# Accessing and Cleaning Data in Data Frames

Now that we know how to create and store a Data Frame using Pandas, we will now focus on accessing and changing its data. Data cleaning accounts for a significant portion of a data scientist/analyst's task; usually, datasets (or portions of it) are unnecessarily large, redundant, and/or useless for a given task, so we clean them. This notebook goes over some basic data cleaning; more complex methods will be gone over in upcoming notebooks.

---

Our first task is to change the row indexing of a data frame. For example, in our previous notebook, we made a data frame for the top 10 happiest countries in the world. We have a column for the rankings from 1 to 10; there's also the leftmost column that indexes each row from 0 to 9. Check it out below:

In [2]:
import pandas as pd
loc = "../DataSets/Simple/top-ten-happy-countries-forbes.xlsx"
df = pd.read_excel(loc)
df

Unnamed: 0,Ranking,Country,Happy Score
0,1,Finland,7.632
1,2,Norway,7.594
2,3,Denmark,7.555
3,4,Iceland,7.495
4,5,Switzerland,7.487
5,6,Netherlands,7.441
6,7,Canada,7.382
7,8,New Zealand,7.324
8,9,Sweden,7.314
9,10,Australia,7.272


Let's modify the row indexes so that they display the ranking. We will do this by setting the `index` of our data frame to the "Ranking" column. Then, we use Python's `del` command to get rid of that "Ranking" column.

In [4]:
df.index = df["Ranking"]
del df["Ranking"]
df

Unnamed: 0_level_0,Country,Happy Score
Ranking,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Finland,7.632
2,Norway,7.594
3,Denmark,7.555
4,Iceland,7.495
5,Switzerland,7.487
6,Netherlands,7.441
7,Canada,7.382
8,New Zealand,7.324
9,Sweden,7.314
10,Australia,7.272


---