# PyTutorial 3.1 - Indexes

In this module, we will learn how to set, reset and use indexes.

In [55]:
import pandas as pd

people = { "first": ["Ben", "Han", "Luke"], 
"last": ["Kenobi", "Solo", "Skywalker"], 
"email": ["Benkenobi@email.com", "Hansolo@email.com", "Lukeskywalker@email.com"]}
df = pd.DataFrame(people)

In [56]:
df

Unnamed: 0,first,last,email
0,Ben,Kenobi,Benkenobi@email.com
1,Han,Solo,Hansolo@email.com
2,Luke,Skywalker,Lukeskywalker@email.com


On the far left, you can see a column without a name. This is the index. The index identifies each element in a list. As you can see, the index is a range of numbers from 0-2.

We can change the list to something that better identifies the items in the list.

In [57]:
df["email"]
#shows you the emails from each list.

0        Benkenobi@email.com
1          Hansolo@email.com
2    Lukeskywalker@email.com
Name: email, dtype: object

In [58]:
#Now, we can change the emails to the index.
df.set_index("email")

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
Benkenobi@email.com,Ben,Kenobi
Hansolo@email.com,Han,Solo
Lukeskywalker@email.com,Luke,Skywalker


In [59]:
#This doesn't permanently change the data frame. In order to do that, you need to include 'inplace = True' after what you're trying to change the index to.

df.set_index("email", inplace=True)
df

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
Benkenobi@email.com,Ben,Kenobi
Hansolo@email.com,Han,Solo
Lukeskywalker@email.com,Luke,Skywalker


In [60]:
#If you want to see your index, you can just do df.index (which will now show you the emails)
df.index

Index(['Benkenobi@email.com', 'Hansolo@email.com', 'Lukeskywalker@email.com'], dtype='object', name='email')

In [61]:
#It's much more specific and if you're searching for information, you can simply use the loc function: df.loc['Index Name']. This will bring up all of the data in that list.
df.loc["Lukeskywalker@email.com"]

first         Luke
last     Skywalker
Name: Lukeskywalker@email.com, dtype: object

If you try to use df.loc["0"], it won't bring up the data because you changed the index from 0 to the email. The iloc function is used to bring it up with a number index instead.

In [62]:
df.iloc[1]

first     Han
last     Solo
Name: Hansolo@email.com, dtype: object

If you want to reset the index, you can you the df.reset_index function. Don't forget to use inplace=True if you want to make these changes permanent.

In [63]:
df.reset_index(inplace=True)
df

Unnamed: 0,email,first,last
0,Benkenobi@email.com,Ben,Kenobi
1,Hansolo@email.com,Han,Solo
2,Lukeskywalker@email.com,Luke,Skywalker


If you are importing data, instead of the df.set_index function, you can add an argument into where you loaded your data. In this line of code, you load your data, and where it says column, put the name of the column you want to be the index. It would look like this:

df= pd.read_csv('folder/name_of_file.csv', index_col='Column')

In [67]:
#You can use .sort to organize your data.
df.sort_index(ascending=True)
#ascending = True organizes it alphabetically which might make your data easier to read.

Unnamed: 0,email,first,last
0,Benkenobi@email.com,Ben,Kenobi
1,Hansolo@email.com,Han,Solo
2,Lukeskywalker@email.com,Luke,Skywalker
