In [2]:
people = {
    "first": ["Corey", "Jane", 'John'],
    "last": ["Schafer", "Doe", "Doe"],
    "email": ["CoreyMSchafer@gmail.com", "JaneDoe@gmail.com", "JohnDoe@gmail.com"]
}

In [3]:
import pandas as pd

In [4]:
df = pd.DataFrame(people)

### Data Frame with default index
* Here we have default index for the data frame that is **0, 1, 2 .....**
* These are nothing but unique identifier for each row

In [5]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@gmail.com
2,John,Doe,JohnDoe@gmail.com


* But sometimes it makes sense to have a different identifier for the rows that is basically label for that row
* Pandas doesnt force the indexs to be unique but most of the times it will be unique

In [6]:
df['email']

0    CoreyMSchafer@gmail.com
1          JaneDoe@gmail.com
2          JohnDoe@gmail.com
Name: email, dtype: object

### Data Frame with custom index
* We can do that by using **set_index('column_name')** method
* Lets have emails as the indexs
* So we will do ==> df.set_index('email')

In [7]:
df.set_index('email')

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
CoreyMSchafer@gmail.com,Corey,Schafer
JaneDoe@gmail.com,Jane,Doe
JohnDoe@gmail.com,John,Doe


* Now we can see email column is set as index but it looks like regular column because it still has same name as our column when we set it
* We can 

In [8]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@gmail.com
2,John,Doe,JohnDoe@gmail.com


* Here we see the data frame didnt actually changed it still has the default index.
* Hence to change the index we do **df.set_index('email', inplace=True)**

In [10]:
df.set_index('email', inplace=True)

In [11]:
df

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
CoreyMSchafer@gmail.com,Corey,Schafer
JaneDoe@gmail.com,Jane,Doe
JohnDoe@gmail.com,John,Doe


In [12]:
df.index

Index(['CoreyMSchafer@gmail.com', 'JaneDoe@gmail.com', 'JohnDoe@gmail.com'], dtype='object', name='email')

### Why custom indexs are important/useful?
* **'email'** above provides us a nice unique identifier for the rows
* And when using **.loc** to search our data frame by labels these indexes become the labels for these rows
* Now as the indexes had been set to the email, we dont have the default integer indexes
* So df.loc[0] gives error ==> **TypeError**

In [13]:
df.loc['CoreyMSchafer@gmail.com']

first      Corey
last     Schafer
Name: CoreyMSchafer@gmail.com, dtype: object

In [14]:
df.loc['CoreyMSchafer@gmail.com', 'last']

'Schafer'

In [16]:
df.iloc[0]

first      Corey
last     Schafer
Name: CoreyMSchafer@gmail.com, dtype: object

### Reseting the indexes 
* We can reset the indexes by using ==> **df.reset_index(inplace=True)**

In [17]:
df.reset_index(inplace=True)
df

Unnamed: 0,email,first,last
0,CoreyMSchafer@gmail.com,Corey,Schafer
1,JaneDoe@gmail.com,Jane,Doe
2,JohnDoe@gmail.com,John,Doe
