## Changing index of the DataFrame

- When we generate a dataframe or import a dataset with pandas, it automatically creates a column that acts as an `index`.

- This index will start from 0 and continous to the no.of rows|columns in the dataset.

- The advantage of these indexes is to find out specified rows|columns from the dataframe.

-  Changing the index function to another column will help to retrive the specified data from the dataframe and it also helps in other functionalities.

**The Index column**

- Let’s take a simple example by importing a CSV file into pandas that contains with some data.

In [2]:
import pandas as pd

In [3]:
pd.read_csv(r"data\std_data_nocols.csv",header=None)

Unnamed: 0,0,1,2,3,4
0,A,10,67,66,66.1
1,B,12,87,56,65.0
2,C,11,66,89,63.0
3,D,10,56,88,66.0
4,E,10,89,90,56.0
5,F,9,88,85,89.0
6,G,12,90,83,88.0
7,H,11,79,88,90.0


- Above dataframe as no column name and pandas by default gives you the index (starts from 0 and continues to no.of columns).

- Same way applies to row if you observe to the left of the dataframe index is given starts from 0 and continues to no.of rows)

**Specify the index column when reading**

- In many cases, our data source is a CSV file. Suppose that we have the file named `data.csv` that has the following data.

In [4]:
pd.read_csv(r"data\std_data1.csv")

Unnamed: 0,SNAME,SCLASS,ENGLISH,COMPUTER,SCIENCE
0,A,10,67,66,66
1,B,12,87,56,65
2,C,11,66,89,63
3,D,10,56,88,66
4,E,10,89,90,56
5,F,9,88,85,89
6,G,12,90,83,88
7,H,11,79,88,90


- `An untitled column with numbers has been created to the left of ‘SNAME’ column. This is the index column of our dataframe.`

In [5]:
pd.read_csv(r"data\std_data1.csv",index_col="SNAME")

Unnamed: 0_level_0,SCLASS,ENGLISH,COMPUTER,SCIENCE
SNAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,10,67,66,66
B,12,87,56,65
C,11,66,89,63
D,10,56,88,66
E,10,89,90,56
F,9,88,85,89
G,12,90,83,88
H,11,79,88,90


- here mentioned index column to be `SNAME` in the dataframe while reading the file with the parameter  `index_col`

**Set index with an existing DataFrame**

- After reading the data or in some other data processing steps, you may want to set the index manually. We can use the `set_index` method.

In [6]:
df=pd.read_csv(r"data\std_data1.csv")

In [16]:
df

Unnamed: 0,SNAME,SCLASS,ENGLISH,COMPUTER,SCIENCE
0,A,10,67,66,66
1,B,12,87,56,65
2,C,11,66,89,63
3,D,10,56,88,66
4,E,10,89,90,56
5,F,9,88,85,89
6,G,12,90,83,88
7,H,11,79,88,90




**df.set_index()**  sets the dataframe index using the existing columns. By default it will have RangeIndex (0 to n-1)

In [28]:
df_index=df.set_index("SNAME")

In [30]:
df_index

Unnamed: 0_level_0,SCLASS,ENGLISH,COMPUTER,SCIENCE
SNAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,10,67,66,66
B,12,87,56,65
C,11,66,89,63
D,10,56,88,66
E,10,89,90,56
F,9,88,85,89
G,12,90,83,88
H,11,79,88,90


In [29]:
df_index1=df.set_index("SNAME", drop=False)

In [31]:
df_index1

Unnamed: 0_level_0,SNAME,SCLASS,ENGLISH,COMPUTER,SCIENCE
SNAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A,A,10,67,66,66
B,B,12,87,56,65
C,C,11,66,89,63
D,D,10,56,88,66
E,E,10,89,90,56
F,F,9,88,85,89
G,G,12,90,83,88
H,H,11,79,88,90


- In this method, you specify which column(s) to be the new indices. Two things are worth to note.

  - This method will create a new DataFrame by default. If you want to change the index inplace, you run `df.set_index("SNAME", inplace=True)`.
  
  - If you want to keep the column after which is set to the index, you can run `df.set_index("SNAME", drop=False)`.

**Reset index after some manipulations**

- When you process your DataFrame, some manipulations, such as drop rows, index selecting, will result in a subset of the original index. To re-produce a continuous index, you can use the reset_index method.

`df.reset_index()`

In [44]:
df_index

Unnamed: 0_level_0,SCLASS,ENGLISH,COMPUTER,SCIENCE
SNAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,10,67,66,66
B,12,87,56,65
C,11,66,89,63
D,10,56,88,66
E,10,89,90,56
F,9,88,85,89
G,12,90,83,88
H,11,79,88,90


In [34]:
df_index.reset_index()    #removes SNAME column as index and includes in the dataframe by take default
                                   # indexes

Unnamed: 0,SNAME,SCLASS,ENGLISH,COMPUTER,SCIENCE
0,A,10,67,66,66
1,B,12,87,56,65
2,C,11,66,89,63
3,D,10,56,88,66
4,E,10,89,90,56
5,F,9,88,85,89
6,G,12,90,83,88
7,H,11,79,88,90


In [45]:
df_index1

Unnamed: 0_level_0,SNAME,SCLASS,ENGLISH,COMPUTER,SCIENCE
SNAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A,A,10,67,66,66
B,B,12,87,56,65
C,C,11,66,89,63
D,D,10,56,88,66
E,E,10,89,90,56
F,F,9,88,85,89
G,G,12,90,83,88
H,H,11,79,88,90


In [37]:
df_index1.reset_index(drop=True)#when drop=True it wont include the indec col in dataframe it just drops

Unnamed: 0,SNAME,SCLASS,ENGLISH,COMPUTER,SCIENCE
0,A,10,67,66,66
1,B,12,87,56,65
2,C,11,66,89,63
3,D,10,56,88,66
4,E,10,89,90,56
5,F,9,88,85,89
6,G,12,90,83,88
7,H,11,79,88,90


## Sorting dataframe
- Sorting dataframe by index
- Sorting dataframe by values

#### Sorting by index

**sort_index is used to sort the DF by index**

By default, the index sorting will be ascending

In [38]:
df

Unnamed: 0,SNAME,SCLASS,ENGLISH,COMPUTER,SCIENCE
0,A,10,67,66,66
1,B,12,87,56,65
2,C,11,66,89,63
3,D,10,56,88,66
4,E,10,89,90,56
5,F,9,88,85,89
6,G,12,90,83,88
7,H,11,79,88,90


In [39]:
df.sort_index()        #sorting index in ascending order

Unnamed: 0,SNAME,SCLASS,ENGLISH,COMPUTER,SCIENCE
0,A,10,67,66,66
1,B,12,87,56,65
2,C,11,66,89,63
3,D,10,56,88,66
4,E,10,89,90,56
5,F,9,88,85,89
6,G,12,90,83,88
7,H,11,79,88,90


In [40]:
df.sort_index(ascending=False)         #sorting index in descending order

Unnamed: 0,SNAME,SCLASS,ENGLISH,COMPUTER,SCIENCE
7,H,11,79,88,90
6,G,12,90,83,88
5,F,9,88,85,89
4,E,10,89,90,56
3,D,10,56,88,66
2,C,11,66,89,63
1,B,12,87,56,65
0,A,10,67,66,66


#### Sorting dataframe by values
**df.sort_values(by="ENGLISH") → It will sort the dataframe by column**

In [42]:
df.sort_values(by="ENGLISH")  #ascending order

Unnamed: 0,SNAME,SCLASS,ENGLISH,COMPUTER,SCIENCE
3,D,10,56,88,66
2,C,11,66,89,63
0,A,10,67,66,66
7,H,11,79,88,90
1,B,12,87,56,65
5,F,9,88,85,89
4,E,10,89,90,56
6,G,12,90,83,88


In [43]:
df.sort_values(by="ENGLISH",ascending=False)  #descending order

Unnamed: 0,SNAME,SCLASS,ENGLISH,COMPUTER,SCIENCE
6,G,12,90,83,88
4,E,10,89,90,56
5,F,9,88,85,89
1,B,12,87,56,65
7,H,11,79,88,90
0,A,10,67,66,66
2,C,11,66,89,63
3,D,10,56,88,66
