# Data Transformation
Once your data is clean, the next step is to reshape, reformat, and reorder it as needed for analysis. Pandas gives you plenty of flexible tools to do this.



#### Sorting & Ranking
#### Sort by Values

In [3]:
import pandas as pd 


In [4]:
df = pd.read_csv("data.csv")
df

Unnamed: 0,NAME,SCHOOL,MARKS
0,anshi,DPS,56
1,anshika,DPS,78
2,avani,LLS,89
3,priya,LLS,89
4,aliza,RDS,45
5,sanya,RDS,56


In [5]:
df.sort_values("MARKS")

Unnamed: 0,NAME,SCHOOL,MARKS
4,aliza,RDS,45
0,anshi,DPS,56
5,sanya,RDS,56
1,anshika,DPS,78
2,avani,LLS,89
3,priya,LLS,89


df.sort_values(["Age", "Salary"]) sorts the DataFrame first by the "Age" column, and if there are ties (i.e., two or more rows with the same "Age"), it will sort by the "Salary" column.



In [7]:
df.sort_values("MARKS",ascending=False)


Unnamed: 0,NAME,SCHOOL,MARKS
2,avani,LLS,89
3,priya,LLS,89
1,anshika,DPS,78
0,anshi,DPS,56
5,sanya,RDS,56
4,aliza,RDS,45


In [8]:
df2 = df.sort_values(["MARKS","NAME"]).copy()
df2


Unnamed: 0,NAME,SCHOOL,MARKS
4,aliza,RDS,45
0,anshi,DPS,56
5,sanya,RDS,56
1,anshika,DPS,78
2,avani,LLS,89
3,priya,LLS,89


#### Sort by Index
The df.sort_index() function is used to sort the DataFrame based on its index values. If the index is not in a sequential order (e.g., you have dropped rows or performed other operations that change the index), you can use sort_index() to restore it to a sorted order.

In [10]:
df.sort_index()

Unnamed: 0,NAME,SCHOOL,MARKS
0,anshi,DPS,56
1,anshika,DPS,78
2,avani,LLS,89
3,priya,LLS,89
4,aliza,RDS,45
5,sanya,RDS,56


### Reset Index
If you want the index to start from 0 and be sequential, you can reset it using reset_index()

In [12]:
df2 .reset_index(drop=True)

Unnamed: 0,NAME,SCHOOL,MARKS
0,aliza,RDS,45
1,anshi,DPS,56
2,sanya,RDS,56
3,anshika,DPS,78
4,avani,LLS,89
5,priya,LLS,89


In [13]:
df2

Unnamed: 0,NAME,SCHOOL,MARKS
4,aliza,RDS,45
0,anshi,DPS,56
5,sanya,RDS,56
1,anshika,DPS,78
2,avani,LLS,89
3,priya,LLS,89


In [14]:
df2 .reset_index(drop=True,inplace=True)
df2 # Reset the index and drop the old index


Unnamed: 0,NAME,SCHOOL,MARKS
0,aliza,RDS,45
1,anshi,DPS,56
2,sanya,RDS,56
3,anshika,DPS,78
4,avani,LLS,89
5,priya,LLS,89


In [15]:
df2

Unnamed: 0,NAME,SCHOOL,MARKS
0,aliza,RDS,45
1,anshi,DPS,56
2,sanya,RDS,56
3,anshika,DPS,78
4,avani,LLS,89
5,priya,LLS,89


### Ranking
The .rank() function in pandas is used to assign ranks to numeric values in a column, like scores or points. By default, it gives the average rank to tied values, which can result in decimal numbers. For example, if two people share the top score, they both get a rank of 1.5. You can customize the ranking behavior using the method parameter. One useful option is method='dense', which assigns the same rank to ties but doesnâ€™t leave gaps in the ranking sequence. This is helpful when you want a clean, consecutive ranking system without skips.

df["Rank"] = df["Score"].rank()                 # Default: average method
df["Rank"] = df["Score"].rank(method="dense")   # 1, 2, 2, 3

In [17]:
df2["Rank"]=df2["MARKS"].rank(ascending = False)
df2

Unnamed: 0,NAME,SCHOOL,MARKS,Rank
0,aliza,RDS,45,6.0
1,anshi,DPS,56,4.5
2,sanya,RDS,56,4.5
3,anshika,DPS,78,3.0
4,avani,LLS,89,1.5
5,priya,LLS,89,1.5


In [18]:
df2

Unnamed: 0,NAME,SCHOOL,MARKS,Rank
0,aliza,RDS,45,6.0
1,anshi,DPS,56,4.5
2,sanya,RDS,56,4.5
3,anshika,DPS,78,3.0
4,avani,LLS,89,1.5
5,priya,LLS,89,1.5


### Renaming Columns & Index

In [20]:
df2.rename(columns={"MARKS":"SCORE"},inplace=True)
df2

Unnamed: 0,NAME,SCHOOL,SCORE,Rank
0,aliza,RDS,45,6.0
1,anshi,DPS,56,4.5
2,sanya,RDS,56,4.5
3,anshika,DPS,78,3.0
4,avani,LLS,89,1.5
5,priya,LLS,89,1.5


### Changing Column Order
Just pass a new list of column names:

You can also move one column to the front:
cols = ["Name"] + [col for col in df.columns if col != "Name"]
df = df[cols]

In [22]:
df2.rename(index={0:"row1",1:"row2"},inplace=True)
df2

Unnamed: 0,NAME,SCHOOL,SCORE,Rank
row1,aliza,RDS,45,6.0
row2,anshi,DPS,56,4.5
2,sanya,RDS,56,4.5
3,anshika,DPS,78,3.0
4,avani,LLS,89,1.5
5,priya,LLS,89,1.5


In [24]:
df2 =df2[["Rank","SCORE","NAME","SCHOOL",]]
df2

Unnamed: 0,Rank,SCORE,NAME,SCHOOL
row1,6.0,45,aliza,RDS
row2,4.5,56,anshi,DPS
2,4.5,56,sanya,RDS
3,3.0,78,anshika,DPS
4,1.5,89,avani,LLS
5,1.5,89,priya,LLS


### Summary
Sort, rank, and rename to prepare your data
Reordering and reshaping are key for EDA and visualization