# Data Transformation

Once your data is clean, the next step is to **reshape, reformat, and reorder** it as needed for analysis. Pandas gives you plenty of flexible tools to do this.

---

## Sorting & Ranking

### Sort by Values

```python
df.sort_values("Age").# Ascending sort
df.sort_values("Age", ascending=False)  # Descending
df.sort_values(["Age", "Salary"])       # Sort by multiple columns
```
df.sort_values(["Age", "Salary"]) sorts the DataFrame first by the "Age" column, and if there are ties (i.e., two or more rows with the same "Age"), it will sort by the "Salary" column.

### Reset Index
If you want the index to start from 0 and be sequential, you can reset it using reset_index()
```python
df.reset_index(drop=True, inplace=True)  # Reset the index and drop the old index
```
### Sort by Index

```python
df.sort_index()
```
The df.sort_index() function is used to sort the DataFrame based on its index values. If the index is not in a sequential order (e.g., you have dropped rows or performed other operations that change the index), you can use sort_index() to restore it to a sorted order.
### Ranking
The .rank() function in pandas is used to assign ranks to numeric values in a column, like scores or points. By default, it gives the average rank to tied values, which can result in decimal numbers. For example, if two people share the top score, they both get a rank of 1.5. You can customize the ranking behavior using the method parameter. One useful option is method='dense', which assigns the same rank to ties but doesnâ€™t leave gaps in the ranking sequence. This is helpful when you want a clean, consecutive ranking system without skips.
```python
df["Rank"] = df["Score"].rank()                 # Default: average method
df["Rank"] = df["Score"].rank(method="dense")   # 1, 2, 2, 3
```

---

## Renaming Columns & Index

```python
df.rename(columns={"oldName": "newName"}, inplace=True)
df.rename(index={0: "row1", 1: "row2"}, inplace=True)
```

To rename all columns:

```python
df.columns = ["Name", "Age", "City"]
```

---

## Changing Column Order

Just pass a new list of column names:

```python
df = df[["City", "Name", "Age"]]   # Reorder as desired
```

You can also move one column to the front:

```python
cols = ["Name"] + [col for col in df.columns if col != "Name"]
df = df[cols]
```

---



## Summary

- Sort, rank, and rename to prepare your data    
- Reordering and reshaping are key for EDA and visualization

 

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("data.csv")
df

Unnamed: 0,Name,Age,Department,Position,Years_Experience,Salary,Performance_Rating,Projects_Completed,Hours_Worked_Per_Week,Education,Location,Skills
0,Shrii,28,Engineering,Senior Developer,5,95000,4.5,23,42,Masters,Bangalore,Python|JavaScript|React
1,Samay,32,Marketing,Marketing Manager,7,88000,4.2,18,38,Bachelors,Mumbai,SEO|Analytics|Content
2,Bhargav,26,Engineering,Junior Developer,2,65000,4.0,12,40,Bachelors,Pune,Java|SQL|Spring
3,Adarsh,35,Finance,Financial Analyst,10,105000,4.8,31,45,MBA,Delhi,Excel|SAP|Forecasting
4,Prabh,29,Design,Lead Designer,6,82000,4.6,27,39,Bachelors,Hyderabad,Figma|Adobe|UI/UX
5,MoneyBhai,41,Finance,CFO,15,185000,4.9,45,50,MBA,Mumbai,Strategy|Finance|Leadership
6,Krish,24,Engineering,Intern,1,45000,3.8,8,35,Bachelors,Bangalore,Python|HTML|CSS


In [3]:
df2 = df.sort_values("Age", ascending=False).copy()

In [4]:
df2.reset_index()

Unnamed: 0,index,Name,Age,Department,Position,Years_Experience,Salary,Performance_Rating,Projects_Completed,Hours_Worked_Per_Week,Education,Location,Skills
0,5,MoneyBhai,41,Finance,CFO,15,185000,4.9,45,50,MBA,Mumbai,Strategy|Finance|Leadership
1,3,Adarsh,35,Finance,Financial Analyst,10,105000,4.8,31,45,MBA,Delhi,Excel|SAP|Forecasting
2,1,Samay,32,Marketing,Marketing Manager,7,88000,4.2,18,38,Bachelors,Mumbai,SEO|Analytics|Content
3,4,Prabh,29,Design,Lead Designer,6,82000,4.6,27,39,Bachelors,Hyderabad,Figma|Adobe|UI/UX
4,0,Shrii,28,Engineering,Senior Developer,5,95000,4.5,23,42,Masters,Bangalore,Python|JavaScript|React
5,2,Bhargav,26,Engineering,Junior Developer,2,65000,4.0,12,40,Bachelors,Pune,Java|SQL|Spring
6,6,Krish,24,Engineering,Intern,1,45000,3.8,8,35,Bachelors,Bangalore,Python|HTML|CSS


In [5]:
''' 
Resetting index without making a new index column 
and updating the DataFrame itself
This line includes 2 concepts
'''
df2.reset_index(drop=True, inplace=True)
df2

Unnamed: 0,Name,Age,Department,Position,Years_Experience,Salary,Performance_Rating,Projects_Completed,Hours_Worked_Per_Week,Education,Location,Skills
0,MoneyBhai,41,Finance,CFO,15,185000,4.9,45,50,MBA,Mumbai,Strategy|Finance|Leadership
1,Adarsh,35,Finance,Financial Analyst,10,105000,4.8,31,45,MBA,Delhi,Excel|SAP|Forecasting
2,Samay,32,Marketing,Marketing Manager,7,88000,4.2,18,38,Bachelors,Mumbai,SEO|Analytics|Content
3,Prabh,29,Design,Lead Designer,6,82000,4.6,27,39,Bachelors,Hyderabad,Figma|Adobe|UI/UX
4,Shrii,28,Engineering,Senior Developer,5,95000,4.5,23,42,Masters,Bangalore,Python|JavaScript|React
5,Bhargav,26,Engineering,Junior Developer,2,65000,4.0,12,40,Bachelors,Pune,Java|SQL|Spring
6,Krish,24,Engineering,Intern,1,45000,3.8,8,35,Bachelors,Bangalore,Python|HTML|CSS


In [6]:
df2["Rank"] = df2["Salary"].rank(ascending=False) 
df2

Unnamed: 0,Name,Age,Department,Position,Years_Experience,Salary,Performance_Rating,Projects_Completed,Hours_Worked_Per_Week,Education,Location,Skills,Rank
0,MoneyBhai,41,Finance,CFO,15,185000,4.9,45,50,MBA,Mumbai,Strategy|Finance|Leadership,1.0
1,Adarsh,35,Finance,Financial Analyst,10,105000,4.8,31,45,MBA,Delhi,Excel|SAP|Forecasting,2.0
2,Samay,32,Marketing,Marketing Manager,7,88000,4.2,18,38,Bachelors,Mumbai,SEO|Analytics|Content,4.0
3,Prabh,29,Design,Lead Designer,6,82000,4.6,27,39,Bachelors,Hyderabad,Figma|Adobe|UI/UX,5.0
4,Shrii,28,Engineering,Senior Developer,5,95000,4.5,23,42,Masters,Bangalore,Python|JavaScript|React,3.0
5,Bhargav,26,Engineering,Junior Developer,2,65000,4.0,12,40,Bachelors,Pune,Java|SQL|Spring,6.0
6,Krish,24,Engineering,Intern,1,45000,3.8,8,35,Bachelors,Bangalore,Python|HTML|CSS,7.0


In [7]:
df2.rename(columns={"Salary": "Income"}, inplace=True)
df2

Unnamed: 0,Name,Age,Department,Position,Years_Experience,Income,Performance_Rating,Projects_Completed,Hours_Worked_Per_Week,Education,Location,Skills,Rank
0,MoneyBhai,41,Finance,CFO,15,185000,4.9,45,50,MBA,Mumbai,Strategy|Finance|Leadership,1.0
1,Adarsh,35,Finance,Financial Analyst,10,105000,4.8,31,45,MBA,Delhi,Excel|SAP|Forecasting,2.0
2,Samay,32,Marketing,Marketing Manager,7,88000,4.2,18,38,Bachelors,Mumbai,SEO|Analytics|Content,4.0
3,Prabh,29,Design,Lead Designer,6,82000,4.6,27,39,Bachelors,Hyderabad,Figma|Adobe|UI/UX,5.0
4,Shrii,28,Engineering,Senior Developer,5,95000,4.5,23,42,Masters,Bangalore,Python|JavaScript|React,3.0
5,Bhargav,26,Engineering,Junior Developer,2,65000,4.0,12,40,Bachelors,Pune,Java|SQL|Spring,6.0
6,Krish,24,Engineering,Intern,1,45000,3.8,8,35,Bachelors,Bangalore,Python|HTML|CSS,7.0


In [8]:
df2.rename(index={0: "row1", 1: "row2"}, inplace=True)
df2

Unnamed: 0,Name,Age,Department,Position,Years_Experience,Income,Performance_Rating,Projects_Completed,Hours_Worked_Per_Week,Education,Location,Skills,Rank
row1,MoneyBhai,41,Finance,CFO,15,185000,4.9,45,50,MBA,Mumbai,Strategy|Finance|Leadership,1.0
row2,Adarsh,35,Finance,Financial Analyst,10,105000,4.8,31,45,MBA,Delhi,Excel|SAP|Forecasting,2.0
2,Samay,32,Marketing,Marketing Manager,7,88000,4.2,18,38,Bachelors,Mumbai,SEO|Analytics|Content,4.0
3,Prabh,29,Design,Lead Designer,6,82000,4.6,27,39,Bachelors,Hyderabad,Figma|Adobe|UI/UX,5.0
4,Shrii,28,Engineering,Senior Developer,5,95000,4.5,23,42,Masters,Bangalore,Python|JavaScript|React,3.0
5,Bhargav,26,Engineering,Junior Developer,2,65000,4.0,12,40,Bachelors,Pune,Java|SQL|Spring,6.0
6,Krish,24,Engineering,Intern,1,45000,3.8,8,35,Bachelors,Bangalore,Python|HTML|CSS,7.0


In [10]:
# Reorder Columns
df2 = df2[["Rank","Age", "Position", "Department", "Name", "Performance_Rating","Hours_Worked_Per_Week","Education","Location","Skills","Years_Experience","Income","Projects_Completed"]]
df2

Unnamed: 0,Rank,Age,Position,Department,Name,Performance_Rating,Hours_Worked_Per_Week,Education,Location,Skills,Years_Experience,Income,Projects_Completed
row1,1.0,41,CFO,Finance,MoneyBhai,4.9,50,MBA,Mumbai,Strategy|Finance|Leadership,15,185000,45
row2,2.0,35,Financial Analyst,Finance,Adarsh,4.8,45,MBA,Delhi,Excel|SAP|Forecasting,10,105000,31
2,4.0,32,Marketing Manager,Marketing,Samay,4.2,38,Bachelors,Mumbai,SEO|Analytics|Content,7,88000,18
3,5.0,29,Lead Designer,Design,Prabh,4.6,39,Bachelors,Hyderabad,Figma|Adobe|UI/UX,6,82000,27
4,3.0,28,Senior Developer,Engineering,Shrii,4.5,42,Masters,Bangalore,Python|JavaScript|React,5,95000,23
5,6.0,26,Junior Developer,Engineering,Bhargav,4.0,40,Bachelors,Pune,Java|SQL|Spring,2,65000,12
6,7.0,24,Intern,Engineering,Krish,3.8,35,Bachelors,Bangalore,Python|HTML|CSS,1,45000,8
