# 🧹 Data Transformation in Pandas

Once your data is clean, the next step is to **reshape**, **reformat**, and **reorder** it to prepare for analysis.
Pandas provides powerful tools to help you do just that. Let’s walk through the essentials:

---

## 🔢 Sorting & Ranking

### 📊 Sort by Values

```python
df.sort_values("Age")                     # Ascending by Age
df.sort_values("Age", ascending=False)    # Descending by Age
df.sort_values(["Age", "Salary"])         # Sort by Age, then Salary
```

> When sorting by multiple columns, Pandas uses the second column to break ties in the first.

---

### 🔁 Reset Index

```python
df.reset_index(drop=True, inplace=True)
```

* Resets the index to a default integer range (0, 1, 2,...)
* `drop=True` prevents the old index from becoming a column
* `inplace=True` applies changes directly to the DataFrame

---

### 🗂 Sort by Index

```python
df.sort_index()
```

Use this to **sort the DataFrame by its index values**, especially after operations that disrupt index order.

---

### 🏅 Ranking Data

```python
df["Rank"] = df["Score"].rank()                 # Default: average ranking
df["Rank"] = df["Score"].rank(method="dense")   # Dense ranking: 1, 2, 2, 3
```

* `.rank()` is great for assigning ranks to numeric columns (like scores).
* `method='dense'` avoids gaps in the ranking sequence.

---

## 🏷 Renaming Columns and Index

### Rename Individual Columns or Index Labels

```python
df.rename(columns={"oldName": "newName"}, inplace=True)
df.rename(index={0: "row1", 1: "row2"}, inplace=True)
```

### Rename All Columns

```python
df.columns = ["Name", "Age", "City"]
```

---

## 🔄 Changing Column Order

### Reorder Entire DataFrame

```python
df = df[["City", "Name", "Age"]]   # Custom column order
```

### Move a Specific Column to the Front

```python
cols = ["Name"] + [col for col in df.columns if col != "Name"]
df = df[cols]
```

---

## ✅ Summary

* **Sort**, **rank**, and **rename** your data to make it analysis-ready
* **Reordering and reshaping** are essential steps for EDA and visualization workflows
* Pandas makes all of this simple and intuitive with its versatile functions

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv("data2.csv")

In [4]:
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


In [5]:
df.sort_values("Year")

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [6]:
df.sort_values("Year", ascending=False)

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0


In [7]:
df2 = df.sort_values(["Year", "IMDb"]).copy()

In [8]:
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [9]:
# df2.reset_index(drop=True, inplace=True)     # Drop true does not add another column named "Index" and inplace changes the original df2

In [10]:
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [11]:
df2.sort_index()       # Sort according to index number

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


In [12]:
df2["Rank"] = df2["IMDb"].rank(ascending=False, method="dense")    # Method dense gives the intiger value to the same values ranks
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,Rank
2,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,9.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1,8.0
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
7,Hrithik Roshan,War,2019,Action,475,6.5,7.0
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2,3.0
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,11.0


In [13]:
df = df[["Film", "Actor", "Year", "Genre", "IMDb", "BoxOffice(INR Crore)"]]    # Rearrange columns

In [14]:
df

Unnamed: 0,Film,Actor,Year,Genre,IMDb,BoxOffice(INR Crore)
0,Pathaan,Shah Rukh Khan,2023,Action,7.2,1050
1,Tiger Zinda Hai,Salman Khan,2017,Action,6.0,565
2,Dangal,Aamir Khan,2016,Biography,8.4,2024
3,Brahmastra,Ranbir Kapoor,2022,Fantasy,5.6,431
4,Padmaavat,Ranveer Singh,2018,Historical,7.0,585
5,Andhadhun,Ayushmann Khurrana,2018,Thriller,8.3,111
6,Stree,Rajkummar Rao,2018,Horror Comedy,7.5,180
7,War,Hrithik Roshan,2019,Action,6.5,475
8,Good Newwz,Akshay Kumar,2019,Comedy,7.0,318
9,Bhool Bhulaiyaa 2,Kartik Aaryan,2022,Horror Comedy,5.9,266
