#### Data Transformation
Once the data is clean, the next step is to reshape, reformat, and reorder it as needed for analysis. Pandas gives you plenty of flexible tools to do this.

---

##### Sorting & Ranking
---

**Sort by Values**

In [10]:
import pandas as pd

df = pd.read_csv('sales_data.csv')

df.head()

#* Ascending sort
df.sort_values('Region')

#* Descending sort
df.sort_values('Region', ascending=False)

#* Sort by multiple columns
df.sort_values(['Product Category', 'Region'])

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal
10,10011,2024-01-11,Beauty Products,Chanel No. 5 Perfume,1,129.99,129.99,Europe,PayPal
16,10017,2024-01-17,Beauty Products,Dyson Supersonic Hair Dryer,1,399.99,399.99,Europe,PayPal
22,10023,2024-01-23,Beauty Products,Olay Regenerist Face Cream,1,49.99,49.99,Europe,PayPal
28,10029,2024-01-29,Beauty Products,MAC Ruby Woo Lipstick,1,29.99,29.99,Europe,PayPal
...,...,...,...,...,...,...,...,...,...
215,10216,2024-08-03,Sports,YETI Tundra 65 Cooler,1,349.99,349.99,Asia,Credit Card
221,10222,2024-08-09,Sports,Garmin Forerunner 945,1,599.99,599.99,Asia,Credit Card
227,10228,2024-08-15,Sports,Fitbit Luxe,2,149.95,299.90,Asia,Credit Card
233,10234,2024-08-21,Sports,Hydro Flask Standard Mouth Water Bottle,3,32.95,98.85,Asia,Credit Card


`df.sort_values(['Age', 'Salary'])` sorts the DataFrame first by the 'Age' column, and if there are ties (i.e. two or more rows with same 'Age'), it will sort by the 'Salary' column.

---

**Reset Index**

If you want the index to start from 0 and be sequential, yoou can reset it using `reset_index()`

In [11]:
df.reset_index(drop=True, inplace=True)  # reset the index and drop the old index

--- 

**Sort Index**
```Python
df.sort_index()
```

The `df.sort_index()` function is used to sort the DataFrame based on its index values, If the index is not in a sequential order (e.g., yo have dropped rows or performed other operations that change the index), you can use `sort_index()` to restore it to sorted order.

---

**Ranking**

The `.rank()` function in pandas is used to assign ranks to numeric values in a column, like scores or points. By default, it gives the average rank to tied values, which can result indecimal numbers. For example if two people share the top score, they both get rank of 1.5. You can customize the ranking behavior using method paremeter. One useful option is "method='dense'", which assigns the same rank to ties but doesn't leave gaps in the ranking sequence. This is helpful when you want a clean, consecutive ranking system without skips.

In [16]:
df['Rank'] = df['Units Sold'].rank(method="dense")
df.head()

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method,Rank
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card,2.0
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal,1.0
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card,3.0
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card,4.0
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal,1.0


---

**Renaming Columns & Index**
```Python
df.rename(columns={"oldName":"newName"}, inplace=True)
df.rename(index={0:"row1", 1:"row2"}, inplace=True)
```

In [23]:
# Example

data = {
    'name':['Alice', 'Bob', 'Charlie'],
    'age':[20, 25, 29],
    'city':['Delhi', 'Mumbai', 'Kolkata']
}

df = pd.DataFrame(data)

#* Rename age column
df.rename(columns={'age':'new_age'}, inplace=True)
print(df.columns)

#* Rename indices
df.rename(index={0:"row1", 1:"row2", 2:"row3"}, inplace=True)
df.head()

#* To rename all columns
df.columns = ['new_name', 'new_age', 'new_city']
df.head()

Index(['name', 'new_age', 'city'], dtype='object')


Unnamed: 0,new_name,new_age,new_city
row1,Alice,20,Delhi
row2,Bob,25,Mumbai
row3,Charlie,29,Kolkata


---

**Changing column order**

Just pass a new list of column names:

In [24]:
df = df[['new_city', 'new_age', 'new_name']]   # Reorder as desired

df

Unnamed: 0,new_city,new_age,new_name
row1,Delhi,20,Alice
row2,Mumbai,25,Bob
row3,Kolkata,29,Charlie
