Data Transformation
Once your data is clean, the next step is to reshape, reformat, and reorder it as needed for analysis. Pandas gives you plenty of flexible tools to do this.

Sorting & Ranking
Sort by Values
df.sort_values("Age")                   # Ascending sort
df.sort_values("Age", ascending=False)  # Descending
df.sort_values(["Age", "Salary"])       # Sort by multiple columns

df.sort_values(["Age", "Salary"]) sorts the DataFrame first by the "Age" column, and if there are ties (i.e., two or more rows with the same "Age"), it will sort by the "Salary" column.

Reset Index
If you want the index to start from 0 and be sequential, you can reset it using reset_index()

df.reset_index(drop=True, inplace=True)  # Reset the index and drop the old index

Sort by Index
df.sort_index()

The df.sort_index() function is used to sort the DataFrame based on its index values. If the index is not in a sequential order (e.g., you have dropped rows or performed other operations that change the index), you can use sort_index() to restore it to a sorted order.

Ranking
The .rank() function in pandas is used to assign ranks to numeric values in a column, like scores or points. By default, it gives the average rank to tied values, which can result in decimal numbers. For example, if two people share the top score, they both get a rank of 1.5. You can customize the ranking behavior using the method parameter. One useful option is method='dense', which assigns the same rank to ties but doesn’t leave gaps in the ranking sequence. This is helpful when you want a clean, consecutive ranking system without skips.

df["Rank"] = df["Score"].rank()                 # Default: average method
df["Rank"] = df["Score"].rank(method="dense")   # 1, 2, 2, 3

Renaming Columns & Index
df.rename(columns={"oldName": "newName"}, inplace=True)
df.rename(index={0: "row1", 1: "row2"}, inplace=True)

To rename all columns:

df.columns = ["Name", "Age", "City"]

Changing Column Order
Just pass a new list of column names:

df = df[["City", "Name", "Age"]]   # Reorder as desired

You can also move one column to the front:

cols = ["Name"] + [col for col in df.columns if col != "Name"]
df = df[cols]

Summary
Sort, rank, and rename to prepare your data
Reordering and reshaping are key for EDA and visualization

In [None]:
import pandas as pd
data = {
    "Name": ["Tasmir", "Saad", "Aman", "Sharique", "Rahil"],
    "Age": [21, None, 21, None, 20],
    "Course": ["CSE", None, "MBA", "CSE", "BCA"],
    "Salary": ["9999999", "7459576", "6432795", "3445869", "5462923"],
    "Date": ["2023-01-01", "2023-02-01", "2023-03-01", "2023-04-01", "2023-05-01"],
    "Category": ["A", "B", "A", "C", "B"]
}
df = pd.DataFrame(data)
print("\nOriginal DataFrame:-\n",df)

print("\n",df.sort_values("Age"))
print("\n",df.sort_values("Age",ascending=False))
print("\n",df.sort_values(["Age","Salary"]))
print("\n",df.sort_index())



Original DataFrame:-
        Name   Age Course   Salary        Date Category
0    Tasmir  21.0    CSE  9999999  2023-01-01        A
1      Saad   NaN   None  7459576  2023-02-01        B
2      Aman  21.0    MBA  6432795  2023-03-01        A
3  Sharique   NaN    CSE  3445869  2023-04-01        C
4     Rahil  20.0    BCA  5462923  2023-05-01        B

        Name   Age Course   Salary        Date Category
4     Rahil  20.0    BCA  5462923  2023-05-01        B
0    Tasmir  21.0    CSE  9999999  2023-01-01        A
2      Aman  21.0    MBA  6432795  2023-03-01        A
1      Saad   NaN   None  7459576  2023-02-01        B
3  Sharique   NaN    CSE  3445869  2023-04-01        C

        Name   Age Course   Salary        Date Category
0    Tasmir  21.0    CSE  9999999  2023-01-01        A
2      Aman  21.0    MBA  6432795  2023-03-01        A
4     Rahil  20.0    BCA  5462923  2023-05-01        B
1      Saad   NaN   None  7459576  2023-02-01        B
3  Sharique   NaN    CSE  3445869  202

In [29]:
# Ranking
import pandas as pd
data = {
    "Name": ["Tasmir", "Saad", "Aman", "Sharique", "Rahil"],
    "Age": [21, None, 21, None, 20],
    "Course": ["CSE", None, "MBA", "CSE", "BCA"],
    "Salary": ["9999999", "7459576", "6432795", "3445869", "3445869"],
    "Date": ["2023-01-01", "2023-02-01", "2023-03-01", "2023-04-01", "2023-05-01"],
    "Category": ["A", "B", "A", "C", "B"]
}

df = pd.DataFrame(data)
df.index = range(1,len(df)+1)
print("\n",df)
df["Rank"] = df["Salary"].rank()
print("\n",df)
df["Rank2"] = df["Salary"].rank(method = "dense")
print("\n",df)

print("Renaming columns and Indexes:-")
df.rename(columns={"Date":"Joining","Course":"Stream"}, inplace=True)
print("\nDate renaming:-\n",df)
df.rename(index={1:"R1",2:"R2"}, inplace=True)
print("\nRow renaming:-\n",df)

print("\nChanging Column Order")
df = df[["Name","Salary","Rank","Rank2","Joining","Stream","Age"]]
print("\n",df)


        Name   Age Course   Salary        Date Category
1    Tasmir  21.0    CSE  9999999  2023-01-01        A
2      Saad   NaN   None  7459576  2023-02-01        B
3      Aman  21.0    MBA  6432795  2023-03-01        A
4  Sharique   NaN    CSE  3445869  2023-04-01        C
5     Rahil  20.0    BCA  3445869  2023-05-01        B

        Name   Age Course   Salary        Date Category  Rank
1    Tasmir  21.0    CSE  9999999  2023-01-01        A   5.0
2      Saad   NaN   None  7459576  2023-02-01        B   4.0
3      Aman  21.0    MBA  6432795  2023-03-01        A   3.0
4  Sharique   NaN    CSE  3445869  2023-04-01        C   1.5
5     Rahil  20.0    BCA  3445869  2023-05-01        B   1.5

        Name   Age Course   Salary        Date Category  Rank  Rank2
1    Tasmir  21.0    CSE  9999999  2023-01-01        A   5.0    4.0
2      Saad   NaN   None  7459576  2023-02-01        B   4.0    3.0
3      Aman  21.0    MBA  6432795  2023-03-01        A   3.0    2.0
4  Sharique   NaN    CSE  