### **Renaming Columns in Pandas**

In pandas, you can **rename columns** to make your DataFrame more readable and meaningful.
This is especially useful when working with large datasets or after importing data with generic column names.

---
‚úÖ **Key Points**
* **`rename()`** is best for selective column renaming.
* **Assigning to `df.columns`** is useful when renaming all columns at once.
* **`add_prefix()`** and **`add_suffix()`** quickly modify all column names systematically.
---

‚û°Ô∏è **1. Renaming Specific Columns**

Use the **`rename()`** method and provide a dictionary mapping old column names to new ones. This changes only the specified columns, leaving others unchanged.

In [3]:
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

df = df.rename(columns={'A': 'Column_1', 'B': 'Column_2'})
print(f"Renaming Specific Columns:\n{df}")

Renaming Specific Columns:
   Column_1  Column_2
0         1         4
1         2         5
2         3         6


‚û°Ô∏è **2. Renaming All Columns**

You can directly assign a **list of new column names** to the DataFrame‚Äôs `columns` attribute. Make sure the list length matches the number of columns in the DataFrame.

In [2]:
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.columns = ['New_Column_1', 'New_Column_2']

print(f"Renaming All Columns:\n{df}")

Renaming All Columns:
   New_Column_1  New_Column_2
0             1             4
1             2             5
2             3             6


‚û°Ô∏è **3. Adding Prefixes or Suffixes**

Pandas also provides convenient methods to **add prefixes or suffixes** to all column names at once.

In [5]:
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})\

df = df.add_prefix('col_') # Add prefix to all column names
df = df.add_suffix('_data') # Add suffix to all column names

print(f"After adding Prefix and Suffix:\n{df}")

After adding Prefix and Suffix:
   col_A_data  col_B_data
0           1           4
1           2           5
2           3           6


üîπ **`Task:` You are given a dataset with some columns. Rename the columns based on the following rules**
* If the datatype of the **column is numerical** - then add a **suffix '_data'**
* If the datatype of the **column is date/time** - then add a **suffix '_date'**
* For every **other datatype** - add a **prefix 'col_'**

In [7]:
import pandas as pd

sample_data = {
    'order': [1, 2, 3, 4, 5],
    'ordertime': ['2023-03-15', '2023-04-01', 'invalid_date', '2023-03-20', '2023-04-05'],
    'total': [100.50, 200.75, 150.25, 300.00, 75.80]
}
df = pd.DataFrame(sample_data)

# Method 1 of Solving the Problem
new_column = []
for column in df.columns:
    if df[column].dtypes == 'int64':
        new_column.append(column + '_data')
    elif df[column].dtypes == 'object':
        new_column.append(column + '_date')
    else:
        new_column.append('col_' + column)

df.columns = new_column
print(f"Method 1 of Solving the Problem:\n{df}")

Method 1 of Solving the Problem:
   order_data ordertime_date  col_total
0           1     2023-03-15     100.50
1           2     2023-04-01     200.75
2           3   invalid_date     150.25
3           4     2023-03-20     300.00
4           5     2023-04-05      75.80


In [8]:
import pandas as pd

sample_data = {
    'order': [1, 2, 3, 4, 5],
    'ordertime': ['2023-03-15', '2023-04-01', 'invalid_date', '2023-03-20', '2023-04-05'],
    'total': [100.50, 200.75, 150.25, 300.00, 75.80]
}
df = pd.DataFrame(sample_data)

# Method 2 of Solving the Problem
df = df.rename(columns = {'order': 'order_date', 'ordertime': 'ordertime_date', 'total': 'col_total'})
print(f"Method 2 of Solving the Problem:\n{df}")

Method 2 of Solving the Problem:
   order_date ordertime_date  col_total
0           1     2023-03-15     100.50
1           2     2023-04-01     200.75
2           3   invalid_date     150.25
3           4     2023-03-20     300.00
4           5     2023-04-05      75.80


### **Reordering Columns in Pandas**
**Reordering columns** in a DataFrame refers to changing the sequence in which columns appear.
This is a **structural operation** ‚Äî it only affects the order of display, not the underlying data.

---
‚û°Ô∏è **Key Notes**
* Reordering columns is **non-destructive** ‚Äî the data remains unchanged.
* Always ensure that the column names in your list **exist** in the DataFrame.
* This technique is useful for **cleaner presentation** or **preparing data for export or analysis**.
---
‚úÖ **Summary**

| Operation                      | Purpose                                   |
| ------------------------------ | ----------------------------------------- |
| `df[['col3', 'col1', 'col2']]` | Reorder columns manually                  |
| `df.reindex(columns=[...])`    | Alternative method to reorder dynamically |


‚û°Ô∏è **1. Reordering Using a List of Column Names**

The most common and intuitive approach is to **index the DataFrame with a list** of column names in the desired order.

In [9]:
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# Reorder columns
df_reordered = df[['C', 'A', 'B']]
print(f"Reordered Columns:\n{df_reordered}")

Reordered Columns:
   C  A  B
0  7  1  4
1  8  2  5
2  9  3  6


‚û°Ô∏è **2. Reordering and Selecting Specific Columns**

You can also **reorder and select** only specific columns at the same time. This creates a DataFrame containing only the **‚ÄòC‚Äô** and **‚ÄòA‚Äô** columns in that order.

For example, if you only want to keep a subset of columns:

In [11]:
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})
selected_columns = ['C', 'A']
df_selected = df[selected_columns]

print(f"Reodering & Selecting Specific Columns:\n{df_selected}")

Reodering & Selecting Specific Columns:
   C  A
0  7  1
1  8  2
2  9  3


‚û°Ô∏è **`Task:` You have a DataFrame df with columns 'ID', 'Name', 'Age', 'City', and 'Salary'. Perform the following operations:**
* Reorder the columns to have **'Name' first**, **followed by 'Age', 'Salary' and 'City'.**
* We have given a list containing the countries - **add a new column 'Country' at the end**.

In [17]:
import pandas as pd

# Create the DataFrame
df = pd.DataFrame({
    'ID': [1, 2, 3],
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'London', 'Paris'],
    'Salary': [50000, 60000, 70000]
})
# Reorder the columns to have 'Name' first, followed by 'Age', 'Salary' and 'City'.
df_reordered = df[['Name', 'Age', 'Salary', 'City']]
print(f"Reodered Columns:\n{df_reordered}\n")

# Add a new column 'Country' at the end.
country = ['USA', 'UK', 'France']

df_reordered['Country'] = country
print(f"Revised DataFrame after adding Country column:\n{df_reordered}")

Reodered Columns:
      Name  Age  Salary      City
0    Alice   25   50000  New York
1      Bob   30   60000    London
2  Charlie   35   70000     Paris

Revised DataFrame after adding Country column:
      Name  Age  Salary      City Country
0    Alice   25   50000  New York     USA
1      Bob   30   60000    London      UK
2  Charlie   35   70000     Paris  France


### **Reordering Rows in Pandas**
In addition to rearranging columns, you can also **reorder rows** in a pandas DataFrame. The most common way to do this is by **sorting the data** based on one or more columns.

---
‚úÖ **Key Points**
* The **`sort_values()`** method is the most efficient way to reorder rows.
* Use **`ascending=False`** to sort in descending order.
* Sorting can be based on **numeric**, **string**, or **datetime** columns.
* To reset the row indices after sorting, use `df_sorted.reset_index(drop=True, inplace=True)`.
---

‚û°Ô∏è **1. Sorting by a Single Column**

Use the `sort_values()` function to reorder rows based on the values of a specific column. This will reorder the rows so that entries are sorted by the **Age** column from smallest to largest.

In [18]:
import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 20]
})

# Sort rows by 'Age' in ascending order
df_sorted = df.sort_values('Age')
print(f"Rows sorted by Age [in ascending order]:\n{df_sorted}")

Rows sorted by Age [in ascending order]:
      Name  Age
2  Charlie   20
0    Alice   25
1      Bob   30


‚û°Ô∏è **2. Sorting by Multiple Columns**

You can sort by multiple columns by passing a list of column names to the `by` parameter. You can also specify different sorting orders for each column.

üîπ **`Here, the DataFrame will be:`**
* **Sorted ascending** by `column1`, and
* **Sorted descending** by `column2` for rows with the same `column1` value.

In [20]:
import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Department': ['HR', 'IT', 'HR', 'Finance', 'IT'],
    'Salary': [50000, 60000, 55000, 65000, 62000]
})

# Sort by Department (ascending) and then by Salary (descending)
df_sorted = df.sort_values(by=['Department', 'Salary'], ascending=[True, False])

print(f"Multiple column sorting [Department (ascending) & Salary (descending)]:\n{df_sorted}")

Multiple column sorting [Department (ascending) & Salary (descending)]:
      Name Department  Salary
3    David    Finance   65000
2  Charlie         HR   55000
0    Alice         HR   50000
4      Eva         IT   62000
1      Bob         IT   60000


### **Reordering Rows (Continued)**
Another common use case for reordering rows is to ensure that **data is in chronological order**, especially when working with **time series data**.
This is crucial for tasks like trend analysis, forecasting, and cumulative calculations.

---
‚úÖ **Key Points**
* Always **convert date columns** to datetime objects using `pd.to_datetime()` before sorting.
* Sorting ensures that **time-dependent analyses** (like rolling averages or lag calculations) are accurate.
* Use `ascending=False` if you want the most recent dates first.
* After sorting, you can reset the index using:

  ```python
  df_sorted.reset_index(drop=True, inplace=True)
  ```
  ---

üîπ **Example: Sorting by Date**

In [None]:
import pandas as pd

# Sample DataFrame with unsorted dates
df = pd.DataFrame({
    'Date': ['2023-07-15', '2023-07-13', '2023-07-14'],
    'Sales': [250, 300, 275]
})
# Convert the 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Sort the DataFrame by date in ascending order
df_sorted = df.sort_values('Date')

print(df_sorted)

        Date  Sales
1 2023-07-13    300
2 2023-07-14    275
0 2023-07-15    250


In [4]:
import pandas as pd

# Create a sample DataFrame of customer service tickets
df = pd.DataFrame({
    'Ticket_ID': range(1, 11),
    'Customer': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
    'Issue': ['Login', 'Payment', 'Delivery', 'Refund', 'Login', 'Payment', 'Login', 'Delivery', 'Refund', 'Payment'],
    'Priority': ['High', 'Medium', 'Low', 'High', 'Medium', 'Low', 'High', 'Medium', 'Low', 'High'],
    'Days_Open': [1, 3, 2, 5, 1, 4, 2, 3, 6, 1]
})

# Define a custom sorting key for Priority
priority_order = {'High': 0, 'Medium': 1, 'Low': 2}

# Sort the DataFrame by Priority (custom order) and then by Days_Open (descending)
df_sorted = df.sort_values(
    by=['Priority', 'Days_Open'],
    key=lambda x: x.map(priority_order) if x.name == 'Priority' else x,
    ascending=[True, False]
)

print(f"Original DataFrame:\n{df}\n")
print(f"Sorted DataFrame (Most important tickets first):\n{df_sorted}")

Original DataFrame:
   Ticket_ID Customer     Issue Priority  Days_Open
0          1        A     Login     High          1
1          2        B   Payment   Medium          3
2          3        C  Delivery      Low          2
3          4        D    Refund     High          5
4          5        E     Login   Medium          1
5          6        F   Payment      Low          4
6          7        G     Login     High          2
7          8        H  Delivery   Medium          3
8          9        I    Refund      Low          6
9         10        J   Payment     High          1

Sorted DataFrame (Most important tickets first):
   Ticket_ID Customer     Issue Priority  Days_Open
3          4        D    Refund     High          5
6          7        G     Login     High          2
0          1        A     Login     High          1
9         10        J   Payment     High          1
1          2        B   Payment   Medium          3
7          8        H  Delivery   Medium      

‚û°Ô∏è **`Task:` Perform the following operations and output the result to the console:**
* Sort the **books by 'Year' in descending order** (newest first). For books published in the same year, sort them by **'Sales' in descending order**.
* **Rename the column 'Title' as 'Book_name'**

In [7]:
import pandas as pd

# Create the DataFrame
df = pd.DataFrame({
    'Title': ['Book A', 'Book B', 'Book C', 'Book D', 'Book E'],
    'Author': ['Author 1', 'Author 2', 'Author 3', 'Author 1', 'Author 2'],
    'Year': [2020, 2019, 2020, 2018, 2019],
    'Sales': [5000, 6000, 4500, 3000, 7000]
})
# Sort the books by 'Year' in descending order & 'Sales' in descending order.
df_sorted = df.sort_values(by = ['Year', 'Sales'], ascending = [False, False])

# Rename the column 'Title' as 'Book_name'
df_renamed = df_sorted.rename(columns = {'Title': 'Book_name'})
print(f"Final Output after sorting 'Year by desc', 'Sales in desc' & 'Renamed Title as Book_name':\n{df_renamed}")

Final Output after sorting 'Year by desc', 'Sales in desc' & 'Renamed Title as Book_name':
  Book_name    Author  Year  Sales
0    Book A  Author 1  2020   5000
2    Book C  Author 3  2020   4500
4    Book E  Author 2  2019   7000
1    Book B  Author 2  2019   6000
3    Book D  Author 1  2018   3000
