# 8. Renaming: .rename()

-----

The `.rename()` method is your tool for changing the labels of your DataFrame's **rows (index)** or **columns**. Think of it as the "Find and Replace" function for your row and column names. You use it to clean up messy names (e.g., 'First Name ') or to make them more concise (e.g., changing 'customer\_email\_address' to 'email').

**How It Works in Memory**: By default (`inplace=False`), `.rename()` does *not* change your original DataFrame. It creates a **new** DataFrame in memory. This new DataFrame has new `Index` (or `Column`) objects, but it *shares* the original, untouched data blocks (the NumPy arrays) with the first DataFrame. This is very memory-efficient. If you set `inplace=True`, no new DataFrame is created; it modifies the metadata of the original object directly.

**When to Use This**: You will use this *all the time* during data cleaning.

  * Use `.rename()` to fix one or two messy column names after loading a file (e.g., `df.rename(columns={' old name ': 'new_name'})`).
  * Use it to change specific index labels that are incorrect.
  * Use it with a **function** (e.g., `str.lower`) to apply a consistent cleaning rule to *all* column names at once, like making them all lowercase.

-----

### 0\. Syntax & Parameters (MUST COME FIRST)

This method is for both Series and DataFrames. The syntax below is for a **DataFrame**.

```python
dataframe.rename(mapper=None, index=None, columns=None, inplace=False, ...)
```

  * **`mapper`**

      * **What it does:** A dictionary or a function to apply to the labels of an axis. You *must* specify which axis (e.g., `mapper=str.lower, axis='columns'`).
      * **Default value:** `None`
      * **When you would use it:** This is an alternative to the `index` and `columns` parameters. It's more flexible but also more verbose. Most people prefer using the `index` and `columns` parameters directly.
      * **What happens if you don't specify it:** Nothing. You'll use `index` or `columns` instead.

  * **`index`**

      * **What it does:** This is the key parameter for **row labels**. It can be a **dictionary** (`{'old_label': 'new_label'}`) to rename specific rows, or a **function** to apply to all row labels.
      * **Default value:** `None`
      * **When you would use it:** `df.rename(index={'row_1': 'row_A'})`
      * **What happens if you don't specify it:** No row labels are renamed.

  * **`columns`**

      * **What it does:** This is the key parameter for **column labels**. It can be a **dictionary** (`{'old_col': 'new_col'}`) to rename specific columns, or a **function** to apply to all column labels.
      * **Default value:** `None`
      * **When you would use it:** `df.rename(columns={'User ID': 'user_id'})`
      * **What happens if you don't specify it:** No column labels are renamed.

  * **`inplace`**

      * **What it does:** A boolean (True/False). If `False` (the default), it returns a *new copy* of the DataFrame. If `True`, it modifies the original DataFrame *directly* and returns `None`.
      * **Default value:** `False`
      * **When you would use it:** You *can* use `inplace=True` to save memory, but it's **discouraged**. It's less predictable and breaks method chaining. The standard, preferred way is to re-assign: `df = df.rename(...)`.
      * **What happens if you don't specify it:** The default `False` is used, which is safer.

-----

### 1\. Basic Example

The most common use is to rename one or two columns using a dictionary.

**Example 1: Renaming a single column**

```python
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30]
})
print("--- Original DataFrame ---")
print(df)

# Rename the 'Age' column to 'Years'
# This returns a NEW DataFrame
df_renamed = df.rename(columns={'Age': 'Years'})

print("\n--- Renamed DataFrame ---")
print(df_renamed)

print("\n--- Original is Unchanged ---")
print(df)
```

**Output:**

```
--- Original DataFrame ---
    Name  Age
0  Alice   25
1    Bob   30

--- Renamed DataFrame ---
    Name  Years
0  Alice     25
1    Bob     30

--- Original is Unchanged ---
    Name  Age
0  Alice   25
1    Bob   30
```

**Explanation:**
We passed a dictionary `{'Age': 'Years'}` to the `columns` parameter. This means "find a column named 'Age' and rename it to 'Years'". The result was `df_renamed`. The original `df` was not affected because `inplace=False` is the default.

-----

### 2\. Intermediate Example

You can rename both columns and index labels at the same time.

**Example 2: Renaming multiple columns and index labels**

```python
df = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'User Age': [25, 30]
}, index=['ID_1', 'ID_2'])

print("--- Original DataFrame ---")
print(df)

# Rename 'User Age' -> 'Age' and 'Name' -> 'Full Name'
# Rename index 'ID_1' -> 'user_A'
df_cleaned = df.rename(
    columns={'User Age': 'Age', 'Name': 'Full Name'},
    index={'ID_1': 'user_A'}
)

print("\n--- Cleaned DataFrame ---")
print(df_cleaned)
```

**Output:**

```
--- Original DataFrame ---
       Name  User Age
ID_1  Alice        25
ID_2    Bob        30

--- Cleaned DataFrame ---
       Full Name  Age
user_A     Alice   25
ID_2         Bob   30
```

**Explanation:**
We passed two dictionaries: one to `columns` to fix two column names, and one to `index` to fix one row label. Pandas handled both operations in a single step. Note that 'ID\_2' was not in the `index` dictionary, so it was left unchanged.

-----

### 3\. Advanced or Tricky Case

The most powerful feature of `.rename()` is using a **function** to clean *all* labels at once.

**Example 3: Using a function to rename all columns**

This is extremely common when a file has messy, inconsistent column names.

```python
df = pd.DataFrame({
    ' First Name ': ['Alice', 'Bob'], # Note the whitespace
    'Email Address': [25, 30]  # Note the space and uppercase
})

print("--- Messy Columns ---")
print(df.columns)
print(df)

# We want all columns to be lowercase and have no spaces
def clean_col_name(col_name):
    return col_name.strip().lower().replace(' ', '_')

# Pass the FUNCTION itself (no parentheses) to the 'columns' param
df_clean_cols = df.rename(columns=clean_col_name)

print("\n--- Cleaned Columns ---")
print(df_clean_cols.columns)
print(df_clean_cols)
```

**Output:**

```
--- Messy Columns ---
Index([' First Name ', 'Email Address'], dtype='object')
   First Name   Email Address
0       Alice             25
1         Bob             30

--- Cleaned Columns ---
Index(['first_name', 'email_address'], dtype='object')
   first_name  email_address
0       Alice             25
1         Bob             30
```

**Explanation:**
We passed our custom `clean_col_name` function to the `columns` parameter. Pandas automatically called this function for *every* column name, passing in the old name (e.g., ' First Name ') and using the returned value (e.g., 'first\_name') as the new name.

**Example 4: Using a lambda function (a compact version)**

```python
# A shorter way to do the same thing using a lambda
df_clean_lambda = df.rename(columns=lambda x: x.strip().lower().replace(' ', '_'))

print("\n--- Cleaned with Lambda ---")
print(df_clean_lambda)
```

**Output:**

```
--- Cleaned with Lambda ---
   first_name  email_address
0       Alice             25
1         Bob             30
```

-----

### 4\. Real-World Use Case

This is a classic "data cleaning" step after loading a file.

**Example 5: Cleaning up after a `pd.read_csv`**

```python
# Imagine you just loaded a file and got this:
df = pd.DataFrame({
    'customer ID': ['c1', 'c2'],
    'Transaction Value (USD)': [100.5, 75.2]
})

print("--- Loaded Data (Messy) ---")
print(df)

# Clean it up for analysis
df_clean = df.rename(columns={
    'customer ID': 'customer_id',
    'Transaction Value (USD)': 'value_usd'
})

print("\n--- Cleaned Data (Ready for Analysis) ---")
print(df_clean)
```

**Output:**

```
--- Loaded Data (Messy) ---
  customer ID  Transaction Value (USD)
0          c1                    100.5
1          c2                     75.2

--- Cleaned Data (Ready for Analysis) ---
  customer_id  value_usd
0          c1      100.5
1          c2       75.2
```

**Explanation:**
The original names were bad for programming: they had spaces, capital letters, and parentheses. We used `rename` with a simple dictionary to map the messy names to clean, "snake\_case" names that are easy to use (e.g., `df_clean.customer_id`).

-----

### 5\. Common Mistakes / Pitfalls

**Mistake 6: Forgetting to re-assign (The `inplace=False` mistake)**

```python
# Wrong code
df = pd.DataFrame({'A': [1], 'B': [2]})
print("--- Before ---")
print(df)

# Try to rename, but don't save the result
df.rename(columns={'A': 'col_A'})

print("\n--- After (Still has 'A'!) ---")
print(df)
```

**Error/Wrong Output:**

```
--- Before ---
   A  B
0  1  2

--- After (Still has 'A'!) ---
   A  B
0  1  2
```

**Why it happens:**
`df.rename()` returns a *new DataFrame* by default. It doesn't touch the original `df`. You didn't do anything with the new DataFrame it created, so it was just lost.

**Example 7: Corrected code (Two ways)**

```python
# Way 1: Re-assignment (Preferred)
df = pd.DataFrame({'A': [1], 'B': [2]})
df = df.rename(columns={'A': 'col_A'}) # Assign the new df back to 'df'
print("--- Corrected 1 (Re-assigned) ---")
print(df)

# Way 2: In-place (Discouraged)
df = pd.DataFrame({'A': [1], 'B': [2]})
df.rename(columns={'A': 'col_A'}, inplace=True) # Modifies df directly
print("\n--- Corrected 2 (In-place) ---")
print(df)
```

**Output:**

```
--- Corrected 1 (Re-assigned) ---
   col_A  B
0      1  2

--- Corrected 2 (In-place) ---
   col_A  B
0      1  2
```

**Mistake 8: Using a list instead of a dictionary**

```python
# Wrong code
df = pd.DataFrame({'A': [1], 'B': [2]})
try:
    # This tries to rename 'A' to 'col_A'
    df.rename(columns=['col_A', 'B'])
except TypeError as e:
    print(f"Error: {e}")
```

**Error/Wrong Output:**

```
Error: 'list' object is not callable
```

**Why it happens:**
`.rename()` expects a *mapping* (`{'old': 'new'}`) via a dictionary, or a *rule* via a function. It doesn't accept a list. (If you want to replace *all* columns with a list, you use `df.columns = ['col_A', 'B']`, but that's a different, more dangerous operation).

-----

### 6\. Key Terms (Explained Simply)

  * **Mapper:** A general term for something that "maps" an old value to a new one. In this case, it's either a **dictionary** or a **function**.
  * **Dictionary (dict):** A Python object that stores `key: value` pairs. For `.rename()`, it's `{'old_name': 'new_name'}`.
  * **Function:** A reusable block of code. For `.rename()`, it's a rule that is applied to *every* label (e.g., `str.lower`).
  * **`inplace`:** A parameter that means "do this operation directly on the original object" instead of returning a new one.
  * **Index (Row Labels):** The labels for the rows (e.g., `0, 1, 2...`).
  * **Columns (Column Labels):** The labels for the columns (e.g., `'Age'`, `'Name'`).

-----

### 7\. Best Practices

  * **Prefer Re-assignment:** `df = df.rename(...)` is safer, more readable, and allows method chaining (e.g., `df.rename(...).dropna()`). Avoid `inplace=True`.
  * **Use Dictionaries for Specific Fixes:** If you only need to change 2 or 3 columns, a dictionary is the clearest way. `df.rename(columns={'oldA': 'newA', 'oldB': 'newB'})`.
  * **Use Functions for General Rules:** If you need to make *all* columns lowercase, remove spaces, or add a prefix, a function is the best way. `df.rename(columns=str.strip)` or `df.rename(columns=lambda x: 'col_' + x)`.
  * **Check Your Work:** After renaming, always check `df.columns` or `df.head()` to make sure your changes were applied correctly.

-----

### 8\. Mini Summary

  * `.rename()` changes index (row) or column labels.
  * It does **not** change the data *inside* the DataFrame.
  * To change **columns**, use the `columns` parameter. To change **rows**, use the `index` parameter.
  * Use a **dictionary** (`{'old': 'new'}`) for specific, one-by-one changes.
  * Use a **function** (e.g., `str.lower`) to apply a cleaning rule to *all* labels.
  * It returns a **new copy** by default. You must re-assign: `df = df.rename(...)`.

-----

### 10\. Practice Tasks

**Data for Tasks:**

```python
df_practice = pd.DataFrame(
    {'Name': ['Dan', 'Eva', 'Sal'],
     ' PAY RATE ': [15.50, 22.00, 30.00],
     'HIRE DATE': ['2020-01-01', '2021-05-15', '2022-11-30']},
    index=['emp_1', 'emp_2', 'emp_3']
)
print("--- Practice Data ---")
print(df_practice)
```

**Task 9 (Easy):**
Create a new DataFrame `df_easy` from `df_practice` that renames the 'Name' column to 'Employee\_Name'.

**Task 10 (Medium):**
Create a new DataFrame `df_medium` from `df_practice` that renames the index label 'emp\_1' to 'e\_1' *and* the column 'HIRE DATE' to 'hire\_date'.

**Task 11 (Hard):**
Create a new DataFrame `df_hard` from `df_practice` where *all* column names are made lowercase, have no leading/trailing whitespace, and have spaces replaced with underscores (`_`). (Hint: Use a function or lambda).

-----

### 11\. Recommended Next Topic

You've learned how to explore and rename. [cite\_start]The next logical step is to change the *structure* of the DataFrame by adding or removing columns entirely. [cite: 23]

**Recommended:** **Adding/removing columns (`.insert()`, `del`, `.drop()`)**

-----

### 12\. Quick Reference Card

| Operation | Syntax |
| :--- | :--- |
| **Rename One Column** | `df = df.rename(columns={'OldName': 'NewName'})` |
| **Rename One Index** | `df = df.rename(index={'OldLabel': 'NewLabel'})` |
| **Rename Multiple** | `df = df.rename(columns={'A':'a', 'B':'b'}, index={0:'x'})` |
| **Clean All Columns** | `df = df.rename(columns=str.lower)` |
| **Clean All (Complex)**| `df = df.rename(columns=lambda c: c.strip().replace(' ','_'))` |
| **In-place (Discouraged)**| `df.rename(columns={'A':'a'}, inplace=True)` |

-----

### 13\. Common Interview Questions

1.  **How do you rename the column 'User ID' to 'user\_id'?**
      * `df = df.rename(columns={'User ID': 'user_id'})`
2.  **What's the difference between `df.rename(columns=...)` and `df.columns = ...`?**
      * `df.rename(columns=...)` uses a dictionary to change *specific* columns, leaving others alone. It's safe.
      * `df.columns = [...]` *replaces all column names* from a list. The list *must* be the exact same length as the number of columns, or it will error. It's powerful but risky.
3.  **How would you make all column names in your DataFrame lowercase?**
      * Pass the `str.lower` function: `df = df.rename(columns=str.lower)`
4.  **Why did my `df.rename(...)` not work? I ran it but the columns are the same.**
      * You forgot to re-assign the result. `.rename()` returns a *new DataFrame* by default (`inplace=False`). You need to do `df = df.rename(...)`.

-----

### 14\. Performance Considerations

  * **Time Complexity:** Very fast, **O(M + N)**, where M is the number of columns and N is the number of rows. It just iterates over the labels, not the data.
  * **Memory Usage (View vs. Copy):** `.rename()` returns a new DataFrame, but this is a **shallow copy**. It creates new `Index` objects for the renamed labels, but it **shares the original data** (the underlying NumPy arrays). This is extremely memory-efficient.
  * `inplace=True` will modify the metadata in place and use no extra memory, but it's not the recommended practice.

-----

### 15\. When NOT to Use This

  * **Don't use `.rename()` to change the *data* in a column.** It only changes labels. To change the data *values*, use `.loc`, `.iloc`, or `.replace()`.
  * **Don't use `.rename()` to replace *all* column names if you have a simple list.** If you have 3 columns and you want to call them `['a', 'b', 'c']`, it's simpler (though riskier) to just do `df.columns = ['a', 'b', 'c']`. This is common at the end of a `pd.read_csv()` to set initial names.
  * **Don't use `df.rename(columns=...)` to change *index (row)* labels.** It won't work. You must use the `index` parameter: `df.rename(index=...)`.