# 12. Indexing Operations
- `df.set_index()`: Set a specified column as the index.
- `df.reset_index()`: Reset the index.
- `df.reindex()`: Reindex the DataFrame.
- `df.index.name`: Change or retrieve index name.


In [2]:
import pandas as pd
import numpy as np

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Hannah', 'Ivan', 'Julia'],
    'Role': ['Developer', 'Designer', 'Manager', 'Developer', 'Designer', 'Manager', 'Tester', 'Developer', 'Tester',
             'Designer'],
    'Experience (Years)': [5, 3, 10, 4, 2, 8, 6, 3, 7, 1],
    'Salary ($)': [80000, 65000, 120000, 75000, 60000, 110000, 70000, 72000, 68000, 62000],
    'Location': ['New York', 'San Francisco', 'Los Angeles', 'New York', 'San Francisco', 'Los Angeles', 'New York',
                 'Chicago', 'Chicago', 'Los Angeles']
}

# Creating DataFrame
df = pd.DataFrame(data)

print(df)

      Name       Role  Experience (Years)  Salary ($)       Location
0    Alice  Developer                   5       80000       New York
1      Bob   Designer                   3       65000  San Francisco
2  Charlie    Manager                  10      120000    Los Angeles
3    David  Developer                   4       75000       New York
4      Eva   Designer                   2       60000  San Francisco
5    Frank    Manager                   8      110000    Los Angeles
6    Grace     Tester                   6       70000       New York
7   Hannah  Developer                   3       72000        Chicago
8     Ivan     Tester                   7       68000        Chicago
9    Julia   Designer                   1       62000    Los Angeles


# pandas.DataFrame.set_index

`DataFrame.set_index(keys, *, drop=True, append=False, inplace=False, verify_integrity=False)`

Set the DataFrame index using existing columns.

Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.
---
## **Parameters**

### **keys**
- **Type:** `label` or `array-like` or `list of labels/arrays`
  This parameter can be either:
  - A single column key.
  - A single array of the same length as the calling DataFrame.
  - A list containing an arbitrary combination of column keys and arrays.
    Here, “array” encompasses:
    - `Series`
    - `Index`
    - `np.ndarray`
    - Instances of `Iterator`.

### **drop**
- **Type:** `bool`
- **Default:** `True`
  Delete columns to be used as the new index.

### **append**
- **Type:** `bool`
- **Default:** `False`
  Whether to append columns to the existing index.

### **inplace**
- **Type:** `bool`
- **Default:** `False`
  Whether to modify the DataFrame rather than creating a new one.


### **verify_integrity**
- **Type:** `bool`
- **Default:** `False`
  Check the new index for duplicates.
  - Setting this to `False` will defer the check until necessary and improve the method's performance.



## **Returns**
- **Type:** `DataFrame` or `None`
  - Changed row labels (if `inplace=False`).
  - `None` if `inplace=True`.


In [3]:
print(df.set_index('Name'))

              Role  Experience (Years)  Salary ($)       Location
Name                                                             
Alice    Developer                   5       80000       New York
Bob       Designer                   3       65000  San Francisco
Charlie    Manager                  10      120000    Los Angeles
David    Developer                   4       75000       New York
Eva       Designer                   2       60000  San Francisco
Frank      Manager                   8      110000    Los Angeles
Grace       Tester                   6       70000       New York
Hannah   Developer                   3       72000        Chicago
Ivan        Tester                   7       68000        Chicago
Julia     Designer                   1       62000    Los Angeles


In [4]:
print(df.set_index(['Role','Name']).sort_index(level=0))

                   Experience (Years)  Salary ($)       Location
Role      Name                                                  
Designer  Bob                       3       65000  San Francisco
          Eva                       2       60000  San Francisco
          Julia                     1       62000    Los Angeles
Developer Alice                     5       80000       New York
          David                     4       75000       New York
          Hannah                    3       72000        Chicago
Manager   Charlie                  10      120000    Los Angeles
          Frank                     8      110000    Los Angeles
Tester    Grace                     6       70000       New York
          Ivan                      7       68000        Chicago


In [5]:
print(df.set_index(['Role','Name'],append=True).sort_index(level=0))

                     Experience (Years)  Salary ($)       Location
  Role      Name                                                  
0 Developer Alice                     5       80000       New York
1 Designer  Bob                       3       65000  San Francisco
2 Manager   Charlie                  10      120000    Los Angeles
3 Developer David                     4       75000       New York
4 Designer  Eva                       2       60000  San Francisco
5 Manager   Frank                     8      110000    Los Angeles
6 Tester    Grace                     6       70000       New York
7 Developer Hannah                    3       72000        Chicago
8 Tester    Ivan                      7       68000        Chicago
9 Designer  Julia                     1       62000    Los Angeles


In [6]:
print(df.set_index(['Role','Name'],append=True, drop=False).sort_index(level=0))

                        Name       Role  Experience (Years)  Salary ($)  \
  Role      Name                                                          
0 Developer Alice      Alice  Developer                   5       80000   
1 Designer  Bob          Bob   Designer                   3       65000   
2 Manager   Charlie  Charlie    Manager                  10      120000   
3 Developer David      David  Developer                   4       75000   
4 Designer  Eva          Eva   Designer                   2       60000   
5 Manager   Frank      Frank    Manager                   8      110000   
6 Tester    Grace      Grace     Tester                   6       70000   
7 Developer Hannah    Hannah  Developer                   3       72000   
8 Tester    Ivan        Ivan     Tester                   7       68000   
9 Designer  Julia      Julia   Designer                   1       62000   

                          Location  
  Role      Name                    
0 Developer Alice        

In [7]:
df1=df.copy()

In [8]:
df1.set_index(['Role','Name'],append=True, drop=True, inplace=True)

In [9]:
df1.index

MultiIndex([(0, 'Developer',   'Alice'),
            (1,  'Designer',     'Bob'),
            (2,   'Manager', 'Charlie'),
            (3, 'Developer',   'David'),
            (4,  'Designer',     'Eva'),
            (5,   'Manager',   'Frank'),
            (6,    'Tester',   'Grace'),
            (7, 'Developer',  'Hannah'),
            (8,    'Tester',    'Ivan'),
            (9,  'Designer',   'Julia')],
           names=[None, 'Role', 'Name'])

# pandas.DataFrame.reset_index

`DataFrame.reset_index(level=None, *, drop=False, inplace=False, col_level=0, col_fill='', allow_duplicates=<no_default>, names=None)`

Reset the index, or a level of it.

This method resets the index of the DataFrame and uses the default integer index instead. If the DataFrame has a `MultiIndex`, it can remove one or more levels.


## **Parameters**

### **level**
- **Type:** `int`, `str`, `tuple`, or `list`
- **Default:** `None`
  - Specifies the levels to remove from the index.
  - If not provided, removes all levels of the index by default.



### **drop**
- **Type:** `bool`
- **Default:** `False`
  - If `True`, does not insert the index into the DataFrame as columns.
  - Resets the index to the default integer index.



### **inplace**
- **Type:** `bool`
- **Default:** `False`
  - If `True`, modifies the original DataFrame instead of returning a new one.



### **col_level**
- **Type:** `int` or `str`
- **Default:** `0`
  - For `MultiIndex` columns, determines which level the labels are inserted into.
  - By default, labels are inserted into the first level.



### **col_fill**
- **Type:** `object`
- **Default:** `''`
  - For `MultiIndex` columns, determines how the other levels are named.
  - If `None`, the index name is repeated.


### **allow_duplicates**
- **Type:** `bool`
- **Default:** `lib.no_default`
  - If `True`, allows duplicate column labels to be created.
  - **Added in version 1.5.0.**


### **names**
- **Type:** `int`, `str`, or 1-dimensional list`
- **Default:** `None`
  - Renames the column that contains the index data using the provided string.
  - If the DataFrame has a `MultiIndex`, this must be a list or tuple with a length equal to the number of levels.
  - **Added in version 1.5.0.**


## **Returns**
- **Type:** `DataFrame` or `None`
  - Returns the DataFrame with the new index.
  - Returns `None` if `inplace=True`.


In [10]:
df1

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Experience (Years),Salary ($),Location
Unnamed: 0_level_1,Role,Name,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,Developer,Alice,5,80000,New York
1,Designer,Bob,3,65000,San Francisco
2,Manager,Charlie,10,120000,Los Angeles
3,Developer,David,4,75000,New York
4,Designer,Eva,2,60000,San Francisco
5,Manager,Frank,8,110000,Los Angeles
6,Tester,Grace,6,70000,New York
7,Developer,Hannah,3,72000,Chicago
8,Tester,Ivan,7,68000,Chicago
9,Designer,Julia,1,62000,Los Angeles


In [11]:
df1.columns=[['about_yourself','about_job', 'about_yourself'],list(df1.columns)]
print(df1)

                        about_yourself  about_job about_yourself
                    Experience (Years) Salary ($)       Location
  Role      Name                                                
0 Developer Alice                    5      80000       New York
1 Designer  Bob                      3      65000  San Francisco
2 Manager   Charlie                 10     120000    Los Angeles
3 Developer David                    4      75000       New York
4 Designer  Eva                      2      60000  San Francisco
5 Manager   Frank                    8     110000    Los Angeles
6 Tester    Grace                    6      70000       New York
7 Developer Hannah                   3      72000        Chicago
8 Tester    Ivan                     7      68000        Chicago
9 Designer  Julia                    1      62000    Los Angeles


In [12]:
print(df1.reset_index(level=1,col_fill='about_job',col_level=1))

           about_job     about_yourself  about_job about_yourself
                Role Experience (Years) Salary ($)       Location
  Name                                                           
0 Alice    Developer                  5      80000       New York
1 Bob       Designer                  3      65000  San Francisco
2 Charlie    Manager                 10     120000    Los Angeles
3 David    Developer                  4      75000       New York
4 Eva       Designer                  2      60000  San Francisco
5 Frank      Manager                  8     110000    Los Angeles
6 Grace       Tester                  6      70000       New York
7 Hannah   Developer                  3      72000        Chicago
8 Ivan        Tester                  7      68000        Chicago
9 Julia     Designer                  1      62000    Los Angeles


In [13]:
print(df1)
print(df1.reset_index(level=2,col_fill='about_yourself',col_level=1))

                        about_yourself  about_job about_yourself
                    Experience (Years) Salary ($)       Location
  Role      Name                                                
0 Developer Alice                    5      80000       New York
1 Designer  Bob                      3      65000  San Francisco
2 Manager   Charlie                 10     120000    Los Angeles
3 Developer David                    4      75000       New York
4 Designer  Eva                      2      60000  San Francisco
5 Manager   Frank                    8     110000    Los Angeles
6 Tester    Grace                    6      70000       New York
7 Developer Hannah                   3      72000        Chicago
8 Tester    Ivan                     7      68000        Chicago
9 Designer  Julia                    1      62000    Los Angeles
            about_yourself                     about_job about_yourself
                      Name Experience (Years) Salary ($)       Location
  Role     

In [14]:
print(df1.reset_index(names=['ID'],level=0,col_fill='about_yourself',col_level=1))

                  about_yourself                     about_job about_yourself
                              ID Experience (Years) Salary ($)       Location
Role      Name                                                               
Developer Alice                0                  5      80000       New York
Designer  Bob                  1                  3      65000  San Francisco
Manager   Charlie              2                 10     120000    Los Angeles
Developer David                3                  4      75000       New York
Designer  Eva                  4                  2      60000  San Francisco
Manager   Frank                5                  8     110000    Los Angeles
Tester    Grace                6                  6      70000       New York
Developer Hannah               7                  3      72000        Chicago
Tester    Ivan                 8                  7      68000        Chicago
Designer  Julia                9                  1      62000  

### pandas.DataFrame.reindex

`DataFrame.reindex(labels=None, *, index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=nan, limit=None, tolerance=None)`

Conform DataFrame to new index with optional filling logic.

Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and `copy=False`.

#### Parameters:

- **labels** (`array-like`, optional):
  New labels / index to conform the axis specified by ‘axis’ to.

- **index** (`array-like`, optional):
  New labels for the index. Preferably an `Index` object to avoid duplicating data.

- **columns** (`array-like`, optional):
  New labels for the columns. Preferably an `Index` object to avoid duplicating data.

- **axis** (`int` or `str`, optional):
  Axis to target. Can be either the axis name (‘index’, ‘columns’) or number (0, 1).

- **method** (`{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}`):
  Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

  - `None` (default): Don’t fill gaps.
  - `pad` / `ffill`: Propagate last valid observation forward to next valid.
  - `backfill` / `bfill`: Use next valid observation to fill gap.
  - `nearest`: Use nearest valid observations to fill gap.

- **copy** (`bool`, default `True`):
  Return a new object, even if the passed indexes are the same.

  > **Note**: The `copy` keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a `copy` keyword will use a lazy copy mechanism to defer the copy and ignore the `copy` keyword. The `copy` keyword will be removed in a future version of pandas.
  You can already get the future behavior and improvements through enabling copy on write:
  ```python
  pd.options.mode.copy_on_write = True
 - level (int or name):
Broadcast across a level, matching Index values on the passed MultiIndex level.

- fill_value (scalar, default np.nan):
Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

- limit (int, default None):
Maximum number of consecutive elements to forward or backward fill.

- tolerance (optional):
Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation:

python
Copy code
abs(index[indexer] - target) <= tolerance
Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Returns:
DataFrame with changed index.


In [24]:
df.reindex(list(range(12)),fill_value=0 )

Unnamed: 0,Name,Role,Experience (Years),Salary ($),Location
0,Alice,Developer,5,80000,New York
1,Bob,Designer,3,65000,San Francisco
2,Charlie,Manager,10,120000,Los Angeles
3,David,Developer,4,75000,New York
4,Eva,Designer,2,60000,San Francisco
5,Frank,Manager,8,110000,Los Angeles
6,Grace,Tester,6,70000,New York
7,Hannah,Developer,3,72000,Chicago
8,Ivan,Tester,7,68000,Chicago
9,Julia,Designer,1,62000,Los Angeles


In [26]:
df.reindex(list(range(12)),method='ffill')

Unnamed: 0,Name,Role,Experience (Years),Salary ($),Location
0,Alice,Developer,5,80000,New York
1,Bob,Designer,3,65000,San Francisco
2,Charlie,Manager,10,120000,Los Angeles
3,David,Developer,4,75000,New York
4,Eva,Designer,2,60000,San Francisco
5,Frank,Manager,8,110000,Los Angeles
6,Grace,Tester,6,70000,New York
7,Hannah,Developer,3,72000,Chicago
8,Ivan,Tester,7,68000,Chicago
9,Julia,Designer,1,62000,Los Angeles


## pandas.Index.name
- property Index.name
-
Return Index or MultiIndex name.

In [28]:
df.index.name='ID'
df

Unnamed: 0_level_0,Name,Role,Experience (Years),Salary ($),Location
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,Alice,Developer,5,80000,New York
1,Bob,Designer,3,65000,San Francisco
2,Charlie,Manager,10,120000,Los Angeles
3,David,Developer,4,75000,New York
4,Eva,Designer,2,60000,San Francisco
5,Frank,Manager,8,110000,Los Angeles
6,Grace,Tester,6,70000,New York
7,Hannah,Developer,3,72000,Chicago
8,Ivan,Tester,7,68000,Chicago
9,Julia,Designer,1,62000,Los Angeles


In [29]:
print(df)

       Name       Role  Experience (Years)  Salary ($)       Location
ID                                                                   
0     Alice  Developer                   5       80000       New York
1       Bob   Designer                   3       65000  San Francisco
2   Charlie    Manager                  10      120000    Los Angeles
3     David  Developer                   4       75000       New York
4       Eva   Designer                   2       60000  San Francisco
5     Frank    Manager                   8      110000    Los Angeles
6     Grace     Tester                   6       70000       New York
7    Hannah  Developer                   3       72000        Chicago
8      Ivan     Tester                   7       68000        Chicago
9     Julia   Designer                   1       62000    Los Angeles
