In [1]:
# Enable code formatting using external plugin: nb_black.
%reload_ext nb_black

<IPython.core.display.Javascript object>

# Pandas Tutorial - PART 2

**Ref: [Pandas Tutorials][1] by [Corey Schafer][2]**

[1]: https://youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS
[2]: https://coreyms.com/

### Topics:

1. [Basics of pandas](#Basics-of-pandas-using-test-data)
2. [`loc` and `iloc`](#Fetch-rows-and-columns-using-loc-and-iloc-indexers)
  1. [`iloc`](#Fetch-rows-using-iloc:-Integer-Location)
  2. [`loc`](#Fetch-rows-and-columns-using-loc:-Location)
3. [Custom Indexes](#Custom-Indexes)
4. [Filters](#Filters)

#### Load and configure `pandas` library

In [2]:
import pandas as pd

print("Pandas version:", pd.__version__)

# Set display width to maximum 130 chacters in the output, post which it will continue in next line.
pd.options.display.width = 130

Pandas version: 1.3.4


<IPython.core.display.Javascript object>

## Basics of `pandas` using test data

Basics:
1. Understand different data types used in `pandas` library. 
  1. Series
  2. DataFrame
  3. Panel (Not discussed in the video)
2. Different techniques to access rows and columns in a `DataFrame`.

Create a python dictionary containing some sample data:

In [3]:
people = {
    "Code": ["IN028", "UK007", "US003", "US004", "XX001", "IN100"],
    "First Name": ["Dheemanth", "Alex", "Corey", "Bucky", "John", "M.S"],
    "Last Name": ["Bhat", "Rider", "Schafer", "Roberts", "Oldman", "Dhoni"],
    "Country": ["India", "U.K.", "U.S.", "U.S.", None, "India"],
    "Age": [28, 14, 32, 37, None, 40],
    "AOI": ["Codding", "Adventure", "Youtube", "Youtube", "History", "Cricket"],
    "Fictional": [False, True, False, False, True, False],
}

people

{'Code': ['IN028', 'UK007', 'US003', 'US004', 'XX001', 'IN100'],
 'First Name': ['Dheemanth', 'Alex', 'Corey', 'Bucky', 'John', 'M.S'],
 'Last Name': ['Bhat', 'Rider', 'Schafer', 'Roberts', 'Oldman', 'Dhoni'],
 'Country': ['India', 'U.K.', 'U.S.', 'U.S.', None, 'India'],
 'Age': [28, 14, 32, 37, None, 40],
 'AOI': ['Codding', 'Adventure', 'Youtube', 'Youtube', 'History', 'Cricket'],
 'Fictional': [False, True, False, False, True, False]}

<IPython.core.display.Javascript object>

Load test data from the above python dictionary into a `DataFrame` and display the same.

In [4]:
df = pd.DataFrame(people)
df

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
0,IN028,Dheemanth,Bhat,India,28.0,Codding,False
1,UK007,Alex,Rider,U.K.,14.0,Adventure,True
2,US003,Corey,Schafer,U.S.,32.0,Youtube,False
3,US004,Bucky,Roberts,U.S.,37.0,Youtube,False
4,XX001,John,Oldman,,,History,True
5,IN100,M.S,Dhoni,India,40.0,Cricket,False


<IPython.core.display.Javascript object>

> Note:
> 1. First column without column-name is the **default index** column provided by the `pandas` library.
> 2. Default indexes are **unique values for rows**. It starts from zero and incremented by one.
> 3. Indexes **need not be unique** always i.e., two or more rows can have same index.

### `Series`

`Series` is a 1-D array (Vector) i.e., a series can be either these two:
1. All rows of a single column.
2. All columns of a single row.

In [5]:
type(df["First Name"])

pandas.core.series.Series

<IPython.core.display.Javascript object>

### `DataFrame`

1. `DataFrame` is a 2-D array i.e., it contains rows and columns.
2. `DataFrame` is a container for multiple `Series` objects.

In [6]:
type(df[["First Name", "Last Name"]])

pandas.core.frame.DataFrame

<IPython.core.display.Javascript object>

### Access single column in a `DataFrame`

A single column can be assessed in two ways, for example:

In [7]:
# Technique #1: When column name contains space or python keyword or pandas attribute/method name.
df["First Name"]

0    Dheemanth
1         Alex
2        Corey
3        Bucky
4         John
5          M.S
Name: First Name, dtype: object

<IPython.core.display.Javascript object>

As it is seen in the above output, a `Series` (in this case first column) also gets **default unique index**, starting from zero and incremented by one.

In [8]:
# Technique #2: Using dot notation.
df.Country

0    India
1     U.K.
2     U.S.
3     U.S.
4     None
5    India
Name: Country, dtype: object

<IPython.core.display.Javascript object>

#### Access single row of a column

Accessing single row of a column returns the value in [Numpy data types][1].

[1]: https://numpy.org/doc/stable/user/basics.types.html#array-types-and-conversions-between-types

In [9]:
f_name = df["First Name"][2]
age = df["Age"][2]
is_fic = df["Fictional"][2]

print("Data type: ", type(f_name), "Value: ", f_name)
print("Data type: ", type(age), "Value: ", age)
print("Data type: ", type(is_fic), "Value: ", is_fic)

Data type:  <class 'str'> Value:  Corey
Data type:  <class 'numpy.float64'> Value:  32.0
Data type:  <class 'numpy.bool_'> Value:  False


<IPython.core.display.Javascript object>

#### Access selected rows of a column - Row Slicing

In [10]:
df["Last Name"][1:4]

1      Rider
2    Schafer
3    Roberts
Name: Last Name, dtype: object

<IPython.core.display.Javascript object>

### Access multiple columns in a `DataFrame`

To accesses multiple columns in a `DataFrame`, pass list of column names using `[]` notation.

In [11]:
df[["First Name", "Last Name"]]

Unnamed: 0,First Name,Last Name
0,Dheemanth,Bhat
1,Alex,Rider
2,Corey,Schafer
3,Bucky,Roberts
4,John,Oldman
5,M.S,Dhoni


<IPython.core.display.Javascript object>

### Valid ways to access columns in `DataFrame`

✅ - Valid  
⚠️ - KeyError  
❌ - SyntaxError

1. ✅ `df["First Name"]`
2. ⚠️ `df["First Name", "Last Name"]`
3. ✅ `df["First Name" and "Last Name"]`
4. ⚠️ `df[0]`
5. ⚠️ `df[0, 1, 2]`
6. ✅ `df[:]`
7. ✅ `df[::-1]` **# Reverse the DaraFrame**
8. ✅ `df[0:3]`
9. ⚠️ `df[0, "First Name"]` **# -> `df["0", "First Name"]` will work if column exits with name "0".**
10. ⚠️ `df[0, ["First Name"]]`
11. ⚠️ `df[0, "First Name":"Age"]`
12. ❌ `df[0, ["First Name":"Age"]]`
13. ✅ `df[["First Name"]]`
14. ✅ `df[["First Name", "Last Name"]]`
15. ✅ `df[["First Name" and "Last Name"]]`
16. ⚠️ `df[[0]]`
17. ⚠️ `df[[0, 1, 2]]`
18. ❌ `df[[:]]`
19. ❌ `df[[::-1]]`
20. ❌ `df[[0:3]]`
21. ⚠️ `df[[0, "First Name"]]`
22. ❌ `df[[0, "First Name":"Age"]]`

> Note: Strings / List of strings / Slicing can be used to access columns in a `DataFrame`.

#### Get list of column names used in the `DataFrame`.

In [12]:
df.columns

Index(['Code', 'First Name', 'Last Name', 'Country', 'Age', 'AOI', 'Fictional'], dtype='object')

<IPython.core.display.Javascript object>

## Fetch rows and columns using `loc` and `iloc` indexers

All the above techniques enables us to access data directly from a `DataFrame` by specifying column name (or list of column names) in a `[]` (square brackets) notation. To access data by addressing rows (and/or columns) we have to use **`indexers`**:

1. `iloc` - Integer Location
2. `loc` - Location

### Fetch rows using `iloc`: Integer Location

`iloc` allows us to access rows in a `DataFrame` using integer location. 

#### Access first row using integer location zero

In [13]:
type(df.iloc[0])

pandas.core.series.Series

<IPython.core.display.Javascript object>

In [14]:
df.iloc[0]

Code              IN028
First Name    Dheemanth
Last Name          Bhat
Country           India
Age                28.0
AOI             Codding
Fictional         False
Name: 0, dtype: object

<IPython.core.display.Javascript object>

As it is seen in the above output, a `Series` (in this case first row) gets a **string index**, which is nothing but the column names.

#### Access multiple rows using list of integers

In [15]:
type(df.iloc[[0, 1, 2]])

pandas.core.frame.DataFrame

<IPython.core.display.Javascript object>

In [16]:
df.iloc[[3, 1, 2]]

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
3,US004,Bucky,Roberts,U.S.,37.0,Youtube,False
1,UK007,Alex,Rider,U.K.,14.0,Adventure,True
2,US003,Corey,Schafer,U.S.,32.0,Youtube,False


<IPython.core.display.Javascript object>

> Note: Rows are displayed based on the order passed in the list.

In [17]:
df.iloc[[-1]]

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
5,IN100,M.S,Dhoni,India,40.0,Cricket,False


<IPython.core.display.Javascript object>

In [18]:
df.iloc[[]]

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional


<IPython.core.display.Javascript object>

Empty list of integers returns empty result.

#### Access multiple rows using Slicing.

Syntax:

```python
df.iloc[start_row_idx:end_row_idx:offset]
```

> Note:
> 1. `end_row_idx` is not inclusive and optional.
> 2. `offset` is optional.

In [19]:
# Fetch rows starting from 2nd index till 3rd index.
df.iloc[2:4]

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
2,US003,Corey,Schafer,U.S.,32.0,Youtube,False
3,US004,Bucky,Roberts,U.S.,37.0,Youtube,False


<IPython.core.display.Javascript object>

In [20]:
# Fetch rows starting from 0th index till 1st index.
df.iloc[:2]

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
0,IN028,Dheemanth,Bhat,India,28.0,Codding,False
1,UK007,Alex,Rider,U.K.,14.0,Adventure,True


<IPython.core.display.Javascript object>

In [21]:
# Get rows with odd-numbered indexes.
df.iloc[1::2]

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
1,UK007,Alex,Rider,U.K.,14.0,Adventure,True
3,US004,Bucky,Roberts,U.S.,37.0,Youtube,False
5,IN100,M.S,Dhoni,India,40.0,Cricket,False


<IPython.core.display.Javascript object>

In [22]:
# Get all rows in reverse order of their index.
df.iloc[::-1]

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
5,IN100,M.S,Dhoni,India,40.0,Cricket,False
4,XX001,John,Oldman,,,History,True
3,US004,Bucky,Roberts,U.S.,37.0,Youtube,False
2,US003,Corey,Schafer,U.S.,32.0,Youtube,False
1,UK007,Alex,Rider,U.K.,14.0,Adventure,True
0,IN028,Dheemanth,Bhat,India,28.0,Codding,False


<IPython.core.display.Javascript object>

### Fetch columns using `iloc`

#### Access first column using integer location zero

In [23]:
type(df.iloc[:, 0])

pandas.core.series.Series

<IPython.core.display.Javascript object>

In [24]:
df.iloc[:, 0]

0    IN028
1    UK007
2    US003
3    US004
4    XX001
5    IN100
Name: Code, dtype: object

<IPython.core.display.Javascript object>

As it is seen in the above output, a `Series` (in this case first column) gets **default unique index**, starting from zero and incremented by one where a row gets **default string index**.

#### Access multiple columns using list of integers

In [25]:
df.iloc[:, [1, 2, 0]]

Unnamed: 0,First Name,Last Name,Code
0,Dheemanth,Bhat,IN028
1,Alex,Rider,UK007
2,Corey,Schafer,US003
3,Bucky,Roberts,US004
4,John,Oldman,XX001
5,M.S,Dhoni,IN100


<IPython.core.display.Javascript object>

> Note: Columns are displayed based on the order passed in the list.

#### Access multiple columns using Slicing.

Syntax:

```python
df.iloc[start_row_idx:end_row_idx:offset, start_col_idx:end_col_idx:offset]
```

> Note:
> 1. `end_row_idx` and `end_col_idx` are not inclusive and optional.
> 2. `offset` is optional.
> 3. Default index column **does not participate** in slicing.

In [26]:
print("All column names for reference:", df.columns)

# Fetch all rows but columns starting from 2nd index till 3rd index.
df.iloc[:, 2:4]

All column names for reference: Index(['Code', 'First Name', 'Last Name', 'Country', 'Age', 'AOI', 'Fictional'], dtype='object')


Unnamed: 0,Last Name,Country
0,Bhat,India
1,Rider,U.K.
2,Schafer,U.S.
3,Roberts,U.S.
4,Oldman,
5,Dhoni,India


<IPython.core.display.Javascript object>

In [27]:
# Fetch all rows but columns starting from 0th index till 2nd index.
df.iloc[:, :3]

Unnamed: 0,Code,First Name,Last Name
0,IN028,Dheemanth,Bhat
1,UK007,Alex,Rider
2,US003,Corey,Schafer
3,US004,Bucky,Roberts
4,XX001,John,Oldman
5,IN100,M.S,Dhoni


<IPython.core.display.Javascript object>

In [28]:
print("All column names for reference:", df.columns)

# Fetch all rows but only even-indexed columns.
df.iloc[:, 1::2]

All column names for reference: Index(['Code', 'First Name', 'Last Name', 'Country', 'Age', 'AOI', 'Fictional'], dtype='object')


Unnamed: 0,First Name,Country,AOI
0,Dheemanth,India,Codding
1,Alex,U.K.,Adventure
2,Corey,U.S.,Youtube
3,Bucky,U.S.,Youtube
4,John,,History
5,M.S,India,Cricket


<IPython.core.display.Javascript object>

In [29]:
# Fetch all rows and columns in reverse order of their indexes.
df.iloc[::-1, ::-1]

Unnamed: 0,Fictional,AOI,Age,Country,Last Name,First Name,Code
5,False,Cricket,40.0,India,Dhoni,M.S,IN100
4,True,History,,,Oldman,John,XX001
3,False,Youtube,37.0,U.S.,Roberts,Bucky,US004
2,False,Youtube,32.0,U.S.,Schafer,Corey,US003
1,True,Adventure,14.0,U.K.,Rider,Alex,UK007
0,False,Codding,28.0,India,Bhat,Dheemanth,IN028


<IPython.core.display.Javascript object>

### Fetch rows and columns using `loc`: Location

1. `iloc` uses **integer locations** or **indexes** to fetch rows and columns whereas `loc` uses **labels** to fetch rows and columns.
2. _labels_ are very different from _integer locations_ hence some of the techniques used in `iloc` will not work (results in `KeyError` exception) in `loc`.
3. _labels_ can be strings or integers.

#### Access first row using label zero

In [30]:
df.loc[0]

Code              IN028
First Name    Dheemanth
Last Name          Bhat
Country           India
Age                28.0
AOI             Codding
Fictional         False
Name: 0, dtype: object

<IPython.core.display.Javascript object>

#### Access multiple rows using list of labels

In [31]:
df.loc[[3, 1, 2]]

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
3,US004,Bucky,Roberts,U.S.,37.0,Youtube,False
1,UK007,Alex,Rider,U.K.,14.0,Adventure,True
2,US003,Corey,Schafer,U.S.,32.0,Youtube,False


<IPython.core.display.Javascript object>

> Note: Rows are displayed based on the order passed in the list.

Negative integers cannot be used to access rows, as any numbers used to address rows will be considered as _labels_ and will be compared exactly with _default indexes_.

In [32]:
try:
    print("Default index:", df.index)
    df.loc[[-1]]  # Does not work. Throws error.

except KeyError as ke:
    print(ke)

Default index: RangeIndex(start=0, stop=6, step=1)
"None of [Int64Index([-1], dtype='int64')] are in the [index]"


<IPython.core.display.Javascript object>

In [33]:
df.loc[[]]

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional


<IPython.core.display.Javascript object>

Empty list of labels return empty result.

#### Access multiple rows using Slicing.

Syntax:

```python
df.loc[start_row_label:end_row_label:offset]
```

> Note:
> 1. `end_row_label` **is inclusive** and optional whereas in `iloc` its not inclusive.
> 2. `offset` is optional.

In [34]:
# Fetch rows starting from 2nd index till 4th index.
df.loc[2:4]

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
2,US003,Corey,Schafer,U.S.,32.0,Youtube,False
3,US004,Bucky,Roberts,U.S.,37.0,Youtube,False
4,XX001,John,Oldman,,,History,True


<IPython.core.display.Javascript object>

In [35]:
# Fetch rows starting from 0th index till 2nd index.
df.loc[:2]

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
0,IN028,Dheemanth,Bhat,India,28.0,Codding,False
1,UK007,Alex,Rider,U.K.,14.0,Adventure,True
2,US003,Corey,Schafer,U.S.,32.0,Youtube,False


<IPython.core.display.Javascript object>

In [36]:
# Get rows with odd-numbered indexes.
df.loc[1::2]

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
1,UK007,Alex,Rider,U.K.,14.0,Adventure,True
3,US004,Bucky,Roberts,U.S.,37.0,Youtube,False
5,IN100,M.S,Dhoni,India,40.0,Cricket,False


<IPython.core.display.Javascript object>

In [37]:
# Get all rows in reverse order of their index.
df.loc[::-1]

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
5,IN100,M.S,Dhoni,India,40.0,Cricket,False
4,XX001,John,Oldman,,,History,True
3,US004,Bucky,Roberts,U.S.,37.0,Youtube,False
2,US003,Corey,Schafer,U.S.,32.0,Youtube,False
1,UK007,Alex,Rider,U.K.,14.0,Adventure,True
0,IN028,Dheemanth,Bhat,India,28.0,Codding,False


<IPython.core.display.Javascript object>

#### Access first column using integer label zero

In `loc` integers cannot be used to access columns.

In [38]:
try:
    df.loc[:, 0]  # Does not work. Throws error.

except KeyError as ke:
    print(ke)

0


<IPython.core.display.Javascript object>

To access single column in a `DataFrame` use column name.

In [39]:
df.loc[:, "AOI"]

0      Codding
1    Adventure
2      Youtube
3      Youtube
4      History
5      Cricket
Name: AOI, dtype: object

<IPython.core.display.Javascript object>

#### Access multiple columns using list of _labels_ or column names

In [40]:
# Equivaluent of `df.iloc[:, [1, 2, 0]]` in `loc`.
print(f"Columns: {[f'{index}: {column}' for index, column in enumerate(df.columns)]}")

df.loc[:, ["First Name", "Last Name", "Code"]]

Columns: ['0: Code', '1: First Name', '2: Last Name', '3: Country', '4: Age', '5: AOI', '6: Fictional']


Unnamed: 0,First Name,Last Name,Code
0,Dheemanth,Bhat,IN028
1,Alex,Rider,UK007
2,Corey,Schafer,US003
3,Bucky,Roberts,US004
4,John,Oldman,XX001
5,M.S,Dhoni,IN100


<IPython.core.display.Javascript object>

> Note: Columns are displayed based on the order passed in the list.

#### Access multiple columns using Slicing.

Syntax:

```python
df.loc[start_row_lbl:end_row_lbl:offset, start_col_lbl:end_col_lbl:offset]
```

> Note:
> 1. `end_row_lbl` and `end_col_lbl` **are inclusive** and optional.
> 2. `offset` is integer and optional.
> 3. Default index column **does not participate** in slicing.

In [41]:
# Fetch all rows but from 2nd to 4th column.
# Equivaluent of `df.iloc[:, 2:4]` in `loc`.
print(f"2nd Column:'{df.columns[2]}', 4th Column: '{df.columns[4]}'")

df.loc[:, "Last Name":"Age"]

2nd Column:'Last Name', 4th Column: 'Age'


Unnamed: 0,Last Name,Country,Age
0,Bhat,India,28.0
1,Rider,U.K.,14.0
2,Schafer,U.S.,32.0
3,Roberts,U.S.,37.0
4,Oldman,,
5,Dhoni,India,40.0


<IPython.core.display.Javascript object>

In [42]:
# Fetch all rows but from 1st to 3rd column.
# Equivaluent of `df.iloc[:, :3]` using `loc`.
print("3rd Column:", df.columns[3])

df.loc[:, :"Country"]

3rd Column: Country


Unnamed: 0,Code,First Name,Last Name,Country
0,IN028,Dheemanth,Bhat,India
1,UK007,Alex,Rider,U.K.
2,US003,Corey,Schafer,U.S.
3,US004,Bucky,Roberts,U.S.
4,XX001,John,Oldman,
5,IN100,M.S,Dhoni,India


<IPython.core.display.Javascript object>

In [43]:
# Fetch all rows and columns in reverse order of their indexes.
df.loc[::-1, ::-1]

Unnamed: 0,Fictional,AOI,Age,Country,Last Name,First Name,Code
5,False,Cricket,40.0,India,Dhoni,M.S,IN100
4,True,History,,,Oldman,John,XX001
3,False,Youtube,37.0,U.S.,Roberts,Bucky,US004
2,False,Youtube,32.0,U.S.,Schafer,Corey,US003
1,True,Adventure,14.0,U.K.,Rider,Alex,UK007
0,False,Codding,28.0,India,Bhat,Dheemanth,IN028


<IPython.core.display.Javascript object>

> Note:
> 1. In `iloc` both rows and columns can be accessed **only with integers**.
> 2. In `loc`, rows can be accessed with **integers for default indexes** and **string labels for custom indexes**.
> 3. In `loc`, columns can be accessed **only with string labels**.

## Custom Indexes

Every `DataFrame` has a default column at the far left without a column name called _index_. This column contains range of numbers where one particular number acts as integer identifier to access specific row in a `Dataframe`. This default index column contains **integers starting from zero, incremented by 1** until the last row, for example:

```python
df.loc[3]  # Returns 4th row in the DataFrame
```
 
To modify the default index call `set_index()` and pass the required column name as the parameter to act as new index. Syntax:
```python
df.set_index("col_name", inplace=True)  # False is default
```

#### Default index

In [44]:
df.index

RangeIndex(start=0, stop=6, step=1)

<IPython.core.display.Javascript object>

Set **_Code_** column as new _index_.

In [45]:
df.set_index("Code")

Unnamed: 0_level_0,First Name,Last Name,Country,Age,AOI,Fictional
Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
IN028,Dheemanth,Bhat,India,28.0,Codding,False
UK007,Alex,Rider,U.K.,14.0,Adventure,True
US003,Corey,Schafer,U.S.,32.0,Youtube,False
US004,Bucky,Roberts,U.S.,37.0,Youtube,False
XX001,John,Oldman,,,History,True
IN100,M.S,Dhoni,India,40.0,Cricket,False


<IPython.core.display.Javascript object>

As seen in the above output now _index_ column has a name **`Code`** and entire column is bold like default _index_.

In [46]:
df  # Index is not updated

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
0,IN028,Dheemanth,Bhat,India,28.0,Codding,False
1,UK007,Alex,Rider,U.K.,14.0,Adventure,True
2,US003,Corey,Schafer,U.S.,32.0,Youtube,False
3,US004,Bucky,Roberts,U.S.,37.0,Youtube,False
4,XX001,John,Oldman,,,History,True
5,IN100,M.S,Dhoni,India,40.0,Cricket,False


<IPython.core.display.Javascript object>

> **Note: If `inplace` is not set to `True` then changes are not permanently stored.**

In [47]:
df.set_index("Code", inplace=True)
df

Unnamed: 0_level_0,First Name,Last Name,Country,Age,AOI,Fictional
Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
IN028,Dheemanth,Bhat,India,28.0,Codding,False
UK007,Alex,Rider,U.K.,14.0,Adventure,True
US003,Corey,Schafer,U.S.,32.0,Youtube,False
US004,Bucky,Roberts,U.S.,37.0,Youtube,False
XX001,John,Oldman,,,History,True
IN100,M.S,Dhoni,India,40.0,Cricket,False


<IPython.core.display.Javascript object>

### Custom index effect on `loc` and `iloc`

#### `iloc`

Even after changing the default numerical index to some custom index, **`iloc` indexer only works with default index**.

In [48]:
try:
    df.iloc["US004", 0]

except ValueError as te:
    print("ERROR:", te)

finally:
    print("\nFirst column of row at index 3:", df.iloc[3, 0])

ERROR: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

First column of row at index 3: Bucky


<IPython.core.display.Javascript object>

#### `loc`

After changing default numerical index to some custom index, default numerical index no more acts as _labels_ while accessing rows in the `DataFrame`. **`loc` indexer only works with custom index**.

In [49]:
try:
    df.loc[5, "AOI"]

except KeyError as ke:
    print("KeyError:", ke)

finally:
    print("\n'AOI' of row at index 5:", df.loc["IN100", "AOI"])

KeyError: 5

'AOI' of row at index 5: Cricket


<IPython.core.display.Javascript object>

### Why modify default index with a custom index?

While accessing data in a `DataFrame` using `loc` indexer, columns can be addressed using strings, i.e., column names. Whereas rows has to be addressed using _default indexes_ which acts as _labels_. So after changing default _index_ column with a custom index using `set_index()`, even _rows_ of a `DataFrame` can be accessed using string _labels_ similar to columns.

```python
# Before `set_index()`
df.loc[4, ["First Name", "Last Name"]]

# After `set_index()`
df.loc["XX001", ["First Name", "Last Name"]]
```

In [50]:
df.loc["XX001"]

First Name       John
Last Name      Oldman
Country          None
Age               NaN
AOI           History
Fictional        True
Name: XX001, dtype: object

<IPython.core.display.Javascript object>

#### Custom index in `DataFrame`

In [51]:
df.loc[["US003", "US004"]]

Unnamed: 0_level_0,First Name,Last Name,Country,Age,AOI,Fictional
Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
US003,Corey,Schafer,U.S.,32.0,Youtube,False
US004,Bucky,Roberts,U.S.,37.0,Youtube,False


<IPython.core.display.Javascript object>

In [52]:
df.loc["UK007":"XX001"]

Unnamed: 0_level_0,First Name,Last Name,Country,Age,AOI,Fictional
Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
UK007,Alex,Rider,U.K.,14.0,Adventure,True
US003,Corey,Schafer,U.S.,32.0,Youtube,False
US004,Bucky,Roberts,U.S.,37.0,Youtube,False
XX001,John,Oldman,,,History,True


<IPython.core.display.Javascript object>

#### Custom Index in `Series`

In [53]:
df.loc["UK007":"XX001", "Fictional"]

Code
UK007     True
US003    False
US004    False
XX001     True
Name: Fictional, dtype: bool

<IPython.core.display.Javascript object>

In [54]:
df["Age"]

Code
IN028    28.0
UK007    14.0
US003    32.0
US004    37.0
XX001     NaN
IN100    40.0
Name: Age, dtype: float64

<IPython.core.display.Javascript object>

> Note: Default index is replaced with custom index - `Code` - for both `DataFrame` and `Series`.

### How to revert from custom index to default integer index?

In [55]:
df.reset_index(inplace=True)
df

Unnamed: 0,Code,First Name,Last Name,Country,Age,AOI,Fictional
0,IN028,Dheemanth,Bhat,India,28.0,Codding,False
1,UK007,Alex,Rider,U.K.,14.0,Adventure,True
2,US003,Corey,Schafer,U.S.,32.0,Youtube,False
3,US004,Bucky,Roberts,U.S.,37.0,Youtube,False
4,XX001,John,Oldman,,,History,True
5,IN100,M.S,Dhoni,India,40.0,Cricket,False


<IPython.core.display.Javascript object>

> Note:
> 1. Index is not reverted permanently if **`inplace`** is not set to `True`.
> 2. Run `reset_index` only once.

### Sorting Index column

Syntax:
```python
df.sort_index(inplace=True, ascending=False)  # inplace: False is default, ascending: True is default 
```

Set **_Code_** column as custom index.

In [56]:
df.set_index("Code", inplace=True)

<IPython.core.display.Javascript object>

Sorting index column in **ascending** order.

In [57]:
df.sort_index()  # Sorted indexes are not preserved unless `inplace` is set to `True`.

Unnamed: 0_level_0,First Name,Last Name,Country,Age,AOI,Fictional
Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
IN028,Dheemanth,Bhat,India,28.0,Codding,False
IN100,M.S,Dhoni,India,40.0,Cricket,False
UK007,Alex,Rider,U.K.,14.0,Adventure,True
US003,Corey,Schafer,U.S.,32.0,Youtube,False
US004,Bucky,Roberts,U.S.,37.0,Youtube,False
XX001,John,Oldman,,,History,True


<IPython.core.display.Javascript object>

Sorting index column in **descending** order.

In [58]:
df.sort_index(ascending=False)  # Sorted indexes are not preserved unless `inplace` is set to `True`.

Unnamed: 0_level_0,First Name,Last Name,Country,Age,AOI,Fictional
Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
XX001,John,Oldman,,,History,True
US004,Bucky,Roberts,U.S.,37.0,Youtube,False
US003,Corey,Schafer,U.S.,32.0,Youtube,False
UK007,Alex,Rider,U.K.,14.0,Adventure,True
IN100,M.S,Dhoni,India,40.0,Cricket,False
IN028,Dheemanth,Bhat,India,28.0,Codding,False


<IPython.core.display.Javascript object>

## Filters