<a href="https://colab.research.google.com/github/adhang/data-science-digitalskola/blob/update/08.%20Advanced%20Pandas%20Dataframe/Learn%20-%20Advanced%20Pandas%20Dataframe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Dataframe
Author: Adhang Muntaha Muhammad

[![LinkedIn](https://img.shields.io/badge/linkedin-0077B5?style=for-the-badge&logo=linkedin&logoColor=white&link=https://www.linkedin.com/in/adhangmuntaha/)](https://www.linkedin.com/in/adhangmuntaha/)
[![GitHub](https://img.shields.io/badge/github-121011?style=for-the-badge&logo=github&logoColor=white&link=https://github.com/adhang)](https://github.com/adhang)
[![Kaggle](https://img.shields.io/badge/kaggle-20BEFF?style=for-the-badge&logo=kaggle&logoColor=white&link=https://www.kaggle.com/adhang)](https://www.kaggle.com/adhang)
[![Tableau](https://img.shields.io/badge/tableau-E97627?style=for-the-badge&logo=tableau&logoColor=white&link=https://public.tableau.com/app/profile/adhang)](https://public.tableau.com/app/profile/adhang)
___
**Contents**
- Indexing Dataframe
- Dropping Columns
- Joining Dataframes
- Contatenating Dataframes
- Appending Dataframes
- Merge vs Join vs Concat vs Append
- Pivot Table
- Melting Dataframes
- Lambda Function on Dataframes

# Importing Libraries

In [1]:
import pandas as pd

# Reading Dataset
For this notebook, I will use my mentor's dataset from GitHub. You can check his works [here](https://github.com/densaiko).

In [2]:
file_path = 'https://raw.githubusercontent.com/densaiko/data_science_learning/main/dataset/insurance.csv'

data = pd.read_csv(file_path)
data.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


# Indexing Dataframe

## Default Index
An index is used to identify a row/ record. The default index starts from 0 to n (determined by the total rows).

For example, let's see the dataset size using `shape`.

In [3]:
data.shape

(1338, 7)

As we can see, there are 1338 rows and 7 columns. Let's see the first 5 rows.

In [4]:
data.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


From this output, we know that the index is started from 0 to 4. How about the last index?

In [5]:
data.tail()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
1333,50,male,30.97,3,no,northwest,10600.5483
1334,18,female,31.92,0,no,northeast,2205.9808
1335,18,female,36.85,0,no,southeast,1629.8335
1336,21,female,25.8,0,no,southwest,2007.945
1337,61,female,29.07,0,yes,northwest,29141.3603


Now, we know the last 5 rows. The last row has 1337 as the index. So, an index ranging from 0 to 1337 means it has 1338 rows.

## Column as Index
Can we change the index using a column? Yes, absolutely. We can use `set_index()` method to do this.

Let's say we will use the `sex` column as the index.

In [6]:
data.set_index('sex')

Unnamed: 0_level_0,age,bmi,children,smoker,region,charges
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
female,19,27.900,0,yes,southwest,16884.92400
male,18,33.770,1,no,southeast,1725.55230
male,28,33.000,3,no,southeast,4449.46200
male,33,22.705,0,no,northwest,21984.47061
male,32,28.880,0,no,northwest,3866.85520
...,...,...,...,...,...,...
male,50,30.970,3,no,northwest,10600.54830
female,18,31.920,0,no,northeast,2205.98080
female,18,36.850,0,no,southeast,1629.83350
female,21,25.800,0,no,southwest,2007.94500


Note, it's just for displaying the result. If you want to save it, use `inplace=True` or assign it to a variable (whether it's a new variable or the same variable). Like this:

```
# assign to the original dataframe
data.set_index('sex', inplace=True)

# assign to a new variable
new_data = data.set_index('sex')

# assign to the same variable
data = data.set_index('sex')
```



## Multiple Index
We can set multiple columns as the index. It will create a multi-index.

In [7]:
data.set_index(['sex','smoker'])

Unnamed: 0_level_0,Unnamed: 1_level_0,age,bmi,children,region,charges
sex,smoker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
female,yes,19,27.900,0,southwest,16884.92400
male,no,18,33.770,1,southeast,1725.55230
male,no,28,33.000,3,southeast,4449.46200
male,no,33,22.705,0,northwest,21984.47061
male,no,32,28.880,0,northwest,3866.85520
male,...,...,...,...,...,...
male,no,50,30.970,3,northwest,10600.54830
female,no,18,31.920,0,northeast,2205.98080
female,no,18,36.850,0,southeast,1629.83350
female,no,21,25.800,0,southwest,2007.94500


Here, the `sex` and `smoker` are set as the index.

## Reset Index
Let's say, our dataframe has an index that confusing (or not ordered). We can reset the index using `reset_index()`.

To demonstrate this, I will create a new dataframe using random sampling. So, if you re-run this notebook, you may get a different result.

In [8]:
data_5 = data.sample(n=5)
data_5

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
575,58,female,27.17,0,no,northwest,12222.8983
1186,20,male,35.625,3,yes,northwest,37465.34375
403,49,male,32.3,3,no,northwest,10269.46
300,36,male,27.55,3,no,northeast,6746.7425
536,33,female,38.9,3,no,southwest,5972.378


This dataframe has an unordered (messy?) index. Let's reset it.

In [9]:
data_5.reset_index()

Unnamed: 0,index,age,sex,bmi,children,smoker,region,charges
0,575,58,female,27.17,0,no,northwest,12222.8983
1,1186,20,male,35.625,3,yes,northwest,37465.34375
2,403,49,male,32.3,3,no,northwest,10269.46
3,300,36,male,27.55,3,no,northeast,6746.7425
4,536,33,female,38.9,3,no,southwest,5972.378


We can see that the index has reset. But wait. There's a new column named `index`. This column contains the previous index. How to reset the index without creating a new column? We can pass the `drop=True` parameter.

In [10]:
data_5.reset_index(drop=True)

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,58,female,27.17,0,no,northwest,12222.8983
1,20,male,35.625,3,yes,northwest,37465.34375
2,49,male,32.3,3,no,northwest,10269.46
3,36,male,27.55,3,no,northeast,6746.7425
4,33,female,38.9,3,no,southwest,5972.378


Now, our dataframe contains a nice index, right?

# Dropping Columns and Rows
Sometimes, for some reason, we have to delete or remove some columns or rows from our dataset.

In the previous notebook (Intermediate Pandas Dataframe), I had written some ways to select some columns or rows such as using `loc`, `iloc`, etc. But in some cases, dropping some columns is easier than selecting some columns.

## Dropping Columns
Let's say, we want to select all columns except the `charges` column. To drop a column, don't forget to add `axis=1` since columns are axis 1.

In [11]:
drop_col_1 = data.drop('charges', axis=1)
drop_col_1.head()

Unnamed: 0,age,sex,bmi,children,smoker,region
0,19,female,27.9,0,yes,southwest
1,18,male,33.77,1,no,southeast
2,28,male,33.0,3,no,southeast
3,33,male,22.705,0,no,northwest
4,32,male,28.88,0,no,northwest


To drop multiple columns, we can use a list containing the column names.

In [12]:
drop_col_2 = data.drop(['age','sex'], axis=1)
drop_col_2.head()

Unnamed: 0,bmi,children,smoker,region,charges
0,27.9,0,yes,southwest,16884.924
1,33.77,1,no,southeast,1725.5523
2,33.0,3,no,southeast,4449.462
3,22.705,0,no,northwest,21984.47061
4,28.88,0,no,northwest,3866.8552


## Dropping Rows
To drop a row, we have to specify the index number and set the axis parameter to 0.

For example, we will drop a row that has index `1`.

In [13]:
drop_row = data.drop(1, axis=0)
drop_row.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552
5,31,female,25.74,0,no,southeast,3756.6216


Now, the row that has index `1` has been removed.

What if the index has the same values? Let's say, the `sex` column is set as the index. Then, we remove the rows that have `female` as the index. What will happen?

In [14]:
data_sex = data.set_index('sex')
data_sex.head()

Unnamed: 0_level_0,age,bmi,children,smoker,region,charges
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
female,19,27.9,0,yes,southwest,16884.924
male,18,33.77,1,no,southeast,1725.5523
male,28,33.0,3,no,southeast,4449.462
male,33,22.705,0,no,northwest,21984.47061
male,32,28.88,0,no,northwest,3866.8552


In [15]:
data_sex_drop = data_sex.drop('female', axis=0)
data_sex_drop.head()

Unnamed: 0_level_0,age,bmi,children,smoker,region,charges
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
male,18,33.77,1,no,southeast,1725.5523
male,28,33.0,3,no,southeast,4449.462
male,33,22.705,0,no,northwest,21984.47061
male,32,28.88,0,no,northwest,3866.8552
male,37,29.83,2,no,northeast,6406.4107


As you can see, all the rows that have `female` as the index are removed.

# Joining Dataframes
In my previous notebook (Intermediate Pandas Dataframe), we can combine several tables in terms of columns (combining both tables' columns) using the `merge()` method.
<br><br>
Pandas has another merging/ joining method named `join()`. It has the same function as `merge()`. But, `join()` will only combine the dataframes based on the rows number (index). It's just like row's matching.
<br><br>
For example, I will create 2 new dataframes. The first one contains 10 records using random sampling. And the second one is half of the first dataframe (the first 5 rows).
<br><br>
**Note:** if you re-run this notebook, you may get a different result.

In [16]:
data_left = data.sample(n=10)
data_right = data_left.iloc[:5]

display(data_left)
display(data_right)

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
815,20,female,31.46,0,no,southeast,1877.9294
164,37,male,29.64,0,no,northwest,5028.1466
1292,21,male,23.21,0,no,southeast,1515.3449
985,44,female,25.8,1,no,southwest,7624.63
1263,43,female,29.9,1,no,southwest,7337.748
1032,30,female,27.93,0,no,northeast,4137.5227
708,31,female,30.495,3,no,northeast,6113.23105
562,27,male,30.5,0,no,southwest,2494.022
553,52,female,31.73,2,no,northwest,11187.6567
545,49,male,25.84,2,yes,northwest,23807.2406


Unnamed: 0,age,sex,bmi,children,smoker,region,charges
815,20,female,31.46,0,no,southeast,1877.9294
164,37,male,29.64,0,no,northwest,5028.1466
1292,21,male,23.21,0,no,southeast,1515.3449
985,44,female,25.8,1,no,southwest,7624.63
1263,43,female,29.9,1,no,southwest,7337.748


As you can see, the second dataframe contains the first 5 rows from the first dataframe. Let's do a join.
<br><br>
We need to specify `lsuffix` (left suffix) and `rsuffix` (right suffix). If the both tables have the same column names, it's used to identify whether the columns are from the left dataframe or the right dataframe.

In [17]:
data_left.join(data_right, lsuffix='_first', rsuffix='_second')

Unnamed: 0,age_first,sex_first,bmi_first,children_first,smoker_first,region_first,charges_first,age_second,sex_second,bmi_second,children_second,smoker_second,region_second,charges_second
815,20,female,31.46,0,no,southeast,1877.9294,20.0,female,31.46,0.0,no,southeast,1877.9294
164,37,male,29.64,0,no,northwest,5028.1466,37.0,male,29.64,0.0,no,northwest,5028.1466
1292,21,male,23.21,0,no,southeast,1515.3449,21.0,male,23.21,0.0,no,southeast,1515.3449
985,44,female,25.8,1,no,southwest,7624.63,44.0,female,25.8,1.0,no,southwest,7624.63
1263,43,female,29.9,1,no,southwest,7337.748,43.0,female,29.9,1.0,no,southwest,7337.748
1032,30,female,27.93,0,no,northeast,4137.5227,,,,,,,
708,31,female,30.495,3,no,northeast,6113.23105,,,,,,,
562,27,male,30.5,0,no,southwest,2494.022,,,,,,,
553,52,female,31.73,2,no,northwest,11187.6567,,,,,,,
545,49,male,25.84,2,yes,northwest,23807.2406,,,,,,,


The first 5 rows contain full records because the data (that have the same index) appears on both dataframes.
<br><br>
The column names contain `_first` or `_second` postfix, except the `index` column. That's because, we are joining 2 tables with `index` as a connector, which means that the columns are considered as the same column. The other columns are considered as different columns, even the column names are the same. There is a reason behind it. Let's see the 2 tables below.

> **order_table**

| **order_id** | **user_id** |  **name**  | **price** |
|:------------:|:-----------:|:----------:|:---------:|
|       1      |      2      |    mango   |    4000   |
|       2      |      1      | strawberry |    5000   |
|       3      |      1      |    lemon   |    4500   |

> **user_table**

| **user_id** | **name** |  **city** |
|:-----------:|:--------:|:---------:|
|      1      |  Adhang  |   Jogja   |
|      2      |  Muntaha |  Jakarta  |
|      3      | Muhammad | Palembang |

Both tables contain a `name` column. What if we merge those tables and assume that the `name` column is the same for both tables? It will be a disaster because the `name` column on `order_table` represents the product names, but the `name` column on `user_table` represents the user names.
<br><br>
To handle this problem, Pandas is assuming that they are different column even the column name are the same. So, Pandas automatically add a postfix to represent it as a different column.

# Concatenating Dataframes
Concatenating is used to combine several tables in terms of rows or columns (combining both tables' rows or columns). The mode is determined by the axis parameter. Axis 0 is combining the rows, and axis 1 is combining the columns.

## Concatenate Columns
Concatenating columns is like `join()` method, we will combine the dataframes' column based on its index.

In [18]:
display(data_left.sort_index())
display(data_right.sort_index())

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
164,37,male,29.64,0,no,northwest,5028.1466
545,49,male,25.84,2,yes,northwest,23807.2406
553,52,female,31.73,2,no,northwest,11187.6567
562,27,male,30.5,0,no,southwest,2494.022
708,31,female,30.495,3,no,northeast,6113.23105
815,20,female,31.46,0,no,southeast,1877.9294
985,44,female,25.8,1,no,southwest,7624.63
1032,30,female,27.93,0,no,northeast,4137.5227
1263,43,female,29.9,1,no,southwest,7337.748
1292,21,male,23.21,0,no,southeast,1515.3449


Unnamed: 0,age,sex,bmi,children,smoker,region,charges
164,37,male,29.64,0,no,northwest,5028.1466
815,20,female,31.46,0,no,southeast,1877.9294
985,44,female,25.8,1,no,southwest,7624.63
1263,43,female,29.9,1,no,southwest,7337.748
1292,21,male,23.21,0,no,southeast,1515.3449


In [19]:
pd.concat([data_left, data_right], axis=1)

Unnamed: 0,age,sex,bmi,children,smoker,region,charges,age.1,sex.1,bmi.1,children.1,smoker.1,region.1,charges.1
164,37,male,29.64,0,no,northwest,5028.1466,37.0,male,29.64,0.0,no,northwest,5028.1466
545,49,male,25.84,2,yes,northwest,23807.2406,,,,,,,
553,52,female,31.73,2,no,northwest,11187.6567,,,,,,,
562,27,male,30.5,0,no,southwest,2494.022,,,,,,,
708,31,female,30.495,3,no,northeast,6113.23105,,,,,,,
815,20,female,31.46,0,no,southeast,1877.9294,20.0,female,31.46,0.0,no,southeast,1877.9294
985,44,female,25.8,1,no,southwest,7624.63,44.0,female,25.8,1.0,no,southwest,7624.63
1032,30,female,27.93,0,no,northeast,4137.5227,,,,,,,
1263,43,female,29.9,1,no,southwest,7337.748,43.0,female,29.9,1.0,no,southwest,7337.748
1292,21,male,23.21,0,no,southeast,1515.3449,21.0,male,23.21,0.0,no,southeast,1515.3449


One thing that makes it slightly different from `join()` is that the dataframe is sorted by its index.

## Concatenate Rows
Concatenating rows is like adding rows from another table.

In [20]:
pd.concat([data_left, data_right], axis=0)

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
815,20,female,31.46,0,no,southeast,1877.9294
164,37,male,29.64,0,no,northwest,5028.1466
1292,21,male,23.21,0,no,southeast,1515.3449
985,44,female,25.8,1,no,southwest,7624.63
1263,43,female,29.9,1,no,southwest,7337.748
1032,30,female,27.93,0,no,northeast,4137.5227
708,31,female,30.495,3,no,northeast,6113.23105
562,27,male,30.5,0,no,southwest,2494.022
553,52,female,31.73,2,no,northwest,11187.6567
545,49,male,25.84,2,yes,northwest,23807.2406


What if the dataframes have different columns? Let's say the second dataframe only contains `age` and `sex` columns.

In [21]:
pd.concat([data_left, data_right.loc[:,['age','sex']]], axis=0)

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
815,20,female,31.46,0.0,no,southeast,1877.9294
164,37,male,29.64,0.0,no,northwest,5028.1466
1292,21,male,23.21,0.0,no,southeast,1515.3449
985,44,female,25.8,1.0,no,southwest,7624.63
1263,43,female,29.9,1.0,no,southwest,7337.748
1032,30,female,27.93,0.0,no,northeast,4137.5227
708,31,female,30.495,3.0,no,northeast,6113.23105
562,27,male,30.5,0.0,no,southwest,2494.022
553,52,female,31.73,2.0,no,northwest,11187.6567
545,49,male,25.84,2.0,yes,northwest,23807.2406


The other columns that didn't match will have `NaN` values, as you can see on the last 5 rows.

# Appending Dataframes
Appending dataframes is like concatenating on axis 0 (row).

In [22]:
data_left.append(data_right)

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
815,20,female,31.46,0,no,southeast,1877.9294
164,37,male,29.64,0,no,northwest,5028.1466
1292,21,male,23.21,0,no,southeast,1515.3449
985,44,female,25.8,1,no,southwest,7624.63
1263,43,female,29.9,1,no,southwest,7337.748
1032,30,female,27.93,0,no,northeast,4137.5227
708,31,female,30.495,3,no,northeast,6113.23105
562,27,male,30.5,0,no,southwest,2494.022
553,52,female,31.73,2,no,northwest,11187.6567
545,49,male,25.84,2,yes,northwest,23807.2406


See? It's just the same.

# Merge vs Join vs Concat vs Append
You may notice that there are several methods to combine tables. Here are the of thumb:
- `merge` - used to combine tables' columns based on connector (specific column)
- `join` - used to combine tables' columns based on the index
- `append` - used to combine tables' rows
- `concat`
  - Axis 1 - used to combine tables' columns based on the index, similar to `join`
  - Axis 0 - used to combine tables' rows, similar to `append`

# Pivot Table
Some of you may familiar with 'pivot table' term. In Ms Excel? Yes, right. It's just the same. Pandas `pivot_table()` creates a spreadsheet-style pivot table as a dataframe.
<br><br>
The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.
<br><br>
For some of you who not familiar with pivot table, I can say it's similar to `groupby()` method. We will group the data using some categories, and doing some aggregations. You can check on my previous notebook (Intermediate Pandas Dataframe) for detailed information.

## Single Index
Suppose we want to know the average charges of each region. Hence, we will group our data by `region` and calculate the average of `charges`.

In [23]:
data[['region','charges']].groupby('region').mean().round(2)

Unnamed: 0_level_0,charges
region,Unnamed: 1_level_1
northeast,13406.38
northwest,12417.58
southeast,14735.41
southwest,12346.94


We can achieve the same thing using `pivot_table()` function.

In [24]:
pd.pivot_table(data, values='charges', index='region', aggfunc='mean').round(2)

Unnamed: 0_level_0,charges
region,Unnamed: 1_level_1
northeast,13406.38
northwest,12417.58
southeast,14735.41
southwest,12346.94


## Multiple Index
This time, we want to know the average charges of each region and specified the average charge for each gender. Hence, we will group our data by `region` and `sex`, then calculate the average of `charges`. It will create a dataframe with multi-index.

In [25]:
data[['region','sex','charges']].groupby(['region','sex']).mean().round(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,charges
region,sex,Unnamed: 2_level_1
northeast,female,12953.2
northeast,male,13854.01
northwest,female,12479.87
northwest,male,12354.12
southeast,female,13499.67
southeast,male,15879.62
southwest,female,11274.41
southwest,male,13412.88


In [26]:
pd.pivot_table(data, values='charges', index=['region','sex'], aggfunc='mean').round(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,charges
region,sex,Unnamed: 2_level_1
northeast,female,12953.2
northeast,male,13854.01
northwest,female,12479.87
northwest,male,12354.12
southeast,female,13499.67
southeast,male,15879.62
southwest,female,11274.41
southwest,male,13412.88


## Multiple Columns
From the last output, you may think that it's a bit 'messy' because the rows contains multiple `female` and `male` categories. We can set the `sex` to be a column, to make it more tidy. Like this:

In [27]:
pd.pivot_table(data, values='charges', index='region', columns='sex', aggfunc='mean').round(2)

sex,female,male
region,Unnamed: 1_level_1,Unnamed: 2_level_1
northeast,12953.2,13854.01
northwest,12479.87,12354.12
southeast,13499.67,15879.62
southwest,11274.41,13412.88


As you can see, it's more clean and easier to read. Can we do it using `groupby()`?

In [28]:
data[['region','sex','charges']].groupby(['region','sex']).mean().unstack().round(2)

Unnamed: 0_level_0,charges,charges
sex,female,male
region,Unnamed: 1_level_2,Unnamed: 2_level_2
northeast,12953.2,13854.01
northwest,12479.87,12354.12
southeast,13499.67,15879.62
southwest,11274.41,13412.88


Yes we can! But it may not convenient to use `unstack()` each time we want to create multi-columns. That's why, I prefer using `pivot_table()` to create pivot table.
<br><br>
We can also set `sex` as the index, and set `region` as the columns.

In [41]:
pd.pivot_table(data, values='charges', index='sex', columns='region', aggfunc='mean').round(2)

region,northeast,northwest,southeast,southwest
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,12953.2,12479.87,13499.67,11274.41
male,13854.01,12354.12,15879.62,13412.88


## Multiple Aggregations
Instead of using single aggregation, we can use multiple aggregations at once.
<br><br>
Suppose we want to know the average charges of each region. Hence, we will group our data by `region` and calculate the minimum and maximum of `charges` at the same time.
<br><br>
To do this, we can pass a list of aggregations.

In [38]:
pd.pivot_table(data, values='charges', index='region', aggfunc=['min','max']).round(2)

Unnamed: 0_level_0,min,max
Unnamed: 0_level_1,charges,charges
region,Unnamed: 1_level_2,Unnamed: 2_level_2
northeast,1694.8,58571.07
northwest,1621.34,60021.4
southeast,1121.87,63770.43
southwest,1241.56,52590.83


## Swap Level

On the last example, you can see that the aggregation functions are on the above of column name. We can swap it using `swaplevel()` method.
<br><br>
Syntax
```
data.swaplevel(i, j, axis=0)
```
This method is used to swap level between `i` and `j`. The default axis is 0, which means it's used to swap the index (on multi-index). If we want to swap multi-column, set the axis to 1.
<br><br>
In this case, the aggregation functions (`min` and `max`) are on the first level (level 0), and the column name (`charges`) is on the second level (level 1). So, we will do a swap between level 0 and level 1 on axis 1 (column axis).

In [39]:
data_swap = pd.pivot_table(data, values='bmi', index='region', aggfunc=['min', 'max']).round(2)
data_swap = data_swap.swaplevel(0, 1, axis=1)
data_swap

Unnamed: 0_level_0,bmi,bmi
Unnamed: 0_level_1,min,max
region,Unnamed: 1_level_2,Unnamed: 2_level_2
northeast,15.96,48.07
northwest,17.39,42.94
southeast,19.8,53.13
southwest,17.4,47.6


Alternatively, we can use `swaplevel()` method on the `columns` attribute.

In [40]:
data_swap = pd.pivot_table(data, values='bmi', index='region', aggfunc=['min', 'max']).round(2)
data_swap.columns = data_swap.columns.swaplevel(0,1)
data_swap

Unnamed: 0_level_0,bmi,bmi
Unnamed: 0_level_1,min,max
region,Unnamed: 1_level_2,Unnamed: 2_level_2
northeast,15.96,48.07
northwest,17.39,42.94
southeast,19.8,53.13
southwest,17.4,47.6


## Multiple Values
Using pivot table, we can also pass multiple values and same or even different aggregation function. To use different aggregation functions, instead of passing a list, we have to use a dictionary to map the column names and the aggregation functions.
<br><br>
Suppose we want to know the minimum and maximum age for each region, and we also want to know the average charges for each region.

In [32]:
pd.pivot_table(data, values=['age','charges'], index=['region'], aggfunc={'age':['min','max'],'charges':['mean']}).round(2)

Unnamed: 0_level_0,age,age,charges
Unnamed: 0_level_1,max,min,mean
region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
northeast,64,18,13406.38
northwest,64,19,12417.58
southeast,64,18,14735.41
southwest,64,19,12346.94


# Melting Dataframes

# Lambda Function on Dataframe