## Reshaping a DataFrame using Pandas melt()

This is a notebook for the medium article [Reshaping a DataFrame using Pandas melt()](https://bindichen.medium.com/reshaping-a-dataframe-using-pandas-melt-83a151ce1907)

Please check out article for instructions

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause)

In [1]:
import pandas as pd

## 1. Simplest melt

In [2]:
df_wide = pd.DataFrame({
   "Country": ["France", "US", "UK"],
   "22/01/2020": [1,2,3],
   "23/01/2020": [4,5,6],
   "24/01/2020": [7,8,9],
   "25/01/2020": [10,11,12],
   "26/01/2020": [13,14,15],
  }
)
df_wide

Unnamed: 0,Country,22/01/2020,23/01/2020,24/01/2020,25/01/2020,26/01/2020
0,France,1,4,7,10,13
1,US,2,5,8,11,14
2,UK,3,6,9,12,15


In [3]:
# without any argument
df_wide.melt()

Unnamed: 0,variable,value
0,Country,France
1,Country,US
2,Country,UK
3,22/01/2020,1
4,22/01/2020,2
5,22/01/2020,3
6,23/01/2020,4
7,23/01/2020,5
8,23/01/2020,6
9,24/01/2020,7


In [4]:
df_wide.melt(
    id_vars='Country',
)

Unnamed: 0,Country,variable,value
0,France,22/01/2020,1
1,US,22/01/2020,2
2,UK,22/01/2020,3
3,France,23/01/2020,4
4,US,23/01/2020,5
5,UK,23/01/2020,6
6,France,24/01/2020,7
7,US,24/01/2020,8
8,UK,24/01/2020,9
9,France,25/01/2020,10


## 2. With custom name

In [5]:
df_wide.melt(
    id_vars='Country',
    var_name='Date',
    value_name='Cases'
)

Unnamed: 0,Country,Date,Cases
0,France,22/01/2020,1
1,US,22/01/2020,2
2,UK,22/01/2020,3
3,France,23/01/2020,4
4,US,23/01/2020,5
5,UK,23/01/2020,6
6,France,24/01/2020,7
7,US,24/01/2020,8
8,UK,24/01/2020,9
9,France,25/01/2020,10


## 3. Multiple ids

In [6]:
df_wide = pd.DataFrame({
   "Country": ["France", "US", "UK"],
   "Lat": [31.8257, 40.0, 55.3781],
   "Long": [117.2264, -100.0, -3.436],
   "22/01/2020": [1,2,3],
   "23/01/2020": [4,5,6],
   "24/01/2020": [7,8,9],
   "25/01/2020": [10,11,12],
   "26/01/2020": [13,14,15],
  }
)
df_wide

Unnamed: 0,Country,Lat,Long,22/01/2020,23/01/2020,24/01/2020,25/01/2020,26/01/2020
0,France,31.8257,117.2264,1,4,7,10,13
1,US,40.0,-100.0,2,5,8,11,14
2,UK,55.3781,-3.436,3,6,9,12,15


In [7]:
df_wide.melt(
    id_vars=['Country', 'Lat', 'Long'],
    var_name='Date',
    value_name='Cases'
)

Unnamed: 0,Country,Lat,Long,Date,Cases
0,France,31.8257,117.2264,22/01/2020,1
1,US,40.0,-100.0,22/01/2020,2
2,UK,55.3781,-3.436,22/01/2020,3
3,France,31.8257,117.2264,23/01/2020,4
4,US,40.0,-100.0,23/01/2020,5
5,UK,55.3781,-3.436,23/01/2020,6
6,France,31.8257,117.2264,24/01/2020,7
7,US,40.0,-100.0,24/01/2020,8
8,UK,55.3781,-3.436,24/01/2020,9
9,France,31.8257,117.2264,25/01/2020,10


## 4. Specify the columns to melt

In [8]:
df_wide.melt(
    id_vars=['Country', 'Lat', 'Long'],
    value_vars=["24/01/2020", "25/01/2020"],
    var_name='Date',
    value_name='Cases'
)

Unnamed: 0,Country,Lat,Long,Date,Cases
0,France,31.8257,117.2264,24/01/2020,7
1,US,40.0,-100.0,24/01/2020,8
2,UK,55.3781,-3.436,24/01/2020,9
3,France,31.8257,117.2264,25/01/2020,10
4,US,40.0,-100.0,25/01/2020,11
5,UK,55.3781,-3.436,25/01/2020,12


## 5. Pandas melt

In [9]:
# You can also call melt directly from pandas instead of your DataFrame. However, these are identical
pd.melt(df_wide, id_vars=['Country', 'Lat', 'Long'])

Unnamed: 0,Country,Lat,Long,variable,value
0,France,31.8257,117.2264,22/01/2020,1
1,US,40.0,-100.0,22/01/2020,2
2,UK,55.3781,-3.436,22/01/2020,3
3,France,31.8257,117.2264,23/01/2020,4
4,US,40.0,-100.0,23/01/2020,5
5,UK,55.3781,-3.436,23/01/2020,6
6,France,31.8257,117.2264,24/01/2020,7
7,US,40.0,-100.0,24/01/2020,8
8,UK,55.3781,-3.436,24/01/2020,9
9,France,31.8257,117.2264,25/01/2020,10


## 6. Bonus: Covid-19 time series data preprocessing

### 6.1 Loading confirmed, death, and recovered dataset

In [10]:
confirmed_df = pd.read_csv('time_series_covid19_confirmed_global.csv')
deaths_df = pd.read_csv('time_series_covid19_deaths_global.csv')
recovered_df = pd.read_csv('time_series_covid19_recovered_global.csv')

In [11]:
confirmed_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,4/10/20,4/11/20,4/12/20,4/13/20,4/14/20,4/15/20,4/16/20,4/17/20,4/18/20,4/19/20
0,,Afghanistan,33.0,65.0,0,0,0,0,0,0,...,521,555,607,665,714,784,840,906,933,996
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,416,433,446,467,475,494,518,539,548,562
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,1761,1825,1914,1983,2070,2160,2268,2418,2534,2629
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,601,601,638,646,659,673,673,696,704,713
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,19,19,19,19,19,19,19,19,24,24


### 6.2 Reshaping them from wide to long format with the same date values

In [12]:
confirmed_df.columns

Index(['Province/State', 'Country/Region', 'Lat', 'Long', '1/22/20', '1/23/20',
       '1/24/20', '1/25/20', '1/26/20', '1/27/20', '1/28/20', '1/29/20',
       '1/30/20', '1/31/20', '2/1/20', '2/2/20', '2/3/20', '2/4/20', '2/5/20',
       '2/6/20', '2/7/20', '2/8/20', '2/9/20', '2/10/20', '2/11/20', '2/12/20',
       '2/13/20', '2/14/20', '2/15/20', '2/16/20', '2/17/20', '2/18/20',
       '2/19/20', '2/20/20', '2/21/20', '2/22/20', '2/23/20', '2/24/20',
       '2/25/20', '2/26/20', '2/27/20', '2/28/20', '2/29/20', '3/1/20',
       '3/2/20', '3/3/20', '3/4/20', '3/5/20', '3/6/20', '3/7/20', '3/8/20',
       '3/9/20', '3/10/20', '3/11/20', '3/12/20', '3/13/20', '3/14/20',
       '3/15/20', '3/16/20', '3/17/20', '3/18/20', '3/19/20', '3/20/20',
       '3/21/20', '3/22/20', '3/23/20', '3/24/20', '3/25/20', '3/26/20',
       '3/27/20', '3/28/20', '3/29/20', '3/30/20', '3/31/20', '4/1/20',
       '4/2/20', '4/3/20', '4/4/20', '4/5/20', '4/6/20', '4/7/20', '4/8/20',
       '4/9/20', '4/10/20'

In [13]:
# Create a date list
dates = confirmed_df.columns[4:]
dates

Index(['1/22/20', '1/23/20', '1/24/20', '1/25/20', '1/26/20', '1/27/20',
       '1/28/20', '1/29/20', '1/30/20', '1/31/20', '2/1/20', '2/2/20',
       '2/3/20', '2/4/20', '2/5/20', '2/6/20', '2/7/20', '2/8/20', '2/9/20',
       '2/10/20', '2/11/20', '2/12/20', '2/13/20', '2/14/20', '2/15/20',
       '2/16/20', '2/17/20', '2/18/20', '2/19/20', '2/20/20', '2/21/20',
       '2/22/20', '2/23/20', '2/24/20', '2/25/20', '2/26/20', '2/27/20',
       '2/28/20', '2/29/20', '3/1/20', '3/2/20', '3/3/20', '3/4/20', '3/5/20',
       '3/6/20', '3/7/20', '3/8/20', '3/9/20', '3/10/20', '3/11/20', '3/12/20',
       '3/13/20', '3/14/20', '3/15/20', '3/16/20', '3/17/20', '3/18/20',
       '3/19/20', '3/20/20', '3/21/20', '3/22/20', '3/23/20', '3/24/20',
       '3/25/20', '3/26/20', '3/27/20', '3/28/20', '3/29/20', '3/30/20',
       '3/31/20', '4/1/20', '4/2/20', '4/3/20', '4/4/20', '4/5/20', '4/6/20',
       '4/7/20', '4/8/20', '4/9/20', '4/10/20', '4/11/20', '4/12/20',
       '4/13/20', '4/14/20', '4/15

In [14]:
confirmed_df_long = confirmed_df.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
                            value_vars=dates, var_name='Date', value_name='Confirmed')

deaths_df_long = deaths_df.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
                            value_vars=dates, var_name='Date', value_name='Deaths')

recovered_df_long = recovered_df.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
                            value_vars=dates, var_name='Date', value_name='Recovered')

In [15]:
(confirmed_df_long.shape, deaths_df_long.shape, recovered_df_long.shape)

((23496, 6), (23496, 6), (22250, 6))

### 6.3 Merging them

In [16]:
# Merging confirmed_df_long and deaths_df_long
full_table = confirmed_df_long.merge(
  right=deaths_df_long, 
  how='left',
  on=['Province/State', 'Country/Region', 'Date', 'Lat', 'Long']
)

# Merging full_table and recovered_df_long
full_table = full_table.merge(
  right=recovered_df_long, 
  how='left',
  on=['Province/State', 'Country/Region', 'Date', 'Lat', 'Long']
)

full_table

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered
0,,Afghanistan,33.000000,65.000000,1/22/20,0,0,0.0
1,,Albania,41.153300,20.168300,1/22/20,0,0,0.0
2,,Algeria,28.033900,1.659600,1/22/20,0,0,0.0
3,,Andorra,42.506300,1.521800,1/22/20,0,0,0.0
4,,Angola,-11.202700,17.873900,1/22/20,0,0,0.0
...,...,...,...,...,...,...,...,...
23491,Saint Pierre and Miquelon,France,46.885200,-56.315900,4/19/20,1,0,0.0
23492,,South Sudan,6.877000,31.307000,4/19/20,4,0,0.0
23493,,Western Sahara,24.215500,-12.885800,4/19/20,6,0,0.0
23494,,Sao Tome and Principe,0.186360,6.613081,4/19/20,4,0,0.0


## Thanks for reading

This is a notebook for the medium article [Reshaping a DataFrame using Pandas melt()](https://bindichen.medium.com/reshaping-a-dataframe-using-pandas-melt-83a151ce1907)

Please check out article for instructions

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause)