# Long Vs Wide Data
<br>

## Long data and wide data are two different formats for organizing and representing data in a tabular structure.

<img src="../../images/wide.jpg" style="display: block;margin-left: auto;margin-right: auto;
  width: 30%; border-radius:10px 10px 10px 10px; height:200px;">



## **Wide format** is where we have a single row for every data point with multiple columns to hold the values of various attributes.
<br>



<img src="../../images/long.jpg" style="display: block;margin-left: auto;margin-right: auto;
  width: 30%; border-radius:10px 10px 10px 10px; height:200px;">

## **Long format** is where, for each data point we have as many rows as the number of attributes and each row contains the value of a particular attribute for a given data point.

<br>

- ### Both long and wide data formats have their advantages and are suitable for different purposes. 
<br>

- ### The choice of format depends on the specific analysis or task at hand, as well as the tools and techniques being used. 
<br>

- ### It's often possible to convert between long and wide formats as needed, depending on the requirements of the analysis or the desired representation of the data.

# Melt Method
<br>

- ### Convert Wide Data format into Long Data Format

In [43]:
# Examples
import pandas as pd

pd.DataFrame({"cse":[120]})

# This is the Wide dataframe

Unnamed: 0,cse
0,120


In [44]:
# Convert Wide to LOng dataframe

pd.DataFrame({"cse":[120]}).melt()
# This is the Long dataframe

Unnamed: 0,variable,value
0,cse,120


In [45]:
# Example 2



wide_df = pd.DataFrame({"cse":[120],"ece":[100],"mech":[50]})

wide_df


Unnamed: 0,cse,ece,mech
0,120,100,50


In [46]:
# convert wide to long
wide_df.melt()

Unnamed: 0,variable,value
0,cse,120
1,ece,100
2,mech,50


### var_name and value_name parameter

In [47]:
wide_df.melt(var_name="branch",value_name="num of student")

Unnamed: 0,branch,num of student
0,cse,120
1,ece,100
2,mech,50


In [48]:
# Example 3

ds = {
    "branch":["cse","ece","mech"],
    2020:[100,150,60],
    2021:[120,130,80],
    2022:[150,140,70]
}

wd_df = pd.DataFrame(ds)
wd_df

Unnamed: 0,branch,2020,2021,2022
0,cse,100,120,150
1,ece,150,130,140
2,mech,60,80,70


In [49]:
# convert into long
wd_df.melt()

Unnamed: 0,variable,value
0,branch,cse
1,branch,ece
2,branch,mech
3,2020,100
4,2020,150
5,2020,60
6,2021,120
7,2021,130
8,2021,80
9,2022,150


## Here data are not meaningful 
<br>

### because we can't need to convert branch column
<br>

### use id_vars parameters for ignore the column

In [50]:
wd_df

Unnamed: 0,branch,2020,2021,2022
0,cse,100,120,150
1,ece,150,130,140
2,mech,60,80,70


In [51]:
wd_df.melt(id_vars=["branch"])
# now long data is readable

Unnamed: 0,branch,variable,value
0,cse,2020,100
1,ece,2020,150
2,mech,2020,60
3,cse,2021,120
4,ece,2021,130
5,mech,2021,80
6,cse,2022,150
7,ece,2022,140
8,mech,2022,70


In [52]:
# change col name

wd_df.melt(id_vars=["branch"],var_name="year",value_name="Students")

Unnamed: 0,branch,year,Students
0,cse,2020,100
1,ece,2020,150
2,mech,2020,60
3,cse,2021,120
4,ece,2021,130
5,mech,2021,80
6,cse,2022,150
7,ece,2022,140
8,mech,2022,70


## Example on DataSet

In [91]:
death = pd.read_csv("./datasets/covid_confirm.csv")

confirm = pd.read_csv("./datasets/covid_deaths.csv")


In [92]:
death.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,12/24/22,12/25/22,12/26/22,12/27/22,12/28/22,12/29/22,12/30/22,12/31/22,1/1/23,1/2/23
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,207310,207399,207438,207460,207493,207511,207550,207559,207616,207627
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,333749,333749,333751,333751,333776,333776,333806,333806,333811,333812
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,271194,271198,271198,271202,271208,271217,271223,271228,271229,271229
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,47686,47686,47686,47686,47751,47751,47751,47751,47751,47751
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,104973,104973,104973,105095,105095,105095,105095,105095,105095,105095


In [93]:
death.shape
# this data is wide data format

(289, 1081)

In [94]:
confirm.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,12/24/22,12/25/22,12/26/22,12/27/22,12/28/22,12/29/22,12/30/22,12/31/22,1/1/23,1/2/23
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,7845,7846,7846,7846,7846,7847,7847,7849,7849,7849
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,3595,3595,3595,3595,3595,3595,3595,3595,3595,3595
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,6881,6881,6881,6881,6881,6881,6881,6881,6881,6881
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,165,165,165,165,165,165,165,165,165,165
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,1928,1928,1928,1930,1930,1930,1930,1930,1930,1930


In [95]:
confirm.shape
# this data is wide data format

(289, 1081)

##

## Customize and merge the data 

| Country | Date | Confirm | Death |

In [96]:
dth = death.melt(id_vars=["Province/State","Country/Region","Lat","Long"],var_name="date",value_name="Deaths")

cnf = confirm.melt(id_vars=["Province/State","Country/Region","Lat","Long"],var_name="date",value_name="Confirm Cases")

In [97]:
dth.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,date,Deaths
0,,Afghanistan,33.93911,67.709953,1/22/20,0
1,,Albania,41.1533,20.1683,1/22/20,0
2,,Algeria,28.0339,1.6596,1/22/20,0
3,,Andorra,42.5063,1.5218,1/22/20,0
4,,Angola,-11.2027,17.8739,1/22/20,0


In [98]:
cnf.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,date,Confirm Cases
0,,Afghanistan,33.93911,67.709953,1/22/20,0
1,,Albania,41.1533,20.1683,1/22/20,0
2,,Algeria,28.0339,1.6596,1/22/20,0
3,,Andorra,42.5063,1.5218,1/22/20,0
4,,Angola,-11.2027,17.8739,1/22/20,0


In [99]:
# now merge the data
final = dth.merge(cnf,how="inner",on=["Province/State","Country/Region","Lat","Long","date"])[["Country/Region","date","Confirm Cases","Deaths"]]

final

Unnamed: 0,Country/Region,date,Confirm Cases,Deaths
0,Afghanistan,1/22/20,0,0
1,Albania,1/22/20,0,0
2,Algeria,1/22/20,0,0
3,Andorra,1/22/20,0,0
4,Angola,1/22/20,0,0
...,...,...,...,...
311248,West Bank and Gaza,1/2/23,5708,703228
311249,Winter Olympics 2022,1/2/23,0,535
311250,Yemen,1/2/23,2159,11945
311251,Zambia,1/2/23,4024,334661
