# Data Reshaping


Let me get the data on Covid again:

In [None]:
import pandas as pd
import glob
import os

all_names = glob.glob(os.path.join('DataFiles' , "*2022.csv"))

dfs=[pd.read_csv(name,sep=";") for name in all_names]

covid=pd.concat(dfs,ignore_index=True,copy=False)

We speak of the **long** and the **wide** shape. As you can see the covid data is in the former:

In [None]:
covid.head(10)

Long format is efficient, but some operations may need a wide format:

In [None]:
covidSemanaW=pd.pivot_table(covid,
                            values='casosNovos', 
                            index=['estado'],
                            columns=['semanaEpi'],# to long
                            aggfunc=sum)

covidSemanaW

Notice the column names:

In [None]:
covidSemanaW.columns

Pandas gave a name to all the columns ('_semanaEpi_'), which is saved from the original data.

In [None]:
covidSemanaW.reset_index().rename_axis(index=None, columns=None)

We could save this, dropping the last column:

In [None]:
covidSemanaW=covidSemanaW.reset_index().rename_axis(index=None, columns=None)
covidSemanaW.drop(columns=[52],inplace=True)
covidSemanaW.to_csv(os.path.join('DataFiles','covidSemanaW.csv'),index=False)

We should be able to transfor this wide version into a long one:

In [None]:
covidSemanaL=covidSemanaW.set_index('estado').stack().reset_index()
covidSemanaL

In [None]:
# you can save after renaming
covidSemanaL.rename(columns={'level_1':'semanaEpi',0:'cases'},inplace=True)
covidSemanaW.to_csv(os.path.join('DataFiles','covidSemanaL.csv'),index=False)

Let's make it little more complex:

In [None]:
covidSemanaW2=pd.pivot_table(covid,
                            values=['casosNovos','obitosNovos'], 
                            index=['regiao','estado'],
                            columns=['semanaEpi'],
                            aggfunc=sum)

covidSemanaW2

Now you have _multi index_:

In [None]:
covidSemanaW2.index

This works well:

In [None]:
covidSemanaW2.reset_index()

Your problem is the column names:

In [None]:
covidSemanaW2.columns

Notice that, before making any change, you can easily convert this into a Long format:

In [None]:
covidSemanaW2.stack()

And, more interesting:

In [None]:
covidSemanaW2.stack([0,1])

In [None]:
covidSemanaW2.stack([0,1]).reset_index()

In [None]:
# rename

covidSemanaW2_L=covidSemanaW2.stack([0,1]).reset_index()
covidSemanaW2_L.rename(columns={'level_2':'measure',0:'counts'},inplace=True)

In [None]:
# then

covidSemanaW2_L.to_csv(os.path.join('DataFiles','covidSemanaW2_L.csv'),index=False)

But, if you decided to alter this:

In [None]:
covidSemanaW2.columns

In [None]:
# with something like

["_".join((str(a),str(b))) for a,b in covidSemanaW2.columns]

In [None]:
NewNames=["_".join((str(a),str(b))) for a,b in covidSemanaW2.columns]
covidSemanaW2.columns=NewNames

#now you have
covidSemanaW2

If you start with something like this, you could recover the Multi index:

In [None]:
pd.MultiIndex.from_tuples(covidSemanaW2.columns.str.split('_').map(tuple))