# Data Aggregating


Sometimes, you need to summarize the unit of analysis at a higher level. This is when you need the aggregating capabilities in Pandas.

We will use data from here:

In [None]:
%%html
<iframe width="700" height="300" src="https://covid.saude.gov.br/" allowfullscreen></iframe>


I downloaded the data for 2022 in the _DataFiles_ folder:

In [None]:
import pandas as pd
import glob
import os

all_names = glob.glob(os.path.join('DataFiles' , "*2022.csv"))
all_names

In [None]:
dfs=[]
for name in all_names:
    dfs.append(pd.read_csv(name,sep=";"))

Let's check the names:

In [None]:
for df in dfs:
    print(df.columns)

In [None]:
# then
covid=pd.concat(dfs,ignore_index=True,copy=False)

In [None]:
#import dtale
#dtale.show(covid)

We have several rows:

In [None]:
covid.shape[0]

Let's keep what we need:

In [None]:
toSelect=['regiao', 'estado', 'municipio','data', 'semanaEpi','casosNovos', 'obitosNovos']
covid=covid[toSelect]

In [None]:
# you have the data at the municipal level

covid.head()

Let's aggregate:

In [None]:
# sum of cases by estado
covid.groupby('estado').agg({'casosNovos': 'sum'})

In [None]:
# of course you can do this:
covid.groupby(['estado','semanaEpi']).agg({'casosNovos': 'sum'})

In [None]:
# or more complex:
covid.groupby(['estado','semanaEpi']).agg({'casosNovos': ['sum','mean']})

In [None]:
# sum of cases and deaths by estado
covidAGG=covid.groupby('estado').agg({'casosNovos': 'sum', 'obitosNovos': 'sum'})
covidAGG

Notice that the _state_ name is the index:

In [None]:
covidAGG.columns

You can save it like this:
**covidAGG.to_csv(os.path.join("DataFiles","Aggregated_Covid.csv"),index=True)**

Or you can send the index into dataframe:

In [None]:
covidAGG.reset_index() #you don't drop it!!

In [None]:
import os

covidAGG=covidAGG.reset_index()
covidAGG.to_csv(os.path.join("DataFiles","Aggregated_Covid.csv"),index=False)