Let's bring some data of COVID from Brazil:

In [None]:

linkCovid="https://github.com/DACSS-Fundamentals/someData/raw/main/brazilCovid2022.csv"

covid=read.csv(linkCovid)


Now, check the data available:

In [None]:
str(covid)

Let's take a look:

In [None]:
head(covid,20)

Let's format the dates, and get date details:

In [None]:
covid$data=as.POSIXct(covid$data, format="%Y-%m-%d")
covid$day=format(covid$data,"%d")
covid$year=format(covid$data,"%Y")
covid$month=format(covid$data,"%m")

## see
head(covid)


Let's find out about  months available:

In [None]:
unique(covid$month)

So, we have data from January to July 2022.
Let's find out: **count of new positive cases per month**:

In [None]:
sum(covid$casosNovos[covid$month=='07'])

In [None]:
sum(covid$casosNovos[covid$month=='06'])

In [None]:
sum(covid$casosNovos[covid$month=='05'])

...

In [None]:
sum(covid$casosNovos[covid$month=='01'])

We use **aggregation** to simplify the previous steps:

In [None]:
# sum of cases by month
casesSumByMonth=aggregate(data=covid,casosNovos~month,sum)
casesSumByMonth

**AGGREGATING** capabilities allow us to produce useful output with few code:

* **The groupings**:

In the last example, _month_ was the **grouping** variable. We can have more the one of those:

In [None]:
# sum of cases by estado and week
casesSumByStateAndMonth=aggregate(data=covid,casosNovos~estado + month,sum)
casesSumByStateAndMonth

* **The function to apply**:

We can have more than one function:

In [None]:
# sum and mean of cases by estado and week
casesSumAndMeanByStateAndWeek=aggregate(data=covid,casosNovos~estado + semanaEpi,
          function(x) c(mean = mean(x), sum = sum(x) ) )


head(casesSumAndMeanByStateAndWeek,30)

...or better:

In [None]:
casesSumAndMeanByStateAndWeek=do.call(data.frame, aggregate(data=covid,casosNovos~estado + semanaEpi,
function(x) c(mean = mean(x), sum = sum(x) ) ))
head(casesSumAndMeanByStateAndWeek,30)

* **The variables transformed**:

We can apply the function to more than one variable:

In [None]:
# sum of cases and deaths by estado

CasesAndDeathsByState=aggregate(data=covid,
                                cbind(casosNovos,obitosNovos)~estado,
                                sum)

head(CasesAndDeathsByState,30)

* Function **according** to variable

The function can vary according to variable.  In this case, using **dplyr** is needed:

In [None]:
library(dplyr)
covid |>
  group_by(month) |>
  summarize(casosNovos_VAR = var(casosNovos),
            casosNovos_SD = sd(casosNovos),
            obitosNovos_Median = median(obitosNovos),
            obitosNovos_Mean = mean(obitosNovos))