<a href="https://colab.research.google.com/github/bradleyboehmke/uc-bana-4080/blob/main/example-notebooks/10_aggregating_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Summarizing Data

> “What we have is a data glut.”
> — Vernor Vinge, Professor Emeritus of Mathematics, San Diego State University

As datasets grow in size and complexity, the ability to summarize information becomes an essential skill in any data scientist’s toolkit...

In [None]:
import pandas as pd

ames = pd.read_csv('https://raw.githubusercontent.com/bradleyboehmke/uc-bana-4080/refs/heads/main/data/ames_raw.csv')
ames.head()

### Summarizing a Series

In [None]:
ames['SalePrice'].sum()

In [None]:
ames['SalePrice'].mean()

In [None]:
ames['SalePrice'].median()

In [None]:
ames['SalePrice'].std()

In [None]:
ames['Neighborhood'].nunique()

In [None]:
ames['Neighborhood'].mode()

### The `.describe()` Method

In [None]:
ames['SalePrice'].describe()

In [None]:
ames['Neighborhood'].describe()

### Summarizing a DataFrame

In [None]:
ames[['SalePrice', 'Gr Liv Area']].mean()

In [None]:
ames[['SalePrice', 'Gr Liv Area']].median()

### The `.agg()` Method

In [None]:
ames.agg({'SalePrice': ['mean']})

In [None]:
ames.agg({'SalePrice': ['mean'], 'Gr Liv Area': ['mean']})

In [None]:
ames.agg({'SalePrice': ['mean', 'median'], 'Gr Liv Area': ['mean', 'min']})

## Grouped Aggregation

In [None]:
ames.groupby('Neighborhood').agg({'SalePrice': ['mean', 'median']}).head()

In [None]:
ames.groupby('Neighborhood', as_index=False).agg({'SalePrice': ['mean', 'median']}).head()

In [None]:
ames.groupby(['Neighborhood', 'Yr Sold'], as_index=False).agg({'SalePrice': 'mean'})

## COVID-19 College Data Aggregation Exercise

In [None]:
data_url = "https://raw.githubusercontent.com/nytimes/covid-19-data/refs/heads/master/colleges/colleges.csv"
college_df = pd.read_csv(data_url)
college_df.head()

In [None]:
# Sample aggregation template
# college_df['cases'].mean()
# college_df[['cases', 'cases_2021']].sum()