# Summarizing DataFrames

This notebook will demonstrate how to apply simple statistical methods, such as the mean, sum, median, standard deviation (for numeric columns), and counts (for categorical columns) to columns in a pandas DataFrame.

In [None]:
import pandas as pd
gapminder = pd.read_csv('data/gapminder.csv')
gapminder

To apply a statistical summary such as the mean to a single column from a DataFrame, you first need to extract the column of interest, e.g., using the `df['col']` syntax, and then apply the relevant method (e.g., `.mean()`) to the resulting Series object.



### Calculating the mean


In [None]:
# Use the .mean() method to compute the mean of the lifeExp column


Note that `.mean()` is a *method*, rather than a function

In [None]:
# Try to use mean() as a function to compute the mean of lifeExp


You can apply `.mean()` to multiple columns at once by extracting the relevant columns (e.g., using `df[[]]` or `df.loc[,]`), and then applying `.mean()` to the resulting DataFrame:

In [None]:
# use the .mean() method to compute the mean of both the lifeExp and gdpPercap columns simultaneously


What is the type of the object returned by `.mean()`?

In [None]:
# Check the type of the result above


Since `.mean()` can be applied to multiple columns at once, why is it that applying `.mean()` directly to `gapminder` doesn't work? 

In [None]:
# what happens when you try to apply the .mean() method to the entire gapminder DataFrame?


In [None]:
# show gapminder to try to figure out why the above command failed


### Extracting columns of a particular type

You can extract all of the numeric columns of gapminder using the `.select_dtypes()` method, and then apply the `.mean()` method to the resulting DataFrame:

In [None]:
# apply the .mean() method to the gapminder DataFrame
# but only to the columns of type 'number' extracted using .select_dtypes()



### Other statistical summaries: sum, median, std

The sum:

In [None]:
# apply the .sum() method to the gapminder DataFrame, but only to the columns of type 'number'


The median:

In [None]:
# apply the .median() method to the gapminder DataFrame, but only to the columns of type 'number'


The standard deviation:

In [None]:
# apply the .std() method to the gapminder DataFrame, but only to the columns of type 'number'


### Counting the number of unique categorical values with the `value_counts()` method

While the above methods can only be used for numeric (float or integer) columns, there are some summaries that you can use for categorical columns too. 

The `.value_counts()` method will compute the number of times each unique value appears in a column.

In [None]:
# apply the .value_counts() method to the country column of the gapminder DataFrame


And the `continent` column:

In [None]:
# apply the .value_counts() method to the continent column of the gapminder DataFrame


## Exercise

Compute the average life expectancy for all countries in Asia in the year 1992.

## Standardizing a DataFrame

Let's create a version of gapminder that just contains the numeric columns:

In [None]:
# create gapminder_numeric, a subset of gapminder that contains only the columns of type 'number'


Note that Pandas' will perform mathematical operations column-wise, so standardization can be done by subtracting the mean and dividing by the standard deviation:

In [None]:
# create gapminder_std, which contains the standardized values of gapminder_numeric

# look at gapminder_std