# Grouped Computations for DataFrames

In [1]:
import pandas as pd
gapminder = pd.read_csv('data/gapminder.csv')
# create a version of gapminder with only the numeric columns called `gapminder_numeric`
gapminder_numeric = gapminder.select_dtypes(include='number').copy()

### The `.groupby()` method

So far we have seen how to compute statistical summaries across an entire column, but sometimes we want to compute a summary separately for different groups (where the groups might be defined by the unique values in a column).

The code below uses the `.groupby()` method to compute the mean of each column separately for each `year` value:

In [2]:
# Apply the .mean() method to gapminder_numeric, but group by the 'year' column.
gapminder_numeric.groupby('year').mean()

Unnamed: 0_level_0,lifeExp,pop,gdpPercap
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1952,49.05762,16950400.0,3725.276046
1957,51.507401,18763410.0,4299.408345
1962,53.609249,20421010.0,4725.812342
1967,55.67829,22658300.0,5483.653047
1972,57.647386,25189980.0,6770.082815
1977,59.570157,27676380.0,7313.166421
1982,61.533197,30207300.0,7518.901673
1987,63.212613,33038570.0,7900.920218
1992,64.160338,35990920.0,8158.608521
1997,65.014676,38839470.0,9090.175363


### Grouping by multiple columns

We can group by multiple columns at once by providing a *list* of the column names that we want to group by as the argument of the `.groupby()` method, and we can then extract a single column (if we choose) and compute the mean of it separately within each grouped combination. For example, the code below computes the mean `lifeExp` value for each year-continent combination:

In [3]:
# group by the year and continent columns and compute the mean of the lifeExp column
gapminder.groupby(['year', 'continent'])['lifeExp'].mean()

year  continent
1952  Africa       39.135500
      Americas     53.279840
      Asia         46.314394
      Europe       64.408500
      Oceania      69.255000
1957  Africa       41.266346
      Americas     55.960280
      Asia         49.318544
      Europe       66.703067
      Oceania      70.295000
1962  Africa       43.319442
      Americas     58.398760
      Asia         51.563223
      Europe       68.539233
      Oceania      71.085000
1967  Africa       45.334538
      Americas     60.410920
      Asia         54.663640
      Europe       69.737600
      Oceania      71.310000
1972  Africa       47.450942
      Americas     62.394920
      Asia         57.319269
      Europe       70.775033
      Oceania      71.910000
1977  Africa       49.580423
      Americas     64.391560
      Asia         59.610556
      Europe       71.937767
      Oceania      72.855000
1982  Africa       51.592865
      Americas     66.228840
      Asia         62.617939
      Europe       72.80640

### Exercise

1. Compute the maximum population for each country

2. Compute the mean gdpPercap for each continent averaged across all years after 1990

In [4]:
# compute the maximum population for each country
gapminder.groupby('country')['pop'].max()

country
Afghanistan           31889923
Albania                3600523
Algeria               33333216
Angola                12420476
Argentina             40301927
                        ...   
Vietnam               85262356
West Bank and Gaza     4018332
Yemen, Rep.           22211743
Zambia                11746035
Zimbabwe              12311143
Name: pop, Length: 142, dtype: int64

In [5]:
# compute the mean gdpPercal for each continent averaged across all years after 1990
gapminder.query('year > 1990').groupby('continent')['gdpPercap'].mean()

continent
Africa       2587.246913
Americas     9306.236000
Asia        10280.225202
Europe      20726.140986
Oceania     25416.796842
Name: gdpPercap, dtype: float64