
<br/>

### Aggregates

Pandas provides a `.groupby()` method which makes it easy to compute aggregates over the DataFrame. This is very handy to find things like sums, counts, min. and max values.

The example below shows how to use `count()`, `sum()`, `min()`, and `max()`:

In [12]:
import pandas as pd

# read data fom csv
data_dir = '../../../data/input/ch1/'
routes_file = data_dir + 'routes.csv'
routes = pd.read_csv(routes_file, header=0)

# get count of routes leaving a src
routes_per_src = routes.groupby('src').src.count()

# get total stops leading to each dest
stops_per_dest = routes.groupby('dest').stops.sum()

# get max stops per src
max_stops_src = routes.groupby('src').stops.max()

# get min stops per src
min_stops_src = routes.groupby('src').stops.min()

print("Routes per src:\n", routes_per_src.head(10))
print("Stops per dest:\n", stops_per_dest.head(10))
print("Max stops per src:\n", max_stops_src.sort_values(ascending=False).head())
print("Min stops per src:\n", min_stops_src.head())

Routes per src:
 src
AAE     9
AAL    20
AAN     2
AAQ     3
AAR     8
AAT     2
AAX     1
AAY     1
ABA     4
ABB     2
Name: src, dtype: int64
Stops per dest:
 dest
AAE    0
AAL    0
AAN    0
AAQ    0
AAR    0
AAT    0
AAX    0
AAY    0
ABA    0
ABB    0
Name: stops, dtype: int64
Max stops per src:
 src
BOS    1
YRT    1
HOU    1
ARN    1
YVR    1
Name: stops, dtype: int64
Min stops per src:
 src
AAE    0
AAL    0
AAN    0
AAQ    0
AAR    0
Name: stops, dtype: int64



<br/>

### Multiple Aggregates Using .agg()

Alternatively, pandas provides the `.agg()` method to apply multiple aggregates on a column at the same time. You can accomplish the same results much more concisely by using the `.agg()` method such as:


In [6]:
# create a grouped series
grouped = routes.groupby('src').stops.agg([len, sum, min, max])
print(grouped.head(5))


     len  sum  min  max
src                    
AAE    9    0    0    0
AAL   20    0    0    0
AAN    2    0    0    0
AAQ    3    0    0    0
AAR    8    0    0    0



<br/>

### GroupBy Multiple Columns

You can pass an array of columns to `groupby()` method to aggregate by multiple columns at the same time. The example below calculates flight counts per airline route (airline, src, dest):

In [9]:
# get flight counts for distinct routes
routes_per_pair = routes.groupby(['src', 'dest']).src.count()
routes_per_pair.head(25)

src  dest
AAE  ALG     1
     CDG     1
     IST     1
     LYS     1
     MRS     2
     ORN     1
     ORY     2
AAL  AAR     1
     AGP     1
     ALC     1
     AMS     2
     ARN     1
     BCN     2
     BLL     3
     CPH     2
     IST     1
     LGW     1
     OSL     3
     PMI     1
     SVG     1
AAN  CCJ     1
     PEW     1
AAQ  DME     1
     LED     1
     SVO     1
Name: src, dtype: int64