## Grouped Aggregation

### Review

In [1]:
import pandas as pd

ames = pd.read_csv("data/ames_raw.csv")
ames.columns = ames.columns.str.lower().str.replace(" ", "_")
ames.head()

Unnamed: 0,order,pid,ms_subclass,ms_zoning,lot_frontage,lot_area,street,alley,lot_shape,land_contour,...,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,sale_condition,saleprice
0,1,526301100,20,RL,141.0,31770,Pave,,IR1,Lvl,...,0,,,,0,5,2010,WD,Normal,215000
1,2,526350040,20,RH,80.0,11622,Pave,,Reg,Lvl,...,0,,MnPrv,,0,6,2010,WD,Normal,105000
2,3,526351010,20,RL,81.0,14267,Pave,,IR1,Lvl,...,0,,,Gar2,12500,6,2010,WD,Normal,172000
3,4,526353030,20,RL,93.0,11160,Pave,,Reg,Lvl,...,0,,,,0,4,2010,WD,Normal,244000
4,5,527105010,60,RL,74.0,13830,Pave,,IR1,Lvl,...,0,,MnPrv,,0,3,2010,WD,Normal,189900


In [2]:
ames['saleprice'].mean()

180796.0600682594

In [3]:
ames[['saleprice', 'gr_liv_area']].mean()

saleprice      180796.060068
gr_liv_area      1499.690444
dtype: float64

In [4]:
ames.agg({
    'saleprice': ['mean', 'median', 'max'],
    'gr_liv_area': ['mean', 'median', 'min']
})

Unnamed: 0,saleprice,gr_liv_area
mean,180796.060068,1499.690444
median,160000.0,1442.0
max,755000.0,
min,,334.0


In [12]:
ames.groupby('neighborhood', as_index=False).agg({'saleprice': ['min', 'max'], 'gr_liv_area': 'mean'})

Unnamed: 0_level_0,neighborhood,saleprice,saleprice,gr_liv_area
Unnamed: 0_level_1,Unnamed: 1_level_1,min,max,mean
0,Blmngtn,156820,264561,1404.892857
1,Blueste,115000,200000,1159.7
2,BrDale,83000,125500,1115.233333
3,BrkSide,39300,223500,1234.907407
4,ClearCr,107500,328000,1744.386364
5,CollgCr,110000,475000,1496.11985
6,Crawfor,90350,392500,1722.796117
7,Edwards,35000,415000,1337.737113
8,Gilbert,115000,377500,1620.89697
9,Greens,155000,214000,1157.25


## Knowledge check

1. How would you convert the following statement into a grouped aggregation syntax: _“what is the average above ground square footage of homes based on neighbhorhood and bedroom count”_?

2. Compute the above statement (variable hints: `gr_liv_area` = above ground square footage, `neighborhood` = neighborhood, `bedroom_abvgr` = bedroom count).

3. Using the results from #2, find out which neighborhoods have 1 bedrooms homes that average more than 1500 above ground square feet.

In [15]:
results = ames.groupby(['neighborhood', 'bedroom_abvgr'], as_index=False).agg({'gr_liv_area': 'mean'})
results.head()

Unnamed: 0,neighborhood,bedroom_abvgr,gr_liv_area
0,Blmngtn,1,1546.666667
1,Blmngtn,2,1387.88
2,Blueste,1,1027.0
3,Blueste,2,1141.0
4,Blueste,3,1556.0


In [16]:
cond_a = results['bedroom_abvgr'] == 1
cond_b = results['gr_liv_area'] > 1500

results[cond_a & cond_b]

Unnamed: 0,neighborhood,bedroom_abvgr,gr_liv_area
0,Blmngtn,1,1546.666667
23,Crawfor,1,1801.0
39,GrnHill,1,1502.0
75,NridgHt,1,1574.0
100,SawyerW,1,1542.0
111,StoneBr,1,1838.875
