### Introduction
Maps allow us to transform data in a DataFrame or Series one value at a time for an entire column. However, often we want to gproup our data,

In [11]:
import pandas as pd

In [12]:
wine_store = pd.read_csv('../data/wine_catalog/wine_store_dataset.csv')

In [13]:
wine_store = wine_store.drop(columns=['Unnamed: 0'])

In [14]:
wine_store.head()

Unnamed: 0,country,designation,points,price,province,region_1,region_2,variety,winery,last_year_points
0,US,Martha's Vineyard,96.0,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz,94
1,Spain,Carodorum Selección Especial Reserva,96.0,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez,92
2,US,Special Selected Late Harvest,96.0,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley,100
3,US,Reserve,96.0,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi,94
4,France,La Brûlade,95.0,66.0,Provence,Bandol,,Provence red blend,Domaine de la Bégude,94


### Groupwise analysis
One function we've been using heavily thus far is the `value_counts()` function.

The `groupby()` operations involves some combinations of splitting the object, applying a function, and combining the results.  
This can be used to group large amount of data and compute operations on these groups.

In [5]:
wine_store.groupby('points').points.count() # show us, how many times `points` it appeared.

points
80.0       838
81.0      1407
82.0      3893
83.0      5756
84.0     10304
85.0     11912
86.0     14925
87.0     19944
88.0     17100
89.0     12248
90.0     15168
91.0      9929
92.0      8746
93.0      5720
94.0      3272
95.0      1612
96.0       655
97.0       342
98.0       127
99.0        43
100.0       24
Name: points, dtype: int64

`value_counts()` is just a shortcut to this `groupby()` operation.

For example, to get the expansive wine in each point value category, we can do the following:

In [6]:
wine_store.groupby('points').price.max()

points
80.0       80.0
81.0       95.0
82.0      150.0
83.0      225.0
84.0      225.0
85.0      320.0
86.0      495.0
87.0      325.0
88.0      325.0
89.0      500.0
90.0      535.0
91.0     2013.0
92.0      670.0
93.0      770.0
94.0     1000.0
95.0      850.0
96.0     1300.0
97.0     1100.0
98.0     1900.0
99.0     2300.0
100.0    1400.0
Name: price, dtype: float64

This DataFrame is accessible to us directly using the `apply()` method, and we can manipulate the data in any way we see fit.

In [9]:
wine_store.groupby('winery').apply(lambda df: df.price.iloc[0])

winery
 Rondinella          40.0
'37 Cellars           NaN
1+1=3                18.0
10 Knots             35.0
1000 Stories         19.0
                     ... 
Ñandú                17.0
Único Luis Miguel    15.0
àMaurice             35.0
áster                26.0
Štoka                23.0
Length: 14582, dtype: float64

In [8]:
wine_store.country.iloc[0]

'US'

In [24]:
wine_store.groupby(['country', 'province']).apply(lambda df: df.loc[df.points.idxmax()])
# x.to_csv('../data/wine_catalog/wine_store_country_winery.csv')

Unnamed: 0_level_0,Unnamed: 1_level_0,country,designation,points,price,province,region_1,region_2,variety,winery,last_year_points
country,province,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Albania,Mirditë,Albania,,88.0,20.0,Mirditë,,,Kallmet,Arbëri,82
Argentina,Mendoza Province,Argentina,Nicasia Vineyard,97.0,120.0,Mendoza Province,Mendoza,,Malbec,Bodega Catena Zapata,92
Argentina,Other,Argentina,Reserva,95.0,90.0,Other,Salta,,Malbec,Colomé,92
Australia,Australia Other,Australia,Yattarna,92.0,65.0,Australia Other,South Eastern Australia,,Chardonnay,Penfolds,93
Australia,New South Wales,Australia,Noble One Botrytis,93.0,32.0,New South Wales,New South Wales,,Sémillon,De Bortoli,98
...,...,...,...,...,...,...,...,...,...,...,...
Uruguay,Juanico,Uruguay,Preludio Barrel Select Lote N 77,90.0,45.0,Juanico,,,Red Blend,Familia Deicas,100
Uruguay,Montevideo,Uruguay,Monte Vide Eu Tannat-Merlot-Tempranillo,90.0,57.0,Montevideo,,,Red Blend,Bouza,100
Uruguay,Progreso,Uruguay,RPF,89.0,23.0,Progreso,,,Tannat,Pisano,81
Uruguay,San Jose,Uruguay,,84.0,19.0,San Jose,,,Tannat-Syrah,Tanterra,93


`agg()` lets you run a bunch of different functions on your DataFrame simultaneously.

In [26]:
wine_store.groupby(['country']).price.agg([len, min, max]).head()

Unnamed: 0_level_0,len,min,max
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Albania,2.0,20.0,20.0
Argentina,4978.0,4.0,250.0
Australia,4495.0,5.0,850.0
Austria,2933.0,8.0,208.0
Bosnia and Herzegovina,4.0,12.0,13.0


The `groupby()` will allow you to do lots of really powerfull things with your dataset.