# Grouping and Sorting
* sometimes, we want to group our data and do something specific with the group.
* to do so, we use **group_by()**

In [3]:
# import the data on which we will work on.
import pandas as pd
reviews = pd.read_csv('winemag_data\winemag-data-130k-v2.csv')
reviews.head(5)

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


## Grouping And Aggregation
* **group_by()** takes a column name as an argument and returns a DataFrameGroupBy object
* usually after we do the group_by, we apply some sort of aggregation functions, to get some insights. 
* group_by should be applied on a column that has overlapping values, because, if you applied it on a column that has unique values, it will return the same DataFrame.
* after applying certain aggregation function, it will neglect columns, that can not be aggregated, like strings, and will return a DataFrame with the aggregated columns only.

In [31]:
# help(reviews.groupby)
reviews.points.mean()
reviews.groupby('points').mean()



Unnamed: 0_level_0,Unnamed: 0,price
points,Unnamed: 1_level_1,Unnamed: 2_level_1
80,75196.856423,16.372152
81,70736.263006,17.182353
82,67122.309368,18.870767
83,68088.008926,18.237353
84,63307.477469,19.310215
85,62419.718783,19.949562
86,65240.739127,22.133759
87,62941.281817,24.901884
88,66309.540885,28.687523
89,66030.442172,32.16964


In [34]:
# we can see how many rows, that have been grouped together, by using the count method.
reviews.groupby('points').price.count()

points
80       395
81       680
82      1772
83      2886
84      6099
85      8902
86     11745
87     15767
88     16014
89     11324
90     14361
91     10564
92      8871
93      5935
94      3449
95      1406
96       482
97       207
98        69
99        28
100       19
Name: price, dtype: int64

## Agg function
* instead of applying one aggregation function, we can apply multiple aggregation functions at once, using **agg()** function.
* it takes a map where
  * key: is the column name on which we want to apply the aggregation functions.
  * value: list of aggregated functions we want to apply on the column.
* it will return 1 column, and each value in this column will be a tuple of the aggregated values.

In [43]:
reviews.groupby('country').agg({'points': [min, max]})

#? this is how we can access the element in that column
# reviews.groupby('country').agg({'points': [min, max]}).points['min']


Unnamed: 0_level_0,points,points
Unnamed: 0_level_1,min,max
country,Unnamed: 1_level_2,Unnamed: 2_level_2
Argentina,80,97
Armenia,87,88
Australia,80,100
Austria,82,98
Bosnia and Herzegovina,85,88
Brazil,80,89
Bulgaria,80,91
Canada,82,94
Chile,80,95
China,89,89


In [36]:
reviews.groupby('points').agg({'price': ['min', 'max', 'mean', 'count']})


Unnamed: 0_level_0,price,price,price,price
Unnamed: 0_level_1,min,max,mean,count
points,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
80,5.0,69.0,16.372152,395
81,5.0,130.0,17.182353,680
82,4.0,150.0,18.870767,1772
83,4.0,225.0,18.237353,2886
84,4.0,225.0,19.310215,6099
85,4.0,320.0,19.949562,8902
86,4.0,170.0,22.133759,11745
87,5.0,800.0,24.901884,15767
88,6.0,3300.0,28.687523,16014
89,7.0,500.0,32.16964,11324


### Aggregate on multiple columns
* we can apply aggregation functions on multiple columns, by passing multiple maps in the agg function.


In [45]:
reviews.groupby('country').agg({'points': [min, max], 'price':['min', 'max', 'mean', 'count']})


Unnamed: 0_level_0,points,points,price,price,price,price
Unnamed: 0_level_1,min,max,min,max,mean,count
country,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Argentina,80,97,4.0,230.0,24.510117,3756
Armenia,87,88,14.0,15.0,14.5,2
Australia,80,100,5.0,850.0,35.437663,2294
Austria,82,98,7.0,1100.0,30.762772,2799
Bosnia and Herzegovina,85,88,12.0,13.0,12.5,2
Brazil,80,89,10.0,60.0,23.765957,47
Bulgaria,80,91,8.0,100.0,14.64539,141
Canada,82,94,12.0,120.0,35.712598,254
Chile,80,95,5.0,400.0,20.786458,4416
China,89,89,18.0,18.0,18.0,1


## Group by multiple columns
* we can group by multiple columns, by passing a list of columns to the group_by function.

> Remember that the returned object, so we can deal with it as a dataframe.

In [52]:
reviews.groupby(['points', 'price']).agg({'country' : ['count'], 'price': ['min', 'max', 'mean', 'count']})

Unnamed: 0_level_0,Unnamed: 1_level_0,country,price,price,price,price
Unnamed: 0_level_1,Unnamed: 1_level_1,count,min,max,mean,count
points,price,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
80,5.0,2,5.0,5.0,5.0,2
80,6.0,3,6.0,6.0,6.0,3
80,7.0,10,7.0,7.0,7.0,10
80,8.0,23,8.0,8.0,8.0,23
80,9.0,20,9.0,9.0,9.0,20
...,...,...,...,...,...,...
100,550.0,1,550.0,550.0,550.0,1
100,617.0,1,617.0,617.0,617.0,1
100,650.0,1,650.0,650.0,650.0,1
100,848.0,1,848.0,848.0,848.0,1


In [53]:
reviews.groupby(['points', 'price']).describe()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 0,Unnamed: 0,Unnamed: 0,Unnamed: 0,Unnamed: 0,Unnamed: 0,Unnamed: 0,Unnamed: 0
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,std,min,25%,50%,75%,max
points,price,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
80,5.0,2.0,74631.500000,72399.956219,23437.0,49034.25,74631.5,100228.75,125826.0
80,6.0,3.0,71962.333333,33749.954523,34501.0,57945.50,81390.0,90693.00,99996.0
80,7.0,10.0,74916.300000,39058.629188,13782.0,40718.50,81339.0,100282.50,125774.0
80,8.0,23.0,75017.826087,36799.926860,3640.0,42909.00,83847.0,99994.50,125825.0
80,9.0,20.0,69481.750000,35244.449756,21843.0,42195.75,57585.0,102982.75,125823.0
...,...,...,...,...,...,...,...,...,...
100,550.0,1.0,45781.000000,,45781.0,45781.00,45781.0,45781.00,45781.0
100,617.0,1.0,89729.000000,,89729.0,89729.00,89729.0,89729.00,89729.0
100,650.0,1.0,114972.000000,,114972.0,114972.00,114972.0,114972.00,114972.0
100,848.0,1.0,122935.000000,,122935.0,122935.00,122935.0,122935.00,122935.0


> [reference Video](https://www.youtube.com/watch?v=VRmXto2YA2I)