**[Pandas Micro-Course Home Page](https://www.kaggle.com/learn/pandas)**

---


# Intro
Over time, you'll be using `groupby` a lot. So it's worth practicing your skills to get efficient with it.

# Relevant Resources
- [**Grouping Reference and Examples**](https://www.kaggle.com/residentmario/grouping-and-sorting-reference)  
- [Pandas cheat sheet](https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf)

# Set Up
Run the code cell below to load the data before running the exercises.

In [1]:
import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
#pd.set_option("display.max_rows", 5)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.grouping_and_sorting import *
print("Setup complete.")

Setup complete.


# Exercises

## 1.
Who are the most common wine reviewers in the dataset? Create a `Series` whose index is the `taster_twitter_handle` category from the dataset, and whose values count how many reviews each person wrote.

In [2]:
# Your code here

reviews_written = reviews.groupby('taster_twitter_handle').size()

q1.check()
reviews_written[reviews_written == 25514]

print(reviews_written == 25514)

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct:</span> 


```python
reviews_written = reviews.groupby('taster_twitter_handle').size()
```
or
```python
reviews_written = reviews.groupby('taster_twitter_handle').taster_twitter_handle.count()
```


taster_twitter_handle
@AnneInVino         False
@JoeCz              False
@bkfiona            False
@gordone_cellars    False
@kerinokeefe        False
@laurbuzz           False
@mattkettmann       False
@paulgwine          False
@suskostrzewa       False
@vboone             False
@vossroger           True
@wawinereport       False
@wineschach         False
@winewchristina     False
@worldwineguys      False
dtype: bool


In [3]:
#q1.hint()
q1.solution()

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
reviews_written = reviews.groupby('taster_twitter_handle').size()
```
or
```python
reviews_written = reviews.groupby('taster_twitter_handle').taster_twitter_handle.count()
```


## 2.
What is the best wine I can buy for a given amount of money? Create a `Series` whose index is wine prices and whose values is the maximum number of points a wine costing that much was given in a review. Sort the values by price, ascending (so that `4.0` dollars is at the top and `3300.0` dollars is at the bottom).

In [4]:
best_rating_per_price = reviews.groupby('price')['points'].max().sort_index()
q2.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [5]:
#q2.hint()
q2.solution()

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
best_rating_per_price = reviews.groupby('price')['points'].max().sort_index()
```

## 3.
What are the minimum and maximum prices for each `variety` of wine? Create a `DataFrame` whose index is the `variety` category from the dataset and whose values are the `min` and `max` values thereof.

In [6]:
price_extremes = reviews.groupby('variety')['price'].agg([min, max])

q3.check()
price_extremes

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

Unnamed: 0_level_0,min,max
variety,Unnamed: 1_level_1,Unnamed: 2_level_1
Abouriou,15.0,75.0
Agiorgitiko,10.0,66.0
Aglianico,6.0,180.0
Aidani,27.0,27.0
Airen,8.0,10.0
Albana,12.0,50.0
Albanello,20.0,20.0
Albariño,10.0,75.0
Albarossa,40.0,40.0
Aleatico,25.0,55.0


In [7]:
#q3.hint()
q3.solution()

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
price_extremes = reviews.groupby('variety').price.agg([min, max])
```

## 4.
What are the most expensive wine varieties? Create a variable `sorted_varieties` containing a copy of the dataframe from the previous question where varieties are sorted in descending order based on minimum price, then on maximum price (to break ties).

In [8]:
sorted_varieties = price_extremes.sort_values(by=['min','max'], ascending=False)

q4.check()
sorted_varieties

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

Unnamed: 0_level_0,min,max
variety,Unnamed: 1_level_1,Unnamed: 2_level_1
Ramisco,495.0,495.0
Terrantez,236.0,236.0
Francisa,160.0,160.0
Rosenmuskateller,150.0,150.0
Tinta Negra Mole,112.0,112.0
Pignolo,70.0,70.0
Syrah-Cabernet Franc,60.0,69.0
Garnacha-Cariñena,57.0,57.0
Doña Blanca,53.0,53.0
Cercial,50.0,50.0


In [9]:
#q4.hint()
q4.solution()

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
sorted_varieties = price_extremes.sort_values(by=['min', 'max'], ascending=False)
```

## 5.
Create a `Series` whose index is reviewers and whose values is the average review score given out by that reviewer. Hint: you will need the `taster_name` and `points` columns.

In [10]:
reviewer_mean_ratings = reviews.groupby('taster_name').points.mean()

q5.check()
reviewer_mean_ratings

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

taster_name
Alexander Peartree    85.855422
Anna Lee C. Iijima    88.415629
Anne Krebiehl MW      90.562551
Carrie Dykes          86.395683
Christina Pickard     87.833333
Fiona Adams           86.888889
Jeff Jenssen          88.319756
Jim Gordon            88.626287
Joe Czerwinski        88.536235
Kerin O’Keefe         88.867947
Lauren Buzzeo         87.739510
Matt Kettmann         90.008686
Michael Schachner     86.907493
Mike DeSimone         89.101167
Paul Gregutt          89.082564
Roger Voss            88.708003
Sean P. Sullivan      88.755739
Susan Kostrzewa       86.609217
Virginie Boone        89.213379
Name: points, dtype: float64

In [11]:
#q5.hint()
#q5.solution()

Are there significant differences in the average scores assigned by the various reviewers? Run the cell below to use the `describe()` method to see a summary of the range of values.

In [12]:
reviewer_mean_ratings.describe()

count    19.000000
mean     88.233026
std       1.243610
min      85.855422
25%      87.323501
50%      88.536235
75%      88.975256
max      90.562551
Name: points, dtype: float64

1. ## 6.
What combination of countries and varieties are most common? Create a `Series` whose index is a `MultiIndex`of `{country, variety}` pairs. For example, a pinot noir produced in the US should map to `{"US", "Pinot Noir"}`. Sort the values in the `Series` in descending order based on wine count.

In [13]:
country_variety_counts = reviews.groupby(['country', 'variety']).size().sort_values(
ascending=False)

q6.check()
country_variety_counts

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

country     variety                   
US          Pinot Noir                    9885
            Cabernet Sauvignon            7315
            Chardonnay                    6801
France      Bordeaux-style Red Blend      4725
Italy       Red Blend                     3624
US          Syrah                         3244
            Red Blend                     2972
France      Chardonnay                    2808
Italy       Nebbiolo                      2736
US          Zinfandel                     2711
Portugal    Portuguese Red                2466
US          Merlot                        2311
Italy       Sangiovese                    2265
US          Sauvignon Blanc               2163
France      Pinot Noir                    1966
            Rosé                          1923
US          Bordeaux-style Red Blend      1824
Germany     Riesling                      1790
US          Riesling                      1753
Argentina   Malbec                        1510
Spain       Tempranil

In [14]:
#q6.hint()
q6.solution()

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
country_variety_counts = reviews.groupby(['country', 'variety']).size().sort_values(ascending=False)
```

# Keep Going

Congrats. This is a really useful skill. 

Next up, learn about [**Data Types and Missing Data**](https://www.kaggle.com/residentmario/data-types-and-missing-data-reference) so you can work with more types of data.

---
**[Pandas Micro-Course Home Page](https://www.kaggle.com/learn/pandas)**

