# Pandas for Data Analysis

[**Pandas**](https://pandas.pydata.org/) is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

We will first introduce some core aspects of pandas using toy data, and then analyse a real data set. First, we should import the pandas package - by convention we give it a shorthand name using `as`. When we want to use the package, we can type `pd.` instead of `pandas.`. 

#### Useful links
* [Data Wrangling cheat sheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)
* [Python For Data Science cheat sheet](https://www.utc.fr/~jlaforet/Suppl/python-cheatsheets.pdf)

In [117]:
import pandas as pd

### Creating and Reading Data

Two core objects in pandas: **Series** and **DataFrame**.

[**Series**](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) is a one-dimensional (1d) *list* of values. It has a corresponding list of *index* and (possibly) a *name*.

In [118]:
pd.Series([3780, 4120, 4750], index=[2020,2021,2022], name='sales')

2020    3780
2021    4120
2022    4750
Name: sales, dtype: int64

[**DataFrame**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame) is a two-dimensional (2d) *table* of values. Each row is a "record" having its *index* and each column is a **Series** having its (column) *name*.

In [119]:
df = pd.DataFrame({'date': pd.date_range('31/05/2022', periods=5, freq='ME'), # freq='ME' means month end frequency
                   'sales': [300.12, 313.28, 330.64, 347.59, 352.11],
                   'department': 'domestic'})
df

Unnamed: 0,date,sales,department
0,2022-05-31,300.12,domestic
1,2022-06-30,313.28,domestic
2,2022-07-31,330.64,domestic
3,2022-08-31,347.59,domestic
4,2022-09-30,352.11,domestic


**Checking the DataFrame index**

- The index is how Pandas labels and organizes the rows in your DataFrame.
- Knowing the index is important because it tells you how you can access, align, or join your data.


In [120]:
df.index

RangeIndex(start=0, stop=5, step=1)

**Setting a column as the index**
- Makes it easier to work with time series data, since dates are now row labels.

In [121]:
df.set_index(['date'] , inplace= True)

**We can also reset the index**

In [122]:
df.reset_index()

Unnamed: 0,date,sales,department
0,2022-05-31,300.12,domestic
1,2022-06-30,313.28,domestic
2,2022-07-31,330.64,domestic
3,2022-08-31,347.59,domestic
4,2022-09-30,352.11,domestic


**Checking the Datafram columns**
- Lists all column labels in the DataFrame.
- Useful for quickly checking the structure of your dataset.

### Exercise

1. Create a dataframe called sales that matches the diagram below

| week       | electronics_sales | furniture_sales |
|------------|-------------------|-----------------|
| 2022-06-05 | 120               | 85              |
| 2022-06-12 | 135               | 90              |
| 2022-06-19 | 128               | 88              |
| 2022-06-26 | 150               | 95              |

2. Display the sales dataframe
3. Set the `week` column as index

In [123]:
df = pd.DataFrame({'week': pd.date_range('2022/06/01', periods=4, freq='W'), 
                   'electronics_sales': [120,135,128,150],
                   'furniture_sales': [85,90,88,95]})

df.set_index('week', inplace=True)
df

Unnamed: 0_level_0,electronics_sales,furniture_sales
week,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-06-05,120,85
2022-06-12,135,90
2022-06-19,128,88
2022-06-26,150,95


More often, DataFrames are created from data files, like **CSV (comma-separated values)** files, using [`pd.read_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) function.

**Let's import our first dataframe**

In [124]:
reviews=  pd.read_csv("../Data/wine_reviews.csv", index_col=0)
reviews.reset_index(drop=True, inplace=True)
reviews


Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Australia,"Possibly a little sweet, this is a soft, easyg...",,83,5.0,Australia Other,South Eastern Australia,,Joe Czerwinski,@JoeCz,Banrock Station 2006 Chardonnay (South Eastern...,Chardonnay,Banrock Station
1,France,"A soft, almost off dry wine that is full in th...",Réserve,85,12.0,Rhône Valley,Côtes du Rhône,,Roger Voss,@vossroger,Cellier des Dauphins 2015 Réserve Rosé (Côtes ...,Rosé,Cellier des Dauphins
2,Spain,Generic white-fruit aromas of peach and apple ...,Estate Grown & Bottled,86,9.0,Northern Spain,Rueda,,Michael Schachner,@wineschach,Esperanza 2013 Estate Grown & Bottled Verdejo-...,Verdejo-Viura,Esperanza
3,US,This is the winery's best Nebula in years. Whi...,Nebula,87,29.0,California,Paso Robles,Central Coast,,,Midnight 2010 Nebula Cabernet Sauvignon (Paso ...,Cabernet Sauvignon,Midnight
4,US,This is a very rich Pinot whose primary virtue...,Wiley Vineyard,88,40.0,California,Anderson Valley,,,,Harrington 2006 Wiley Vineyard Pinot Noir (And...,Pinot Noir,Harrington
...,...,...,...,...,...,...,...,...,...,...,...,...,...
58482,US,A solid effort from a dependable winery that u...,Winemaker's Reserve,88,35.0,California,Sonoma County,Sonoma,,,Château Souverain 1996 Winemaker's Reserve Cab...,Cabernet Sauvignon,Château Souverain
58483,Greece,"Crushed thyme, pine resin and lemon start this...",Retsina of Attica,86,9.0,Attica,,,Susan Kostrzewa,@suskostrzewa,Kourtaki NV Retsina of Attica Savatiano (Attica),Savatiano,Kourtaki
58484,Italy,"Made from Negroamaro, this opens with aromas o...",,87,15.0,Southern Italy,Salento,,Kerin O’Keefe,@kerinokeefe,Masseria Altemura 2016 Rosato (Salento),Rosato,Masseria Altemura
58485,US,"This big, bold wine has the taste profile of a...",Estate Mae's Block Ravazzi Vineyard,88,32.0,California,Mendocino,,Jim Gordon,@gordone_cellars,Jaxon Keys 2013 Estate Mae's Block Ravazzi Vin...,Zinfandel,Jaxon Keys


### Viewing, Selecting, Assigning & Missing Data

In [125]:
reviews.shape

(58487, 13)

In [126]:
reviews.info

<bound method DataFrame.info of          country                                        description  \
0      Australia  Possibly a little sweet, this is a soft, easyg...   
1         France  A soft, almost off dry wine that is full in th...   
2          Spain  Generic white-fruit aromas of peach and apple ...   
3             US  This is the winery's best Nebula in years. Whi...   
4             US  This is a very rich Pinot whose primary virtue...   
...          ...                                                ...   
58482         US  A solid effort from a dependable winery that u...   
58483     Greece  Crushed thyme, pine resin and lemon start this...   
58484      Italy  Made from Negroamaro, this opens with aromas o...   
58485         US  This big, bold wine has the taste profile of a...   
58486      Spain  Zingy and sort of floral on the nose, but fair...   

                               designation  points  price         province  \
0                                    

### Selecting Data

Also called **indexing**, it is the most common operation in Pandas. We discuss 4 cases selecting data from a DataFrame:
1. Selecting one **column** (as a Series)
2. Selecting by **label**
3. Selecting by **position**
4. Selecting by **conditions**

We will practice with the wine review DataFrame.

In [127]:
reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Australia,"Possibly a little sweet, this is a soft, easyg...",,83,5.0,Australia Other,South Eastern Australia,,Joe Czerwinski,@JoeCz,Banrock Station 2006 Chardonnay (South Eastern...,Chardonnay,Banrock Station
1,France,"A soft, almost off dry wine that is full in th...",Réserve,85,12.0,Rhône Valley,Côtes du Rhône,,Roger Voss,@vossroger,Cellier des Dauphins 2015 Réserve Rosé (Côtes ...,Rosé,Cellier des Dauphins
2,Spain,Generic white-fruit aromas of peach and apple ...,Estate Grown & Bottled,86,9.0,Northern Spain,Rueda,,Michael Schachner,@wineschach,Esperanza 2013 Estate Grown & Bottled Verdejo-...,Verdejo-Viura,Esperanza
3,US,This is the winery's best Nebula in years. Whi...,Nebula,87,29.0,California,Paso Robles,Central Coast,,,Midnight 2010 Nebula Cabernet Sauvignon (Paso ...,Cabernet Sauvignon,Midnight
4,US,This is a very rich Pinot whose primary virtue...,Wiley Vineyard,88,40.0,California,Anderson Valley,,,,Harrington 2006 Wiley Vineyard Pinot Noir (And...,Pinot Noir,Harrington


**1. Selecting a column**

In [128]:
reviews.country

0        Australia
1           France
2            Spain
3               US
4               US
           ...    
58482           US
58483       Greece
58484        Italy
58485           US
58486        Spain
Name: country, Length: 58487, dtype: object

**2. Selecting by label**

Here "label" means the "row names" `index` and the "column names" `columns`.

In [129]:
reviews.loc[20, "country"]

'Chile'

Use `loc[]` to access part of the DataFrame by row and column **labels**. Note the `[]` instead of `()`.

We can use `:` inside `.loc[]` to access either all rows for a given column(s)

In [130]:
reviews.loc[: , ["country", "province"]]

Unnamed: 0,country,province
0,Australia,Australia Other
1,France,Rhône Valley
2,Spain,Northern Spain
3,US,California
4,US,California
...,...,...
58482,US,California
58483,Greece,Attica
58484,Italy,Southern Italy
58485,US,California


We can rearrange our `.loc[]` to obtain all columns for specif range of rows too!

In [131]:
reviews.loc[2:5, :]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
2,Spain,Generic white-fruit aromas of peach and apple ...,Estate Grown & Bottled,86,9.0,Northern Spain,Rueda,,Michael Schachner,@wineschach,Esperanza 2013 Estate Grown & Bottled Verdejo-...,Verdejo-Viura,Esperanza
3,US,This is the winery's best Nebula in years. Whi...,Nebula,87,29.0,California,Paso Robles,Central Coast,,,Midnight 2010 Nebula Cabernet Sauvignon (Paso ...,Cabernet Sauvignon,Midnight
4,US,This is a very rich Pinot whose primary virtue...,Wiley Vineyard,88,40.0,California,Anderson Valley,,,,Harrington 2006 Wiley Vineyard Pinot Noir (And...,Pinot Noir,Harrington
5,US,"An unabashedly rich and luscious wine, this co...",,90,22.0,California,Sonoma County-Monterey County-Santa Barbara Co...,California Other,Jim Gordon,@gordone_cellars,Meiomi 2013 Chardonnay (Sonoma County-Monterey...,Chardonnay,Meiomi


We can use the `:` inside `.loc[]` to obtain a range from a specific point until the end of the dataframe 

In [132]:
reviews.loc[:]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Australia,"Possibly a little sweet, this is a soft, easyg...",,83,5.0,Australia Other,South Eastern Australia,,Joe Czerwinski,@JoeCz,Banrock Station 2006 Chardonnay (South Eastern...,Chardonnay,Banrock Station
1,France,"A soft, almost off dry wine that is full in th...",Réserve,85,12.0,Rhône Valley,Côtes du Rhône,,Roger Voss,@vossroger,Cellier des Dauphins 2015 Réserve Rosé (Côtes ...,Rosé,Cellier des Dauphins
2,Spain,Generic white-fruit aromas of peach and apple ...,Estate Grown & Bottled,86,9.0,Northern Spain,Rueda,,Michael Schachner,@wineschach,Esperanza 2013 Estate Grown & Bottled Verdejo-...,Verdejo-Viura,Esperanza
3,US,This is the winery's best Nebula in years. Whi...,Nebula,87,29.0,California,Paso Robles,Central Coast,,,Midnight 2010 Nebula Cabernet Sauvignon (Paso ...,Cabernet Sauvignon,Midnight
4,US,This is a very rich Pinot whose primary virtue...,Wiley Vineyard,88,40.0,California,Anderson Valley,,,,Harrington 2006 Wiley Vineyard Pinot Noir (And...,Pinot Noir,Harrington
...,...,...,...,...,...,...,...,...,...,...,...,...,...
58482,US,A solid effort from a dependable winery that u...,Winemaker's Reserve,88,35.0,California,Sonoma County,Sonoma,,,Château Souverain 1996 Winemaker's Reserve Cab...,Cabernet Sauvignon,Château Souverain
58483,Greece,"Crushed thyme, pine resin and lemon start this...",Retsina of Attica,86,9.0,Attica,,,Susan Kostrzewa,@suskostrzewa,Kourtaki NV Retsina of Attica Savatiano (Attica),Savatiano,Kourtaki
58484,Italy,"Made from Negroamaro, this opens with aromas o...",,87,15.0,Southern Italy,Salento,,Kerin O’Keefe,@kerinokeefe,Masseria Altemura 2016 Rosato (Salento),Rosato,Masseria Altemura
58485,US,"This big, bold wine has the taste profile of a...",Estate Mae's Block Ravazzi Vineyard,88,32.0,California,Mendocino,,Jim Gordon,@gordone_cellars,Jaxon Keys 2013 Estate Mae's Block Ravazzi Vin...,Zinfandel,Jaxon Keys


**3. Selecting by position**

Here "position" means the *numerical location*, i.e., row number and column number (both start from 0 per Python convention), in the DataFrame.

Use `iloc[]` to access part of the DataFrame by row and column numbers. Note the `[]` instead of `()`.

In [133]:
reviews.iloc[1,0]

'France'

In [134]:
reviews.iloc[:,[0, 5]]

Unnamed: 0,country,province
0,Australia,Australia Other
1,France,Rhône Valley
2,Spain,Northern Spain
3,US,California
4,US,California
...,...,...
58482,US,California
58483,Greece,Attica
58484,Italy,Southern Italy
58485,US,California


**4. Selecting by conditions**

This is also called **boolean indexing**, usually used to select *rows* satisfying certain conditions.

In [135]:
reviews.loc[reviews.country == "France"]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
1,France,"A soft, almost off dry wine that is full in th...",Réserve,85,12.0,Rhône Valley,Côtes du Rhône,,Roger Voss,@vossroger,Cellier des Dauphins 2015 Réserve Rosé (Côtes ...,Rosé,Cellier des Dauphins
9,France,"Made from low-yielding, 70-year-old vines, thi...",Clos du Château,92,21.0,Provence,Côtes de Provence,,Roger Voss,@vossroger,Domaine du Clos Gautier 2015 Clos du Château R...,Rosé,Domaine du Clos Gautier
11,France,"This ripe, fruity wine has both freshness and ...",Réserve des Vignerons,86,14.0,Loire Valley,Saumur,,Roger Voss,@vossroger,Cave de Saumur 2014 Réserve des Vignerons (Sa...,Chenin Blanc,Cave de Saumur
12,France,This is a smooth and creamy wine with soft app...,Clos le Vigneau,90,18.0,Loire Valley,Vouvray,,Roger Voss,@vossroger,Château Gaudrelle 2010 Clos le Vigneau (Vouvray),Chenin Blanc,Château Gaudrelle
28,France,"This is pretty pale for a Tavel, with a copper...",,90,24.0,Rhône Valley,Tavel,,Joe Czerwinski,@JoeCz,Prieuré de Montézargues 2014 Tavel,Rosé,Prieuré de Montézargues
...,...,...,...,...,...,...,...,...,...,...,...,...,...
58460,France,"Just hinting at maturity, this rich and stylis...",,90,40.0,Bordeaux,Canon-Fronsac,,Roger Voss,@vossroger,Château Canon 2005 Canon-Fronsac,Bordeaux-style Red Blend,Château Canon
58466,France,This wine is crisp and refreshingly fruity. A ...,,85,16.0,Burgundy,Mâcon-Villages,,Roger Voss,@vossroger,Joseph Drouhin 2016 Mâcon-Villages,Chardonnay,Joseph Drouhin
58467,France,The dry character of this wine is emphasized b...,,85,11.0,Alsace,Alsace,,Roger Voss,@vossroger,Cave de Hunawihr 2013 Pinot Gris (Alsace),Pinot Gris,Cave de Hunawihr
58477,France,This wine comes from the stony slopes above th...,Renaissance,93,23.0,Southwest France,Gaillac,,Roger Voss,@vossroger,Domaine Rotier 2015 Renaissance Red (Gaillac),Red Blend,Domaine Rotier


How can we satisfy more than one condition?

We still need `[]` as we are passing a list of labels or conditions.

We can seprate our conditions using the `&`.

We will also need wrap our conditions using `()` due to precedence.

In python `==` have a higher precedence than bitwise operators like `&`.

So we need any `==` to be evaluated first and then combined.

In [136]:
reviews.loc[(reviews.country == "France") & (reviews.points >= 95), ["country", "description"]]

Unnamed: 0,country,description
40,France,"A gorgeously perfumed wine, dominated by the r..."
441,France,This wine has power along with great fruit and...
597,France,"The wood element is important here, but it is ..."
868,France,The wine from this clos or walled vineyard in ...
949,France,Produced from vines mainly planted in the 1970...
...,...,...
57771,France,"96–98. Barrel sample. This powerful, impressiv..."
57948,France,The heady scent of Damask rose hints unmistaka...
58065,France,"This is a full-bodied and ripe wine, showing s..."
58148,France,Toasty aromas are not enough to smother the in...


We can also satisfy 2 conditions within a column using `.isin()` and passing it a list

In [137]:
reviews.loc[reviews.country.isin(["France", "Argentina"])]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
1,France,"A soft, almost off dry wine that is full in th...",Réserve,85,12.0,Rhône Valley,Côtes du Rhône,,Roger Voss,@vossroger,Cellier des Dauphins 2015 Réserve Rosé (Côtes ...,Rosé,Cellier des Dauphins
9,France,"Made from low-yielding, 70-year-old vines, thi...",Clos du Château,92,21.0,Provence,Côtes de Provence,,Roger Voss,@vossroger,Domaine du Clos Gautier 2015 Clos du Château R...,Rosé,Domaine du Clos Gautier
11,France,"This ripe, fruity wine has both freshness and ...",Réserve des Vignerons,86,14.0,Loire Valley,Saumur,,Roger Voss,@vossroger,Cave de Saumur 2014 Réserve des Vignerons (Sa...,Chenin Blanc,Cave de Saumur
12,France,This is a smooth and creamy wine with soft app...,Clos le Vigneau,90,18.0,Loire Valley,Vouvray,,Roger Voss,@vossroger,Château Gaudrelle 2010 Clos le Vigneau (Vouvray),Chenin Blanc,Château Gaudrelle
28,France,"This is pretty pale for a Tavel, with a copper...",,90,24.0,Rhône Valley,Tavel,,Joe Czerwinski,@JoeCz,Prieuré de Montézargues 2014 Tavel,Rosé,Prieuré de Montézargues
...,...,...,...,...,...,...,...,...,...,...,...,...,...
58466,France,This wine is crisp and refreshingly fruity. A ...,,85,16.0,Burgundy,Mâcon-Villages,,Roger Voss,@vossroger,Joseph Drouhin 2016 Mâcon-Villages,Chardonnay,Joseph Drouhin
58467,France,The dry character of this wine is emphasized b...,,85,11.0,Alsace,Alsace,,Roger Voss,@vossroger,Cave de Hunawihr 2013 Pinot Gris (Alsace),Pinot Gris,Cave de Hunawihr
58477,France,This wine comes from the stony slopes above th...,Renaissance,93,23.0,Southwest France,Gaillac,,Roger Voss,@vossroger,Domaine Rotier 2015 Renaissance Red (Gaillac),Red Blend,Domaine Rotier
58478,Argentina,Racy plum and cherry aromas lead to an edgy pa...,Finca Lalande,87,16.0,Mendoza Province,Mendoza,,Michael Schachner,@wineschach,Domaine Bousquet 2016 Finca Lalande Malbec (Me...,Malbec,Domaine Bousquet


Missing values can also be considered as conditions

In [138]:
reviews.loc[reviews.price.isnull()]
reviews.loc[reviews.price.notnull()]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Australia,"Possibly a little sweet, this is a soft, easyg...",,83,5.0,Australia Other,South Eastern Australia,,Joe Czerwinski,@JoeCz,Banrock Station 2006 Chardonnay (South Eastern...,Chardonnay,Banrock Station
1,France,"A soft, almost off dry wine that is full in th...",Réserve,85,12.0,Rhône Valley,Côtes du Rhône,,Roger Voss,@vossroger,Cellier des Dauphins 2015 Réserve Rosé (Côtes ...,Rosé,Cellier des Dauphins
2,Spain,Generic white-fruit aromas of peach and apple ...,Estate Grown & Bottled,86,9.0,Northern Spain,Rueda,,Michael Schachner,@wineschach,Esperanza 2013 Estate Grown & Bottled Verdejo-...,Verdejo-Viura,Esperanza
3,US,This is the winery's best Nebula in years. Whi...,Nebula,87,29.0,California,Paso Robles,Central Coast,,,Midnight 2010 Nebula Cabernet Sauvignon (Paso ...,Cabernet Sauvignon,Midnight
4,US,This is a very rich Pinot whose primary virtue...,Wiley Vineyard,88,40.0,California,Anderson Valley,,,,Harrington 2006 Wiley Vineyard Pinot Noir (And...,Pinot Noir,Harrington
...,...,...,...,...,...,...,...,...,...,...,...,...,...
58482,US,A solid effort from a dependable winery that u...,Winemaker's Reserve,88,35.0,California,Sonoma County,Sonoma,,,Château Souverain 1996 Winemaker's Reserve Cab...,Cabernet Sauvignon,Château Souverain
58483,Greece,"Crushed thyme, pine resin and lemon start this...",Retsina of Attica,86,9.0,Attica,,,Susan Kostrzewa,@suskostrzewa,Kourtaki NV Retsina of Attica Savatiano (Attica),Savatiano,Kourtaki
58484,Italy,"Made from Negroamaro, this opens with aromas o...",,87,15.0,Southern Italy,Salento,,Kerin O’Keefe,@kerinokeefe,Masseria Altemura 2016 Rosato (Salento),Rosato,Masseria Altemura
58485,US,"This big, bold wine has the taste profile of a...",Estate Mae's Block Ravazzi Vineyard,88,32.0,California,Mendocino,,Jim Gordon,@gordone_cellars,Jaxon Keys 2013 Estate Mae's Block Ravazzi Vin...,Zinfandel,Jaxon Keys


#### Missing Data

Detect missing data `np.nan`:

In [139]:
reviews.isna().any()

country                   True
description              False
designation               True
points                   False
price                     True
province                  True
region_1                  True
region_2                  True
taster_name               True
taster_twitter_handle     True
title                    False
variety                  False
winery                   False
dtype: bool

Filling in missing data

In [140]:
reviews.fillna(1).isna().any()

country                  False
description              False
designation              False
points                   False
price                    False
province                 False
region_1                 False
region_2                 False
taster_name              False
taster_twitter_handle    False
title                    False
variety                  False
winery                   False
dtype: bool

Drop missing data

In [141]:

reviews.dropna()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
6,US,"This opens with a pleasing toasty aroma, follo...",Five Faces,90,33.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Fullerton 2014 Five Faces Pinot Noir (Willamet...,Pinot Noir,Fullerton
10,US,"Amidst cola and sassafras on the nose, there's...",White Hawk Vineyard,89,38.0,California,Santa Barbara County,Central Coast,Matt Kettmann,@mattkettmann,Deep Sea 2011 White Hawk Vineyard Syrah (Santa...,Syrah,Deep Sea
19,US,A blend of Ciel du Cheval and Force Majeure vi...,Crazy Mary,93,48.0,Washington,Red Mountain,Columbia Valley,Sean P. Sullivan,@wawinereport,Mark Ryan 2012 Crazy Mary Mourvèdre (Red Mount...,Mourvèdre,Mark Ryan
23,US,This cool climate region is well-suited to thi...,Gorge Crest,90,28.0,Oregon,Columbia Gorge (OR),Oregon Other,Paul Gregutt,@paulgwine,Phelps Creek 2015 Gorge Crest Gewürztraminer (...,Gewürztraminer,Phelps Creek
36,US,This varietal Cabernet Sauvignon hails from th...,Sievers Reserve,93,80.0,California,Napa Valley,Napa,Virginie Boone,@vboone,Volker Eisele Family Estate 2013 Sievers Reser...,Cabernet Sauvignon,Volker Eisele Family Estate
...,...,...,...,...,...,...,...,...,...,...,...,...,...
58432,US,"An elegant debut for this new Oregon winery, t...",Winemaker's Cuvée,92,32.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Elizabeth Chambers 2011 Winemaker's Cuvée Pino...,Pinot Noir,Elizabeth Chambers
58450,US,"Tart and tannic, this has a sleek, almost stee...",Sentience,89,55.0,Oregon,Applegate Valley,Southern Oregon,Paul Gregutt,@paulgwine,Cowhorn 2011 Sentience Syrah (Applegate Valley),Syrah,Cowhorn
58457,US,This smells like sweet red cherries and ripe p...,Estate,86,30.0,California,Livermore Valley,Central Coast,Jim Gordon,@gordone_cellars,Fenestra 2011 Estate Grenache (Livermore Valley),Grenache,Fenestra
58469,US,"This tasty, toasty red wine is all Sangiovese....",Kiona Estate,90,29.0,Washington,Red Mountain,Columbia Valley,Paul Gregutt,@paulgwine,Barrister 2011 Kiona Estate Sangiovese (Red Mo...,Sangiovese,Barrister


In [142]:
reviews.price.fillna(1).isna().any()

np.False_

#### Exercise

Create a "sub"-DataFrame from `reviews` that contains the `country`, `province`, `region_1` and `region_2` columns with index labels `10`, `750` and `1200`.

In [143]:
reviews.loc[[10,750,1200], ["country","province","region1","region2"]]
#reviews.loc[1]

KeyError: "['region1', 'region2'] not in index"

Create a "sub"-DataFrame from `reviews` that contains all reviews with at least 95 points for wines from oceanian countries (Australia and New Zealand).

In [None]:
reviews.loc[(reviews.points >= 95) & (reviews.country.isin(["Autralia","New Zeland"]))]

### Summary Functions

Summary functions allow us to quickly describe and understand a dataset by computing key statistics. Common examples include `.mean()`, `.median()`, `.min()`, `.max()`, and `.sum()` for numerical data, as well as `.value_counts()` for categorical data. These functions can be applied to entire DataFrames or specific columns, giving us insights such as average values, distributions, and totals. Using `.describe()` provides a convenient overview of multiple summary statistics at once.


In [None]:
reviews.describe()

In [None]:
reviews.country.value_counts()

For numerical columns, we can obtain the mean, median, min, max and sum

Useful for obtaining quick statistics

In [None]:
print("the mean of price is:" , reviews.price.mean())

the mean of price is: 35.537221527828834


## All Done!