**[Pandas Course Home Page](https://www.kaggle.com/learn/pandas)**

---


# Introduction

This is the workbook component of the "Indexing, selecting, assigning" section. For the reference component, [**click here**](https://www.kaggle.com/residentmario/indexing-selecting-assigning-reference).

Selecting specific values of a pandas `DataFrame` or `Series` to work on is an implicit step in almost any data operation you'll run, so one of the first things you need to learn in working with data in Python is how to go about selecting the data points relevant to you quickly and effectively.

In this set of exercises we will work on exploring the [Wine Reviews dataset](https://www.kaggle.com/zynicide/wine-reviews). 

# Relevant Resources
* **[Quickstart to indexing and selecting data](https://www.kaggle.com/residentmario/indexing-and-selecting-data/)** 
* [Indexing and Selecting Data](https://pandas.pydata.org/pandas-docs/stable/indexing.html) section of pandas documentation
* [Pandas Cheat Sheet](https://assets.datacamp.com/blog_assets/PandasPythonForDataScience.pdf)




# Set Up

Run the following cell to load your data and some utility functions (including code to check your answers).

In [6]:
import pandas as pd

reviews = pd.read_csv("../Data Visualisation/winemag-data_first150k.csv", index_col=0)
pd.set_option("display.max_rows", 5)

Look at an overview of your data by running the following line

In [7]:
reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
2,US,Mac Watson honors the memory of a wine once ma...,Special Selected Late Harvest,96,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley
3,US,"This spent 20 months in 30% new French oak, an...",Reserve,96,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi
4,France,"This is the top wine from La Bégude, named aft...",La Brûlade,95,66.0,Provence,Bandol,,Provence red blend,Domaine de la Bégude


# Exercises

## 1.

Select the `description` column from `reviews` and assign the result to the variable `desc`.

In [10]:
# Your code here

desc = reviews.description
desc

0         This tremendous 100% varietal wine hails from ...
1         Ripe aromas of fig, blackberry and cassis are ...
                                ...                        
150928    A perfect salmon shade, with scents of peaches...
150929    More Pinot Grigios should taste like this. A r...
Name: description, Length: 150930, dtype: object

Follow-up question: what type of object is `desc`? If you're not sure, you can check by calling Python's `type` function: `type(desc)`.

In [11]:
#q1.hint()
#q1.solution()

type(desc)

pandas.core.series.Series

## 2.

Select the first value from the description column of `reviews`, assigning it to variable `first_description`.

In [12]:
first_description = reviews['description'][0]

first_description

'This tremendous 100% varietal wine hails from Oakville and was aged over three years in oak. Juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. Balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. Enjoy 2022–2030.'

## 3. 

Select the first row of data (the first record) from `reviews`, assigning it to the variable `first_row`.

In [13]:
first_row = reviews.iloc[0]

first_row

country                                                       US
description    This tremendous 100% varietal wine hails from ...
                                     ...                        
variety                                       Cabernet Sauvignon
winery                                                     Heitz
Name: 0, Length: 10, dtype: object

## 4.

Select the first 10 values from the `description` column in `reviews`, assigning the result to variable `first_descriptions`.

Hint: format your output as a `pandas` `Series`.

In [14]:
first_descriptions = reviews.description.iloc[0:10]

first_descriptions

0    This tremendous 100% varietal wine hails from ...
1    Ripe aromas of fig, blackberry and cassis are ...
                           ...                        
8    This re-named vineyard was formerly bottled as...
9    The producer sources from two blocks of the vi...
Name: description, Length: 10, dtype: object

## 5.

Select the records with index labels `1`, `2`, `3`, `5`, and `8`, assigning the result to the variable `sample_reviews`.

In other words, generate the following DataFrame:

![](https://i.imgur.com/sHZvI1O.png)

In [15]:
sample_reviews = reviews.loc[[1,2,3,5,8]]

sample_reviews

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
2,US,Mac Watson honors the memory of a wine once ma...,Special Selected Late Harvest,96,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley
3,US,"This spent 20 months in 30% new French oak, an...",Reserve,96,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi
5,Spain,"Deep, dense and pure from the opening bell, th...",Numanthia,95,73.0,Northern Spain,Toro,,Tinta de Toro,Numanthia
8,US,This re-named vineyard was formerly bottled as...,Silice,95,65.0,Oregon,Chehalem Mountains,Willamette Valley,Pinot Noir,Bergström


## 6.

Create a variable `df` containing the `country`, `province`, `region_1`, and `region_2` columns of the records with the index labels `0`, `1`, `10`, and `100`. In other words, generate the following `DataFrame`:

![](https://i.imgur.com/FUCGiKP.png)

In [16]:
df = reviews.loc[[0,1,10,100], ['country', 'province', 'region_1', 'region_2']]

df

Unnamed: 0,country,province,region_1,region_2
0,US,California,Napa Valley,Napa
1,Spain,Northern Spain,Toro,
10,Italy,Northeastern Italy,Collio,
100,US,California,South Coast,South Coast


## 7.

Create a variable `df` containing the `country` and `variety` columns of the first 100 records. 

Hint: you may use `loc` or `iloc`. When working on the answer this question and the several of the ones that follow, keep the following "gotcha" described in the [reference](https://www.kaggle.com/residentmario/indexing-selecting-assigning-reference) for this tutorial section:

> `iloc` uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So `0:10` will select entries `0,...,9`. `loc`, meanwhile, indexes inclusively. So `0:10` will select entries `0,...,10`.

> [...]

> ...[consider] when the DataFrame index is a simple numerical list, e.g. `0,...,1000`. In this case `reviews.iloc[0:1000]` will return 1000 entries, while `reviews.loc[0:1000]` return 1001 of them! To get 1000 elements using `iloc`, you will need to go one higher and ask for `reviews.iloc[0:1001]`.

In [17]:
# 슬라이싱을 할 때는 리스트 안에 담으면 안된다. [:99] (X)

df = reviews.loc[:99, ['country', 'variety']]

df

Unnamed: 0,country,variety
0,US,Cabernet Sauvignon
1,Spain,Tinta de Toro
...,...,...
98,France,Merlot-Malbec
99,France,Ugni Blanc-Colombard


## 8.

Create a DataFrame `italian_wines` containing reviews of wines made in `Italy`. Hint: `reviews.country` equals what?

In [18]:
italian_wines = reviews[reviews.country == 'Italy']

italian_wines

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
10,Italy,"Elegance, complexity and structure come togeth...",Ronco della Chiesa,95,80.0,Northeastern Italy,Collio,,Friulano,Borgo del Tiglio
32,Italy,"Underbrush, scorched earth, menthol and plum s...",Vigna Piaggia,90,,Tuscany,Brunello di Montalcino,,Sangiovese,Abbadia Ardenga
...,...,...,...,...,...,...,...,...,...,...
150927,Italy,This classic example comes from a cru vineyard...,Terre di Dora,91,20.0,Southern Italy,Fiano di Avellino,,White Blend,Terredora
150929,Italy,More Pinot Grigios should taste like this. A r...,,90,15.0,Northeastern Italy,Alto Adige,,Pinot Grigio,Alois Lageder


## 9.

Create a DataFrame `top_oceania_wines` containing all reviews with at least 95 points (out of 100) for wines from Australia or New Zealand.

In [20]:
# top_oceania_wines = reviews[(reviews.country == 'Australia') | (reviews.country == 'New Zealand')][reviews.points > 94]
top_oceania_wines = reviews.loc[
    (reviews.country.isin(['Australia', 'New Zealand']))
    & (reviews.points >= 95)
]

top_oceania_wines

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
2148,Australia,Full-bodied and plush yet vibrant and imbued w...,The Factor,98,125.0,South Australia,Barossa Valley,,Shiraz,Torbreck
2458,Australia,This is a top example of the classic Australia...,The Peake,96,150.0,South Australia,McLaren Vale,,Cabernet-Shiraz,Hickinbotham
...,...,...,...,...,...,...,...,...,...,...
150562,Australia,"As unevolved as they are, the dense and multil...",Grange,96,185.0,South Australia,South Australia,,Shiraz,Penfolds
150563,Australia,"Seamless luxury from stem to stern, this ‘baby...",RWT,95,70.0,South Australia,Barossa Valley,,Shiraz,Penfolds


## Keep going

Move on to the **[Summary functions and maps workbook](https://www.kaggle.com/kernels/fork/595524)**.

---
**[Pandas Course Home Page](https://www.kaggle.com/learn/pandas)**

