# Introduction

Selecting specific values of a pandas DataFrame or Series to work on is an implicit step in almost any data operation you'll run, so one of the first things you need to learn in working with data in Python is how to go about selecting the data points relevant to you quickly and effectively.

In [1]:

import pandas as pd
reviews = pd.read_csv("winemag-data-130k-v2.csv", index_col=0)
#pd.set_option('max_rows', 5)

# Native accessors

Native Python objects provide  good ways of indexing data. Pandas carries all of these over, which helps make it easy to start with.

Consider this DataFrame:

In [3]:
reviews

Unnamed: 0_level_0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,dupe?
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
94355,Austria,"""Chremisa,"" the ancient name of Krems, is comm...",Edition Chremisa Sandgrube 13,85,24.0,Niederösterreich,,,Roger Voss,@vossroger,Winzer Krems 2011 Edition Chremisa Sandgrube 1...,Grüner Veltliner,Winzer Krems,
126883,US,$10 for this very drinkable Cab? That's crazy....,,87,10.0,California,North Coast,North Coast,Virginie Boone,@vboone,Line 39 2009 Cabernet Sauvignon (North Coast),Cabernet Sauvignon,Line 39,
119493,US,$14 is a pretty good price for a Chardonnay th...,Whiplash,86,14.0,California,California,California Other,,,Jamieson Ranch 2011 Whiplash Chardonnay (Calif...,Chardonnay,Jamieson Ranch,
126909,Spain,"). Earth, cola and leather aromas are good, ho...",Finca Resalso,86,15.0,Northern Spain,Ribera del Duero,,Michael Schachner,@wineschach,Emilio Moro 2009 Finca Resalso (Ribera del Du...,Tinto Fino,Emilio Moro,
119752,Spain,). Light and lemony on the nose. The palate ha...,,87,17.0,Galicia,Rías Baixas,,Michael Schachner,@wineschach,La Caña 2010 Albariño (Rías Baixas),Albariño,La Caña,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
80210,Italy,Zonchera is Ceretto's more affordable base Bar...,Zonchera,90,48.0,Piedmont,Barolo,,,,Ceretto 2004 Zonchera (Barolo),Nebbiolo,Ceretto,
76487,Italy,Zonin's 2006 Amarone opens with very ripe arom...,,88,70.0,Veneto,Amarone della Valpolicella,,,,Zonin 2006 Amarone della Valpolicella,"Corvina, Rondinella, Molinara",Zonin,
86953,Italy,Zorzettig's precious Picolit dessert wine deli...,,90,,Northeastern Italy,Colli Orientali del Friuli,,,,Zorzettig 2006 Picolit (Colli Orientali del Fr...,Picolit,Zorzettig,
18824,US,Zucca has made a fragrant and floral Sangioves...,Sangiovese Rosato,87,18.0,California,Amador County,Sierra Foothills,Virginie Boone,@vboone,Zucca 2010 Sangiovese Rosato Rosé (Amador County),Rosé,Zucca,


In Python, we can access the property of an object by accessing it as an attribute. A `book` object, for example, might have a `title` property, which we can access by calling `book.title`. Columns in a pandas DataFrame work in much the same way. 

Hence to access the `country` property of `reviews` we can use:

In [5]:
reviews.country

id
94355     Austria
126883         US
119493         US
126909      Spain
119752      Spain
           ...   
80210       Italy
76487       Italy
86953       Italy
18824          US
88999     Austria
Name: country, Length: 119988, dtype: object

If we have a Python dictionary, we can access its values using the indexing (`[]`) operator. We can do the same with columns in a DataFrame:

In [7]:
reviews['country']

id
94355     Austria
126883         US
119493         US
126909      Spain
119752      Spain
           ...   
80210       Italy
76487       Italy
86953       Italy
18824          US
88999     Austria
Name: country, Length: 119988, dtype: object

These are the two ways of selecting a specific Series out of a DataFrame. Neither of them is more or less syntactically valid than the other, but the indexing operator `[]` does have the advantage that it can handle column names with reserved characters in them (e.g. if we had a `country providence` column, `reviews.country providence` wouldn't work).

Doesn't a pandas Series look kind of like a fancy dictionary? It pretty much is, so it's no surprise that, to drill down to a single specific value, we need only use the indexing operator `[]` once more:

In [9]:
reviews['country'][0]

'Italy'

# Indexing in pandas

The indexing operator and attribute selection are nice because they work just like they do in the rest of the Python ecosystem. As a novice, this makes them easy to pick up and use. However, pandas has its own accessor operators, `loc` and `iloc`. For more advanced operations, these are the ones you're supposed to be using.

### Index-based selection

Pandas indexing works in one of two paradigms. The first is **index-based selection**: selecting data based on its numerical position in the data. `iloc` follows this paradigm.

To select the first row of data in a DataFrame, we may use the following:

In [11]:
reviews.iloc[0]

country                                                            Austria
description              "Chremisa," the ancient name of Krems, is comm...
designation                                  Edition Chremisa Sandgrube 13
points                                                                  85
price                                                                 24.0
province                                                  Niederösterreich
region_1                                                               NaN
region_2                                                               NaN
taster_name                                                     Roger Voss
taster_twitter_handle                                           @vossroger
title                    Winzer Krems 2011 Edition Chremisa Sandgrube 1...
variety                                                   Grüner Veltliner
winery                                                        Winzer Krems
dupe?                    

Both `loc` and `iloc` are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.

This means that it's marginally easier to retrieve rows, and marginally harder to get retrieve columns. To get a column with `iloc`, we can do the following:

In [13]:
reviews.iloc[:, 0]

id
94355     Austria
126883         US
119493         US
126909      Spain
119752      Spain
           ...   
80210       Italy
76487       Italy
86953       Italy
18824          US
88999     Austria
Name: country, Length: 119988, dtype: object

On its own, the `:` operator, which also comes from native Python, means "everything". When combined with other selectors, however, it can be used to indicate a range of values. For example, to select the `country` column from just the first, second, and third row, we would do:

In [15]:
reviews.iloc[:3, 0]

id
94355     Austria
126883         US
119493         US
Name: country, dtype: object

Or, to select just the second and third entries, we would do:

In [17]:
reviews.iloc[1:3, 0]

id
126883    US
119493    US
Name: country, dtype: object

It's also possible to pass a list:

In [19]:
reviews.iloc[[0, 1, 2], 0]

id
94355     Austria
126883         US
119493         US
Name: country, dtype: object

Finally, it's worth knowing that negative numbers can be used in selection. This will start counting forwards from the _end_ of the values. So for example here are the last five elements of the dataset.

In [21]:
reviews.iloc[-5:]

Unnamed: 0_level_0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,dupe?
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
80210,Italy,Zonchera is Ceretto's more affordable base Bar...,Zonchera,90,48.0,Piedmont,Barolo,,,,Ceretto 2004 Zonchera (Barolo),Nebbiolo,Ceretto,
76487,Italy,Zonin's 2006 Amarone opens with very ripe arom...,,88,70.0,Veneto,Amarone della Valpolicella,,,,Zonin 2006 Amarone della Valpolicella,"Corvina, Rondinella, Molinara",Zonin,
86953,Italy,Zorzettig's precious Picolit dessert wine deli...,,90,,Northeastern Italy,Colli Orientali del Friuli,,,,Zorzettig 2006 Picolit (Colli Orientali del Fr...,Picolit,Zorzettig,
18824,US,Zucca has made a fragrant and floral Sangioves...,Sangiovese Rosato,87,18.0,California,Amador County,Sierra Foothills,Virginie Boone,@vboone,Zucca 2010 Sangiovese Rosato Rosé (Amador County),Rosé,Zucca,
88999,Austria,Zweigelt can do easy-drinking styles but in th...,Heideboden,90,26.0,Burgenland,,,Anne Krebiehl MW,@AnneInVino,Nittnaus Hans und Christine 2013 Heideboden Zw...,Zweigelt,Nittnaus Hans und Christine,


### Label-based selection

The second paradigm for attribute selection is the one followed by the `loc` operator: **label-based selection**. In this paradigm, it's the data index value, not its position, which matters.

For example, to get the first entry in `reviews`, we would now do the following:

In [23]:
reviews.loc[0, 'country']

'Italy'

`iloc` is conceptually simpler than `loc` because it ignores the dataset's indices. When we use `iloc` we treat the dataset like a big matrix (a list of lists), one that we have to index into by position. `loc`, by contrast, uses the information in the indices to do its work. Since your dataset usually has meaningful indices, it's usually easier to do things using `loc` instead. For example, here's one operation that's much easier using `loc`:

In [25]:
reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]

Unnamed: 0_level_0,taster_name,taster_twitter_handle,points
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
94355,Roger Voss,@vossroger,85
126883,Virginie Boone,@vboone,87
119493,,,86
126909,Michael Schachner,@wineschach,86
119752,Michael Schachner,@wineschach,87
...,...,...,...
80210,,,90
76487,,,88
86953,,,90
18824,Virginie Boone,@vboone,87


### Choosing between `loc` and `iloc`

When choosing or transitioning between `loc` and `iloc`, there is one "gotcha" worth keeping in mind, which is that the two methods use slightly different indexing schemes.

`iloc` uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So `0:10` will select entries `0,...,9`. `loc`, meanwhile, indexes inclusively. So `0:10` will select entries `0,...,10`.

Why the change? Remember that loc can index any stdlib type: strings, for example. If we have a DataFrame with index values `Apples, ..., Potatoes, ...`, and we want to select "all the alphabetical fruit choices between Apples and Potatoes", then it's a lot more convenient to index `df.loc['Apples':'Potatoes']` than it is to index something like `df.loc['Apples', 'Potatoet']` (`t` coming after `s` in the alphabet).

This is particularly confusing when the DataFrame index is a simple numerical list, e.g. `0,...,1000`. In this case `df.iloc[0:1000]` will return 1000 entries, while `df.loc[0:1000]` return 1001 of them! To get 1000 elements using `loc`, you will need to go one lower and ask for `df.loc[0:999]`. 

Otherwise, the semantics of using `loc` are the same as those for `iloc`.

# Manipulating the index

Label-based selection derives its power from the labels in the index. Critically, the index we use is not immutable. We can manipulate the index in any way we see fit.

The `set_index()` method can be used to do the job. Here is what happens when we `set_index` to the `title` field:

In [27]:
reviews.set_index("title")

Unnamed: 0_level_0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,variety,winery,dupe?
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Winzer Krems 2011 Edition Chremisa Sandgrube 13 Grüner Veltliner (Niederösterreich),Austria,"""Chremisa,"" the ancient name of Krems, is comm...",Edition Chremisa Sandgrube 13,85,24.0,Niederösterreich,,,Roger Voss,@vossroger,Grüner Veltliner,Winzer Krems,
Line 39 2009 Cabernet Sauvignon (North Coast),US,$10 for this very drinkable Cab? That's crazy....,,87,10.0,California,North Coast,North Coast,Virginie Boone,@vboone,Cabernet Sauvignon,Line 39,
Jamieson Ranch 2011 Whiplash Chardonnay (California),US,$14 is a pretty good price for a Chardonnay th...,Whiplash,86,14.0,California,California,California Other,,,Chardonnay,Jamieson Ranch,
Emilio Moro 2009 Finca Resalso (Ribera del Duero),Spain,"). Earth, cola and leather aromas are good, ho...",Finca Resalso,86,15.0,Northern Spain,Ribera del Duero,,Michael Schachner,@wineschach,Tinto Fino,Emilio Moro,
La Caña 2010 Albariño (Rías Baixas),Spain,). Light and lemony on the nose. The palate ha...,,87,17.0,Galicia,Rías Baixas,,Michael Schachner,@wineschach,Albariño,La Caña,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Ceretto 2004 Zonchera (Barolo),Italy,Zonchera is Ceretto's more affordable base Bar...,Zonchera,90,48.0,Piedmont,Barolo,,,,Nebbiolo,Ceretto,
Zonin 2006 Amarone della Valpolicella,Italy,Zonin's 2006 Amarone opens with very ripe arom...,,88,70.0,Veneto,Amarone della Valpolicella,,,,"Corvina, Rondinella, Molinara",Zonin,
Zorzettig 2006 Picolit (Colli Orientali del Friuli),Italy,Zorzettig's precious Picolit dessert wine deli...,,90,,Northeastern Italy,Colli Orientali del Friuli,,,,Picolit,Zorzettig,
Zucca 2010 Sangiovese Rosato Rosé (Amador County),US,Zucca has made a fragrant and floral Sangioves...,Sangiovese Rosato,87,18.0,California,Amador County,Sierra Foothills,Virginie Boone,@vboone,Rosé,Zucca,


This is useful if you can come up with an index for the dataset which is better than the current one.

# Conditional selection

So far we've been indexing various strides of data, using structural properties of the DataFrame itself. To do *interesting* things with the data, however, we often need to ask questions based on conditions. 

For example, suppose that we're interested specifically in better-than-average wines produced in Italy.

We can start by checking if each wine is Italian or not:

In [29]:
reviews.country == 'Italy'

id
94355     False
126883    False
119493    False
126909    False
119752    False
          ...  
80210      True
76487      True
86953      True
18824     False
88999     False
Name: country, Length: 119988, dtype: bool

This operation produced a Series of `True`/`False` booleans based on the `country` of each record.  This result can then be used inside of `loc` to select the relevant data:

In [31]:
reviews[reviews.country == 'Italy']

Unnamed: 0_level_0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,dupe?
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
115413,Italy,). Rubiolo is a soft and lush Chianti Classico...,Rubiolo,88,23.0,Tuscany,Chianti Classico,,,,Gagliole 2009 Rubiolo (Chianti Classico),Sangiovese,Gagliole,
23362,Italy,". There's a compact, traditional characteristi...",Vigneto Bellavista,90,169.0,Tuscany,Chianti Classico,,,,Castello di Ama 2006 Vigneto Bellavista (Chia...,Sangiovese,Castello di Ama,
76094,Italy,. This rich and opulent blend of Cabernet Sauv...,Volpolo,92,50.0,Tuscany,Bolgheri,,,,Podere Sapaio 2008 Volpolo (Bolgheri),Red Blend,Podere Sapaio,
83416,Italy,"100% Cabernet Sauvignon, this opens with aroma...",Basilica del Cortaccio,91,40.0,Tuscany,Toscana,,Kerin O’Keefe,@kerinokeefe,Cafaggio 2010 Basilica del Cortaccio Cabernet ...,Cabernet Sauvignon,Cafaggio,
4515,Italy,95 Massolino 2012 Parafada (Barolo). A textb...,Parafada,95,96.0,Piedmont,Barolo,,Kerin O’Keefe,@kerinokeefe,Massolino 2012 Parafada (Barolo),Nebbiolo,Massolino,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1349,Italy,Zibibbo is usually presented as a passito dess...,,86,35.0,Sicily & Sardinia,Sicilia,,,,Barraco 2010 Zibibbo (Sicilia),Zibibbo,Barraco,
70889,Italy,Zisola is a Nero d'Avola that is aged 10 month...,,90,23.0,Sicily & Sardinia,Sicilia,,,,Zisola 2006 Nero d'Avola (Sicilia),Nero d'Avola,Zisola,
80210,Italy,Zonchera is Ceretto's more affordable base Bar...,Zonchera,90,48.0,Piedmont,Barolo,,,,Ceretto 2004 Zonchera (Barolo),Nebbiolo,Ceretto,
76487,Italy,Zonin's 2006 Amarone opens with very ripe arom...,,88,70.0,Veneto,Amarone della Valpolicella,,,,Zonin 2006 Amarone della Valpolicella,"Corvina, Rondinella, Molinara",Zonin,


This DataFrame has ~20,000 rows. The original had ~130,000. That means that around 15% of wines originate from Italy.

We also wanted to know which ones are better than average. Wines are reviewed on a 80-to-100 point scale, so this could mean wines that accrued at least 90 points.

We can use the ampersand (`&`) to bring the two questions together:

In [33]:
reviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)]

Unnamed: 0_level_0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,dupe?
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
23362,Italy,". There's a compact, traditional characteristi...",Vigneto Bellavista,90,169.0,Tuscany,Chianti Classico,,,,Castello di Ama 2006 Vigneto Bellavista (Chia...,Sangiovese,Castello di Ama,
76094,Italy,. This rich and opulent blend of Cabernet Sauv...,Volpolo,92,50.0,Tuscany,Bolgheri,,,,Podere Sapaio 2008 Volpolo (Bolgheri),Red Blend,Podere Sapaio,
83416,Italy,"100% Cabernet Sauvignon, this opens with aroma...",Basilica del Cortaccio,91,40.0,Tuscany,Toscana,,Kerin O’Keefe,@kerinokeefe,Cafaggio 2010 Basilica del Cortaccio Cabernet ...,Cabernet Sauvignon,Cafaggio,
4515,Italy,95 Massolino 2012 Parafada (Barolo). A textb...,Parafada,95,96.0,Piedmont,Barolo,,Kerin O’Keefe,@kerinokeefe,Massolino 2012 Parafada (Barolo),Nebbiolo,Massolino,
81278,Italy,"A 25th anniversary special selection, this bea...",Selezione XXV Anno,93,60.0,Tuscany,Brunello di Montalcino,,,,Castello Romitorio 2006 Selezione XXV Anno (B...,Sangiovese Grosso,Castello Romitorio,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
54397,Italy,Zenato is a reliable producer of quality Amaro...,,92,30.0,Veneto,Amarone della Valpolicella Classico,,,,Zenato 2008 Amarone della Valpolicella Classico,"Corvina, Rondinella, Molinara",Zenato,
59349,Italy,Zenato's rich and powerful Recioto opens with ...,500 ml,90,,Veneto,Recioto della Valpolicella Classico,,,,Zenato 2006 500 ml (Recioto della Valpolicell...,"Corvina, Rondinella, Molinara",Zenato,
70889,Italy,Zisola is a Nero d'Avola that is aged 10 month...,,90,23.0,Sicily & Sardinia,Sicilia,,,,Zisola 2006 Nero d'Avola (Sicilia),Nero d'Avola,Zisola,
80210,Italy,Zonchera is Ceretto's more affordable base Bar...,Zonchera,90,48.0,Piedmont,Barolo,,,,Ceretto 2004 Zonchera (Barolo),Nebbiolo,Ceretto,


Suppose we'll buy any wine that's made in Italy _or_ which is rated above average. For this we use a pipe (`|`):

In [35]:
lista = [1,20,30]
reviews.loc[(reviews.country == 'Italy') | (reviews.points >= min(lista))]

Unnamed: 0_level_0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,dupe?
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
94355,Austria,"""Chremisa,"" the ancient name of Krems, is comm...",Edition Chremisa Sandgrube 13,85,24.0,Niederösterreich,,,Roger Voss,@vossroger,Winzer Krems 2011 Edition Chremisa Sandgrube 1...,Grüner Veltliner,Winzer Krems,
126883,US,$10 for this very drinkable Cab? That's crazy....,,87,10.0,California,North Coast,North Coast,Virginie Boone,@vboone,Line 39 2009 Cabernet Sauvignon (North Coast),Cabernet Sauvignon,Line 39,
119493,US,$14 is a pretty good price for a Chardonnay th...,Whiplash,86,14.0,California,California,California Other,,,Jamieson Ranch 2011 Whiplash Chardonnay (Calif...,Chardonnay,Jamieson Ranch,
126909,Spain,"). Earth, cola and leather aromas are good, ho...",Finca Resalso,86,15.0,Northern Spain,Ribera del Duero,,Michael Schachner,@wineschach,Emilio Moro 2009 Finca Resalso (Ribera del Du...,Tinto Fino,Emilio Moro,
119752,Spain,). Light and lemony on the nose. The palate ha...,,87,17.0,Galicia,Rías Baixas,,Michael Schachner,@wineschach,La Caña 2010 Albariño (Rías Baixas),Albariño,La Caña,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
80210,Italy,Zonchera is Ceretto's more affordable base Bar...,Zonchera,90,48.0,Piedmont,Barolo,,,,Ceretto 2004 Zonchera (Barolo),Nebbiolo,Ceretto,
76487,Italy,Zonin's 2006 Amarone opens with very ripe arom...,,88,70.0,Veneto,Amarone della Valpolicella,,,,Zonin 2006 Amarone della Valpolicella,"Corvina, Rondinella, Molinara",Zonin,
86953,Italy,Zorzettig's precious Picolit dessert wine deli...,,90,,Northeastern Italy,Colli Orientali del Friuli,,,,Zorzettig 2006 Picolit (Colli Orientali del Fr...,Picolit,Zorzettig,
18824,US,Zucca has made a fragrant and floral Sangioves...,Sangiovese Rosato,87,18.0,California,Amador County,Sierra Foothills,Virginie Boone,@vboone,Zucca 2010 Sangiovese Rosato Rosé (Amador County),Rosé,Zucca,


Pandas comes with a few built-in conditional selectors, two of which we will highlight here. 

The first is `isin`. `isin` is lets you select data whose value "is in" a list of values. For example, here's how we can use it to select wines only from Italy or France:

In [37]:
reviews.loc[reviews.country.isin(['Italy', 'France'])]

Unnamed: 0_level_0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,dupe?
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
115413,Italy,). Rubiolo is a soft and lush Chianti Classico...,Rubiolo,88,23.0,Tuscany,Chianti Classico,,,,Gagliole 2009 Rubiolo (Chianti Classico),Sangiovese,Gagliole,
101344,France,. Recent vintages have left Bordeaux with an a...,,87,17.0,Bordeaux,Bordeaux Supérieur,,Roger Voss,@vossroger,Château Sainte-Barbe 2010 Bordeaux Supérieur,Bordeaux-style Red Blend,Château Sainte-Barbe,
23362,Italy,". There's a compact, traditional characteristi...",Vigneto Bellavista,90,169.0,Tuscany,Chianti Classico,,,,Castello di Ama 2006 Vigneto Bellavista (Chia...,Sangiovese,Castello di Ama,
76094,Italy,. This rich and opulent blend of Cabernet Sauv...,Volpolo,92,50.0,Tuscany,Bolgheri,,,,Podere Sapaio 2008 Volpolo (Bolgheri),Red Blend,Podere Sapaio,
83416,Italy,"100% Cabernet Sauvignon, this opens with aroma...",Basilica del Cortaccio,91,40.0,Tuscany,Toscana,,Kerin O’Keefe,@kerinokeefe,Cafaggio 2010 Basilica del Cortaccio Cabernet ...,Cabernet Sauvignon,Cafaggio,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98934,France,Zind-Humbrecht owns some great Grand Cru viney...,Clos Saint Urbain Rangen de Thann Grand Cru,94,105.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Zind-Humbrecht 2012 Clos Saint Urbain ...,Pinot Gris,Domaine Zind-Humbrecht,
70889,Italy,Zisola is a Nero d'Avola that is aged 10 month...,,90,23.0,Sicily & Sardinia,Sicilia,,,,Zisola 2006 Nero d'Avola (Sicilia),Nero d'Avola,Zisola,
80210,Italy,Zonchera is Ceretto's more affordable base Bar...,Zonchera,90,48.0,Piedmont,Barolo,,,,Ceretto 2004 Zonchera (Barolo),Nebbiolo,Ceretto,
76487,Italy,Zonin's 2006 Amarone opens with very ripe arom...,,88,70.0,Veneto,Amarone della Valpolicella,,,,Zonin 2006 Amarone della Valpolicella,"Corvina, Rondinella, Molinara",Zonin,


The second is `isnull` (and its companion `notnull`). These methods let you highlight values which are (or are not) empty (`NaN`). For example, to filter out wines lacking a price tag in the dataset, here's what we would do:

In [39]:
reviews.loc[reviews.price.notnull()]

Unnamed: 0_level_0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,dupe?
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
94355,Austria,"""Chremisa,"" the ancient name of Krems, is comm...",Edition Chremisa Sandgrube 13,85,24.0,Niederösterreich,,,Roger Voss,@vossroger,Winzer Krems 2011 Edition Chremisa Sandgrube 1...,Grüner Veltliner,Winzer Krems,
126883,US,$10 for this very drinkable Cab? That's crazy....,,87,10.0,California,North Coast,North Coast,Virginie Boone,@vboone,Line 39 2009 Cabernet Sauvignon (North Coast),Cabernet Sauvignon,Line 39,
119493,US,$14 is a pretty good price for a Chardonnay th...,Whiplash,86,14.0,California,California,California Other,,,Jamieson Ranch 2011 Whiplash Chardonnay (Calif...,Chardonnay,Jamieson Ranch,
126909,Spain,"). Earth, cola and leather aromas are good, ho...",Finca Resalso,86,15.0,Northern Spain,Ribera del Duero,,Michael Schachner,@wineschach,Emilio Moro 2009 Finca Resalso (Ribera del Du...,Tinto Fino,Emilio Moro,
119752,Spain,). Light and lemony on the nose. The palate ha...,,87,17.0,Galicia,Rías Baixas,,Michael Schachner,@wineschach,La Caña 2010 Albariño (Rías Baixas),Albariño,La Caña,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129643,US,Zocker flies under the radar with this Austria...,Paragon Vineyard,90,20.0,California,Edna Valley,Central Coast,,,Zocker 2012 Paragon Vineyard Grüner Veltliner ...,Grüner Veltliner,Zocker,
80210,Italy,Zonchera is Ceretto's more affordable base Bar...,Zonchera,90,48.0,Piedmont,Barolo,,,,Ceretto 2004 Zonchera (Barolo),Nebbiolo,Ceretto,
76487,Italy,Zonin's 2006 Amarone opens with very ripe arom...,,88,70.0,Veneto,Amarone della Valpolicella,,,,Zonin 2006 Amarone della Valpolicella,"Corvina, Rondinella, Molinara",Zonin,
18824,US,Zucca has made a fragrant and floral Sangioves...,Sangiovese Rosato,87,18.0,California,Amador County,Sierra Foothills,Virginie Boone,@vboone,Zucca 2010 Sangiovese Rosato Rosé (Amador County),Rosé,Zucca,


# Assigning data

Going the other way, assigning data to a DataFrame is easy. You can assign either a constant value:

In [45]:
a = {"age":20}
a['phone'] = 100

In [47]:
reviews['critic'] = 'everyone'
reviews['critic']

id
94355     everyone
126883    everyone
119493    everyone
126909    everyone
119752    everyone
            ...   
80210     everyone
76487     everyone
86953     everyone
18824     everyone
88999     everyone
Name: critic, Length: 119988, dtype: object

Or with an iterable of values:

In [49]:
reviews['index_backwards'] = range(len(reviews), 0, -1)
reviews['index_backwards']

id
94355     119988
126883    119987
119493    119986
126909    119985
119752    119984
           ...  
80210          5
76487          4
86953          3
18824          2
88999          1
Name: index_backwards, Length: 119988, dtype: int64

# Your turn
