# Naive accessors
Naive Python objects provide good ways of indexing data.

In [2]:
import pandas as pd
wine_reviews = pd.read_csv("E:/winemag-data_first150k.csv", index_col=0)
wine_reviews

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
2,US,Mac Watson honors the memory of a wine once ma...,Special Selected Late Harvest,96,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley
3,US,"This spent 20 months in 30% new French oak, an...",Reserve,96,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi
4,France,"This is the top wine from La Bégude, named aft...",La Brûlade,95,66.0,Provence,Bandol,,Provence red blend,Domaine de la Bégude
...,...,...,...,...,...,...,...,...,...,...
150925,Italy,Many people feel Fiano represents southern Ita...,,91,20.0,Southern Italy,Fiano di Avellino,,White Blend,Feudi di San Gregorio
150926,France,"Offers an intriguing nose with ginger, lime an...",Cuvée Prestige,91,27.0,Champagne,Champagne,,Champagne Blend,H.Germain
150927,Italy,This classic example comes from a cru vineyard...,Terre di Dora,91,20.0,Southern Italy,Fiano di Avellino,,White Blend,Terredora
150928,France,"A perfect salmon shade, with scents of peaches...",Grand Brut Rosé,90,52.0,Champagne,Champagne,,Champagne Blend,Gosset


In [3]:
# To access a particular column, we call by its reference like following
wine_reviews.country

0             US
1          Spain
2             US
3             US
4         France
           ...  
150925     Italy
150926    France
150927     Italy
150928    France
150929     Italy
Name: country, Length: 150930, dtype: object

In [4]:
# or else we can access the entries by dictionary method
wine_reviews['country']

0             US
1          Spain
2             US
3             US
4         France
           ...  
150925     Italy
150926    France
150927     Italy
150928    France
150929     Italy
Name: country, Length: 150930, dtype: object

In [5]:
wine_reviews['country'][0]

'US'

# Indexing in Pandas
Pandas has its own accessor operators: loc, iloc.
## Index-based selection
Pandas indexing works in one of two paradigms. The first is
### index based selction:
Selecting data based on its numerical position in the data, iloc follows the paradigm.

In [6]:
wine_reviews.iloc[0]

country                                                       US
description    This tremendous 100% varietal wine hails from ...
designation                                    Martha's Vineyard
points                                                        96
price                                                        235
province                                              California
region_1                                             Napa Valley
region_2                                                    Napa
variety                                       Cabernet Sauvignon
winery                                                     Heitz
Name: 0, dtype: object

Both loc and iloc are row-first, column second. This is the oppposite of what is done in native python i.e column-first, row-second. Which means its marginally easier to retrieve rows and marginally harder to get retrieve columns. To get column with iloc:

In [7]:
wine_reviews.iloc[:, 0]

0             US
1          Spain
2             US
3             US
4         France
           ...  
150925     Italy
150926    France
150927     Italy
150928    France
150929     Italy
Name: country, Length: 150930, dtype: object

In [8]:
# : is used to get every value in the list
# it can also be used to get the values in a specific range
wine_reviews.iloc[:3, 0]

0       US
1    Spain
2       US
Name: country, dtype: object

In [9]:
# or to select 1 to 3 rows
wine_reviews.iloc[1:3, 0]

1    Spain
2       US
Name: country, dtype: object

In [10]:
# This is also possible with
wine_reviews.iloc[[0, 1, 2], 0]

0       US
1    Spain
2       US
Name: country, dtype: object

In [12]:
# Using negative numbers enables us to access the values from the last index
wine_reviews.iloc[-5:]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
150925,Italy,Many people feel Fiano represents southern Ita...,,91,20.0,Southern Italy,Fiano di Avellino,,White Blend,Feudi di San Gregorio
150926,France,"Offers an intriguing nose with ginger, lime an...",Cuvée Prestige,91,27.0,Champagne,Champagne,,Champagne Blend,H.Germain
150927,Italy,This classic example comes from a cru vineyard...,Terre di Dora,91,20.0,Southern Italy,Fiano di Avellino,,White Blend,Terredora
150928,France,"A perfect salmon shade, with scents of peaches...",Grand Brut Rosé,90,52.0,Champagne,Champagne,,Champagne Blend,Gosset
150929,Italy,More Pinot Grigios should taste like this. A r...,,90,15.0,Northeastern Italy,Alto Adige,,Pinot Grigio,Alois Lageder


## Label-based selection
In this paradigm, its the data index value, not its position.

In [13]:
wine_reviews.loc[0, 'country']

'US'

iloc is conceptually simpler than loc because it ignores the dataset's indicies. When we use iloc, we treat the dataset like a big matrix(a list of lists), one that we have to index into by position. loc, by contrast, uses the information in the indices to do this work.

In [14]:
wine_reviews.loc[:, ['designation', 'points', 'price']]

Unnamed: 0,designation,points,price
0,Martha's Vineyard,96,235.0
1,Carodorum Selección Especial Reserva,96,110.0
2,Special Selected Late Harvest,96,90.0
3,Reserve,96,65.0
4,La Brûlade,95,66.0
...,...,...,...
150925,,91,20.0
150926,Cuvée Prestige,91,27.0
150927,Terre di Dora,91,20.0
150928,Grand Brut Rosé,90,52.0


### choosing between loc and iloc
- iloc uses the Python stdlib indexing scheme, where the first element of the range is included and the last one is excluded. So 0:10 will select entries 0,...,9.
- loc, meanwhile, indexes inclusively. So 0:10 will select entries 0,....,10.

# Manipulating the index
Label-based selection derives its power from the labels in the index. Critically, the index we use is not immutable. We can manipulate.
- the set_index() method can be used to do the job .

In [15]:
wine_reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
2,US,Mac Watson honors the memory of a wine once ma...,Special Selected Late Harvest,96,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley
3,US,"This spent 20 months in 30% new French oak, an...",Reserve,96,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi
4,France,"This is the top wine from La Bégude, named aft...",La Brûlade,95,66.0,Provence,Bandol,,Provence red blend,Domaine de la Bégude


In [22]:
wine_reviews.set_index("price")

Unnamed: 0_level_0,country,description,designation,points,province,region_1,region_2,variety,winery
price,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
235.0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
110.0,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
90.0,US,Mac Watson honors the memory of a wine once ma...,Special Selected Late Harvest,96,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley
65.0,US,"This spent 20 months in 30% new French oak, an...",Reserve,96,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi
66.0,France,"This is the top wine from La Bégude, named aft...",La Brûlade,95,Provence,Bandol,,Provence red blend,Domaine de la Bégude
...,...,...,...,...,...,...,...,...,...
20.0,Italy,Many people feel Fiano represents southern Ita...,,91,Southern Italy,Fiano di Avellino,,White Blend,Feudi di San Gregorio
27.0,France,"Offers an intriguing nose with ginger, lime an...",Cuvée Prestige,91,Champagne,Champagne,,Champagne Blend,H.Germain
20.0,Italy,This classic example comes from a cru vineyard...,Terre di Dora,91,Southern Italy,Fiano di Avellino,,White Blend,Terredora
52.0,France,"A perfect salmon shade, with scents of peaches...",Grand Brut Rosé,90,Champagne,Champagne,,Champagne Blend,Gosset


## Conditional Selection

In [24]:
# Say we are interested in wine produced in Italy.
wine_reviews.country == 'US'

0          True
1         False
2          True
3          True
4         False
          ...  
150925    False
150926    False
150927    False
150928    False
150929    False
Name: country, Length: 150930, dtype: bool

This operation produced a Series of True/False booleans based on the country of each record. This result can then be used inside of loc to select the relevant data:

In [26]:
wine_reviews.loc[wine_reviews.country == 'India']

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
1156,India,"Dark violet-red in color, this wine has a bouq...",Estate Bottled,90,13.0,Nashik,,,Shiraz,Sula
2292,India,"Aromas of blackberry, cherry preserves, white ...",Estate Bottled,91,12.0,Nashik,,,Shiraz,Sula
3145,India,A nose of cut herbs and just-mown grass backed...,,89,12.0,Nashik,,,Sauvignon Blanc,Sula
3174,India,"Aromas of pink grapefruit, grass and coriander...",,90,12.0,Nashik,,,Sauvignon Blanc,Sula
3840,India,"Pineapple, grapefruit, and apricot show bright...",,90,10.0,Nashik,,,Chenin Blanc,Sula
7804,India,This wine features a fresh nose of grapefruit ...,,87,12.0,Nashik,,,Chenin Blanc,Sula
36579,India,Charred wood and smoke dominate the nose and p...,Concerto Collection Basso Reserve,82,20.0,Nashik,,,Cabernet Sauvignon,Good Earth Wine
80529,India,Charred wood and smoke dominate the nose and p...,Concerto Collection Basso Reserve,82,20.0,Nashik,,,Cabernet Sauvignon,Good Earth Wine


We also wanted to know which ones are better than average. Wines are reviewed on a 80-to-100 point scale, so this could mean wines that accrued at least 90 points.

In [29]:
wine_reviews.loc[(wine_reviews.country == "India") & (wine_reviews.points >= 85)]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
1156,India,"Dark violet-red in color, this wine has a bouq...",Estate Bottled,90,13.0,Nashik,,,Shiraz,Sula
2292,India,"Aromas of blackberry, cherry preserves, white ...",Estate Bottled,91,12.0,Nashik,,,Shiraz,Sula
3145,India,A nose of cut herbs and just-mown grass backed...,,89,12.0,Nashik,,,Sauvignon Blanc,Sula
3174,India,"Aromas of pink grapefruit, grass and coriander...",,90,12.0,Nashik,,,Sauvignon Blanc,Sula
3840,India,"Pineapple, grapefruit, and apricot show bright...",,90,10.0,Nashik,,,Chenin Blanc,Sula
7804,India,This wine features a fresh nose of grapefruit ...,,87,12.0,Nashik,,,Chenin Blanc,Sula


In [30]:
# Or we want either of the condition
wine_reviews.loc[(wine_reviews.country == 'India') | (wine_reviews.points >= 85)]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
2,US,Mac Watson honors the memory of a wine once ma...,Special Selected Late Harvest,96,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley
3,US,"This spent 20 months in 30% new French oak, an...",Reserve,96,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi
4,France,"This is the top wine from La Bégude, named aft...",La Brûlade,95,66.0,Provence,Bandol,,Provence red blend,Domaine de la Bégude
...,...,...,...,...,...,...,...,...,...,...
150925,Italy,Many people feel Fiano represents southern Ita...,,91,20.0,Southern Italy,Fiano di Avellino,,White Blend,Feudi di San Gregorio
150926,France,"Offers an intriguing nose with ginger, lime an...",Cuvée Prestige,91,27.0,Champagne,Champagne,,Champagne Blend,H.Germain
150927,Italy,This classic example comes from a cru vineyard...,Terre di Dora,91,20.0,Southern Italy,Fiano di Avellino,,White Blend,Terredora
150928,France,"A perfect salmon shade, with scents of peaches...",Grand Brut Rosé,90,52.0,Champagne,Champagne,,Champagne Blend,Gosset


# Built-in conditional selectors
- .isin
This lets us select data whose value 'is in' a list of values.
for Example:

In [34]:
wine_reviews.loc[wine_reviews.country.isin(['Italy'])]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
10,Italy,"Elegance, complexity and structure come togeth...",Ronco della Chiesa,95,80.0,Northeastern Italy,Collio,,Friulano,Borgo del Tiglio
32,Italy,"Underbrush, scorched earth, menthol and plum s...",Vigna Piaggia,90,,Tuscany,Brunello di Montalcino,,Sangiovese,Abbadia Ardenga
35,Italy,"Forest floor, tilled soil, mature berry and a ...",Riserva,90,135.0,Tuscany,Brunello di Montalcino,,Sangiovese,Carillon
37,Italy,"Aromas of forest floor, violet, red berry and ...",,90,29.0,Tuscany,Vino Nobile di Montepulciano,,Sangiovese,Avignonesi
38,Italy,"This has a charming nose that boasts rose, vio...",,90,23.0,Tuscany,Chianti Classico,,Sangiovese,Casina di Cornia
...,...,...,...,...,...,...,...,...,...,...
150920,Italy,"Rich and mature aromas of smoke, earth and her...",Brut Riserva,91,19.0,Northeastern Italy,Trento,,Champagne Blend,Letrari
150922,Italy,Made by 30-ish Roberta Borghese high above Man...,Superiore,91,,Northeastern Italy,Colli Orientali del Friuli,,Tocai,Ronchi di Manzano
150925,Italy,Many people feel Fiano represents southern Ita...,,91,20.0,Southern Italy,Fiano di Avellino,,White Blend,Feudi di San Gregorio
150927,Italy,This classic example comes from a cru vineyard...,Terre di Dora,91,20.0,Southern Italy,Fiano di Avellino,,White Blend,Terredora


- .isnull
It also has an companion .notnull. These methods let us highlight values which are empty.
For exmale, to figure out wines lagging a price tag in the dataset:


In [35]:
wine_reviews.loc[wine_reviews.price.isnull()]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
32,Italy,"Underbrush, scorched earth, menthol and plum s...",Vigna Piaggia,90,,Tuscany,Brunello di Montalcino,,Sangiovese,Abbadia Ardenga
56,France,"Delicious while also young and textured, this ...",Le Pavé,90,,Loire Valley,Sancerre,,Sauvignon Blanc,Domaine Vacheron
72,Italy,"This offers aromas of red rose, wild berry, da...",Bussia Riserva,91,,Piedmont,Barolo,,Nebbiolo,Silvano Bolmida
82,Italy,"Berry, baking spice, dried iris, mint and a hi...",Palliano Riserva,91,,Piedmont,Roero,,Nebbiolo,Ceste
116,Spain,Aromas of brandied cherry and crème de cassis ...,Dulce Tinto,86,,Levante,Jumilla,,Monastrell,Casa de la Ermita
...,...,...,...,...,...,...,...,...,...,...
150377,New Zealand,"Light and a bit herbal, like a pleasant St.-Jo...",Matheson,84,,Hawke's Bay,,,Syrah,Matua Valley
150378,New Zealand,"Impressive purple color, but less intense on t...",,84,,Martinborough,,,Syrah,Kusuda
150587,Canada,"Shows pronounced oily, earthy, almost tobacco-...",Icewine,90,,Ontario,Lake Erie North Shore,,Riesling,Colio
150673,US,"Cherry-scented, clean and fruity. Good concent...",,87,,California,Dry Creek Valley,Sonoma,Zinfandel,Taft Street


# Assigning Data
Its comparatively easy to assign data in DataFrame.

In [38]:
wine_reviews['critic'] = "everyone"
wine_reviews['critic']

0         everyone
1         everyone
2         everyone
3         everyone
4         everyone
            ...   
150925    everyone
150926    everyone
150927    everyone
150928    everyone
150929    everyone
Name: critic, Length: 150930, dtype: object

In [39]:
# or with an iterable values
wine_reviews['index_backwards'] = range(len(wine_reviews), 0, -1)
wine_reviews['index_backwards']

0         150930
1         150929
2         150928
3         150927
4         150926
           ...  
150925         5
150926         4
150927         3
150928         2
150929         1
Name: index_backwards, Length: 150930, dtype: int32