# Session: Data Manipulation
## Introduction
We'll be using wine-reviews dataset for today's session.

Let's have a look a our dataset


In [None]:
import pandas as pd

wine_reviews = pd.read_csv("/content/wine-dataset.csv", index_col=0)
pd.set_option("display.max_rows", 5)

In [None]:
wine_reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


In [None]:
wine_reviews.shape

(10336, 13)

# Checking and removing duplicates

In [None]:
wine_reviews.duplicated()

0        False
1        False
         ...  
33606    False
33607    False
Length: 33608, dtype: bool

In [None]:
duplicate_rows = wine_reviews.duplicated()

duplicate_rows.head(2409)

0       False
1       False
        ...  
2407    False
2408     True
Length: 2409, dtype: bool

In [None]:
wine_reviews[duplicate_rows]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
2408,US,"This is weighty, creamy and medium to full in ...",,85,14.0,California,North Coast,North Coast,Virginie Boone,@vboone,Souverain 2010 Chardonnay (North Coast),Chardonnay,Souverain
2409,Italy,There's a touch of toasted almond at the start...,Sallier de la Tour,85,13.0,Sicily & Sardinia,Sicilia,,,,Tasca d'Almerita 2011 Sallier de la Tour Grill...,Grillo,Tasca d'Almerita
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10148,Portugal,A surprisingly fresh wine for an Alvarinho fro...,Terra d'Alter,87,14.0,Alentejano,,,Roger Voss,@vossroger,Terras de Alter 2014 Terra d'Alter Alvarinho (...,Alvarinho,Terras de Alter
10149,Portugal,"This is an attractive, perfumed wine that is a...",Terra d'Alter,87,10.0,Alentejano,,,Roger Voss,@vossroger,Terras de Alter 2014 Terra d'Alter White (Alen...,Portuguese White,Terras de Alter


These are the duplicated rows.
43 duplicate rows.

Now let's drop these rows

In [None]:
wine_reviews.drop_duplicates(inplace = True)
# inplace removes in same dataset

wine_reviews

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
...,...,...,...,...,...,...,...,...,...,...,...,...,...
33606,US,"This is soft, smooth and refined, with rich, c...",Estate Grown and Bottled,91,25.0,California,Santa Ynez Valley,Central Coast,,,Zaca Mesa 2009 Estate Grown and Bottled Syrah ...,Syrah,Zaca Mesa
33607,Italy,"Bosco Faiano is a hearty, masculine wine, aged...",Bosco Faiano,91,45.0,South,,,,,,,


In [None]:
wine_reviews.shape

(33005, 13)

# Dropping columns
to remove unwanted features, which don't contribute to the data and might create noise.

This helps in data cleaning

To check and review columns, let's use value_counts()

In [None]:
wine_reviews.taster_name

0        Kerin O’Keefe
1           Roger Voss
             ...      
33606              NaN
33607              NaN
Name: taster_name, Length: 33005, dtype: object

In [None]:
wine_reviews.taster_name.value_counts()

Roger Voss           2005
Michael Schachner    1248
                     ... 
Carrie Dykes            6
Fiona Adams             3
Name: taster_name, Length: 18, dtype: int64

In [None]:
#wine_reviews.points

wine_reviews.price

0         NaN
1        15.0
         ... 
33606    25.0
33607    45.0
Name: price, Length: 33005, dtype: float64

In [None]:
wine_reviews.title.value_counts()

#wine_reviews.designation.value_counts()

Korbel NV Brut Sparkling (California)                                4
G. H. Mumm NV Brut Rosé  (Champagne)                                 3
                                                                    ..
Torii Mor 2013 Nysa Vineyard Pinot Noir (Dundee Hills)               1
Zaca Mesa 2009 Estate Grown and Bottled Syrah (Santa Ynez Valley)    1
Name: title, Length: 32916, dtype: int64

evidently, we don't need title column

In [None]:
wine_reviews.variety.value_counts()

Pinot Noir    3341
Chardonnay    2879
              ... 
Mavrud           1
Mission          1
Name: variety, Length: 498, dtype: int64

Let's select unwanted columns to remove:


*   description
*   region_2
*   twitter_handle
*   title





In [None]:
reviews = wine_reviews.drop(["description","region_2","taster_twitter_handle","title"], axis=1)

# or You can use inplace=True parameter

reviews

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
0,Italy,Vulkà Bianco,87,,Sicily & Sardinia,Etna,Kerin O’Keefe,White Blend,Nicosia
1,Portugal,Avidagos,87,15.0,Douro,,Roger Voss,Portuguese Red,Quinta dos Avidagos
...,...,...,...,...,...,...,...,...,...
33606,US,Estate Grown and Bottled,91,25.0,California,Santa Ynez Valley,,Syrah,Zaca Mesa
33607,Italy,Bosco Faiano,91,45.0,South,,,,


In [None]:
wine_reviews

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
...,...,...,...,...,...,...,...,...,...,...,...,...,...
33606,US,"This is soft, smooth and refined, with rich, c...",Estate Grown and Bottled,91,25.0,California,Santa Ynez Valley,Central Coast,,,Zaca Mesa 2009 Estate Grown and Bottled Syrah ...,Syrah,Zaca Mesa
33607,Italy,"Bosco Faiano is a hearty, masculine wine, aged...",Bosco Faiano,91,45.0,South,,,,,,,


Initially there were 13 columns, out of which we've removed 4.

## Removing NaN values
It's a step in data cleaning process

there are two ways to deal with NaN values


1.   removing the rows containing NaN
2.   replacing values with mean of that column





In [None]:
reviews.dropna()

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
3,US,Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,Alexander Peartree,Riesling,St. Julian
4,US,Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Paul Gregutt,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...
33602,US,Naked,88,11.0,Washington,Columbia Valley (WA),Paul Gregutt,Chardonnay,Ryan Patrick
33605,Australia,George Wyndham Founder's Reserve,91,20.0,South Australia,Langhorne Creek,Joe Czerwinski,Shiraz,Wyndham Estate


Let's fill null values of points with their min

In [None]:
min_points= reviews.points.min()
min_points

80

In [None]:
reviews[reviews.price.isnull()]

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
0,Italy,Vulkà Bianco,87,,Sicily & Sardinia,Etna,Kerin O’Keefe,White Blend,Nicosia
13,Italy,Rosso,87,,Sicily & Sardinia,Etna,Kerin O’Keefe,Nerello Mascalese,Masseria Setteporte
...,...,...,...,...,...,...,...,...,...
33586,Italy,Riserva,88,,Tuscany,Chianti Rufina,,Sangiovese,Villa Travignoli
33598,Italy,Lamole,88,,Tuscany,Chianti Classico,,Sangiovese,Castelli del Grevepesa


In [None]:
reviews.points.fillna(80, inplace=True)

In [None]:
reviews

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
0,Italy,Vulkà Bianco,87,,Sicily & Sardinia,Etna,Kerin O’Keefe,White Blend,Nicosia
1,Portugal,Avidagos,87,15.0,Douro,,Roger Voss,Portuguese Red,Quinta dos Avidagos
...,...,...,...,...,...,...,...,...,...
33606,US,Estate Grown and Bottled,91,25.0,California,Santa Ynez Valley,,Syrah,Zaca Mesa
33607,Italy,Bosco Faiano,91,45.0,South,,,,


In [None]:
reviews[reviews.points.isnull()]

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery


Let's replace null values in price column with their average

In [None]:
avg = reviews.price.mean()
avg

34.95570198329854

In [None]:
reviews.price.fillna(avg, inplace=True)

In [None]:
reviews[reviews.price.isnull()]

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery


drop rows where there's no province

In [None]:
reviews.province.dropna()

0        Sicily & Sardinia
3                 Michigan
               ...        
33602           Washington
33605      South Australia
Name: province, Length: 14518, dtype: object

In [None]:
reviews[reviews.province.isnull()]

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery


In [None]:
reviews.dropna(subset=['province', 'country'])

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
0,Italy,Vulkà Bianco,87,34.955702,Sicily & Sardinia,Etna,Kerin O’Keefe,White Blend,Nicosia
3,US,Reserve Late Harvest,87,13.000000,Michigan,Lake Michigan Shore,Alexander Peartree,Riesling,St. Julian
...,...,...,...,...,...,...,...,...,...
33602,US,Naked,88,11.000000,Washington,Columbia Valley (WA),Paul Gregutt,Chardonnay,Ryan Patrick
33605,Australia,George Wyndham Founder's Reserve,91,20.000000,South Australia,Langhorne Creek,Joe Czerwinski,Shiraz,Wyndham Estate


## drop the rows

let's drop rows with NaN values

In [None]:
reviews.dropna(inplace=True)
reviews

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
0,Italy,Vulkà Bianco,87,34.955702,Sicily & Sardinia,Etna,Kerin O’Keefe,White Blend,Nicosia
3,US,Reserve Late Harvest,87,13.000000,Michigan,Lake Michigan Shore,Alexander Peartree,Riesling,St. Julian
...,...,...,...,...,...,...,...,...,...
33602,US,Naked,88,11.000000,Washington,Columbia Valley (WA),Paul Gregutt,Chardonnay,Ryan Patrick
33605,Australia,George Wyndham Founder's Reserve,91,20.000000,South Australia,Langhorne Creek,Joe Czerwinski,Shiraz,Wyndham Estate


As you'll learn, we do this with the `groupby()` operation.  We'll also cover some additional topics, such as more complex ways to index your DataFrames, along with how to sort your data.

# Groupwise analysis

One function we've been using heavily thus far is the `value_counts()` function. We can replicate what `value_counts()` does by doing the following:

In [None]:
reviews.country.value_counts()

US           6494
France       3583
             ... 
Australia     356
Canada         49
Name: country, Length: 7, dtype: int64

In [None]:
reviews.groupby('country').price.mean()

country
Argentina    26.125085
Australia    39.709803
               ...    
Spain        29.472186
US           39.796363
Name: price, Length: 7, dtype: float64

In [None]:
reviews.groupby('country').price.max()

country
Argentina    215.0
Australia    350.0
             ...  
Spain        770.0
US           625.0
Name: price, Length: 7, dtype: float64

In [None]:
reviews[reviews.variety=='Riesling']

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
3,US,Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,Alexander Peartree,Riesling,St. Julian
97,US,Ingle Vineyard,88,20.0,New York,Finger Lakes,Anna Lee C. Iijima,Riesling,Heron Hill
...,...,...,...,...,...,...,...,...,...
33454,France,Réserve Saint Jean,88,17.0,Alsace,Alsace,Anne Krebiehl MW,Riesling,Marcel Hugg
33483,US,Sawmill Creek Vineyards,87,17.0,New York,Finger Lakes,Anna Lee C. Iijima,Riesling,Billsboro


In [None]:
reviews[reviews.country=='Argentina']

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
16,Argentina,Felix,87,30.0,Other,Cafayate,Michael Schachner,Malbec,Felix Lavaque
17,Argentina,Winemaker Selection,87,13.0,Mendoza Province,Mendoza,Michael Schachner,Malbec,Gaucho Andino
...,...,...,...,...,...,...,...,...,...
33379,Argentina,Don Giaroli,81,14.0,Mendoza Province,Mendoza,Michael Schachner,Syrah,Hollen Family Vineyards
33380,Argentina,Malbec-Merlot-Cabernet Sauvignon,81,19.0,Mendoza Province,Agrelo,Michael Schachner,Bordeaux-style Red Blend,Trapezio


In [None]:
reviews.groupby('winery').points.mean()

winery
1+1=3           86.000
1000 Stories    91.000
                 ...  
Öko             85.000
àMaurice        89.125
Name: points, Length: 5516, dtype: float64

In [None]:
reviews.groupby('winery').points.mean().sort_values()

winery
Tenimenti Montagnana    80.0
Baroncini               80.0
                        ... 
Château Pontet-Canet    97.0
Pirouette               98.0
Name: points, Length: 5516, dtype: float64

In [None]:
reviews.head()

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
0,Italy,Vulkà Bianco,87,,Sicily & Sardinia,Etna,Kerin O’Keefe,White Blend,Nicosia
1,Portugal,Avidagos,87,15.0,Douro,,Roger Voss,Portuguese Red,Quinta dos Avidagos
2,US,,87,14.0,Oregon,Willamette Valley,Paul Gregutt,Pinot Gris,Rainstorm
3,US,Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,Alexander Peartree,Riesling,St. Julian
4,US,Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Paul Gregutt,Pinot Noir,Sweet Cheeks


In [None]:
reviews.groupby('points').points.count()

points
80     18
81     53
       ..
99      2
100     2
Name: points, Length: 21, dtype: int64

In [None]:
reviews.points.value_counts()

90     1926
88     1827
       ... 
98       10
100       1
Name: points, Length: 20, dtype: int64

`groupby()` created a group of reviews which allotted the same point values to the given wines. Then, for each of these groups, we grabbed the `points()` column and counted how many times it appeared.  `value_counts()` is just a shortcut to this `groupby()` operation.

We can use any of the summary functions we've used before with this data. For example, to get the cheapest wine in each point value category, we can do the following:

In [None]:
reviews.groupby('points').price.mean()

points
80      16.888889
81      19.865682
          ...    
98     190.100000
100    350.000000
Name: price, Length: 20, dtype: float64

You can think of each group we generate as being a slice of our DataFrame containing only data with values that match. This DataFrame is accessible to us directly using the `apply()` method, and we can then manipulate the data in any way we see fit. For example, here's one way of selecting the name of the first wine reviewed from each winery in the dataset:

In [None]:
def show_min(arr):
  return arr.min()

show_avg = lambda arr: arr.mean()
# lambda func is a shorthand

In [None]:
arr1 = reviews.points

show_min(arr1)

80

In [None]:
show_avg(reviews.points)

88.89729990356798

In [None]:
wine_reviews.groupby('winery').apply(lambda df: df.title.iloc[0])

winery
1+1=3                          1+1=3 NV Rosé Sparkling (Cava)
10 Knots                 10 Knots 2010 Viognier (Paso Robles)
                                  ...                        
àMaurice    àMaurice 2013 Fred Estate Syrah (Walla Walla V...
Štoka                         Štoka 2009 Izbrani Teran (Kras)
Length: 10475, dtype: object

In [None]:
wine_reviews.iloc[0]

country                                                    Italy
description    Aromas include tropical fruit, broom, brimston...
                                     ...                        
variety                                              White Blend
winery                                                   Nicosia
Name: 0, Length: 13, dtype: object

For even more fine-grained control, you can also group by more than one column. For an example, here's how we would pick out the best wine by country _and_ province:

In [None]:
reviews[reviews.country=='US']

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
3,US,Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,Alexander Peartree,Riesling,St. Julian
4,US,Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Paul Gregutt,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...
33599,US,Ingle Vineyard,88,15.0,New York,Finger Lakes,Anna Lee C. Iijima,Cabernet Franc,Heron Hill
33602,US,Naked,88,11.0,Washington,Columbia Valley (WA),Paul Gregutt,Chardonnay,Ryan Patrick


In [None]:
reviews.groupby(['country', 'province']).apply(lambda df: df.loc[df.points.idxmax()])

Unnamed: 0_level_0,Unnamed: 1_level_0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
country,province,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Argentina,Mendoza Province,Argentina,"If the color doesn't tell the full story, the ...",Nicasia Vineyard,97,120.0,Mendoza Province,Mendoza,,Michael Schachner,@wineschach,Bodega Catena Zapata 2006 Nicasia Vineyard Mal...,Malbec,Bodega Catena Zapata
Argentina,Other,Argentina,"Take note, this could be the best wine Colomé ...",Reserva,95,90.0,Other,Salta,,Michael Schachner,@wineschach,Colomé 2010 Reserva Malbec (Salta),Malbec,Colomé
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Uruguay,San Jose,Uruguay,"Baked, sweet, heavy aromas turn earthy with ti...",El Preciado Gran Reserva,87,50.0,San Jose,,,Michael Schachner,@wineschach,Castillo Viejo 2005 El Preciado Gran Reserva R...,Red Blend,Castillo Viejo
Uruguay,Uruguay,Uruguay,"Cherry and berry aromas are ripe, healthy and ...",Blend 002 Limited Edition,91,22.0,Uruguay,,,Michael Schachner,@wineschach,Narbona NV Blend 002 Limited Edition Tannat-Ca...,Tannat-Cabernet Franc,Narbona


Another `groupby()` method worth mentioning is `agg()`, which lets you run a bunch of different functions on your DataFrame simultaneously. For example, we can generate a simple statistical summary of the dataset as follows:

In [None]:
reviews.groupby('country').price.max()

country
Argentina    215.0
Australia    350.0
             ...  
Spain        770.0
US           625.0
Name: price, Length: 7, dtype: float64

Aggregate method

In [None]:
reviews.groupby(['country']).price.agg([len, min, max])

Unnamed: 0_level_0,len,min,max
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Argentina,765,4.0,215.0
Australia,356,7.0,350.0
...,...,...,...
Spain,1333,4.0,770.0
US,6494,5.0,625.0


Effective use of `groupby()` will allow you to do lots of really powerful things with your dataset.

# label encoding

Data can't be fed to any model with string values in it

The best way to deal with this is assigning a numerical value to the categorical values

In [None]:

from sklearn.preprocessing import LabelEncoder


label_encoder = LabelEncoder()

# Fit and transform the 'Category' column to get numerical labels
reviews['country'] = label_encoder.fit_transform(reviews['country'])


let's ahve look at new dataframe


In [None]:
reviews

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
0,4,Vulkà Bianco,87,34.955702,Sicily & Sardinia,Etna,Kerin O’Keefe,White Blend,Nicosia
3,6,Reserve Late Harvest,87,13.000000,Michigan,Lake Michigan Shore,Alexander Peartree,Riesling,St. Julian
...,...,...,...,...,...,...,...,...,...
33602,6,Naked,88,11.000000,Washington,Columbia Valley (WA),Paul Gregutt,Chardonnay,Ryan Patrick
33605,1,George Wyndham Founder's Reserve,91,20.000000,South Australia,Langhorne Creek,Joe Czerwinski,Shiraz,Wyndham Estate


In [None]:
reviews.country

0        4
3        6
        ..
33602    6
33605    1
Name: country, Length: 14518, dtype: int64

# Multi-indexes

In all of the examples we've seen thus far we've been working with DataFrame or Series objects with a single-label index. `groupby()` is slightly different in the fact that, depending on the operation we run, it will sometimes result in what is called a multi-index.

A multi-index differs from a regular index in that it has multiple levels. For example:

In [None]:
countries_reviewed = reviews.groupby(['country', 'province']).description.agg([len])
countries_reviewed

Unnamed: 0_level_0,Unnamed: 1_level_0,len
country,province,Unnamed: 2_level_1
Argentina,Mendoza Province,3264
Argentina,Other,536
...,...,...
Uruguay,San Jose,3
Uruguay,Uruguay,24


In [None]:
mi = countries_reviewed.index
type(mi)

pandas.core.indexes.multi.MultiIndex

Multi-indices have several methods for dealing with their tiered structure which are absent for single-level indices. They also require two levels of labels to retrieve a value. Dealing with multi-index output is a common "gotcha" for users new to pandas.

The use cases for a multi-index are detailed alongside instructions on using them in the [MultiIndex / Advanced Selection](https://pandas.pydata.org/pandas-docs/stable/advanced.html) section of the pandas documentation.

However, in general the multi-index method you will use most often is the one for converting back to a regular index, the `reset_index()` method:

In [None]:
countries_reviewed.reset_index()

Unnamed: 0,country,province,len
0,Argentina,Mendoza Province,3264
1,Argentina,Other,536
...,...,...,...
423,Uruguay,San Jose,3
424,Uruguay,Uruguay,24


# Sorting

In [None]:
reviews


Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
0,Italy,Vulkà Bianco,87,34.955702,Sicily & Sardinia,Etna,Kerin O’Keefe,White Blend,Nicosia
3,US,Reserve Late Harvest,87,13.000000,Michigan,Lake Michigan Shore,Alexander Peartree,Riesling,St. Julian
...,...,...,...,...,...,...,...,...,...
33602,US,Naked,88,11.000000,Washington,Columbia Valley (WA),Paul Gregutt,Chardonnay,Ryan Patrick
33605,Australia,George Wyndham Founder's Reserve,91,20.000000,South Australia,Langhorne Creek,Joe Czerwinski,Shiraz,Wyndham Estate


In [None]:
reviews.sort_values(by='price')

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
1987,Spain,Flirty Bird,85,4.0,Central Spain,Vino de la Tierra de Castilla,Michael Schachner,Syrah,Felix Solis
29553,Argentina,Red,84,4.0,Mendoza Province,Mendoza,Michael Schachner,Malbec-Syrah,Broke Ass
...,...,...,...,...,...,...,...,...,...
353,France,Le Montrachet,96,630.0,Burgundy,Montrachet,Roger Voss,Chardonnay,Louis Latour
15846,Spain,El Perer,96,770.0,Catalonia,Priorat,Michael Schachner,Carignan,Marco Abella


In [None]:
reviews.sort_values(by='points')

Unnamed: 0,country,designation,points,price,province,region_1,taster_name,variety,winery
7094,Argentina,Gran Reserva,80,18.0,Mendoza Province,Uco Valley,Michael Schachner,Malbec,Finca El Origen
19123,France,Vin de Pays de l'Ardeche,80,10.0,France Other,France,Michael Schachner,Viognier,Georges Duboeuf
...,...,...,...,...,...,...,...,...,...
16772,Italy,Vigna Rionda Riserva,98,151.0,Piedmont,Barolo,Kerin O’Keefe,Nebbiolo,Massolino
345,Australia,Rare,100,350.0,Victoria,Rutherglen,Joe Czerwinski,Muscat,Chambers Rosewood Vineyards
