<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Summary-Functions" data-toc-modified-id="Summary-Functions-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Summary Functions</a></span><ul class="toc-item"><li><span><a href="#Describe" data-toc-modified-id="Describe-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Describe</a></span></li><li><span><a href="#Mean,-max,-and-other" data-toc-modified-id="Mean,-max,-and-other-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Mean, max, and other</a></span></li></ul></li><li><span><a href="#Maps" data-toc-modified-id="Maps-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Maps</a></span><ul class="toc-item"><li><span><a href="#Apply" data-toc-modified-id="Apply-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Apply</a></span></li></ul></li><li><span><a href="#Direct-mapping-and-applying" data-toc-modified-id="Direct-mapping-and-applying-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Direct mapping and applying</a></span></li><li><span><a href="#Some-useful-examples" data-toc-modified-id="Some-useful-examples-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Some useful examples</a></span><ul class="toc-item"><li><span><a href="#No-of-countries-are-present-dataset" data-toc-modified-id="No-of-countries-are-present-dataset-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>No of countries are present dataset</a></span></li><li><span><a href="#No-of-reviews-per-country" data-toc-modified-id="No-of-reviews-per-country-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>No of reviews per country</a></span></li><li><span><a href="#Title-of-points-to-price-ratio" data-toc-modified-id="Title-of-points-to-price-ratio-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Title of points to price ratio</a></span></li><li><span><a href="#Description-value" data-toc-modified-id="Description-value-4.4"><span class="toc-item-num">4.4&nbsp;&nbsp;</span>Description value</a></span></li><li><span><a href="#Star-Rating" data-toc-modified-id="Star-Rating-4.5"><span class="toc-item-num">4.5&nbsp;&nbsp;</span>Star Rating</a></span></li></ul></li></ul></div>

# Summary Functions

In [1]:
import pandas as pd

data_path = 'data/winemag-data-130k-v2.csv'

data = pd.read_csv(data_path)

In [2]:
data.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


## Describe

We can use `describe` on per column basis also to get better understanding of each column.

In [3]:
data.points.describe()

count    129971.000000
mean         88.447138
std           3.039730
min          80.000000
25%          86.000000
50%          88.000000
75%          91.000000
max         100.000000
Name: points, dtype: float64

## Mean, max, and other

We can also use the `mean()`, `max()`, `unique()` and other functions on per column basis.

In [4]:
data.country.unique()

array(['Italy', 'Portugal', 'US', 'Spain', 'France', 'Germany',
       'Argentina', 'Chile', 'Australia', 'Austria', 'South Africa',
       'New Zealand', 'Israel', 'Hungary', 'Greece', 'Romania', 'Mexico',
       'Canada', nan, 'Turkey', 'Czech Republic', 'Slovenia',
       'Luxembourg', 'Croatia', 'Georgia', 'Uruguay', 'England',
       'Lebanon', 'Serbia', 'Brazil', 'Moldova', 'Morocco', 'Peru',
       'India', 'Bulgaria', 'Cyprus', 'Armenia', 'Switzerland',
       'Bosnia and Herzegovina', 'Ukraine', 'Slovakia', 'Macedonia',
       'China', 'Egypt'], dtype=object)

In [5]:
data.points.mean()

88.44713820775404

In [7]:
data.points.value_counts()

88     17207
87     16933
90     15410
86     12600
89     12226
91     11359
92      9613
85      9530
93      6489
84      6480
94      3758
83      3025
82      1836
95      1535
81       692
96       523
80       397
97       229
98        77
99        33
100       19
Name: points, dtype: int64

# Maps

A map is a term, borrowed from mathematics, for a function that takes one set of values and "maps" them to another set of values, transforming data from the format it is in now to the format that we want it to be in later.

Lets say we want to normalize the score of wines, then we can use maps to do this:

In [8]:
points_mean = data.points.mean()
data.points.map(lambda p: p - points_mean)

0        -1.447138
1        -1.447138
2        -1.447138
3        -1.447138
4        -1.447138
            ...   
129966    1.552862
129967    1.552862
129968    1.552862
129969    1.552862
129970    1.552862
Name: points, Length: 129971, dtype: float64

The function you pass to map() should expect a single value from the Series (a point value, in the above example), and return a transformed version of that value. map() returns a new Series where all the values have been transformed by your function.

## Apply


`apply()` is the equivalent method if we want to transform a whole DataFrame by calling a custom method on each row.

In [9]:
def mean_modification(row):
    row.points = row.points - points_mean
    return row

data.apply(mean_modification, axis='columns')

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,-1.447138,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,-1.447138,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,-1.447138,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,-1.447138,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,-1.447138,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,1.552862,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,129967,US,Citation is given as much as a decade of bottl...,,1.552862,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,129968,France,Well-drained gravel soil gives this wine its c...,Kritt,1.552862,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,129969,France,"A dry style of Pinot Gris, this is crisp with ...",,1.552862,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


If we had called reviews.apply() with axis='index', then instead of passing a function to transform each row, we would need to give a function to transform each column.

<b>Note:`map()` and `apply()` don't transform the original data-frame, they return back modified Series and DataFrame respectively.</b>

# Direct mapping and applying

Pandas also provide a direct and faster way to map and apply functions.

In [11]:
data.points - points_mean

0        -1.447138
1        -1.447138
2        -1.447138
3        -1.447138
4        -1.447138
            ...   
129966    1.552862
129967    1.552862
129968    1.552862
129969    1.552862
129970    1.552862
Name: points, Length: 129971, dtype: float64

In [13]:
data.country + '-' + data.region_1

0                     Italy-Etna
1                            NaN
2           US-Willamette Valley
3         US-Lake Michigan Shore
4           US-Willamette Valley
                   ...          
129966                       NaN
129967                 US-Oregon
129968             France-Alsace
129969             France-Alsace
129970             France-Alsace
Length: 129971, dtype: object

# Some useful examples

## No of countries are present dataset 

In [14]:
data.country.count()

129908

## No of reviews per country

In [15]:
data.country.value_counts()

US                        54504
France                    22093
Italy                     19540
Spain                      6645
Portugal                   5691
Chile                      4472
Argentina                  3800
Austria                    3345
Australia                  2329
Germany                    2165
New Zealand                1419
South Africa               1401
Israel                      505
Greece                      466
Canada                      257
Hungary                     146
Bulgaria                    141
Romania                     120
Uruguay                     109
Turkey                       90
Slovenia                     87
Georgia                      86
England                      74
Croatia                      73
Mexico                       70
Moldova                      59
Brazil                       52
Lebanon                      35
Morocco                      28
Peru                         16
Ukraine                      14
Czech Re

## Title of points to price ratio

find the title of the wine which has the best point to price ratio.

In [37]:
price_to_point_ratio = data.points / data.price
print(price_to_point_ratio)

0              NaN
1         5.800000
2         6.214286
3         6.692308
4         1.338462
            ...   
129966    3.214286
129967    1.200000
129968    3.000000
129969    2.812500
129970    4.285714
Length: 129971, dtype: float64


In [39]:
max_ratio = price_to_point_ratio.idxmax()
print(max_ratio)

64590


In [41]:
data.loc[max_ratio, 'title']

'Bandit NV Merlot (California)'

In [42]:
data.loc[max_ratio].title

'Bandit NV Merlot (California)'

## Description value

There are many wines which are either described as tropical or as fruity. Count the descriptions for each of these 2 values.

In [56]:
tropical_count = data[data.description.map(lambda des: 'tropical' in des)].count()
fruity_count = data[data.description.map(lambda des: 'fruity' in des)].count()
print(tropical_count)
print(fruity_count)
n_tropical = data.description.map(lambda des: 'tropical' in des).sum()
n_fruity = data.description.map(lambda des: 'fruity' in des).sum()
print(n_fruity)
print(n_tropical)

Unnamed: 0               3607
country                  3605
description              3607
designation              2401
points                   3607
price                    3447
province                 3605
region_1                 2822
region_2                 1624
taster_name              2768
taster_twitter_handle    2580
title                    3607
variety                  3607
winery                   3607
dtype: int64
Unnamed: 0               9090
country                  9086
description              9090
designation              6334
points                   9090
price                    8050
province                 9086
region_1                 6893
region_2                 2265
taster_name              8068
taster_twitter_handle    7503
title                    9090
variety                  9090
winery                   9090
dtype: int64
9090
3607


In [57]:
pd.Series([n_tropical, n_fruity], index=['tropical', 'fruity'])

tropical    3607
fruity      9090
dtype: int64

## Star Rating

we have to give star rating to each row, if the wine is from Canada then it automatically gets 3 stars, if the points is 95 or above then also it gets 3 stars, if its above 85 then 2 stars and if below 85 then 1.

In [60]:
data['stars'] = 0

In [61]:
data.stars

0         0
1         0
2         0
3         0
4         0
         ..
129966    0
129967    0
129968    0
129969    0
129970    0
Name: stars, Length: 129971, dtype: int64

In [62]:
def star_rating(row):
    if(row.country == 'Canada'):
        row.stars = 3
    elif (row.points >= 95):
        row.stars = 3
    elif (row.points >=85):
        row.stars = 2
    else:
        row.stars = 1
    return row

In [63]:
data.apply(star_rating, axis = 'columns')

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,stars
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia,2
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos,2
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm,2
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian,2
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef),2
129967,129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation,2
129968,129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser,2
129969,129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss,2
