<a href="https://colab.research.google.com/github/RCDirks/eda-wine-reviews/blob/main/Exploratory_Data_Analysis_on_Wine_Reviews.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analysis of Wine reviews

What makes a great wine? In this analysis we will explore if country, region and production year effect the quality of the wine. Since taste is subjective we will also include some information on the tasters as well as the description of the wine.

In [None]:
import pandas as pd
df = pd.read_csv('wine.csv')
df.head()

Unnamed: 0,country,description,designation,points,price,province,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


 # Which country produces wine with the most points, on average?

 On average, England produces wines that receive the highest points(91.58 points on average), India (90.22 points) and Austria (90.10 points) follow closely behind. What was interesting is that France (88.84 points) is 8th on the list and Italy (88.56 points) is 14th as I was expecting them to be nearer the top.

In [None]:
df.groupby('country')['points'].mean().sort_values(ascending=False)

country
England                   91.581081
India                     90.222222
Austria                   90.101345
Germany                   89.851732
Canada                    89.369650
Hungary                   89.191781
China                     89.000000
France                    88.845109
Luxembourg                88.666667
Australia                 88.580507
Switzerland               88.571429
Morocco                   88.571429
US                        88.563720
Italy                     88.562231
Israel                    88.471287
New Zealand               88.303030
Portugal                  88.250220
Turkey                    88.088889
Slovenia                  88.068966
South Africa              88.056388
Bulgaria                  87.936170
Georgia                   87.686047
Lebanon                   87.685714
Armenia                   87.500000
Serbia                    87.500000
Spain                     87.288337
Greece                    87.283262
Czech Republic      

# Which taster gives the lowest scores (points), on average?

As mentioned before wine tasting is very subjective. It was discovered that Alexander Peartree's average score was 85.86 points making him the lowest scorer out of all the tasters. Anne Krebiehl MW, on the other hand, gave an average score of 90.56 points making her the highest scorer.

In [None]:
df.groupby('taster_name')['points'].mean().sort_values()

taster_name
Alexander Peartree    85.855422
Carrie Dykes          86.395683
Susan Kostrzewa       86.609217
Fiona Adams           86.888889
Michael Schachner     86.907493
Lauren Buzzeo         87.739510
Christina Pickard     87.833333
Jeff Jenssen          88.319756
Anna Lee C. Iijima    88.415629
Joe Czerwinski        88.536235
Jim Gordon            88.626287
Roger Voss            88.708003
Sean P. Sullivan      88.755739
Kerin O’Keefe         88.867947
Paul Gregutt          89.082564
Mike DeSimone         89.101167
Virginie Boone        89.213379
Matt Kettmann         90.008686
Anne Krebiehl MW      90.562551
Name: points, dtype: float64

# Which variety of wine is the most expensive, on average?

While low price does not necessarily mean a wine is bad, an expensive wine usually indicates a good quality wine. Ramico topped the list at an average $495.  

The next closest was Terrantez at $236.

In [None]:
df.groupby('variety')['price'].mean().sort_values(ascending=False)

variety
Ramisco                           495.000000
Terrantez                         236.000000
Francisa                          160.000000
Rosenmuskateller                  150.000000
Malbec-Cabernet                   113.333333
                                     ...    
Roscetto                                 NaN
Sauvignon Blanc-Sauvignon Gris           NaN
Tempranillo-Malbec                       NaN
Vital                                    NaN
Zelen                                    NaN
Name: price, Length: 707, dtype: float64

# Which year of wines has the best score (points), on average?

When asked what a good year for wine is, you can say with confidence that 1969 is a great year with wines receiving an average score of 98. 1952, 1968, and 1966 are also good years with average scores of 95. The 1147 bottle that also scored a 95 would probably be a little hard to find and quite expensive. More research would need to be done on that bottle.

In [None]:
df['year'] = df['title'].str.extract(r'.*(\d{4}).*')
df.groupby('year')['points'].mean().sort_values(ascending=False)

year
1969    98.0
1952    95.5
1147    95.5
1968    95.0
1966    95.0
        ... 
1887    83.0
1912    83.0
1922    83.0
1762    83.0
1492    81.5
Name: points, Length: 168, dtype: float64

# Do reviews with the word "depth" in them tend to get better than average or worse than average points?

It should be noted that if the description contained the word 'depth' in it on average it would receive more points.

In [None]:
df['depth'] = df['description'].str.contains('depth', case=False)
df.groupby('depth')['points'].mean().sort_values(ascending=False)

depth
True     90.112109
False    88.413685
Name: points, dtype: float64

# Do reviews with the word "fruity" in them tend to get better than average or worse than average points?

On another note, if the description included the word 'fruity' on average it would receive less points.

In [None]:
df['fruity'] = df['description'].str.contains('fruity', case=False)
df.groupby('fruity')['points'].mean().sort_values(ascending=False)

fruity
False    88.513558
True     87.600529
Name: points, dtype: float64

# Do reviews with the word "herbal" in them tend to get better than average or worse than average points?

This also proved true if the description included the word 'herbal' as it would receive less points.

In [None]:
df['herbal'] = df['description'].str.contains('herbal', case=False)
df.groupby('herbal')['points'].mean().sort_values(ascending=False)

herbal
False    88.492726
True     87.438434
Name: points, dtype: float64

# Do reviews with more letters award more or less points, on average?

There was a moderate positive correlation of .5578. Meaning that usually the more words in a review resulted in more points.

In [None]:
df['review_length'] = df['description'].str.len()
df[['review_length', 'points']].corr()

Unnamed: 0,review_length,points
review_length,1.0,0.55776
points,0.55776,1.0


# Which region of the province Sicily & Sardinia produces the best wine, on average?

Of the 16 regions of Sicily & Sardinia, Faro lead the group with an average of 94 points. It's next closest neighbor was Alghero at an average of 91.5 points.

In [None]:
df['region'] = df['title'].str.extract(r'\((\w+)\)')
sicily_sardinia = df[df['province'] == 'Sicily & Sardinia']
sicily_sardinia.groupby('region')['points'].mean().sort_values(ascending=False)



region
Faro         94.000000
Alghero      91.500000
Eloro        90.200000
Salina       89.875000
Etna         89.869048
Noto         89.428571
Vittoria     88.791667
Siracusa     88.500000
Marsala      88.285714
Mamertino    87.500000
Sicilia      87.438919
Alcamo       87.285714
Menfi        87.250000
Sardinia     87.200000
Erice        86.818182
Monreale     86.666667
Name: points, dtype: float64

# In Conclusion

While again wine tasting is subjective, according to the tasters in this study, England produces the best wines. Anne Krebiehl MW gives the highest score while Alexander Peartree gives the lowest scores. If they strayed from that pattern it would be worth taking note. A Ramisco on average will be the most expensive wine. Anything from 1969 would be worth tasting.

The description of the wine can influence it's score. the word 'depth' is likely to increase the score while 'fruity' and 'herbal' are likely to decrease the score. The longer the description the more likely the score will be higher.

If you are partial to the province of Sicily & Sardinia try something from Faro as it tends to have the highest scores.