# Milestone 1

In [1]:
# Import packages that are used
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Project idea

We want to create an interactive visualization with a map of France and its regions. When hovering over/clicking on a region, one can for example see the wine types grown in that region, their average ratings and typical words used to describe them. 

### Data Cleaning

In [12]:
df = pd.read_csv('Data/winemag-data-130k-v2.csv', sep = ',')
france = df[df.country.str.contains('France', na=False)]
france.head(3)

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
7,7,France,This dry and restrained wine offers spice in p...,,87,24.0,Alsace,Alsace,,Roger Voss,@vossroger,Trimbach 2012 Gewurztraminer (Alsace),Gewürztraminer,Trimbach
9,9,France,This has great depth of flavor with its fresh ...,Les Natures,87,27.0,Alsace,Alsace,,Roger Voss,@vossroger,Jean-Baptiste Adam 2012 Les Natures Pinot Gris...,Pinot Gris,Jean-Baptiste Adam
11,11,France,"This is a dry wine, very spicy, with a tight, ...",,87,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Leon Beyer 2012 Gewurztraminer (Alsace),Gewürztraminer,Leon Beyer


In [13]:
france.isnull().sum()

Unnamed: 0                   0
country                      0
description                  0
designation               7563
points                       0
price                     4317
province                     0
region_1                    76
region_2                 22093
taster_name                265
taster_twitter_handle      265
title                        0
variety                      0
winery                       0
dtype: int64

We are not interested in the columns: 'Unnamed: 0' (it has no meaning), country (we know which country it is), region_2 (it has no values), taster_name & taster_twitter_handle (it's not important who wrote the review).

There are also a lot of missing values for designation and price, but for now we will keep these if they turn out to be of interest later on.

In [68]:
wines = france.drop(['Unnamed: 0', 'country', 'region_2', 'taster_name','taster_twitter_handle'], axis=1)\
    .rename(columns={'region_1':'region'})
wines.head(3)

Unnamed: 0,description,designation,points,price,province,region,title,variety,winery
7,This dry and restrained wine offers spice in p...,,87,24.0,Alsace,Alsace,Trimbach 2012 Gewurztraminer (Alsace),Gewürztraminer,Trimbach
9,This has great depth of flavor with its fresh ...,Les Natures,87,27.0,Alsace,Alsace,Jean-Baptiste Adam 2012 Les Natures Pinot Gris...,Pinot Gris,Jean-Baptiste Adam
11,"This is a dry wine, very spicy, with a tight, ...",,87,30.0,Alsace,Alsace,Leon Beyer 2012 Gewurztraminer (Alsace),Gewürztraminer,Leon Beyer


### Dataset description

The data was scraped from WineEnthusiast on November 22nd, 2017.

| Column name | Description |
| --- | --- |
| description | A few sentences from a sommelier describing the wine's taste, smell, look, feel, etc. | 
| designation | The vineyard within the winery where the grapes that made the wine are from. | 
| points | The number of points WineEnthusiast rated the wine on a scale of 1-100 (though they say they only post reviews for wines that score >=80). | 
| price | The cost for a bottle of the wine. (Currency?) | 
| province | The province or state that the wine is from. |
| region_1 | The wine growing area in a province or state. | 
| title | The title of the wine. | 
| variety | The type of grapes used to make the wine. |
| winery | The winery that made the wine |

### Data Exploration

In [65]:
grouped = wines.groupby(['province', 'region', 'variety'])\
    .agg({'price':'mean', 'points':'mean', 'title':'count'}).reset_index()
grouped

Unnamed: 0,province,region,variety,price,points,title
0,Alsace,Alsace,Alsace white blend,29.571429,90.235294,51
1,Alsace,Alsace,Auxerrois,24.142857,88.454545,11
2,Alsace,Alsace,Champagne Blend,16.500000,84.000000,2
3,Alsace,Alsace,Chardonnay,23.200000,90.600000,5
4,Alsace,Alsace,Chasselas,22.000000,89.500000,4
...,...,...,...,...,...,...
1032,Southwest France,Vin de Pays du Comté Tolosan,Sauvignon Blanc,10.000000,85.000000,1
1033,Southwest France,Vin de Pays du Comté Tolosan,Syrah,18.000000,89.000000,1
1034,Southwest France,Vin de Pays du Lot,Malbec,12.000000,84.500000,4
1035,Southwest France,Vin de Pays du Lot,Rosé,12.000000,84.000000,1


In [66]:
provinces = wines.groupby(['province', 'region']).agg({'variety':'unique'}).reset_index()
provinces

Unnamed: 0,province,region,variety
0,Alsace,Alsace,"[Gewürztraminer, Pinot Gris, Riesling, White B..."
1,Alsace,Crémant d'Alsace,"[Sparkling Blend, Pinot Noir, Pinot Gris, Pino..."
2,Beaujolais,Beaujolais,[Gamay]
3,Beaujolais,Beaujolais Blanc,[Chardonnay]
4,Beaujolais,Beaujolais Rosé,"[Rosé, Gamay]"
...,...,...,...
381,Southwest France,Saint-Mont,"[Red Blend, White Blend, Tannat]"
382,Southwest France,Saussignac,[Bordeaux-style White Blend]
383,Southwest France,Vin de Pays des Côtes de Gascogne,"[Ugni Blanc-Colombard, Bordeaux-style Red Blen..."
384,Southwest France,Vin de Pays du Comté Tolosan,"[Chenin Blanc, Malbec, Chardonnay-Viognier, Sa..."
