# PTDS EDA Project


PTDS EDA Project
By: Hilary Chan, Elif Ho

Topic: Wine Variety Analysis

Description: By using a wide range of descripitive data to analyze the white wine variety across the world.

data was retrieved from Kaggle.com (https://www.kaggle.com/zhenyulin/whitewinepricerating)

In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import pandas as pd
import numpy as np
sns.set(color_codes=True) #overide matplot libs ugly colours.
mpl.rcParams['figure.figsize'] = [13, 8] #default figure size , by pixels

In [3]:
#import data
wine_df = pd.read_csv("white-wine-price-rating.csv")
wine_df.head(5)


Unnamed: 0,FullName,Winery,WineName,Year,Region,RegionalVariety,VintageRating,VintageRatingCount,WineRating,WineRatingCount,VintagePrice,WinePrice,VintageRatingPriceRatio,WineRatingPriceRatio
0,Domaine Coche-Dury Meursault Les Rougeots 2001,Domaine Coche-Dury,Meursault Les Rougeots,2001,Burgundy,Côte de Beaune White,4.9,25,4.7,755,806.58,806.58,0.006075,0.005827
1,Joseph Drouhin Montrachet Grand Cru Marquis de...,Joseph Drouhin,Montrachet Grand Cru Marquis de Laguiche,2015,Burgundy,Côte de Beaune White,4.8,46,4.6,1191,680.0,680.0,0.007059,0.006765
2,Marcassin Marcassin Vineyard Chardonnay 2013,Marcassin,Marcassin Vineyard Chardonnay,2013,Californian,Chardonnay,4.8,28,4.6,884,448.0,448.0,0.010714,0.010268
3,M. Chapoutier Ermitage Le Méal Blanc 2006,M. Chapoutier,Ermitage Le Méal Blanc,2006,Northern Rhône,White,4.8,31,4.5,414,164.675,164.675,0.029148,0.027327
4,Domaine Coche-Dury Corton-Charlemagne Grand Cr...,Domaine Coche-Dury,Corton-Charlemagne Grand Cru,2007,Burgundy,Côte de Beaune White,4.8,35,4.7,454,3478.36,3478.36,0.00138,0.001351


# Data Fields Introduction (extract from Kaggle.com)
FullName  : Winey + Wine Name + Year              
Winery    : Name of the Winery              
WineName  : Name of the Wine              
Year      : Year which the wine is produced             
Region    : Region where the wine is produced               
RegionalVariety  : Sub-area inside the region       
VintageRating    : average rating of this vintage        
VintageRatingCount  :how many people have rated the vintage    
WineRating          : average rating of all vintages    
WineRatingCount     : how many people have rated the wine       
VintagePrice        : GBP price/750ml (a normal bottle = 750ml)     
WinePrice           : GBP price/750ml    
VintageRatingPriceRatio  : Vintage Rating / VintagePrice
WineRatingPriceRatio     : Wine Rating / WinePrice


In [None]:
wine_df.info()

In [None]:
wine_df.columns

In [None]:
wine_df.shape #4594 rows, 14 columns

 # Data Cleansing

First, we will perform data cleansing on the imported datasets. 

Check how many missing value in each column

In [None]:
sns.heatmap(wine_df.isnull())

In [4]:
#List that stores the column name
col_list=list(wine_df.columns)

#Create a list that stores the number of missing values in each column
def check_missing(col):
    if wine_df[col].isna().sum() >0:
         return int(wine_df[col].isna().sum()) #count how many missing values
value_list = list(map(check_missing, col_list))
print(col_list)
print('\n')
print(value_list)
print('\n')

miss_val_dict = {}
for i in range(len(value_list)):
    miss_val_dict[col_list[i]]=value_list[i]
print('The number of missing values in each column is:\n {}'.format(miss_val_dict))       


['FullName', 'Winery', 'WineName', 'Year', 'Region', 'RegionalVariety', 'VintageRating', 'VintageRatingCount', 'WineRating', 'WineRatingCount', 'VintagePrice', 'WinePrice', 'VintageRatingPriceRatio', 'WineRatingPriceRatio']


[None, None, None, None, 377, 377, None, None, None, None, None, None, None, None]


The number of missing values in each column is:
 {'FullName': None, 'Winery': None, 'WineName': None, 'Year': None, 'Region': 377, 'RegionalVariety': 377, 'VintageRating': None, 'VintageRatingCount': None, 'WineRating': None, 'WineRatingCount': None, 'VintagePrice': None, 'WinePrice': None, 'VintageRatingPriceRatio': None, 'WineRatingPriceRatio': None}


"Region" and "RegionalVariety" contain missing values in the dataframe.

# Handle Missing Values

In [5]:
wine_df_fmt= wine_df.copy()

#Filling missing values with something else

wine_df_fmt.fillna({'Region':'Others','RegionalVariety':'Others'},inplace = True)

wine_df_fmt.isna().any()

FullName                   False
Winery                     False
WineName                   False
Year                       False
Region                     False
RegionalVariety            False
VintageRating              False
VintageRatingCount         False
WineRating                 False
WineRatingCount            False
VintagePrice               False
WinePrice                  False
VintageRatingPriceRatio    False
WineRatingPriceRatio       False
dtype: bool

In [None]:
wine_df_fmt.describe()

In [None]:
wine_df_fmt['Region'].value_counts()

# Adding / Create new data fields 
1. Price in HKD: The price is shown in GBP, adding an extra price field in HKD for reference
2. Sweetness: Define the sweetness of each variety of white wine (Light and Sweet, Light and Zesty , Herbaceous, Bold and Dry , Bold and Sweet) 
3. Country: Base on the Region column, categorize them by Country 

In [None]:
wine_df_fmt['RegionalVariety'].value_counts()

In [6]:
variety = list(set(list(wine_df_fmt['RegionalVariety'])))
variety

taste = ['Dry','Bold and Dry','Light and Sweet ','Unclassified','Light and Sweet ','Herbaceous','Bold and Dry','Light and Zesty'\
,'Unclassified','Light and Sweet ','Light and Sweet ','Dry and Zesty','Herbaceous','Dry and Zesty','Dry','Bold and Dry','Bold and Sweet'\
,'Light and Zesty','Light and Zesty','Unclassified','Light and Zesty','Bold and Dry','Unclassified','Herbaceous','Light and Sweet ','Herbaceous','Light and Sweet ','Light and Zesty','Bold and Dry','Light and Zesty','Light and Zesty','Light and Zesty'
]
taste

sweetness_dict = {}
for i in range(len(variety)):
    sweetness_dict[variety[i]]=taste[i]
sweetness_dict

print('Chardonnay is {}.'.format(sweetness_dict['Chardonnay']))
print('Riesling is {}.'.format(sweetness_dict['Riesling']))

print(sweetness_dict.keys())

#map back the Sweetness value from the dictionary for each row in the dataframe
wine_df_fmt['Sweetness']=wine_df_fmt['RegionalVariety'].map(sweetness_dict)

#exchange rate 1GBP = 10.60 HKD (on 2021/10/05)
gbp_xrate = 10.6

#Create the Price(in HKD) column
wine_df_fmt['Price(HKD)'] = wine_df_fmt['WinePrice'] * gbp_xrate
wine_df_fmt['VintagePrice(HKD)'] = wine_df_fmt['VintagePrice'] * gbp_xrate



#Create the Country Column 
region = list(set(list(wine_df_fmt['Region'])))
region

country = ['Canada','Germany','Australia','Italy','France','Chile','France','France','South Africa','France','Italy','Spain'\
,'Portugal','Greece','France','Italy','France','USA','France','Portugal','USA','USA','France','New Zealand','Portugal','Others'\
           ,'Greece','Austria','Italy','Argentina','Chile','France','France']
country

country_dict = {}
for i in range(len(region)):
    country_dict[region[i]]=country[i]
country_dict

#map back the Country value from the dictionary for each row in the dataframe
wine_df_fmt['Country']=wine_df_fmt['Region'].map(country_dict)
wine_df_fmt.head(10)


Chardonnay is Light and Zesty.
Riesling is Dry and Zesty.
dict_keys(['Saint-Péray', 'Sauvignon Blanc', 'Côte Chalonnaise White', 'Vinho Verde White', 'Rioja White', 'Pinot Blanc', 'Côte de Beaune White', 'Viognier', 'White', 'Vin Jaune', 'Torrontes', 'Moscatel', 'Albariño', 'Riesling', 'Pinot Gris', 'Müller Thurgau', 'Chenin Blanc', 'Condrieu', 'Chablis', 'Grüner Veltliner', 'Malagouzia', 'Verdejo', 'Pinot Grigio', 'White Blend', 'Muscadet', 'Grauburgunder', 'Others', 'Soave', 'Macônnais White', 'Gewürztraminer', 'Chardonnay', 'Gavi'])


Unnamed: 0,FullName,Winery,WineName,Year,Region,RegionalVariety,VintageRating,VintageRatingCount,WineRating,WineRatingCount,VintagePrice,WinePrice,VintageRatingPriceRatio,WineRatingPriceRatio,Sweetness,Price(HKD),VintagePrice(HKD),Country
0,Domaine Coche-Dury Meursault Les Rougeots 2001,Domaine Coche-Dury,Meursault Les Rougeots,2001,Burgundy,Côte de Beaune White,4.9,25,4.7,755,806.58,806.58,0.006075,0.005827,Bold and Dry,8549.748,8549.748,France
1,Joseph Drouhin Montrachet Grand Cru Marquis de...,Joseph Drouhin,Montrachet Grand Cru Marquis de Laguiche,2015,Burgundy,Côte de Beaune White,4.8,46,4.6,1191,680.0,680.0,0.007059,0.006765,Bold and Dry,7208.0,7208.0,France
2,Marcassin Marcassin Vineyard Chardonnay 2013,Marcassin,Marcassin Vineyard Chardonnay,2013,Californian,Chardonnay,4.8,28,4.6,884,448.0,448.0,0.010714,0.010268,Light and Zesty,4748.8,4748.8,France
3,M. Chapoutier Ermitage Le Méal Blanc 2006,M. Chapoutier,Ermitage Le Méal Blanc,2006,Northern Rhône,White,4.8,31,4.5,414,164.675,164.675,0.029148,0.027327,Unclassified,1745.555,1745.555,Portugal
4,Domaine Coche-Dury Corton-Charlemagne Grand Cr...,Domaine Coche-Dury,Corton-Charlemagne Grand Cru,2007,Burgundy,Côte de Beaune White,4.8,35,4.7,454,3478.36,3478.36,0.00138,0.001351,Bold and Dry,36870.616,36870.616,France
5,Domaine Coche-Dury Corton-Charlemagne Grand Cr...,Domaine Coche-Dury,Corton-Charlemagne Grand Cru,2009,Burgundy,Côte de Beaune White,4.8,35,4.7,454,4022.8,4022.8,0.001193,0.001168,Bold and Dry,42641.68,42641.68,France
6,Keller G-Max Riesling 2009,Keller,G-Max Riesling,2009,German,Riesling,4.8,37,4.7,209,2420.0,2420.0,0.001983,0.001942,Dry and Zesty,25652.0,25652.0,Others
7,Château Haut-Brion Pessac-Léognan Blanc (Grand...,Château Haut-Brion,Pessac-Léognan Blanc (Grand Cru Classé de Graves),2005,Bordeaux,White,4.8,39,4.5,820,864.0,864.0,0.005556,0.005208,Unclassified,9158.4,9158.4,Italy
8,Domaine de La Romanée-Conti Montrachet Grand C...,Domaine de La Romanée-Conti,Montrachet Grand Cru,2010,Burgundy,Côte de Beaune White,4.8,43,4.7,1348,7249.11,7249.11,0.000662,0.000648,Bold and Dry,76840.566,76840.566,France
9,Domaine de La Romanée-Conti Montrachet Grand C...,Domaine de La Romanée-Conti,Montrachet Grand Cru,2014,Burgundy,Côte de Beaune White,4.8,43,4.7,1348,5419.19,5419.19,0.000886,0.000867,Bold and Dry,57443.414,57443.414,France


In [None]:
wine_df_fmt['Price(HKD)'].describe()

# General Analysis


In [None]:
wine_df_fmt.groupby('RegionalVariety')['Price(HKD)'].agg(['min','max','mean','median'])

There is a bottle of Côte de Beaune White that costs more than HKD$97308.000

In [None]:
#The average price for each wine type , in HKD
wine_df_fmt.groupby('RegionalVariety')['Price(HKD)'].mean().sort_values(ascending = False).plot(kind='bar', title="Mean Wine Price by each Wine Variety") 
plt.xlabel('Regional Variety')
plt.ylabel('Wine Price')
plt.show()

Conclusion :Among all wine variety, Côte de Beaune White wine cost the most on avergae.

In [None]:
#The average Wine Rating for each wine type , in HKD
wine_df_fmt.groupby('RegionalVariety')['WineRating'].mean().sort_values(ascending = False)

In [None]:
fig,ax = plt.subplots()
wine_df_fmt.groupby('RegionalVariety')['WineRating'].mean().plot(kind='bar', title="Mean Wine Rating by each Wine Variety") 
ax.set_ylim([3.5, 4.5]) # set y axis limits
plt.xlabel('Regional Variety')
plt.ylabel('Wine Rating')
plt.show()

Conclusion :White Blend has the best Wine rating on average among all wine variety.

First, we would like to see in the data, for each Sweetness type, how many different wine bottles had been rated.

In [None]:
#Sweetness
colors_list = ['#cee588', '#e3f0bb','#f3ffcc', '#f1f7dd','#f5ffd6', '#f9ffe5']
wine_df_fmt.Sweetness.value_counts().plot(kind='bar', figsize=(9,6),color=colors_list)  #remove the "Unclassified" for clearer picture
plt.title('Sweetness Distribution')
plt.xlabel('Sweetness')
plt.ylabel('Count of bottles')
plt.show()
print(wine_df_fmt.Sweetness.value_counts())

Wines that has a bold & dry taste were being rated the most, following by wine that tasted Herbaceous and Dry. If you are a white wine enthusiast who prefer bold & dry or herbaceous taste , this should able to provide some good recommendation for you. 

In [None]:
wine_df_fmt.describe()

# Country + Winery Analysis

    1. Which country has the most variety of wine
    2. Which country has most number of winery 
    3. Which country,winery has the highest/lowest wine rating



In [None]:
wine_df_fmt['Country'].value_counts()

In [None]:
ctry_variety=wine_df_fmt.loc[:,['Country','RegionalVariety']].sort_values(by=['Country','RegionalVariety']).drop_duplicates()
ctry_variety.reset_index(inplace=True, drop=True)
ctry_variety.head(10)

In [None]:
#To show the number of wine variety types of each Country.
ctry_variety.groupby('Country')['RegionalVariety'].count().sort_values(ascending=False)

In [None]:
#Chart: Bar Chart
ctry_variety.groupby('Country')['RegionalVariety'].count().sort_values(ascending=False).plot(kind='bar',figsize = (9,6))
plt.title('Number of Wine Variety for each Country')
plt.xlabel('Country')
plt.ylabel('Count')
plt.show()

France has the most wine variety types among all countries, following by Italy and USA.

In [None]:
wine_df_fmt['WineRating'].describe()

In [None]:
wine_df_fmt.groupby(['Country','Winery'])['WineRating'].agg(['min','max','mean','median'])

In [None]:
wine_df_fmt.groupby(['Country','Winery'])['WineRating'].min().sort_values(ascending=True)

Paradies Winery located in Germany has the lowest Wine Rating , while Domaine de La Romanée-Conti  Winery located in France has the highest Wine Rating

In [None]:
wine_df_fmt.groupby('Country')['WineRatingCount'].sum().sort_values(ascending=False).head(10)


French Wine has been rated the most, following by Italian and USA wine.

# Wine / Vintage Rating Analysis


In [None]:
set(wine_df_fmt['Year'])

In [None]:
mpl.rcParams['figure.figsize'] = [13, 8]
wine_bp=sns.boxplot(data = wine_df_fmt,x="Year",y="VintagePrice(HKD)", showfliers = False) #showfliers = False: remove outliers
wine_bp.set_xticklabels(wine_bp.get_xticklabels(),rotation=30)

In [None]:
wine_df_fmt.loc[:,['WineName','Year']].sort_values(by=['WineName','Year'],ascending=True)

In [None]:
#The top 10 Vintage (=Year) 
wine_df_fmt.groupby(['Year']).VintageRating.mean().sort_values(ascending=False).head(10)

In [None]:
wine_df_fmt.groupby(['WineName','Year','VintagePrice(HKD)']).VintageRating.mean().sort_values(ascending=False).head(10)

Although by looking at the top 10 Vintage result, Year 1986 had the highest Vintage Rating in general. However, by looking at each Wine bottle individually, Wine Unendlich Riesling 2017 and Meursault Les Rougeots  2001 had the highest Vintage Rating.

In [None]:
wine_df_fmt.groupby(['VintageRating'])['VintagePrice(HKD)'].mean().sort_values(ascending=False).head(10)

In [None]:
wine_df_fmt[wine_df_fmt['VintageRating']==4.9]['VintagePrice(HKD)'].min()

In [None]:
wine_df_fmt[wine_df_fmt['VintageRating']==4.9]['VintagePrice(HKD)'].max()

Conclusion: By looking at the mean of each Vintage Rating scale, we concluded that there is no positive relationship between price and Vintage Rating. The higher the price, does not mean it has higher Vintage rating.

In [None]:
wine_df_fmt.loc[:,['WineName','Year','Price(HKD)']].sort_values(by='Price(HKD)',ascending=False).head(10)

Montrachet Grand Cru Wine occupied the Top 10 most expensive wine among the 5k wine selection. 

In [None]:
wine_df_fmt.loc[:,['WineName','Year','Price(HKD)']].sort_values(by='Price(HKD)',ascending=False).tail(10)

On the other hand, the above table showed the cheapest 10 bottles. 

# White Wine Selection Tool
This tool will provide some recommendation base on their preference (Price, Country, Sweetness) 
User will be asked to answer several questions, then return the recommended wine name

In [13]:
Exit = False 
sweetness = ['Bold and Dry' , 'Light and Zesty', 'Light and Sweet', 'Herbaceous', 'Bold and Sweet', 'Dry and Zesty', 'Dry']
ctry_list = list(wine_df_fmt['Country'].unique())

def q1():
    print('What is your budget? Please enter range from HKD $100 to HKD$100,000\t')
    minimum = input('Minimum:')
    maximum =input('Maximum:')
    
    if minimum.isdigit() == False:
        print('Wrong Input\t')
        print('Please enter range from HKD $100 to HKD$100,000:\t')
        return q1()
    if int(minimum) < 100 or int(minimum) > 100000:
        print("Out of range.\t")
        print('Please enter range from HKD $100 to HKD$100,000:\t')
        return q1()
    if maximum.isdigit() == False:
        print('Wrong Input\t')
        print('Please enter range from HKD $100 to HKD$100,000:\t')
        return q1()
    if int(maximum) < 100 or int(maximum) > 100000:
        print("Out of range.\t")
        print('Please enter range from HKD $100 to HKD$100,000:\t')
        return q1()
    if int(minimum) > int(maximum):
        print("Your minimum price should not be greater than your maximum price.\t")
        print('Please enter range from HKD $100 to HKD$100,000:\t')
        return q1()
    else:
        return minimum, maximum
#q1();

def q2():
    question2 = input('How sweet do you prefer?\n Choose from Bold and Dry/ Light and Zesty/ Light and Sweet/ Herbaceous/ Bold and Sweet/ Dry and Zesty/ Dry:\t')
    if question2.replace(" ",'').isalpha == False: #remove the whitespace and check if all are alphabets
        print('Wrong Input! Please re-enter.\t')
        return q2()
    else:
        return question2
#q2();

def q3():
    print('\nThe Country list is {}\n'.format(ctry_list))
    print("If you do not have any country preference, Please type 'None'")
    question3 = input("Which country's white wine do you prefer? (Do not type the ''):\t")
    if question3.replace(" ",'').isalpha == False:
        print('Wrong Input! Please re-enter.\t')
        return q3()
    else:
        return question3

#q3();

user_input={}
print('Welcome to the White Wine Selection Tool\n')
while not Exit:
    print('Please answer the following questions to get the best white wine recommendation\n')
    user_min,user_max = q1()
    user_input['Amount_Min'] = user_min
    user_input['Amount_Max'] =user_max
    
    sweetness = q2()
    user_input['Sweetness']= sweetness
    
    country = q3()
    user_input['Country']= country
    break;

print('\nYour input is:\n Minimum Amount:{} \n Maximum Amount: {} \n Country:{}\n'.format(user_input['Amount_Min'],user_input['Amount_Max'],user_input['Country']))

#Extract the records base on the user input , return suggestions
#No Country Preference or didn't enter the correct Country name
selection1 =   (wine_df_fmt['Price(HKD)'] >= int(user_input['Amount_Min'])) \
            & (wine_df_fmt['Price(HKD)'] <= int(user_input['Amount_Max'])) \
            & (wine_df_fmt['Sweetness'] == user_input['Sweetness']) \

#Have Country Preference
selection2 = (wine_df_fmt['Price(HKD)'] >= int(user_input['Amount_Min'])) \
            & (wine_df_fmt['Price(HKD)'] <= int(user_input['Amount_Max'])) \
            & (wine_df_fmt['Sweetness'] == user_input['Sweetness']) \
            & (wine_df_fmt['Country'] == user_input['Country']) 

suggestion_table1= wine_df_fmt[selection1]
suggestion_table2= wine_df_fmt[selection2]

suggestion_table_fnl1= suggestion_table1.loc[:,['FullName','Year','Country','Winery','RegionalVariety','Sweetness','Price(HKD)','WineRating']].copy().sort_values(by='WineRating', ascending = False).head(10)
suggestion_table_fnl2= suggestion_table2.loc[:,['FullName','Year','Country','Winery','RegionalVariety','Sweetness','Price(HKD)','WineRating']].copy().sort_values(by='WineRating', ascending = False).head(10)

if user_input['Country'] not in ctry_list:
    if suggestion_table_fnl1.empty:
        print("Sorry, there aren't any suggestions that fit your preference")
    else:
        print('Here are your suggestions. (We will show the top 10 rated)\n')
        display(suggestion_table_fnl1)

else:
    if suggestion_table_fnl2.empty:
        print("Sorry, there aren't any suggestions that fit your preference")
    else:
        print('Here are your suggestions. (We will show the top 10 rated)\n')
        display(suggestion_table_fnl2)

print('Thank You for using the tool. See you next time!')

Welcome to the White Wine Selection Tool

Please answer the following questions to get the best white wine recommendation

What is your budget? Please enter range from HKD $100 to HKD$100,000	
Minimum:500
Maximum:5000
How sweet do you prefer?
 Choose from Bold and Dry/ Light and Zesty/ Light and Sweet/ Herbaceous/ Bold and Sweet/ Dry and Zesty/ Dry:	Bold and Dry

The Country list is ['France', 'Portugal', 'Others', 'Italy', 'USA', 'New Zealand', 'Chile', 'Argentina', 'Canada', 'Spain', 'Greece', 'Australia', 'Austria', 'Germany', 'South Africa']

If you do not have any country preference, Please type 'None'
Which country's white wine do you prefer? (Do not type the ''):	None

Your input is:
 Minimum Amount:500 
 Maximum Amount: 5000 
 Country:None

Here are your suggestions. (We will show the top 10 rated)



Unnamed: 0,FullName,Year,Country,Winery,RegionalVariety,Sweetness,Price(HKD),WineRating
477,Domaine des Comtes Lafon Meursault-Porusots Pr...,2011,France,Domaine des Comtes Lafon,Côte de Beaune White,Bold and Dry,2030.536,4.6
42,Olivier Leflaive Montrachet Grand Cru 2011,2011,France,Olivier Leflaive,Côte de Beaune White,Bold and Dry,4222.298,4.6
109,Domaine Leflaive Chevalier-Montrachet Grand Cr...,2003,France,Domaine Leflaive,Côte de Beaune White,Bold and Dry,4853.74,4.6
73,Domaine Leflaive Chevalier-Montrachet Grand Cr...,2009,France,Domaine Leflaive,Côte de Beaune White,Bold and Dry,4876.0,4.6
98,Domaine Coche-Dury Puligny-Montrachet Les Ense...,2009,France,Domaine Coche-Dury,Côte de Beaune White,Bold and Dry,4746.892,4.6
48,Joseph Drouhin Montrachet Grand Cru Marquis de...,1998,France,Joseph Drouhin,Côte de Beaune White,Bold and Dry,3710.0,4.6
190,Joseph Drouhin Montrachet Grand Cru Marquis de...,2003,France,Joseph Drouhin,Côte de Beaune White,Bold and Dry,3295.222,4.6
145,Domaine Leflaive Chevalier-Montrachet Grand Cr...,2008,France,Domaine Leflaive,Côte de Beaune White,Bold and Dry,4770.0,4.6
178,Domaine Coche-Dury Meursault Blanc 2016,2016,France,Domaine Coche-Dury,Côte de Beaune White,Bold and Dry,3909.704,4.5
179,Domaine Leflaive Bienvenues-Bâtard-Montrachet ...,2009,France,Domaine Leflaive,Côte de Beaune White,Bold and Dry,3879.6,4.5


Thank You for using the tool. See you next time!


# Summary of the Project
In overall, the dataset is relatively clean, except that there were some columns that needed to be filled. Before we started analysing the data, we listed out from which perspectives we can work on, base on our knowledge on white wine. Normally people tended to look at the year, country , grape type, sweetness etc when choosing a bottle of white wine. Hence, we tend to do analysis on the white wine data base on these. 

Throughout the entire process of analysing the white wine data, for sure we gained knowledge about white wine. This leads to our decision to build this mini 'White wine selection tool'. Although the white wine database only contained around 5000 wine information, which might be the downside. However, we were quite satisfied in general. 

Moreover, while analysing the data, we found out there would be outliers inside the data which will affect your analysis. We need to decide whether we should remove these outliers or not for the accuracy of our analysis. This was another challenge. 

In summary, we learnt a lot while working on this EDA project, especially trying to apply as much Python codings as we can. This definitely helped a lot. 