## Summary:
I have collected data from Vivino.com using their REST API. I have logged my network traffic using the Google Chrome's dev tools and obtained a JSON response with information for different wines. Then, I filtered and selected the required wine information that will be useful for the project. The selected variables are as follows: winery, wine_year, wine_rating, wine_price, wine_region, wine_country, grape_information, wine_acidity, wine_intensity, wine_sweetness, wine_tannin wine_description. Those features will be used in the EDA and during building Recommendation System. It should be noted that website response does not contain all features and for some wines all variables will be given, while for others it will be empty. 



In [1]:
# Importing Required Libraries
import requests
import csv
import pandas as pd
import numpy as np

In [11]:
'''
Function below has two entries (but it can be increased if you want to change API parameters at your convenience):

1) wine_type_id is a string type that represents type of wine available on Vivino: '1' is for Red Wine, '2' for White Wine, '3' for 
Sparkling Wine, '4' for Rose Wine, '7' for Dessert Wine and '24' for Fortified Wine.

2) price_increment is of an int type: It allows to change price increment of a API request after wine data has been collected
in for loop.

'''

def my_vivino_scraper(wine_type_id, price_increment):
    # Getting Request URL from the Website
    url = 'https://www.vivino.com/api/explore/explore?currency_code=USD&grape_filter=varietal&min_rating=1&order_by=price&order=asc&price_range_max=500&price_range_min=4.99'


    # Record the required Parameters, they might vary depending on your preferences
    params = {
                "country_code": "",
                "currency_code": "USD",
                "grape_filter": "varietal",
                "min_rating": "1", # Since Wine Ratings range from 1 to 5 I have set a minimum rating of 1 to get wines with wide range of ratings
                "order_by": "price", # Wines were sorted by Price
                "order": "asc", # Cheaper Wines appeared first
                "page": 1,   # Represents page: 1 page contains 25 wines 
                "price_range_max": "50000", # Maximum Wine Price Selected
                "price_range_min": 4.99, # Minimum Wine Price Selected
                "wine_type_ids[]": wine_type_id # Wines in Vivino has 6 different types 
    }

    # The User Agent was taken from Vivino Website response in the Google Chrome Dev Tools. 
    headers = {
        "user-agent": "Mozilla/5.0"
    }

    # Requests access to Vivino website using parameters described above
    response = requests.get(url, params=params, headers=headers)
    response.raise_for_status()

    # Select section that has useful wine informations from the JSON output
    records = response.json()["explore_vintage"]["records"]

    # Create List for parameters that I want to obtain
    wine_name, winery, wine_description, wine_price, wine_year, wine_rating, wine_region, wine_country, grape_information, wine_acidity, wine_intensity, wine_sweetness, wine_tannin =  [], [], [], [], [], [], [], [], [], [], [], [], []


    '''
    The chunk of code below contains three for loops: The inner for loop appends defined list-parameters with appropriate data obtained 
    from the JSON output. It continues until loop reaches length of current JSON output (25 entries per page).
    Since one page only shows 25 wine entries, next for loop increments page of the website allowing to request next 25 wines.
    Final for loop iterates the minimum price of a wine to ensure that we will access wines with wide range of prices.
    '''
    
    for k in range (250):

        for i in range (250):

            for j in range(len(records)):
                # Append The variables below with JSON output
                wine_name.append(records[j]['vintage']['wine']['name'])
                wine_price.append(records[j]['price']['amount'])
                wine_year.append(records[j]['vintage']['year'])
                wine_rating.append(records[j]['vintage']['statistics']['ratings_average'])
                winery.append(records[j]['vintage']['wine']['winery']['name'])

                # Variables below require try and except approach since not all wines have information available. Therefore, to avoid error of appending nothing try and except is used 
                try:

                    grape_information.append(records[j]['vintage']['wine']['style']['grapes'][0]['name'])

                except:

                    grape_information.append(np.nan)

                try:

                    wine_acidity.append(records[j]['vintage']['wine']['taste']['structure']['acidity'])

                except:

                    wine_acidity.append(np.nan)

                try:

                    wine_intensity.append(records[j]['vintage']['wine']['taste']['structure']['intensity'])

                except:

                    wine_intensity.append(np.nan)

                try:

                    wine_sweetness.append(records[j]['vintage']['wine']['taste']['structure']['sweetness'])

                except:

                    wine_sweetness.append(np.nan)

                try:

                    wine_tannin.append(records[j]['vintage']['wine']['taste']['structure']['tannin'])

                except:

                    wine_tannin.append(np.nan)

                try:

                    wine_region.append(records[j]['vintage'] ['wine']['region']['name'])

                except:

                    wine_region.append(np.nan)

                try:

                    wine_country.append(records[j]['vintage']['wine']['region']['country']['name'])

                except:

                    wine_country.append(np.nan)

                try:
                    wine_description.append(records[j]['vintage']['wine']['style']['description'])

                except:

                    wine_description.append(np.nan)

            params['page'] += 1 
            response = requests.get (url, params = params, headers = headers)
            records = response.json()['explore_vintage']['records']

        params['price_range_min'] += price_increment
    
    return (wine_name, winery, wine_description, wine_price, wine_year, wine_rating, wine_region, wine_country, grape_information, wine_acidity, wine_intensity, wine_sweetness, wine_tannin)


 

# Red Wine 

In [8]:
# Get the wine data we are interested in and call our function (Red Wine - '1', Price Increment Per loop - 10)
wine_name, winery, wine_description, wine_price, wine_year, wine_rating, wine_region, wine_country,  grape_information, wine_acidity, wine_intensity, wine_sweetness, wine_tannin = my_vivino_scraper('1', 10)

# Store the variables in the dictionary format
my_dictionary = ({
    'wine name': wine_name,
    'winery': winery,
    'wine year': wine_year,
    'wine rating': wine_rating,
    'wine price': wine_price,
    'wine region': wine_region,
    'wine country': wine_country,
    'grape information': grape_information,
    'wine acidity': wine_acidity,
    'wine intensity': wine_intensity,
    'wine sweetness': wine_sweetness,
    'wine tannin': wine_tannin, 
    'wine description': wine_description
})

# Convert dictionary to Pandas DataFrame and delete all duplicates 
data = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in my_dictionary.items()]))
data.drop_duplicates(inplace= True)

# Write results to our dataframe
data.to_csv('Red_wine.csv')

Unnamed: 0,wine name,winery,wine year,wine rating,wine price,wine region,wine country,grape information,wine acidity,wine intensity,wine sweetness,wine tannin,wine description
0,Côtes du Rhône,Montalcour,2014,3.4,5.00,Côtes-du-Rhône,France,Shiraz/Syrah,3.482611,3.792743,1.415443,3.364203,"The Southern Rhône is situated in a large, spr..."
1,Cabernet Sauvignon,Gallo Family Vineyards,2018,3.6,5.59,California,United States,Cabernet Sauvignon,2.842703,4.697890,1.850506,3.247603,"Known as the king of red wine grapes, Cabernet..."
2,Merlot,Gallo Family Vineyards,2015,3.5,5.59,California,United States,Merlot,1.620626,3.832823,2.327318,1.985121,"California Merlots tend to more fruit-forward,..."
3,Cabernet - Merlot,Yellow Tail,2016,3.4,5.95,South Eastern Australia,Australia,Cabernet Sauvignon,3.898815,4.941685,1.548283,3.418870,Australian Bordeaux blends are known for power...
4,Bin 40 Merlot,Lindeman's,2016,3.4,5.95,South Eastern Australia,Australia,Merlot,1.750228,3.885053,1.628029,2.053112,Although many most closely associate Shiraz wi...
...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,Pinotage,Beaumont,2017,4.0,36.99,Walker Bay,South Africa,Pinotage,3.116149,3.827951,1.476855,3.343851,"An incredibly interesting wine, Pinotage is a ..."
196,Purisima Mountain Vineyard Syrah,Beckmen,2017,4.1,36.99,Ballard Canyon,United States,Shiraz/Syrah,3.056360,4.460651,1.619383,3.620568,Californian Syrah certainly isn't a wine for t...
197,Ribera del Duero,Finca Villacreces,2016,3.9,36.99,Ribera del Duero,Spain,Tempranillo,3.706280,3.961251,1.581326,3.569809,"Rioja may be the most famous region in Spain, ..."
198,Les Fiefs de Lagrange Saint-Julien,Château Lagrange,2014,3.7,36.99,Saint-Julien,France,Cabernet Sauvignon,4.221845,3.868328,1.594276,3.893925,"Saint-Julien offers balanced, age-worthy blend..."


# White Wine 

In [13]:
# Get the wine data we are interested in and call our function (White wine - '2', Price Increment Per loop - 10)
wine_name, winery, wine_description, wine_price, wine_year, wine_rating, wine_region, wine_country,  grape_information, wine_acidity, wine_intensity, wine_sweetness, wine_tannin = my_vivino_scraper('2', 10)

# Store the variables in the dictionary format
my_dictionary = ({
    'wine name': wine_name,
    'winery': winery,
    'wine year': wine_year,
    'wine rating': wine_rating,
    'wine price': wine_price,
    'wine region': wine_region,
    'wine country': wine_country,
    'grape information': grape_information,
    'wine acidity': wine_acidity,
    'wine intensity': wine_intensity,
    'wine sweetness': wine_sweetness,
    'wine tannin': wine_tannin, 
    'wine description': wine_description

})

# Convert dictionary to Pandas DataFrame and delete all duplicates 
data = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in my_dictionary.items()]))
data.drop_duplicates(inplace= True)
data
# Write results to our dataframe
data.to_csv('White_wine.csv')

Unnamed: 0,wine name,winery,wine year,wine rating,wine price,wine region,wine country,grape information,wine acidity,wine intensity,wine sweetness,wine tannin,wine description
0,Moscato,Gallo Family Vineyards,2016,3.9,5.59,California,United States,,,,,,
1,Sauvignon Blanc,Gallo Family Vineyards,2017,3.3,5.59,California,United States,Sauvignon Blanc,,,,,California is known primarily for its Cabernet...
2,Bin 85 Pinot Grigio,Lindeman's,2019,3.5,5.95,South Eastern Australia,Australia,Pinot Gris,2.736555,2.982939,1.724204,,
3,Bin 65 Chardonnay,Lindeman's,2017,3.2,5.95,South Eastern Australia,Australia,Chardonnay,3.205055,3.460357,1.938636,,The main styles are unoaked Chardonnays in the...
4,Astica Sauvignon Blanc,Trapiche,2015,3.2,5.99,Mendoza,Argentina,Sauvignon Blanc,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
620,Pouilly-Fuissé Le Clos Monopole,Château Fuissé,2016,4.4,79.97,Pouilly-Fuissé,France,Chardonnay,2.869677,3.153957,1.883385,,
621,Camp Meeting Ridge Vineyard Chardonnay,Flowers,2013,4.4,79.97,Sonoma Coast,United States,Chardonnay,3.341709,4.615300,2.762171,,"With over 100,000 acres in production, Chardon..."
622,Chardonnay (Hyde Vineyard),HDV,2012,4.3,79.97,Los Carneros,United States,Chardonnay,2.898492,4.633958,2.407105,,
623,Chardonnay Four,Liquid Farm,2013,4.3,79.99,Sta. Rita Hills,United States,Chardonnay,3.464207,4.424609,2.454313,,"With over 100,000 acres in production, Chardon..."


# Sparkling Wine

In [None]:
# Get the wine data we are interested in and call our function (Sparkling wine - '3', Price Increment Per loop - 10)
wine_name, winery, wine_description, wine_price, wine_year, wine_rating, wine_region, wine_country,  grape_information, wine_acidity, wine_intensity, wine_sweetness, wine_tannin = my_vivino_scraper('3', 10)

# Store the variables in the dictionary format
my_dictionary = ({
    'wine name': wine_name,
    'winery': winery,
    'wine year': wine_year,
    'wine rating': wine_rating,
    'wine price': wine_price,
    'wine region': wine_region,
    'wine country': wine_country,
    'grape information': grape_information,
    'wine acidity': wine_acidity,
    'wine intensity': wine_intensity,
    'wine sweetness': wine_sweetness,
    'wine tannin': wine_tannin, 
    'wine description': wine_description
})

# Convert dictionary to Pandas DataFrame and delete all duplicates 
data = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in my_dictionary.items()]))
data.drop_duplicates(inplace= True)

# Write results to our dataframe
data.to_csv('Sparkling_wine.csv')

# Rose Wine

In [None]:
# Get the wine data we are interested in and call our function (Rose Wine - '4', Price Increment Per loop - 15)
wine_name, winery, wine_description, wine_price, wine_year, wine_rating, wine_region, wine_country,  grape_information, wine_acidity, wine_intensity, wine_sweetness, wine_tannin = my_vivino_scraper('4', 15)

# Store the variables in the dictionary format
my_dictionary = ({
    'wine name': wine_name,
    'winery': winery,
    'wine year': wine_year,
    'wine rating': wine_rating,
    'wine price': wine_price,
    'wine region': wine_region,
    'wine country': wine_country,
    'grape information': grape_information,
    'wine acidity': wine_acidity,
    'wine intensity': wine_intensity,
    'wine sweetness': wine_sweetness,
    'wine tannin': wine_tannin, 
    'wine description': wine_description

})

# Convert dictionary to Pandas DataFrame and delete all duplicates 
data = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in my_dictionary.items()]))
data.drop_duplicates(inplace= True)
data
# Write results to our dataframe
data.to_csv('Rose_wine.csv')

# Dessert Wine

In [None]:
# Get the wine data we are interested in and call our function (Dessert Wine - '7', Price Increment Per loop - 15)
wine_name, winery, wine_description, wine_price, wine_year, wine_rating, wine_region, wine_country,  grape_information, wine_acidity, wine_intensity, wine_sweetness, wine_tannin = my_vivino_scraper('7', 15)

# Store the variables in the dictionary format
my_dictionary = ({
    'wine name': wine_name,
    'winery': winery,
    'wine year': wine_year,
    'wine rating': wine_rating,
    'wine price': wine_price,
    'wine region': wine_region,
    'wine country': wine_country,
    'grape information': grape_information,
    'wine acidity': wine_acidity,
    'wine intensity': wine_intensity,
    'wine sweetness': wine_sweetness,
    'wine tannin': wine_tannin, 
    'wine description': wine_description
})

# Convert dictionary to Pandas DataFrame and delete all duplicates 
data = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in my_dictionary.items()]))
data.drop_duplicates(inplace= True)
data
# Write results to our dataframe
data.to_csv('Dessert_wine.csv')

# Fortified Wine

In [1]:
# Get the wine data we are interested in and call our function (Fortified Wine - '24', Price Increment Per loop - 15)
wine_name, winery, wine_description, wine_price, wine_year, wine_rating, wine_region, wine_country,  grape_information, wine_acidity, wine_intensity, wine_sweetness, wine_tannin = my_vivino_scraper('24', 15)

# Store the variables in the dictionary format
my_dictionary = ({
    'wine name': wine_name,
    'winery': winery,
    'wine year': wine_year,
    'wine rating': wine_rating,
    'wine price': wine_price,
    'wine region': wine_region,
    'wine country': wine_country,
    'grape information': grape_information,
    'wine acidity': wine_acidity,
    'wine intensity': wine_intensity,
    'wine sweetness': wine_sweetness,
    'wine tannin': wine_tannin, 
    'wine description': wine_description

})

# Convert dictionary to Pandas DataFrame and delete all duplicates 
data = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in my_dictionary.items()]))
data.drop_duplicates(inplace= True)
data
# Write results to our dataframe
data.to_csv('Fortified_wine.csv')

# Combining All Wine Data into one DataFrame 

In [3]:
red_wine_df = pd.read_csv('Red_wine.csv',index_col = [0])

red_wine_df.drop_duplicates(inplace = True)
red_wine_df['wine type'] = 'Red'
red_wine_df = red_wine_df[['wine name', 'winery', 'wine year', 'wine rating', 'wine price', 'wine type', 
                           'wine region', 'wine country', 'grape information', 'wine acidity', 'wine intensity',
                          'wine sweetness', 'wine tannin',  'wine description']]
print(red_wine_df.info())
red_wine_df


<class 'pandas.core.frame.DataFrame'>
Int64Index: 10305 entries, 0 to 22749
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   wine name          10305 non-null  object 
 1   winery             10305 non-null  object 
 2   wine year          10305 non-null  object 
 3   wine rating        10305 non-null  float64
 4   wine price         10305 non-null  float64
 5   wine type          10305 non-null  object 
 6   wine region        10296 non-null  object 
 7   wine country       10296 non-null  object 
 8   grape information  9911 non-null   object 
 9   wine acidity       5576 non-null   float64
 10  wine intensity     9801 non-null   float64
 11  wine sweetness     9801 non-null   float64
 12  wine tannin        9801 non-null   float64
 13  wine description   6864 non-null   object 
dtypes: float64(6), object(8)
memory usage: 1.2+ MB
None


Unnamed: 0,wine name,winery,wine year,wine rating,wine price,wine type,wine region,wine country,grape information,wine acidity,wine intensity,wine sweetness,wine tannin,wine description
0,Cabernet Sauvignon,Carta Vieja,2019,3.4,4.99,Red,Loncomilla Valley,Chile,Cabernet Sauvignon,3.043747,3.781781,1.772746,3.164342,Cabernet Sauvignon is the most widely grown gr...
1,Merlot,Carta Vieja,2019,3.4,4.99,Red,Loncomilla Valley,Chile,Merlot,2.020424,3.482722,1.949938,2.366192,Merlot is a staple of the wine producing regio...
2,Cabernet Sauvignon,Three Wishes,N.V.,3.1,4.99,Red,California,United States,Cabernet Sauvignon,3.206160,4.545627,1.962006,3.569953,"Known as the king of red wine grapes, Cabernet..."
3,Cabernet Sauvignon,Crane Lake,2016,3.4,4.99,Red,California,United States,Cabernet Sauvignon,3.014199,4.738935,1.743180,3.540428,"Known as the king of red wine grapes, Cabernet..."
4,Pinot Noir,Crane Lake,2016,3.4,4.99,Red,California,United States,Pinot Noir,3.405433,2.832203,1.500866,2.147871,Pinot Noir has the well deserved reputation of...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22583,Estate Red,Arkenstone,2016,4.6,169.99,Red,Howell Mountain,United States,Cabernet Sauvignon,,4.848333,1.185455,4.116818,
22586,Cornas 'Reynard',Thierry Allemand,2012,4.5,375.00,Red,Cornas,France,Shiraz/Syrah,,4.542016,1.535436,3.488779,
22598,Beckstoffer Missouri Hopper Vineyard Cabernet ...,The Debate,2012,4.6,195.00,Red,Oakville,United States,Cabernet Sauvignon,,4.589830,1.992003,3.406146,
22599,Insignia,Joseph Phelps,2007,4.7,279.99,Red,Napa Valley,United States,Cabernet Sauvignon,,4.777280,1.732723,3.498302,


In [4]:
white_wine_df = pd.read_csv('White_wine.csv',index_col = [0] )

white_wine_df.drop_duplicates(inplace = True)

white_wine_df['wine type'] = 'White'
white_wine_df = white_wine_df[['wine name', 'winery', 'wine year', 'wine rating', 'wine price', 'wine type',  
                           'wine region', 'wine country', 'grape information', 'wine acidity', 'wine intensity',
                          'wine sweetness', 'wine tannin',  'wine description']]
# white_wine_df.info()
white_wine_df

Unnamed: 0,wine name,winery,wine year,wine rating,wine price,wine type,wine region,wine country,grape information,wine acidity,wine intensity,wine sweetness,wine tannin,wine description
0,Bin 85 Pinot Grigio,Lindeman's,2019,3.5,5.95,White,South Eastern Australia,Australia,Pinot Gris,2.737233,2.982922,1.724406,,
1,Bin 65 Chardonnay,Lindeman's,2017,3.2,5.95,White,South Eastern Australia,Australia,Chardonnay,3.204798,3.460279,1.938847,,The main styles are unoaked Chardonnays in the...
2,Sauvignon Blanc,Woodbridge by Robert Mondavi,2017,3.3,5.99,White,California,United States,Sauvignon Blanc,4.063097,2.953770,1.419617,,California is known primarily for its Cabernet...
3,Grillo,Tola,2018,3.6,5.99,White,Terre Siciliane,Italy,Malvasia,2.979659,3.463870,1.172446,,Italy is responsible for some of the finest wi...
4,Coastal Estates Chardonnay,Beaulieu Vineyard (BV),2017,3.5,6.95,White,California,United States,Chardonnay,2.986839,4.776081,2.910311,,"With over 100,000 acres in production, Chardon..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7719,Puligny-Montrachet 1er Cru 'Champ Canet',Etienne Sauzet,2013,4.3,175.00,White,Puligny-Montrachet 1er Cru 'Champ Canet',France,Chardonnay,3.285686,3.566332,1.772265,,
7720,Clos Sainte Hune Riesling Alsace,Trimbach,1995,4.5,395.00,White,Alsace,France,Riesling,4.441244,3.116488,2.384468,,"Seldom oaked, Alsatian Riesling are typically ..."
7721,Blanc de Valandraud No. 1 Bordeaux Blanc,Château Valandraud,2014,4.0,86.50,White,Bordeaux,France,Sauvignon Blanc,2.943649,3.812263,1.867314,,Bordeaux is the largest wine producing region ...
7722,Chassagne-Montrachet,Bruno Colin,2017,4.2,79.99,White,Chassagne-Montrachet,France,Chardonnay,3.299258,3.938477,1.587363,,


In [5]:
sparkling_wine_df = pd.read_csv('Sparkling_wine.csv',index_col = [0])

sparkling_wine_df.drop_duplicates(inplace = True)

sparkling_wine_df['wine type'] = 'Sparkling'
sparkling_wine_df = sparkling_wine_df[['wine name', 'winery', 'wine year', 'wine rating', 'wine price', 'wine type',
                           'wine region', 'wine country', 'grape information', 'wine acidity', 'wine intensity',
                          'wine sweetness', 'wine tannin',  'wine description']]

print(sparkling_wine_df.info())

sparkling_wine_df



<class 'pandas.core.frame.DataFrame'>
Int64Index: 2230 entries, 0 to 1095
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   wine name          2230 non-null   object 
 1   winery             2230 non-null   object 
 2   wine year          2230 non-null   object 
 3   wine rating        2230 non-null   float64
 4   wine price         2230 non-null   float64
 5   wine type          2230 non-null   object 
 6   wine region        2228 non-null   object 
 7   wine country       2228 non-null   object 
 8   grape information  1749 non-null   object 
 9   wine acidity       1731 non-null   float64
 10  wine intensity     1731 non-null   float64
 11  wine sweetness     1 non-null      float64
 12  wine tannin        0 non-null      float64
 13  wine description   1571 non-null   object 
dtypes: float64(6), object(8)
memory usage: 261.3+ KB
None


Unnamed: 0,wine name,winery,wine year,wine rating,wine price,wine type,wine region,wine country,grape information,wine acidity,wine intensity,wine sweetness,wine tannin,wine description
0,Governor's Cuvée Stanford Brut Champagne,Weibel Family,N.V.,3.0,5.99,Sparkling,California,United States,Chardonnay,2.923922,3.703530,,,The elegance of sparkling wine suits Northern ...
1,Cava Gran Brut Rosé,Campo Viejo,N.V.,3.6,7.95,Sparkling,Cava,Spain,Xarel-lo,3.640720,2.933004,,,
2,Verdi Raspberry Sparkletini Spumante,Bosca,N.V.,3.6,7.95,Sparkling,,,,,,,,
3,Verdi Peach Sparkletini,Bosca,N.V.,3.7,7.95,Sparkling,Asti,Italy,,,,,,
4,Cava Gran Brut Reserva,Campo Viejo,N.V.,3.6,7.95,Sparkling,Cava,Spain,Xarel-lo,3.493387,2.953652,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1091,Brut Champagne,Dom Pérignon,1999,4.6,199.99,Sparkling,Champagne,France,Chardonnay,4.441510,3.995037,,,While there are many sparkling wine regions ar...
1092,Brut Champagne,Dom Pérignon,1992,4.5,450.00,Sparkling,Champagne,France,Chardonnay,4.441510,3.995037,,,While there are many sparkling wine regions ar...
1093,Millésime Champagne Grand Cru 'Verzenay',Hugues Godmé,2009,4.1,74.99,Sparkling,Champagne Grand Cru 'Verzenay',France,Chardonnay,4.487732,4.007870,,,While there are many sparkling wine regions ar...
1094,Brut Vintage Champagne (Extra Cuvée de Réserve),Pol Roger,2009,4.3,78.66,Sparkling,Champagne,France,Chardonnay,4.550522,4.015999,,,While there are many sparkling wine regions ar...


In [6]:
rose_wine_df =  pd.read_csv('Rose_wine.csv',index_col = [0])

rose_wine_df.drop_duplicates(inplace = True)

rose_wine_df['wine type'] = 'Rose'
rose_wine_df = rose_wine_df[['wine name', 'winery', 'wine year', 'wine rating', 'wine price', 'wine type',  
                           'wine region', 'wine country', 'grape information', 'wine acidity', 'wine intensity',
                          'wine sweetness', 'wine tannin',  'wine description']]

rose_wine_df

Unnamed: 0,wine name,winery,wine year,wine rating,wine price,wine type,wine region,wine country,grape information,wine acidity,wine intensity,wine sweetness,wine tannin,wine description
0,Zinfandel White,Canyon Road,2018,3.4,10.00,Rose,California,United States,,,,,,
1,Rosé,Ménage à Trois,2018,3.5,10.49,Rose,California,United States,,,,,,
2,Rosé,Hess Select,2019,3.8,10.95,Rose,California,United States,,,,,,
3,Rosé of Malbec,Crios,2019,3.7,10.95,Rose,Lujan de Cuyo,Argentina,,,,,,
4,Rosé,Chateau Ste. Michelle,2018,3.6,10.95,Rose,Yakima Valley,United States,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1266,Private Bin Rosé,Villa Maria,2016,3.6,11.98,Rose,Marlborough,New Zealand,,,,,,
1272,Inspiration Rosé,Château de Berne,2017,4.0,19.99,Rose,Côtes de Provence,France,Shiraz/Syrah,3.929884,2.498264,1.412894,,No summer afternoon is complete without a litt...
1466,Bordeaux Rosé,Château Auguste,2018,4.1,16.79,Rose,Bordeaux,France,,,,,,
1470,Rosé,Massaya,2018,3.7,23.00,Rose,Bekaa Valley,Lebanon,,,,,,


In [7]:
dessert_wine_df = pd.read_csv('Dessert_wine.csv',index_col = [0])

dessert_wine_df.drop_duplicates(inplace = True)

dessert_wine_df['wine type'] = 'Dessert'

dessert_wine_df = dessert_wine_df[['wine name', 'winery', 'wine year', 'wine rating', 'wine price', 'wine type',  
                           'wine region', 'wine country', 'grape information', 'wine acidity', 'wine intensity',
                          'wine sweetness', 'wine tannin',  'wine description']]
print(dessert_wine_df.info())

dessert_wine_df

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1022 entries, 0 to 726
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   wine name          1022 non-null   object 
 1   winery             1022 non-null   object 
 2   wine year          1022 non-null   object 
 3   wine rating        1022 non-null   float64
 4   wine price         1022 non-null   float64
 5   wine type          1022 non-null   object 
 6   wine region        1022 non-null   object 
 7   wine country       1022 non-null   object 
 8   grape information  616 non-null    object 
 9   wine acidity       559 non-null    float64
 10  wine intensity     559 non-null    float64
 11  wine sweetness     558 non-null    float64
 12  wine tannin        10 non-null     float64
 13  wine description   164 non-null    object 
dtypes: float64(6), object(8)
memory usage: 119.8+ KB
None


Unnamed: 0,wine name,winery,wine year,wine rating,wine price,wine type,wine region,wine country,grape information,wine acidity,wine intensity,wine sweetness,wine tannin,wine description
0,Fruitscato Watermelon,Barefoot,N.V.,4.0,5.99,Dessert,California,United States,,,,,,
1,Sweet Walter Red,Bully Hill,N.V.,3.7,8.99,Dessert,New York,United States,,,,,,
2,Sweet Walter White,Bully Hill,N.V.,4.0,8.99,Dessert,New York,United States,,,,,,
3,Harvest Select Sweet Riesling,Chateau Ste. Michelle,2017,4.0,9.95,Dessert,Columbia Valley,United States,Riesling,2.899289,2.511005,3.641250,,
4,Harvest Select Sweet Riesling,Chateau Ste. Michelle,2018,3.9,9.99,Dessert,Columbia Valley,United States,Riesling,2.899289,2.511005,3.641250,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
489,Sauternes,Château Raymond-Lafon,2002,4.1,42.89,Dessert,Sauternes,France,Sauvignon Blanc,4.369525,4.699751,4.912559,,
695,Vouvray Le Mont Moelleux,Domaine Huet,2017,4.3,39.98,Dessert,Vouvray,France,,,,,,
724,La Chapelle de Lafaurie-Peyraguey Sauternes,Château Lafaurie-Peyraguey,2015,4.1,44.99,Dessert,Sauternes,France,Sauvignon Blanc,4.300760,4.520865,4.919000,,
725,Alvaréga Malvasia di Bosa,G. Battista Columbu,2014,3.8,33.71,Dessert,Malvasia di Bosa,Italy,,,,,,


In [8]:
fortified_wine_df = pd.read_csv('Fortified_wine.csv',index_col = [0])

fortified_wine_df.drop_duplicates(inplace = True)

fortified_wine_df['wine type'] = 'Fortified'
fortified_wine_df = fortified_wine_df[['wine name', 'winery', 'wine year', 'wine rating', 'wine price', 'wine type',  
                           'wine region', 'wine country', 'grape information', 'wine acidity', 'wine intensity',
                          'wine sweetness', 'wine tannin',  'wine description']]

print(fortified_wine_df.info())

fortified_wine_df

<class 'pandas.core.frame.DataFrame'>
Int64Index: 811 entries, 0 to 813
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   wine name          811 non-null    object 
 1   winery             811 non-null    object 
 2   wine year          811 non-null    object 
 3   wine rating        811 non-null    float64
 4   wine price         811 non-null    float64
 5   wine type          811 non-null    object 
 6   wine region        809 non-null    object 
 7   wine country       809 non-null    object 
 8   grape information  766 non-null    object 
 9   wine acidity       87 non-null     float64
 10  wine intensity     87 non-null     float64
 11  wine sweetness     87 non-null     float64
 12  wine tannin        3 non-null      float64
 13  wine description   8 non-null      object 
dtypes: float64(6), object(8)
memory usage: 95.0+ KB
None


Unnamed: 0,wine name,winery,wine year,wine rating,wine price,wine type,wine region,wine country,grape information,wine acidity,wine intensity,wine sweetness,wine tannin,wine description
0,Colheita Port (Single Year Tawny),Smith Woodhouse,2000,4.2,55.99,Fortified,Porto,Portugal,Touriga Nacional,,,,,
1,Colheita Malmsey Madeira (Single Harvest),Blandy's,1999,4.2,57.20,Fortified,Malmsey Madeira,Portugal,Verdelho,3.959141,4.638202,4.667680,,
2,Quinta do Panascal Vintage Port,Fonseca,2005,4.2,57.99,Fortified,Porto,Portugal,Touriga Nacional,,,,,
3,Vintage Port,Quinta do Vesuvio,2012,4.2,59.95,Fortified,Porto,Portugal,Touriga Nacional,,,,,
4,Vintage Port,Kopke,2005,4.3,59.99,Fortified,Porto,Portugal,Touriga Nacional,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
807,10 Years Old White Port,Quinta de Santa Eufémia,N.V.,4.1,29.99,Fortified,Porto,Portugal,Gouveio,,,,,
809,Late Bottled Vintage Port,Quinta do Crasto,2015,4.2,27.48,Fortified,Porto,Portugal,Touriga Nacional,,,,,
810,Pedro Ximenez 1827 Jerez-Xeres-Sherry,Osborne,N.V.,3.9,17.59,Fortified,Pedro Ximénez Sherry (PX),Spain,Pedro Ximenez,2.871021,4.823324,4.946997,,
811,Lagrima Fine White Port,Krohn,N.V.,3.8,12.52,Fortified,Porto,Portugal,Gouveio,,,,,


In [9]:
wine_df = pd.concat([red_wine_df, white_wine_df, sparkling_wine_df, rose_wine_df, dessert_wine_df, fortified_wine_df])
wine_df.drop_duplicates(inplace = True )

wine_df['wine price'].apply(type).value_counts()

wine_df = wine_df.reset_index(drop = True)
wine_df.to_csv('wine_df.csv')

In [10]:
wine_df

Unnamed: 0,wine name,winery,wine year,wine rating,wine price,wine type,wine region,wine country,grape information,wine acidity,wine intensity,wine sweetness,wine tannin,wine description
0,Cabernet Sauvignon,Carta Vieja,2019,3.4,4.99,Red,Loncomilla Valley,Chile,Cabernet Sauvignon,3.043747,3.781781,1.772746,3.164342,Cabernet Sauvignon is the most widely grown gr...
1,Merlot,Carta Vieja,2019,3.4,4.99,Red,Loncomilla Valley,Chile,Merlot,2.020424,3.482722,1.949938,2.366192,Merlot is a staple of the wine producing regio...
2,Cabernet Sauvignon,Three Wishes,N.V.,3.1,4.99,Red,California,United States,Cabernet Sauvignon,3.206160,4.545627,1.962006,3.569953,"Known as the king of red wine grapes, Cabernet..."
3,Cabernet Sauvignon,Crane Lake,2016,3.4,4.99,Red,California,United States,Cabernet Sauvignon,3.014199,4.738935,1.743180,3.540428,"Known as the king of red wine grapes, Cabernet..."
4,Pinot Noir,Crane Lake,2016,3.4,4.99,Red,California,United States,Pinot Noir,3.405433,2.832203,1.500866,2.147871,Pinot Noir has the well deserved reputation of...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21976,10 Years Old White Port,Quinta de Santa Eufémia,N.V.,4.1,29.99,Fortified,Porto,Portugal,Gouveio,,,,,
21977,Late Bottled Vintage Port,Quinta do Crasto,2015,4.2,27.48,Fortified,Porto,Portugal,Touriga Nacional,,,,,
21978,Pedro Ximenez 1827 Jerez-Xeres-Sherry,Osborne,N.V.,3.9,17.59,Fortified,Pedro Ximénez Sherry (PX),Spain,Pedro Ximenez,2.871021,4.823324,4.946997,,
21979,Lagrima Fine White Port,Krohn,N.V.,3.8,12.52,Fortified,Porto,Portugal,Gouveio,,,,,


## End of Data Collection Notebook:
So we have 21981 unique wines collected from Vivino REST API. We can see that there are more than 10,000 Red Wines, around 6,000 White Wines and the Rest are spread between Sparkling, Rose, Dessert and Fortified Wines. In the next notebook you will see my attempt to make an Exploratory Data Analysis on collected wine data.