## Vivino Webscraper 

#### Introduction: 
Vivino is a popular wine enthusiast website database of information about wine from around the world. This project is a proof of concept for scraping data from Vivino for the purposes of wine understanding the wine market in Spain. This proof of concept has some limitations. First, the list of wines scraped are not a complete and comprehensive view of Spanish wine, as wine culture in Spain has traditionally been that of small production and barrel sales. For this reason, many small wine producers are not included in Vivinos database and therefore not represented in the data. Second, the wines shown only include wines currently available for purchase. Lastly, as a consumer database, the results of ratings may not represent professional opinions and may be limited by the 5-star rating system due to its overly simplistic nature. 

In [2]:
import requests
import json
import pandas as pd
import time


In [30]:

# Instantiate a dictionary of headers
# We only need to `manipulate` an User-Agent key
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0",
}

# Instantiate a dictionary of query strings
# Defines the only needed payload
payload = {
        "country_codes[]": ["es"],  # "FR", "IT", "DE", "CL", "PT", "AU", "AT", "AR", "US" <-- can add more country codes here
        "currency_code": "EUR",
        "grape_filter": "varietal",
        "min_rating": "1",
        "order_by": "price",
        "order": "asc",
        "page": 1,
        "price_range_max": "19.99",
        "price_range_min": "15",
        "wine_type_ids[]": "1",
}

# Performs an initial request and gathers the amount of results
r = requests.get('https://www.vivino.com/api/explore/explore?',
                 params=payload, headers=headers)
n_matches = r.json()['explore_vintage']['records_matched']


In [31]:
# Create Dataframe
column_names=["Bodega", "Año", "Vino_ID", "Vino", "Rating", "Nº_reseñas", "Precio",'Región']
df = pd.DataFrame(columns = column_names)


In [32]:
# Iterates through the amount of possible pages
# A page is defined by n_matches divided by 25 (number of results per page)
for i in range(int(n_matches / 25)):
    # Adds the page on the payload
    payload['page'] = i + 1

    print(f'Requesting data from page: {payload["page"]}')

    # Performs the request and saves the matches
    r = requests.get('https://www.vivino.com/api/explore/explore?',
                 params=payload, headers=headers)
    # matches = r.json()['explore_vintage']['matches']
    results = [
        (
            t["vintage"]["wine"]["winery"]["name"],
            t["vintage"]["year"],
            t["vintage"]["wine"]["id"],
            f'{t["vintage"]["wine"]["name"]} {t["vintage"]["year"]}',
            t["vintage"]["statistics"]["ratings_average"],
            t["vintage"]["statistics"]["ratings_count"],
            t["prices"][0]["amount"],
            t['vintage']['wine']['region']['name'],

        )
        for t in r.json()["explore_vintage"]["matches"] 
    ]       

    df2 = pd.DataFrame(results, columns=column_names)
    df = df.append(df2)
    

Requesting data from page: 1
Requesting data from page: 2
Requesting data from page: 3
Requesting data from page: 4
Requesting data from page: 5
Requesting data from page: 6
Requesting data from page: 7
Requesting data from page: 8
Requesting data from page: 9
Requesting data from page: 10
Requesting data from page: 11
Requesting data from page: 12
Requesting data from page: 13
Requesting data from page: 14
Requesting data from page: 15
Requesting data from page: 16
Requesting data from page: 17
Requesting data from page: 18
Requesting data from page: 19
Requesting data from page: 20
Requesting data from page: 21
Requesting data from page: 22
Requesting data from page: 23
Requesting data from page: 24
Requesting data from page: 25
Requesting data from page: 26
Requesting data from page: 27
Requesting data from page: 28
Requesting data from page: 29
Requesting data from page: 30
Requesting data from page: 31
Requesting data from page: 32
Requesting data from page: 33
Requesting data fro

In [33]:
df

Unnamed: 0,Bodega,Año,Vino_ID,Vino,Rating,Nº_reseñas,Precio,Región
0,D. Mateos,2018,9220215,La Vanidosa No. 1 Garnacha Tinta 2018,4.1,85,15.0000,Rioja
1,Raúl Pérez,2019,1216033,Ultreia Mencía 2019,4.1,31,15.0000,Bierzo
2,Forlong,2018,2006804,Assemblage 2018,3.8,72,15.0000,Cádiz
3,LaOsa,2015,5610890,Trasto Prieto Picudo 2015,3.7,27,15.0000,Castilla y León
4,Herència Altés,2018,1224551,L'Estel 2018,3.8,41,15.0000,Terra Alta
...,...,...,...,...,...,...,...,...
8,Nassos,2018,2471944,Tinto 2018,3.9,95,18.3636,Priorato
9,Fedellos do Couto,2019,3254559,Cortezada 2019,3.9,143,15.2500,Ribeira Sacra
10,Sierra Salinas,2012,1190555,Mira 2012,3.8,63,16.6600,Alicante
11,Baron de Ley,2016,1182136,Finca Monasterio Rioja 2016,4.1,11462,15.9500,Rioja


In [34]:
df.to_csv("tintos15_20.csv", index=False)