# Individual Assignment: Wine!

There are 10 questions in this assignment. Some questions unlock others. If you can't answer a question, you can skip it and come back to it later, and if the question is locked, I'll provide a data sample for you to use, **at a cost of 0.5 points.**

The wine dataset is available in the file `wine.json`. This data contains information about wine reviews. It's a list of dictionaries, where each dictionary represents a wine review. The keys in the dictionary are:

* `points`: how many points the taster gave the wine on a scale of 1-100
* `title`: the title of the wine
* `description`: a description of the wine
* `taster_name`: the name of the taster
* `taster_twitter_handle`: the twitter handle of the taster
* `price`: the cost for a bottle of the wine
* `designation`: the vineyard within the winery where the grapes that made the wine are from
* `variety`: the type of grapes used to make the wine
* `region_1`: the province or state that the wine is from
* `region_2`: a more specific region within a wine growing area
* `province`: the province or state that the wine is from
* `country`: the country that the wine is from
* `winery`: the winery that made the wine



### Rules:
* For each question, print the answer in the cell below the question. You can use `print()` or just type the variable name.
* You can use any resources you like --including the internet, your notes, and the past notebooks-- except ChatGPT/Copilot/other AI writing tools. Using AI-based tools or asking other people for help will result in a 0 for the assignment, an immediate Fail in the course, and a report to the Dean of Students.
* You have 80 minutes to complete the assignment.
* You can submit the assignment as many times as you like, only the last submission will be graded.
* You can't work with other people on the assignment.

### 0. Load the data as a Python object and print the first item

In [1]:
import json
with open('wine-data-set.json', encoding='utf-8') as json_file:
    json_data = json.load(json_file)

json_data[0]

{'points': 89,
 'title': 'Caymus 1998 Cabernet Sauvignon (Napa Valley)',
 'description': 'Creamy black cherry aromas layered with fresh brussel sprouts and spicy arugula flavors of red plums and toasted oak.',
 'taster_name': None,
 'taster_twitter_handle': None,
 'price': 70,
 'designation': None,
 'variety': 'Cabernet Sauvignon',
 'region_1': 'Napa Valley',
 'region_2': 'Napa',
 'province': 'California',
 'country': 'US',
 'winery': 'Caymus'}

### 1. How many wine reviews are included in the dataset? (1 point)

In [2]:
len(json_data)

10000

### 2. Add a new {key:value} pair in each item in the list (1 point)

The new key should be called *length* and it should indicate the amount of words in the *description* value.

For example, the following description:
* "Very strong taste like apple and cinnamon"

should have a *length* value of **7** 

In [4]:
for review in json_data:
    length = len(review['description'].split())
    review['length'] = length 

json_data[:2]

[{'points': 89,
  'title': 'Caymus 1998 Cabernet Sauvignon (Napa Valley)',
  'description': 'Creamy black cherry aromas layered with fresh brussel sprouts and spicy arugula flavors of red plums and toasted oak.',
  'taster_name': None,
  'taster_twitter_handle': None,
  'price': 70,
  'designation': None,
  'variety': 'Cabernet Sauvignon',
  'region_1': 'Napa Valley',
  'region_2': 'Napa',
  'province': 'California',
  'country': 'US',
  'winery': 'Caymus',
  'length': 19},
 {'points': 98,
  'title': 'M. Chapoutier 1999 Le Méal Ermitage  (Hermitage)',
  'description': "Chapoutier's selections of the best parcels of vines in Hermitage are set to become legendary. Sold under the ancient spelling of the appellation name (Ermitage), they represent the epitome of the power and concentration that lies behind the reputation of the appellation. This cuvée is the best of the collection, with its brooding, opaque character, suggesting rather than revealing power at this stage. Age it until your 

Countries Reviewed: 34
Countries Not Reviewed 34


{'points': 89,
 'title': 'Caymus 1998 Cabernet Sauvignon (Napa Valley)',
 'description': 'Creamy black cherry aromas layered with fresh brussel sprouts and spicy arugula flavors of red plums and toasted oak.',
 'taster_name': None,
 'taster_twitter_handle': None,
 'price': 70,
 'designation': None,
 'variety': 'Cabernet Sauvignon',
 'region_1': 'Napa Valley',
 'region_2': 'Napa',
 'province': 'California',
 'country': 'US',
 'winery': 'Caymus',
 'length': 19}

### 3. How many different countries have their wines reviewed in the dataset? (1 point)

In [27]:
reviewed = {review['country'] for review in json_data if review['points'] != None and review['country'] != None}
total = [review['country'] for review in json_data]
total = set(total)

print(f'Countries Reviewed: {len(reviewed)}')
print(f'Countries Not Reviewed {len(total)}')

Countries Reviewed: 33
Countries Not Reviewed 34


33

### 4. Build a dictionary with the following structure: (1 point)

{country: number of wines reviewed coming from that country}

In [33]:
distinct_countries = {}

for review in json_data:
    country = review['country']
    if country in distinct_countries:
        distinct_countries[country] += 1
    else:
        distinct_countries[country] = 1

print(distinct_countries)

{'US': 4317, 'France': 1716, 'Chile': 362, 'Italy': 1432, 'Germany': 155, 'New Zealand': 100, 'South Africa': 111, 'Argentina': 319, 'Spain': 462, 'Portugal': 396, 'Austria': 287, 'Greece': 32, 'Australia': 165, 'Canada': 17, None: 5, 'England': 11, 'Hungary': 13, 'Georgia': 7, 'Mexico': 10, 'Israel': 32, 'Bulgaria': 13, 'Brazil': 6, 'Uruguay': 3, 'Slovenia': 4, 'Ukraine': 2, 'Turkey': 4, 'Croatia': 2, 'Lebanon': 3, 'Romania': 6, 'Moldova': 2, 'Czech Republic': 3, 'Serbia': 1, 'India': 1, 'China': 1}


In [34]:
countries_dict = {}

for country in reviewed:  
    count = 0 

    for review in json_data:  
        if review["country"] == country:  
            count += 1  
            
    countries_dict[country] = count
    
countries_dict

{'Argentina': 319,
 'Bulgaria': 13,
 'Romania': 6,
 'South Africa': 111,
 'Canada': 17,
 'Moldova': 2,
 'Spain': 462,
 'Australia': 165,
 'Ukraine': 2,
 'Georgia': 7,
 'New Zealand': 100,
 'Czech Republic': 3,
 'India': 1,
 'Chile': 362,
 'Greece': 32,
 'Brazil': 6,
 'Israel': 32,
 'Mexico': 10,
 'Hungary': 13,
 'England': 11,
 'Germany': 155,
 'France': 1716,
 'Turkey': 4,
 'Portugal': 396,
 'Serbia': 1,
 'US': 4317,
 'Lebanon': 3,
 'Austria': 287,
 'Italy': 1432,
 'China': 1,
 'Croatia': 2,
 'Slovenia': 4,
 'Uruguay': 3}

### 5. Build a dictionary with the following structure (1 point)
{country: average points of wines coming from that country]

In [38]:
country_averages = {}

for country in reviewed:
    list_of_reviewed = []
    
    for review in json_data:
        if review['country'] == country:
            list_of_reviewed.append(review['points'])
    
    average = sum(list_of_reviewed) / len(list_of_reviewed)

    country_averages[country] = average

country_averages

{'Argentina': 86.68025078369907,
 'Bulgaria': 88.23076923076923,
 'Romania': 86.83333333333333,
 'South Africa': 88.03603603603604,
 'Canada': 88.94117647058823,
 'Moldova': 87.0,
 'Spain': 87.16666666666667,
 'Australia': 88.4969696969697,
 'Ukraine': 84.0,
 'Georgia': 87.28571428571429,
 'New Zealand': 88.3,
 'Czech Republic': 88.0,
 'India': 92.0,
 'Chile': 86.29558011049724,
 'Greece': 87.125,
 'Brazil': 85.66666666666667,
 'Israel': 88.71875,
 'Mexico': 84.5,
 'Hungary': 89.53846153846153,
 'England': 91.72727272727273,
 'Germany': 89.83225806451613,
 'France': 88.91433566433567,
 'Turkey': 87.5,
 'Portugal': 88.02525252525253,
 'Serbia': 86.0,
 'US': 88.54829742876998,
 'Lebanon': 87.0,
 'Austria': 90.82229965156795,
 'Italy': 88.27374301675978,
 'China': 89.0,
 'Croatia': 81.5,
 'Slovenia': 87.0,
 'Uruguay': 87.33333333333333}

### 6. What's the province that produces the wines with the highest rating? (1 point)

Build a dictionary with the following structure:

{province: average points of wines coming from that province}

And then sort the dictionary by the average points, and print the province with the highest average points.

Hint: you can sort a dictionary by value using the following code:

```
sorted_dict = sorted(my_dict.items(), key=lambda x: x[1], reverse=True)
```

In [54]:
provinces = {review['province'] for review in json_data}
provinces_averages = {}

for province in provinces:
    list_of_points = []

    for review in json_data:
        if review['province'] == province:
            list_of_points.append(review['points'])
            
    average = sum(list_of_points) / len(list_of_points)

    provinces_averages[province] = average

print(provinces_averages)
sorted_dict = sorted(provinces_averages.items(), key=lambda x: x[1], reverse=True)
print(sorted_dict[0])

{'New South Wales': 88.16666666666667, 'Illinois': 86.0, "Hawke's Bay": 90.07692307692308, 'Mosel-Saar-Ruwer': 90.0, 'Codru Region': 87.0, 'Spanish Islands': 86.2, 'Tuscany': 88.5163043478261, 'Württemberg': 87.0, 'Western Australia': 88.85, 'Virginia': 85.62222222222222, 'Istria': 81.5, 'Santorini': 90.0, 'Jerusalem Hills': 91.0, 'Itata Valley': 84.0, 'Eisenberg': 92.0, 'Chile': 85.57142857142857, 'Brda': 82.0, 'Michigan': 87.75, 'Nahe': 89.125, 'Pirque': 92.0, 'Oregon': 88.77105263157895, 'Alentejo': 87.28571428571429, 'Marchigue': 92.0, 'Setubal': 83.0, 'Santa Cruz': 91.0, 'Stellenbosch': 88.55555555555556, 'Letrinon': 89.0, 'Casablanca Valley': 86.42307692307692, 'Missouri': 82.66666666666667, 'Atalanti Valley': 91.0, 'British Columbia': 89.81818181818181, 'Germany': 86.5, 'Veneto': 87.84390243902439, 'Sicily & Sardinia': 87.63448275862069, 'Kremstal': 92.48, 'Nemea': 87.33333333333333, 'Wagram': 92.33333333333333, 'Turkey': 85.0, 'Levante': 87.1875, 'New Jersey': 83.0, 'Alsace': 8

### 7. Update each wine's description by adding at the end of each description the following piece of text (1 point):

"This is a {designation} from {country} that scored {points} points"

In [55]:
for review in json_data:
    designation = review['designation']
    country = review['country']
    points = review['points']
    review['description'] = f'This is a {designation} from {country} that scored {points}'

json_data[:2]

[{'points': 89,
  'title': 'Caymus 1998 Cabernet Sauvignon (Napa Valley)',
  'description': 'This is a None from US that scored 89',
  'taster_name': None,
  'taster_twitter_handle': None,
  'price': 70,
  'designation': None,
  'variety': 'Cabernet Sauvignon',
  'region_1': 'Napa Valley',
  'region_2': 'Napa',
  'province': 'California',
  'country': 'US',
  'winery': 'Caymus',
  'length': 19},
 {'points': 98,
  'title': 'M. Chapoutier 1999 Le Méal Ermitage  (Hermitage)',
  'description': 'This is a Le Méal Ermitage from France that scored 98',
  'taster_name': 'Roger Voss',
  'taster_twitter_handle': '@vossroger',
  'price': 150,
  'designation': 'Le Méal Ermitage',
  'variety': 'Rhône-style Red Blend',
  'region_1': 'Hermitage',
  'region_2': None,
  'province': 'Rhône Valley',
  'country': 'France',
  'winery': 'M. Chapoutier',
  'length': 81}]

7737
8123


### 8. What's the proportion of wine tasters that have a Twitter account? (1 point)

In [None]:
tasters = {review["taster_name"] for review in json_data if review["taster_name"] != None}

twitter_tasters = set([review["taster_twitter_handle"] for review in json_data if review["taster_twitter_handle"] is not None])

print(len(twitter_tasters))
print(len(tasters))
proportion = 100 * len(twitter_tasters) / len(tasters)

print(f"{proportion:.2f} % of the wine tasters have a Twitter account:")
print(*twitter_tasters, sep='\n')

14
18
77.78 % of the wine tasters have a Twitter account:
@bkfiona
@kerinokeefe
@JoeCz
@AnneInVino
@wawinereport
@worldwineguys
@suskostrzewa
@wineschach
@vboone
@paulgwine 
@vossroger
@mattkettmann
@laurbuzz
@gordone_cellars


### Question 9 (1 point)

* Create a function called `affordable_wines` that receives the wines reviews list and a specific budget, and returns how many wines you can buy with that price. (0.5 points) 
* Create another function called `twitter_presence` that receives the wines reviews list and a wine name and returns True if the wine has a twitter handle for the taster, and False otherwise. (0.5 points)

Prove your functions with these examples:

* `affordable_wines(wines, 10)` should return 423 wines in that budget
* `twitter_presence(wines, "Nicosia 2013 Vulkà Bianco  (Etna)")` should return True, meaning there is a twitter handle for the taster of that wine

In [66]:
def affordable_wines(json_data, budget):
    count = 0

    for review in json_data:
        if review['price'] != None and review['price'] <= budget:
            count += 1
    
    return count

print(affordable_wines(json_data, 10))

423


### Question 10 (1 point)

* Which is the most common variety of wine in the dataset? (0.3 points)
* Which is the most expensive wine in the dataset? (0.3 points)
* Which is the taster (other than `None`) that has reviewed the most wines? (0.4 points)

In [68]:
variety_of_wine = {}

for review in json_data:
    variety = review['variety']
    if variety in variety_of_wine:
        variety_of_wine[variety] += 1
    else:
        variety_of_wine[variety] = 1

most_variety = max(variety_of_wine, key = variety_of_wine.get)

print(most_variety, variety_of_wine[most_variety])

Pinot Noir 1007


In [70]:
wine_prices = {review['title']: review['price'] for review in json_data if review['price'] != None}

most_expensive = max(wine_prices, key = wine_prices.get)

print(most_expensive, wine_prices[most_expensive])

Biondi Santi 2007 Riserva  (Brunello di Montalcino) 800


In [71]:
most_expensive_wine = ""
highest_price = 0

for review in json_data:
    if review['price'] != None and float(review['price']) > highest_price:
        highest_price = float(review['price'])
        most_expensive_wine = review['title']

print(f"The most expensive wine is {most_expensive_wine}, with a price of {highest_price}")

The most expensive wine is Biondi Santi 2007 Riserva  (Brunello di Montalcino), with a price of 800.0


In [72]:
tasters = {}

for review in json_data:
    taster = review['taster_name']
    if taster in tasters and taster != None:
        tasters[taster] += 1
    elif taster != None:
        tasters[taster] = 1

most_reviews = max(tasters, key = tasters.get)

print(most_reviews, tasters[most_reviews])

Roger Voss 1932
