# Individual Assignment: Wine!

There are 10 questions in this assignment. Some questions unlock others. If you can't answer a question, you can skip it and come back to it later, and if the question is locked, I'll provide a data sample for you to use, **at a cost of 0.5 points.**

The wine dataset is available in the file `wine.json`. This data contains information about wine reviews. It's a list of dictionaries, where each dictionary represents a wine review. The keys in the dictionary are:

* `points`: how many points the taster gave the wine on a scale of 1-100
* `title`: the title of the wine
* `description`: a description of the wine
* `taster_name`: the name of the taster
* `taster_twitter_handle`: the twitter handle of the taster
* `price`: the cost for a bottle of the wine
* `designation`: the vineyard within the winery where the grapes that made the wine are from
* `variety`: the type of grapes used to make the wine
* `region_1`: the province or state that the wine is from
* `region_2`: a more specific region within a wine growing area
* `province`: the province or state that the wine is from
* `country`: the country that the wine is from
* `winery`: the winery that made the wine



### Rules:
* For each question, print the answer in the cell below the question. You can use `print()` or just type the variable name.
* You can use any resources you like --including the internet, your notes, and the past notebooks-- except ChatGPT/Copilot/other AI writing tools. Using AI-based tools or asking other people for help will result in a 0 for the assignment, an immediate Fail in the course, and a report to the Dean of Students.
* You have 80 minutes to complete the assignment.
* You can submit the assignment as many times as you like, only the last submission will be graded.
* You can't work with other people on the assignment.

### 0. Load the data as a Python object and print the first item

In [1]:
import json

with open("wine.json") as f:
    wines = json.load(f)

FileNotFoundError: [Errno 2] No such file or directory: 'wine.json'

In [None]:
wines[0]

{'points': '87',
 'title': 'Nicosia 2013 Vulkà Bianco  (Etna)',
 'description': "Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity.",
 'taster_name': 'Kerin O’Keefe',
 'taster_twitter_handle': '@kerinokeefe',
 'price': None,
 'designation': 'Vulkà Bianco',
 'variety': 'White Blend',
 'region_1': 'Etna',
 'region_2': None,
 'province': 'Sicily & Sardinia',
 'country': 'Italy',
 'winery': 'Nicosia'}

### 1. How many wine reviews are included in the dataset? (1 point)

In [None]:
print(f"Total reviews included: {len(wines)}")

Total reviews included: 129971


### 2. Add a new {key:value} pair in each item in the list (1 point)

The new key should be called *length* and it should indicate the amount of words in the *description* value.

For example, the following description:
* "Very strong taste like apple and cinnamon"

should have a *length* value of **7** 

In [None]:
# Using for loops

for wine in wines:
    wine["length"] = len(wine["description"].split(" "))
    
wines[0]

{'points': '87',
 'title': 'Nicosia 2013 Vulkà Bianco  (Etna)',
 'description': "Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity.",
 'taster_name': 'Kerin O’Keefe',
 'taster_twitter_handle': '@kerinokeefe',
 'price': None,
 'designation': 'Vulkà Bianco',
 'variety': 'White Blend',
 'region_1': 'Etna',
 'region_2': None,
 'province': 'Sicily & Sardinia',
 'country': 'Italy',
 'winery': 'Nicosia',
 'length': 24}

### 3. How many different countries have wines reviewed in the dataset? (1 point)

In [None]:
# using set/list/dict comprehension

countries_without_none = {wine["country"] for wine in wines if wine["country"]!=None}
countries_with_none = {wine["country"] for wine in wines}

print(f"Different countries (without None): {len(countries_without_none)}")
print(f"Different countries (with None): {len(countries_with_none)}")

Different countries (without None): 43
Different countries (with None): 44


### 4. Build a dictionary with the following structure: (1 point)

{country: number of wines reviewed coming from that country}

In [None]:
# initialize the dictionary we are going to use for storing the results
countries_dict = {}

# first we loop throught the set of countries we build in (3)
for country in countries_without_none:  
    
    # set a counter to zero to add to it in every loop
    count = 0 
    
    # for every country we loop through all the wine reviews
    for wine in wines:  
        
        # when the wine comes from the country in which we are looping through, we add 1 to the counter
        if wine["country"] == country:  
            count += 1  
            
    # we add the total count to the 'country' key 
    countries_dict[country] = count
    
countries_dict

{'Canada': 257,
 'Israel': 505,
 'France': 22093,
 'China': 1,
 'Germany': 2165,
 'Egypt': 1,
 'Croatia': 73,
 'India': 9,
 'US': 54504,
 'Lebanon': 35,
 'Slovakia': 1,
 'Moldova': 59,
 'Chile': 4472,
 'Georgia': 86,
 'Bulgaria': 141,
 'Czech Republic': 12,
 'Armenia': 2,
 'Austria': 3345,
 'Greece': 466,
 'Mexico': 70,
 'South Africa': 1401,
 'New Zealand': 1419,
 'Switzerland': 7,
 'Hungary': 146,
 'Ukraine': 14,
 'Spain': 6645,
 'Serbia': 12,
 'Italy': 19540,
 'Peru': 16,
 'Morocco': 28,
 'Turkey': 90,
 'Brazil': 52,
 'Cyprus': 11,
 'Uruguay': 109,
 'Australia': 2329,
 'Argentina': 3800,
 'England': 74,
 'Slovenia': 87,
 'Portugal': 5691,
 'Luxembourg': 6,
 'Bosnia and Herzegovina': 2,
 'Romania': 120,
 'Macedonia': 12}

### 5. Build a dictionary with the following structure (1 point)
{country: average points of wines coming from that country]

In [None]:
# initialize the dictionary we are going 
country_avg_dict = {}

# first we loop throught the set of countries we build in (3)
for country in countries_without_none:
    
    # now we initialize a list that will store the rating of the wines from the country we are looping through
    ratings = []
    
    # for every country we loop through all its wine reviews
    for wine in wines:
        
        # when the wine comes from the country in which we are looping through
        # we add the rating to the list
        if wine["country"] == country:
            ratings.append(int(wine["points"]))
            
    # now we calculate the average score by dividing the total points scored by the total reviews
    total_points = sum(ratings)
    total_reviews = len(ratings)
    country_avg_dict[country] = total_points / total_reviews
    
country_avg_dict

{'Canada': 89.36964980544747,
 'Israel': 88.47128712871287,
 'France': 88.84510931064138,
 'China': 89.0,
 'Germany': 89.85173210161663,
 'Egypt': 84.0,
 'Croatia': 87.21917808219177,
 'India': 90.22222222222223,
 'US': 88.56372009393806,
 'Lebanon': 87.68571428571428,
 'Slovakia': 87.0,
 'Moldova': 87.20338983050847,
 'Chile': 86.4935152057245,
 'Georgia': 87.68604651162791,
 'Bulgaria': 87.93617021276596,
 'Czech Republic': 87.25,
 'Armenia': 87.5,
 'Austria': 90.10134529147982,
 'Greece': 87.28326180257511,
 'Mexico': 85.25714285714285,
 'South Africa': 88.05638829407566,
 'New Zealand': 88.3030303030303,
 'Switzerland': 88.57142857142857,
 'Hungary': 89.1917808219178,
 'Ukraine': 84.07142857142857,
 'Spain': 87.28833709556058,
 'Serbia': 87.5,
 'Italy': 88.56223132036847,
 'Peru': 83.5625,
 'Morocco': 88.57142857142857,
 'Turkey': 88.08888888888889,
 'Brazil': 84.67307692307692,
 'Cyprus': 87.18181818181819,
 'Uruguay': 86.75229357798165,
 'Australia': 88.58050665521684,
 'Argentin

### 6. What's the province that produces the wines with the highest rating? (1 point)

In [None]:
# first we need to get all the unique provinces there are: sets to the rescue
provinces = {wine["province"] for wine in wines}

# now we do the same as in (5) but with provinces
province_avg_dict = {}


for province in provinces:
    ratings = []
    for wine in wines:
        if wine["province"] == province:
            ratings.append(int(wine["points"]))
            
    # same approach as in (5)
    total_points = sum(ratings)
    total_reviews = len(ratings)
    province_avg_dict[province] = total_points / total_reviews
    

# we have the provinces and their scores, and now we need to find the best one:

# 1. using `province_avg_dict` on the `province_avg_dict` using max and dict.get()
best_province = max(
    province_avg_dict,  
    # by specifying this key, I want max to return the key associated to the highest value previously found
    key=province_avg_dict.get  
)

rating_best_province = province_avg_dict[best_province]

print(f"Best province is {best_province} with a rating of {rating_best_province} points")

Best province is Südburgenland with a rating of 94.0 points


In [None]:
# 2. using sorted() with dict.items(), lambda functions and indexing!
#   2.1. sorted() gets the iterable: our dictionary
#   2.2. dict.items() "breaks" the dictionary into a list of tuples: [(key1, val1), (key2, val2), ...]
#   2.3. key=lambda x: x[1] means that the sorting should be done according to the second element of each tuple
#   2.4. out of the sorted list of tuples, we only want the first item: [0]
#        This item contains the best province and its rating

best_province, rating_best_province = (
    sorted(
        province_avg_dict.items(), 
        key=lambda x: x[1], 
        reverse=True
    )
    [0]
)

print(f"Best province is {best_province} with a rating of {rating_best_province} points")

Best province is Südburgenland with a rating of 94.0 points


### 7. Update each wine's description by adding at the end of each description the following piece of text (1 point):

"This is a {designation} from {country} that scored {points} points"

In [None]:
# we loop again through each review
for wine in wines:
    
    # we fetch all the necessary info in each review: designation, country, and points
    designation = wine["designation"]
    country = wine["country"]
    points = wine["points"]
    
    # with the relevant info, we add a string to the current description
    wine["description"] += f" This is a {designation} from {country} that scored {points} points"
    
wines[0]["description"]

"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity. This is a Vulkà Bianco from Italy that scored 87 points"

### 8. What's the proportion of wine tasters that have a Twitter account? (1 point)

In [None]:
# first we find the unique tasters adding them to a set
tasters = {wine["taster_name"] for wine in wines if wine["taster_name"]!=None}

# then we create another set that contains those tasters that have specified a twitter handle
twitter_tasters = set([wine["taster_twitter_handle"] for wine in wines if wine["taster_twitter_handle"] is not None])

# now we calculate the proportion by divinding the twitter accounts by the total number of unique tasters
proportion = 100 * len(twitter_tasters) / len(tasters)

print(f"{proportion:.2f} % of the wine tasters have a Twitter account:")
print(*twitter_tasters, sep='\n')

78.95 % of the wine tasters have a Twitter account:
@mattkettmann
@bkfiona
@winewchristina
@suskostrzewa
@vossroger
@gordone_cellars
@laurbuzz
@worldwineguys
@vboone
@JoeCz
@wineschach
@AnneInVino
@kerinokeefe
@wawinereport
@paulgwine 


### Question 9 (1 point)

* Create a function called `affordable_wines` that receives the wines reviews list and a specific budget, and returns how many wines you can buy with that price. (0.5 points) 
* Create another function called `twitter_presence` that receives the wines reviews list and a wine name and returns True if the wine has a twitter handle for the taster, and False otherwise. (0.5 points)

Prove your functions with these examples:

* `affordable_wines(wines, 10)` should return 6280 wines in that budget
* `twitter_presence(wines, "Nicosia 2013 Vulkà Bianco  (Etna)")` should return True, meaning there is a twitter handle for the taster of that wine

In [None]:
def affordable_wines(wines, budget):
    on_bugdet = 0
    for wine in wines:
        if wine['price'] != None and float(wine['price']) <= budget:
            on_bugdet += 1  
    return on_bugdet

affordable_wines(wines, 10)

6280

In [None]:
def twitter_presence(wines, wine_name):
    for wine in wines:
        if (wine['title'] == wine_name) and (wine['taster_twitter_handle'] != ""):
            return True
    return False

twitter_presence(wines, "Nicosia 2013 Vulkà Bianco  (Etna)")

True

### Question 10 (1 point)

* Which is the most common variety of wine in the dataset? (0.3 points)
* Which is the most expensive wine in the dataset? (0.3 points)
* Which is the taster (other than `None`) that has reviewed the most wines? (0.4 points)

In [None]:
# 1 most common variety

unique_varieties = {wine['variety'] for wine in wines}

varieties = {}

for variety in unique_varieties:
    varieties[variety] = 0

    for wine in wines:
        if wine['variety'] == variety:
            varieties[variety] += 1

most_common_variety = max(varieties, key=varieties.get)

print(f"The most common variety is {most_common_variety}, with {varieties[most_common_variety]} wines out of {len(wines)} reviews")

The most common variety is Pinot Noir, with 13272 wines out of 129971 reviews


In [None]:
# 2 most expensive wine in the data

most_expensive_wine = ""
highest_price = 0

for wine in wines:
    if wine['price'] != None and float(wine['price']) > highest_price:
        highest_price = float(wine['price'])
        most_expensive_wine = wine['title']

print(f"The most expensive wine is {most_expensive_wine}, with a price of {highest_price}")

The most expensive wine is Château les Ormes Sorbet 2013  Médoc, with a price of 3300.0


In [None]:
# 3 taster that has reviewed the most wines

unique_tasters = {wine['taster_name'] for wine in wines}

tasters = {}

for taster in unique_tasters:
    tasters[taster] = 0

    for wine in wines:
        if wine['taster_name'] == taster and wine['taster_name'] != None:
            tasters[taster] += 1

most_active_taster = max(tasters, key=tasters.get)

print(f"The most active taster is {most_active_taster}, with {tasters[most_active_taster]} reviews")

The most active taster is Roger Voss, with 25514 reviews
