# Individual Assignment: Wine!

There are 10 questions in this assignment. Some questions unlock others. If you can't answer a question, you can skip it and come back to it later, and if the question is locked, I'll provide a data sample for you to use, **at a cost of 0.5 points.**

The wine dataset is available in the file `wine.json`. This data contains information about wine reviews. It's a list of dictionaries, where each dictionary represents a wine review. The keys in the dictionary are:

* `points`: how many points the taster gave the wine on a scale of 1-100
* `title`: the title of the wine
* `description`: a description of the wine
* `taster_name`: the name of the taster
* `taster_twitter_handle`: the twitter handle of the taster
* `price`: the cost for a bottle of the wine
* `designation`: the vineyard within the winery where the grapes that made the wine are from
* `variety`: the type of grapes used to make the wine
* `region_1`: the province or state that the wine is from
* `region_2`: a more specific region within a wine growing area
* `province`: the province or state that the wine is from
* `country`: the country that the wine is from
* `winery`: the winery that made the wine



### Rules:
* When I ask a question, print the answer in the cell below the question. You can use `print()` or just type the variable name.
* You can use any resources you like --including the internet, your notes, and the past notebooks-- except ChatGPT/Copilot/other AI writing tools. Using AI-based tools or asking other people for help will result in a 0 for the assignment, an immediate Fail in the course, and a report to the Dean of Students.
* You have 80 minutes to complete the assignment.
* You can submit the assignment as many times as you like, only the last submission will be graded.
* You can't work with other people on the assignment.

### Question 1 (1 point)

Read the JSON data into a variable called `wine`. Remember to make sure that Python is looking in the right place for the file. You can check the current working directory with the following code:

```python
import os
os.getcwd()
```

If the file is not in the current working directory, you can change the working directory with the following code:

```python
os.chdir('path/to/file')
```

How many reviews are in the data?

**HELP: If you can't figure out how to read the JSON file, raise your hand and I'll tell you how to do it. (-0.5 points)**

In [63]:
import json

with open('wine-data-set.json', encoding = 'utf-8') as f:
    wine = json.load(f)

wine[:2]

[{'points': 89,
  'title': 'Caymus 1998 Cabernet Sauvignon (Napa Valley)',
  'description': 'Creamy black cherry aromas layered with fresh brussel sprouts and spicy arugula flavors of red plums and toasted oak.',
  'taster_name': None,
  'taster_twitter_handle': None,
  'price': 70,
  'designation': None,
  'variety': 'Cabernet Sauvignon',
  'region_1': 'Napa Valley',
  'region_2': 'Napa',
  'province': 'California',
  'country': 'US',
  'winery': 'Caymus'},
 {'points': 98,
  'title': 'M. Chapoutier 1999 Le Méal Ermitage  (Hermitage)',
  'description': "Chapoutier's selections of the best parcels of vines in Hermitage are set to become legendary. Sold under the ancient spelling of the appellation name (Ermitage), they represent the epitome of the power and concentration that lies behind the reputation of the appellation. This cuvée is the best of the collection, with its brooding, opaque character, suggesting rather than revealing power at this stage. Age it until your new-born baby is

### Question 2 (1 point)

What is the average price of a bottle of wine in the dataset?

In [64]:
# prices = [price['price'] for price in wine if isinstance(price['price'], (int, float))]
prices = [price['price'] for price in wine if price['price'] != None]

average_price = sum(prices) / len(prices)
print(f'The average price is {round(average_price, 2)}')

The average price is 34.75


### Question 3 (1 point)

Build a dictionary with the following structure:
```python
ratings = {
    country: average points of all its wines
}
```

What is the `country` whose wines have the highest average `points`?

Hints:
* Remember to use the unique countries in the dataset, so as not to double count.
* Printing the countries and manually finding the highest average is not fully correct. You have to do it with code.

In [100]:
points_dict = {}

for review in wine:
    country = review['country']
    point = review['points']
    if country in points_dict:
        points_dict[country].append(point)
    else:
        points_dict[country] = [point]

for country, points in points_dict.items():
    total = sum(points)
    length = len(points)
    average = total / length
    points_dict[country] = average

print(points_dict)

highest_average_score = max(points_dict, key = points_dict.get)
print(highest_average_score, points_dict[highest_average_score])

{'US': 88.54829742876998, 'France': 88.91433566433567, 'Chile': 86.29558011049724, 'Italy': 88.27374301675978, 'Germany': 89.83225806451613, 'New Zealand': 88.3, 'South Africa': 88.03603603603604, 'Argentina': 86.68025078369907, 'Spain': 87.16666666666667, 'Portugal': 88.02525252525253, 'Austria': 90.82229965156795, 'Greece': 87.125, 'Australia': 88.4969696969697, 'Canada': 88.94117647058823, None: 89.2, 'England': 91.72727272727273, 'Hungary': 89.53846153846153, 'Georgia': 87.28571428571429, 'Mexico': 84.5, 'Israel': 88.71875, 'Bulgaria': 88.23076923076923, 'Brazil': 85.66666666666667, 'Uruguay': 87.33333333333333, 'Slovenia': 87.0, 'Ukraine': 84.0, 'Turkey': 87.5, 'Croatia': 81.5, 'Lebanon': 87.0, 'Romania': 86.83333333333333, 'Moldova': 87.0, 'Czech Republic': 88.0, 'Serbia': 86.0, 'India': 92.0, 'China': 89.0}
India 92.0


### Question 4 (1 point)

Using the `ratings` dictionary created in the previous answer, what are the average ratings of the following countries:

* `Egypt`
* `Slovenia`
* `Uruguay`

**HELP: If you couldn't create the `ratings` dictionary in Q3, use the following dictionary to solve it. (-0.5 points)**
```python
ratings_dictionary = {'England': 91.58108108108108, 'India': 90.22222222222223, 'Austria': 90.10134529147982, 'Germany': 89.85173210161663, 'Canada': 89.36964980544747, 'Hungary': 89.1917808219178, 'China': 89.0, 'France': 88.84510931064138, 'Luxembourg': 88.66666666666667, None: 88.63492063492063, 'Australia': 88.58050665521684, 'Switzerland': 88.57142857142857, 'Morocco': 88.57142857142857, 'US': 88.56372009393806, 'Italy': 88.56223132036847, 'Israel': 88.47128712871287, 'New Zealand': 88.3030303030303, 'Portugal': 88.25021964505359, 'Turkey': 88.08888888888889, 'Slovenia': 88.06896551724138, 'South Africa': 88.05638829407566, 'Bulgaria': 87.93617021276596, 'Georgia': 87.68604651162791, 'Lebanon': 87.68571428571428, 'Armenia': 87.5, 'Serbia': 87.5, 'Spain': 87.28833709556058, 'Greece': 87.28326180257511, 'Czech Republic': 87.25, 'Croatia': 87.21917808219177, 'Moldova': 87.20338983050847, 'Cyprus': 87.18181818181819, 'Slovakia': 87.0, 'Macedonia': 86.83333333333333, 'Uruguay': 86.75229357798165, 'Argentina': 86.71026315789474, 'Bosnia and Herzegovina': 86.5, 'Chile': 86.4935152057245, 'Romania': 86.4, 'Mexico': 85.25714285714285, 'Brazil': 84.67307692307692, 'Ukraine': 84.07142857142857, 'Egypt': 84.0, 'Peru': 83.5625}
```

In [70]:
ratings_dictionary = {'England': 91.58108108108108, 'India': 90.22222222222223, 'Austria': 90.10134529147982, 'Germany': 89.85173210161663, 'Canada': 89.36964980544747, 'Hungary': 89.1917808219178, 'China': 89.0, 'France': 88.84510931064138, 'Luxembourg': 88.66666666666667, None: 88.63492063492063, 'Australia': 88.58050665521684, 'Switzerland': 88.57142857142857, 'Morocco': 88.57142857142857, 'US': 88.56372009393806, 'Italy': 88.56223132036847, 'Israel': 88.47128712871287, 'New Zealand': 88.3030303030303, 'Portugal': 88.25021964505359, 'Turkey': 88.08888888888889, 'Slovenia': 88.06896551724138, 'South Africa': 88.05638829407566, 'Bulgaria': 87.93617021276596, 'Georgia': 87.68604651162791, 'Lebanon': 87.68571428571428, 'Armenia': 87.5, 'Serbia': 87.5, 'Spain': 87.28833709556058, 'Greece': 87.28326180257511, 'Czech Republic': 87.25, 'Croatia': 87.21917808219177, 'Moldova': 87.20338983050847, 'Cyprus': 87.18181818181819, 'Slovakia': 87.0, 'Macedonia': 86.83333333333333, 'Uruguay': 86.75229357798165, 'Argentina': 86.71026315789474, 'Bosnia and Herzegovina': 86.5, 'Chile': 86.4935152057245, 'Romania': 86.4, 'Mexico': 85.25714285714285, 'Brazil': 84.67307692307692, 'Ukraine': 84.07142857142857, 'Egypt': 84.0, 'Peru': 83.5625}

print(ratings_dictionary['Egypt'])
print(ratings_dictionary['Slovenia'])
print(ratings_dictionary['Uruguay'])

84.0
88.06896551724138
86.75229357798165


### Question 5 (1 point)

Some data preparation: 
* If there is a wine that doesn't have a price, fill the price of that wine with the average price of all the wines in that country

For example:
* Country B has 3 wines with `[20, 10, None]` as prices
* Then calculate the average price of the wines with prices in that country, and substitute the `None`s with that average (in this case, the average of `[20, 10]`).
* Final prices for the three wines in country B should be `[20, 10, 15]`.

In [94]:
prices_dict = {}

# First pass: collect the prices
for review in wine:
    country = review['country']
    price = review['price']
    if price is not None: 
        if country in prices_dict:
            prices_dict[country].append(price)
        else:
            prices_dict[country] = [price]

# Second pass: compute the averages
average_prices_dict = {}
for country, prices in prices_dict.items():
    total = sum(prices)
    length = len(prices)
    average = total / length
    average_prices_dict[country] = average

# Third pass: update the dictionary with averages where price is None
updated_dict = {}
for review in wine:
    country = review['country']
    price = review['price']
    if price is not None:
        if country in updated_dict:
            updated_dict[country].append(price)
        else:
            updated_dict[country] = [price]
    else:
        if country in average_prices_dict:
            average_price = average_prices_dict[country]
            if country in updated_dict:
                updated_dict[country].append(average_price)
            else:
                updated_dict[country] = [average_price]

print(updated_dict)

{'US': [70, 150, 100, 120, 60, 150, 65, 100, 48, 42, 65, 48, 16, 28, 35, 32, 30, 17, 36, 21, 15, 20, 20, 35, 100, 20, 35, 24, 19, 18, 15, 25, 17, 35, 16, 35, 10, 33, 39, 15, 15, 15, 19, 19, 17, 13, 25, 30, 20, 25, 22, 22, 120, 65, 28, 24, 38, 39, 40, 48, 50, 25, 25, 60, 25, 53, 55, 55, 16, 22, 55, 20, 24, 15, 24, 80, 65, 28, 80, 50, 43, 44, 43, 19, 38, 85, 23, 32, 19, 36, 40, 30, 28, 56, 25, 30, 12, 26, 17, 10, 38, 26, 16, 72, 30, 15, 25, 14, 72, 68, 55, 30, 61, 55, 68, 50, 160, 55, 28, 30, 54, 30, 110, 39, 30, 75, 65, 62, 30, 26, 60, 29, 65, 26, 38, 75, 60, 25, 18, 55, 65, 60, 75, 22, 46, 20, 44, 58, 28, 39, 35, 35, 30, 85, 45, 19, 36, 30, 26, 50, 14, 25, 50, 40, 65, 30, 36, 34, 32, 50, 20, 38, 42, 42, 65, 52, 55, 45, 38, 22, 30, 45, 30, 50, 85, 80, 50, 40, 65, 38, 150, 50, 46, 55, 33, 25, 45, 28, 28, 135, 18, 25, 25, 28, 42, 16, 25, 10, 11, 14, 26, 26, 11, 15, 135, 16, 13, 10, 18, 20, 20, 15, 19, 10, 20, 16, 28, 42, 30, 16, 16, 28, 13, 12, 147, 30, 35, 13, 17, 17, 20, 25, 36, 22, 16,

### Question 6 (1 point)

Similar to the `ratings` dictionary, create a new dictionary `countries` where the key is the `country` and the value is a tuple containing the following:
```python
countries = {
    country: (
        average points of all its wines,
        average price of all its wines,
        ratio of average points to average price
    )
}
```

For example (not real numbers):
```python
countries = {
    'France': (90, 20, 4.5),
    'Italy': (88, 15, 5.8),
    'Spain': (85, 10, 8.5)
}
```

We want lots of points on average at a lower price.

What is the `country` whose wines have the highest average `points` to `price` ratio?

In [None]:
unique_countries = {review['country'] for review in wine}

new_dict = {}

for country in unique_countries:
    ratio = points_dict[country] / average_prices_dict[country]
    new_dict[country] = (points_dict[country], average_prices_dict[country], ratio)

print(new_dict)

{'Turkey': (87.5, 31.0, 2.8225806451612905), 'France': (88.91433566433567, 38.11685393258427, 2.332677713160557), 'South Africa': (88.03603603603604, 28.47747747747748, 3.091426763682379), 'Argentina': (86.68025078369907, 27.266025641025642, 3.1790570406152705), 'Australia': (88.4969696969697, 37.25766871165644, 2.3752685757625653), 'Spain': (87.16666666666667, 26.49344978165939, 3.2901214218998955), 'Moldova': (87.0, 13.0, 6.6923076923076925), 'Brazil': (85.66666666666667, 30.4, 2.817982456140351), 'Serbia': (86.0, 15.0, 5.733333333333333), 'Croatia': (81.5, 17.0, 4.794117647058823), 'Romania': (86.83333333333333, 18.0, 4.8240740740740735), 'Uruguay': (87.33333333333333, 24.0, 3.638888888888889), 'Slovenia': (87.0, 19.666666666666668, 4.423728813559322), 'Hungary': (89.53846153846153, 28.916666666666668, 3.0964309465750386), 'Bulgaria': (88.23076923076923, 14.0, 6.302197802197802), 'Ukraine': (84.0, 11.0, 7.636363636363637), 'Italy': (88.27374301675978, 38.019747235387044, 2.321786688

### Question 7 (1 point)

Using the `countries` dictionary created in the previous answer, which is the country with the lowest points-to-price ratios out of the following countries:

* `Cyprus`
* `Brazil`
* `India`

Again, you need to find the solution via code, not by printing the countries and finding the lowest ratio.

**HELP: If you couldn't create the `countries` dictionary, use the following dictionary to solve it. (-0.5 points)**
```python
{'Switzerland': (88.57142857142857, 85.28571428571429, 1.0385259631490786), 'Ukraine': (84.07142857142857, 9.214285714285714, 9.124031007751938), 'France': (88.84510931064138, 50.44096557625022, 1.761368131946972), 'New Zealand': (88.3030303030303, 28.71683227387118, 3.0749572049203806), 'Uruguay': (86.75229357798165, 26.40366972477064, 3.2856150104239057), 'Slovenia': (88.06896551724138, 29.952709646822846, 2.9402670594973386), 'Macedonia': (86.83333333333333, 15.583333333333334, 5.572192513368983), 'Morocco': (88.57142857142857, 19.5, 4.542124542124542), 'Germany': (89.85173210161663, 43.22511895400462, 2.0786925351721273), 'Moldova': (87.20338983050847, 16.74576271186441, 5.20748987854251), 'Cyprus': (87.18181818181819, 16.272727272727273, 5.357541899441341), 'Argentina': (86.71026315789474, 25.250984316518174, 3.4339359634852884), 'Lebanon': (87.68571428571428, 30.685714285714287, 2.8575418994413404), 'Austria': (90.10134529147982, 40.21345058732233, 2.2405773186717064), 'Canada': (89.36964980544747, 36.330749534908094, 2.4598900641886674), 'Egypt': (84.0, 88.66754349046016, 0.9473590526282896), 'Chile': (86.4935152057245, 21.632842067954186, 3.9982502037423777), 'China': (89.0, 18.0, 4.944444444444445), 'India': (90.22222222222223, 13.333333333333334, 6.766666666666667), 'Czech Republic': (87.25, 24.25, 3.597938144329897), 'Brazil': (84.67307692307692, 29.977986926550436, 2.8245084344901423), 'Hungary': (89.1917808219178, 40.97516808122459, 2.1767276377027667), 'Luxembourg': (88.66666666666667, 23.333333333333332, 3.8000000000000003), 'Greece': (87.28326180257511, 23.072528919739334, 3.7829950113488127), 'Turkey': (88.08888888888889, 24.633333333333333, 3.576003608479928), 'Croatia': (87.21917808219177, 27.174160416353985, 3.209636535070332), 'Australia': (88.58050665521684, 36.233156357592215, 2.4447361356266684), 'Mexico': (85.25714285714285, 26.785714285714285, 3.182933333333333), 'Armenia': (87.5, 14.5, 6.0344827586206895), 'South Africa': (88.05638829407566, 29.577821794179147, 2.977108622359912), 'Israel': (88.47128712871287, 33.5615655251901, 2.6360894000105426), 'Slovakia': (87.0, 16.0, 5.4375), 'Serbia': (87.5, 24.5, 3.5714285714285716), 'Bulgaria': (87.93617021276596, 14.645390070921986, 6.004358353510896), 'Italy': (88.56223132036847, 46.217621803872724, 1.9162005283652987), 'Portugal': (88.25021964505359, 35.137228194299844, 2.511587401176113), 'Peru': (83.5625, 18.0625, 4.626297577854671), 'Romania': (86.4, 15.241666666666667, 5.6686714051394205), None: (88.63492063492063, 28.645492778168947, 3.094201287487374), 'US': (88.56372009393806, 36.80110724188513, 2.4065504201226693), 'Georgia': (87.68604651162791, 20.92993160772165, 4.189504684252194), 'Spain': (87.28833709556058, 28.86762804139494, 3.0237446932041956), 'Bosnia and Herzegovina': (86.5, 12.5, 6.92), 'England': (91.58108108108108, 54.16377107059393, 1.6908180370550563)}
```


In [121]:
three_countries = {country: new_dict[country][2] for country in new_dict if country == 'Cyprus' or country == 'Brazil' or country == 'India'}
print(three_countries)
print(max(three_countries))

{'Brazil': 2.817982456140351, 'India': 4.842105263157895}
India


### Question 8 (1 point)

Create a list called `top_wines` that contains all the wines that have achieved the maximum rating (0.1 points)

* First calculate what's the maximum rating and then extract all the wine reviews that have that rating.
* The result should be a list of dictionaries, where each dictionary is a wine review, something like the following.

```python
top_wines = [
    {
        'country': 'France',
        'description': 'This is a top wine',
        'points': 100,
        'price': 100,
        'title': 'Top Wine A',
        'variety': 'Top Variety A',
        'winery': 'Top Winery A'
    },
    {
        'country': 'Italy',
        'description': 'This is another top wine',
        'points': 100,
        'price': 100,
        'title': 'Second Top Wine B',
        'variety': 'Top Variety B',
        'winery': 'Top Winery B'
    }
]
```

* What is the country with the most top wines? (0.3 points)
* What is the average price of a top wine? (0.3 points)
* What is the most present variety of top wine? (0.3 points)



In [136]:
max_rating = {review['title']: review['points'] for review in wine}
max_points = {review['points'] for review in wine}
print(max(max_rating))
print(max(max_points))

top_wine_test = {review['title']: review['points'] for review in wine if review['points'] == 99}
print(top_wine_test)

top_wine = [review for review in wine if review['points'] == 99]
print(top_wine)

most_top_wines = {}
for review in top_wine:
    country = review['country']
    if country in top_wine and review['points'] == 99:
        most_top_wines[country] += 1
    elif review['points'] == 99:
        most_top_wines[country] = 1

print(f'The country with the most top wines is {max(most_top_wines)}')

top_wine_prices = [review['price'] for review in top_wine if review['points'] == 99]
average_top_wine_price = max(top_wine_prices) / len(top_wine_prices)
print(f'The average Top Wine price is {average_top_wine_price}')

most_present_variety = {}
for review in top_wine:
    variety = review['variety']
    if variety in most_present_variety:
        most_present_variety[variety] += 1
    else:
        most_present_variety[variety] = 1

print(f'The most present variety is {max(most_present_variety)}')

àMaurice 2008 Red (Columbia Valley (WA))
99
{'Venge 2008 Family Reserve Cabernet Sauvignon (Oakville)': 99, 'Chambers Rosewood Vineyards NV Rare Muscadelle (Rutherglen)': 99}
[{'points': 99, 'title': 'Venge 2008 Family Reserve Cabernet Sauvignon (Oakville)', 'description': "An absolute joy and triumph. Just superb, showcasing the best of Oakville. Perfect tannins, as pure as velvet and sweet, and perfect oak, too, with beautifully applied char and wood spice. That the oak is 100% new is in keeping with the wine's volumetrics. The wine's flavors are a profound, heady expression of blackberries, blueberries, cassis and dark, barely sweetened chocolate. Just spectacular, a real achievement by any world class standard. Production was a scant 275 cases.", 'taster_name': None, 'taster_twitter_handle': None, 'price': 125, 'designation': 'Family Reserve', 'variety': 'Cabernet Sauvignon', 'region_1': 'Oakville', 'region_2': 'Napa', 'province': 'California', 'country': 'US', 'winery': 'Venge'}, 

### Question 9 (1 point)

* Create a function called `affordable_wines` that receives the wines reviews list and a specific budget, and returns how many wines you can buy with that price. (0.5 points) 
* Create another function called `twitter_presence` that receives the wines reviews list and a wine name and returns True if the wine has a twitter handle for the taster, and False otherwise. (0.5 points)

Prove your functions with these examples:

* `affordable_wines(wines, 10)` should return 6280 wines in that budget
* `twitter_presence(wines, "Nicosia 2013 Vulkà Bianco  (Etna)")` should return True, meaning there is a twitter handle for the taster of that wine

### Question 10 (1 point)

* Which is the most common variety of wine in the dataset? (0.3 points)
* Which is the most expensive wine in the dataset? (0.3 points)
* Which is, on average, the most expensive variety of wine in the dataset? (0.2 points)
* Which is the taster (other than `None`) that has reviewed the most wines? (0.2 points)