# Assignment 7: Dictionaries (Soccer)

## Changes to the LLM feedback feature

Thank you for filling out our Canvas survey on the feedback feature. A large majority people wanted the feature to be offered for the rest of the course, so we'll be including the feature in the rest of the assignments this semester.

Many people mentioned they found the input-based method for controlling the feedback to be annoying, so we've integrated it into the grader check:

```Python
student_grader_with_feedback.check("q1", should_get_llm_feedback=False)
```

If you are stuck and want help from an LLM to address the autograder output, just change the second parameter's value in the check call above to `True`. Then you can turn it back off by setting it back to `False`. Each question will by default have the feedback feature turned off. 

Many students said that the output was too cryptic, so we've tried to change the service to be more direct and concise.

Remember that this tool costs us money to run and that you are supposed to use this feature only when you are stuck. Do your best to troubleshoot issues in your code by using print statements and the debugging tools before you turn on LLM feedback for a question.

## Your Information

At the start of each assignment, you will need to provide us your name and the name of the partner you worked with for this assignment (if you had one). Double click on the cell below or click once and hit enter to edit it. Replace "First Last" with your first name and last name. Replace "None" with the first and last name of your partner if you had one for this assignment. We ask for this information so we don't accuse you of cheating when your code looks like your partner's.

Please keep these lines commented so they don't cause an error.

In [1]:
# MY NAME: Hyokyung Kim
# PARTNER: None

## Learning Objectives

In this assignment, you will...

- Write programs to interpret data present in csv files
- Use lists and dictionaries effectively to manage data

## Imports

Every project will begin with some import statements. It's crucial that you run the cell below, otherwise we will not be able to grade your code and provide feedback to you. 

In [2]:
import math
import csv
import os
import student_grader_with_feedback
student_grader_with_feedback.initialize(os.getcwd(), "p7")

## Lab portion (1 question)

### Dataset

In P7, you will be analyzing the dataset obained from the video game [FC'24](https://www.ea.com/games/ea-sports-fc/fc-24) (formerly the FIFA series). The full dataset has some data on nearly 20,000 soccer stars who play in the top soccer leagues across the world, so we will use a smaller subset of this data.

You will practice creating dictionaries using the data in `small_soccer_stars.csv` and using dictionaries to answer simple questions. This dataset has all the same columns as the full dataset, and in the same format, but much fewer rows (only **50**), so it should be easier to get started with our functions and data structures on this dataset.

For now, open `small_soccer_stars.csv` by clicking on the name of the file from the file tab on the left of Jupyter Lab (or using Microsoft Excel or some other Spreadsheet viewer) and look at the list of players in the dataset. A small portion of the dataset `small_soccer_stars.csv` is reproduced here:

ID|Name|Age|Nationality|Team|League|Value|Wage|Attacking|Movement|Defending|Goalkeeping|Overall rating|Position|Height|Preferred foot
--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--
239085|E. Haaland|22|Norway|Manchester City|Premier League (England)|€185M|€340K|78.6|83.6|38.0|10.4|91|ST|195cm|Left
231747|K. Mbappé|24|France|Paris Saint Germain|Ligue 1 (France)|€181.5M|€230K|83.0|92.4|30.7|8.4|91|ST|182cm|Right
192985|K. De Bruyne|32|Belgium|Manchester City|Premier League (England)|€103M|€350K|82.4|78.8|63.0|11.2|91|CM|181cm|Right
202126|H. Kane|29|England|FC Bayern München|Bundesliga (Germany)|€119.5M|€170K|88.0|74.0|43.3|10.8|90|ST|188cm|Right
192119|T. Courtois|31|Belgium|Real Madrid|La Liga (Spain)|€63M|€250K|17.2|58.0|18.0|86.6|90|GK|200cm|Left

The data shows:
- `ID` : the **unique ID** used by FC'24 for identifying each player,
- `Name` : the **name** of the player,
- `Age` : the **age** of the player,
- `Nationality` : the **national team** the player represents,
- `Team` : the **football club** that the player represents,
- `League`: the **football league** that the player's club is a part of,
- `Value` : the **value** of the player in the transfer market (in **Euros**),
- `Wage` : the **weekly wages** earned by the player as per their contract (in **Euros**),
- `Attacking` : the **total attacking stats** of the player (out of **100.0**),
- `Movement` : the **total movement stats** of the player (out of **100.0**),
- `Defending` : the **total defending stats** of the player (out of **100.0**),
- `Goalkeeping` : the **total goalkeeping stats** of the player (out of **100.0**),
- `Overall rating` : the **overall rating** of the player (out of **100**),
- `Position` : the player's **favored position** on the soccer field,
- `Height` : the **height** of the player (in **centimeters**),
- `Preferred foot` : the player's **favored foot**.

### Getting familiar with the dataset

In this section, we've provided a lot of code for you. Just go line by line and make sure you know how it works!

In [3]:
def process_csv(filename):
    example_file = open(filename, encoding="utf-8")
    example_reader = csv.reader(example_file)
    example_data = list(example_reader)
    example_file.close()
    return example_data

In [4]:
csv_data = process_csv("small_soccer_stars.csv")

# splits the header and other rows into appropriate variables
csv_header = csv_data[0]
csv_rows = csv_data[1:]

Basic questions about players

In [5]:
# What is the first player's `Name`?
name_col_idx = csv_header.index('Name')
first_player_name = csv_rows[0][name_col_idx]
print("The first player's name is: ", first_player_name)

# What `Team` does the second player in the dataset play for?
team_col_idx = csv_header.index('Team')
second_player_team = csv_rows[1][team_col_idx]
print("The second player plays for: ", second_player_team)

# What is the `Age` of the third player in the dataset?
age_col_idx = csv_header.index('Age')
third_player_age = csv_rows[2][age_col_idx]
print("The third player is ", third_player_age, " years old.")

# What is the `Height` of the fourth player in the dataset?
height_col_idx = csv_header.index('Height')
fourth_player_height = csv_rows[3][height_col_idx]
print("The fourth player is ", fourth_player_height, " tall.")

The first player's name is:  E. Haaland
The second player plays for:  Paris Saint Germain
The third player is  32  years old.
The fourth player is  188cm  tall.


As we can see from the output above, indexing into our `csv_rows` variable returns string values.

In particular, the `Height` data was represented in units of **cm**. The `Height` data of **all** players in the dataset has the first few characters as numbers that represent the player's **height** and the last two characters are `"cm"` representing the **unit**. If we want to compare the heights of different players, it would make sense to **slice** off the units `"cm"`, and to represent the number as an **int**. 

In [6]:
fourth_player_height_without_units = fourth_player_height[:-2]
fourth_player_height_int = int(fourth_player_height_without_units)
fourth_player_height_int

188

Similarly, the `Wage` and `Value` values are strings with extra characters that don't represent the number.

In [7]:
# What is the `Wage` of the fifth player?
wage_col_idx = csv_header.index('Wage')
fifth_player_wage = csv_rows[4][wage_col_idx]
print("The wage of the fifth player is: ", fifth_player_wage)

# What is the `Value of the tenth player?
value_col_idx = csv_header.index('Value')
tenth_player_value = csv_rows[9][value_col_idx]
print("The value of the tenth player is: ", tenth_player_value)

The wage of the fifth player is:  €250K
The value of the tenth player is:  €158.5M


It would make sense to clean this data up a bit, and to represent these values as **ints**.

Recall that you already created a function in P5 to convert the **number of reviews** of the games into **ints**, and another function to convert the **price** of games into **floats**. The functions `format_price` and `format_num_reviews` can be made to work here with a few tweaks.

In [8]:
def format_euros(euros):
    euros = euros[1:]
    if euros[-1] == 'K':
        euros = float(euros[:-1])*1e3
    elif euros[-1] == 'M':
        euros = float(euros[:-1])*1e6
    else:
        euros = float(euros)
    return round(euros)

fifth_player_wage_int = format_euros(fifth_player_wage)
print("Unformatted: ", fifth_player_wage, " -> formatted: ", fifth_player_wage_int)

tenth_player_value_int = format_euros(tenth_player_value)
print("Unformatted: ", tenth_player_value, " -> formatted: ", tenth_player_value_int)

Unformatted:  €250K  -> formatted:  250000
Unformatted:  €158.5M  -> formatted:  158500000


Similar to the last assignment, we can create a `cell` function to retrieve values from our CSV data and typecast the result:

In [9]:
def cell(row_idx, col_name):
    col_idx = csv_header.index(col_name)
    val = csv_rows[row_idx][col_idx]
    if col_name in ['ID', 'Age', 'Overall rating']:
        return int(val)
    elif col_name in ['Attacking', 'Movement', 'Defending', 'Goalkeeping']:
        return float(val)
    elif col_name in ['Height']:
        return int(val[:-2])
    elif col_name in ['Wage', 'Value']:
        return format_euros(val)
    else:
        return val

# What is the `Value` of the last player?
last_player_value = cell(-1, 'Value')
print("The last player's value is: ", last_player_value)

# What is the `Overall rating` of the third to last player?
third_last_player_overall = cell(-3, 'Overall rating')
print("The third to last player's Overall rating is: ", third_last_player_overall)

# What is the `Attacking` stat of the thirteenth player in the dataset?
thirteenth_player_attacking = cell(12, 'Attacking')
print("The thirteenth player's Attacking stat is: ", thirteenth_player_attacking)

# What is the `Height` of the fourteenth player in the dataset? Notice how this is formatted!
fourteenth_player_height = cell(13, 'Height')
print("The fourteenth player's Height is: ", fourteenth_player_height)

# What is the `Preferred foot` of the fifteenth player in the dataset?
fifteenth_player_foot = cell(14, 'Preferred foot')
print("The fifteenth player's Preferred foot is: ", fifteenth_player_foot)

The last player's value is:  80000000
The third to last player's Overall rating is:  86
The thirteenth player's Attacking stat is:  79.8
The fourteenth player's Height is:  193
The fifteenth player's Preferred foot is:  Right


Hopefully the above code snippets gave you a good idea of the dataset you'll be working with for this project. You can always come back to this section if you're confused about how the dataset was constructed and how data is accessed.

### Lab question

#### Lab Question 1

Please confirm that you understand all of the above code and have not modified these important variables and functions: `csv_header`, `csv_data`, `format_euros`, and `cell`.

Points possible: 100.0

In [10]:
im_ready_to_start_project = True
im_ready_to_start_project

True

In [11]:
student_grader_with_feedback.check("lab-q1", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q1...
Great job! You passed all test cases for this question.


### Submit your Lab

**Congrats! You've reached the end of the lab.**

Please submit your notebook file on gradescope for lab-p7. The next section of the file belongs to the project, so make sure that you understand the contents of the lab before continuing. 

## Project portion (23 questions)

#### Project Question 1

Create a dictionary mapping each **column name** in `csv_header` to its **value** for the second player in `csv_rows` (i.e. the player at index `1`). Your code must call the `cell` function.

Points possible: 4.0

In [17]:
# replace the ... with your code

second_player = {}
for column in csv_header:
    second_player[column] = cell(1, column)

second_player

{'ID': 231747,
 'Name': 'K. Mbappé',
 'Age': 24,
 'Nationality': 'France',
 'Team': 'Paris Saint Germain',
 'League': 'Ligue 1 (France)',
 'Value': 181500000,
 'Wage': 230000,
 'Attacking': 83.0,
 'Movement': 92.4,
 'Defending': 30.7,
 'Goalkeeping': 8.4,
 'Overall rating': 91,
 'Position': 'ST',
 'Height': 182,
 'Preferred foot': 'Right'}

In [18]:
student_grader_with_feedback.check("q1", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q1...
Great job! You passed all test cases for this question.


#### Project Question 2

Find the `Position` of the second player by indexing into the dictionary `second_player`.

Points possible: 3.0

In [19]:
# replace the ... with appropriate values

second_player_position = second_player['Position']

second_player_position

'ST'

In [20]:
student_grader_with_feedback.check("q2", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q2...
Great job! You passed all test cases for this question.


#### Project Question 3

Find the `Height` of the second player by indexing into the dictionary `second_player`.

Points possible: 3.0

In [21]:
# replace the ... with appropriate values
second_player_height = second_player['Height']

second_player_height

182

In [22]:
student_grader_with_feedback.check("q3", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q3...
Great job! You passed all test cases for this question.


#### Background: dictionaries representing rows versus columns

We have now seen how to create a **dictionary** to represent all the data in a **single row** of the dataset. We will now create **dictionaries** to represent all the data in a **single column** of the dataset.

When we created **dictionaries** for each **row**, the **column names** were our **unique keys**. If we want to similarly create a **dictionary** for each **column**, we will need a **unique key** for **each row**. Luckily, the dataset already provides us with a **unique** `ID` for each player. So, we will let the `ID` of **each** player be the **key** corresponding to the data of the player.

#### Project Question 4

Create a dictionary mapping each player's `ID` to their `Movement` stat.

Your output **must** be a **dict** where each key is an `ID` and the value is the `Movement` stat. Key-value pairs within this **dict** should look like this:

```python
{
    239085: 83.6,
    231747: 92.4,
    192985: 77.6,
    202126: 74.0,
    192119: 58.0,
    ...
}
```

As a sidenote, in programming, you will often see variables storing dictionaries with a naming pattern of `value_by_key`. That's because indexing into it with this pattern makes it clearer what's being output: `value_by_key[key] == value`. To make this more concrete, the variable `player_movement_by_id` maps ids to movement stats, and indexing into it with an id would return a movement stat for the player with that id. 

Points possible: 5.0

In [25]:
player_movement_by_id = {}

# loop through all the rows of the dataset
for row_idx in range(len(csv_rows)):
    player_id = cell(row_idx, 'ID')  # extract the player's 'ID' using the `cell` function
    player_movement = cell(row_idx, 'Movement')  # extract the player's 'Movement' using the `cell` function
    player_movement_by_id[player_id] = player_movement  # add the `player_id`-`player_movement` key-value pair to the dict

player_movement_by_id


{239085: 83.6,
 231747: 92.4,
 192985: 78.8,
 202126: 74.0,
 192119: 58.0,
 188545: 80.8,
 158023: 87.0,
 264012: 88.8,
 239818: 65.6,
 238794: 90.8,
 231866: 68.4,
 212831: 54.6,
 209331: 90.2,
 203376: 70.0,
 192448: 52.4,
 190871: 87.4,
 165153: 78.4,
 252371: 82.2,
 239053: 81.8,
 231478: 85.4,
 218667: 84.4,
 210257: 64.6,
 200389: 58.4,
 194765: 84.8,
 121939: 80.2,
 256630: 84.0,
 235073: 48.2,
 232293: 82.8,
 230621: 57.6,
 228702: 83.4,
 222665: 79.2,
 215698: 57.0,
 212622: 79.4,
 212198: 76.8,
 207865: 75.8,
 200145: 64.8,
 200104: 83.6,
 193080: 56.4,
 177003: 82.8,
 167495: 53.6,
 153079: 80.8,
 265674: 86.6,
 256790: 88.0,
 253163: 71.4,
 251854: 84.4,
 247635: 83.4,
 246669: 85.4,
 241721: 87.4,
 241096: 82.8,
 240130: 77.4}

In [26]:
student_grader_with_feedback.check("q4", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q4...
Great job! You passed all test cases for this question.


#### Project Question 5

What is the `Movement` stat of the player with `ID` *239818*?

Points possible: 3.0

In [27]:
# we have done this one for you

player_239818_movement = player_movement_by_id[239818]
player_239818_movement

65.6

In [28]:
student_grader_with_feedback.check("q5", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q5...
Great job! You passed all test cases for this question.


#### Project Question 6

What is the `Movement` stat of the player with `ID` *209331*?

Points possible: 3.0

In [29]:
player_209331_movement = player_movement_by_id[209331]

player_209331_movement


90.2

In [30]:
student_grader_with_feedback.check("q6", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q6...
Great job! You passed all test cases for this question.


#### Project Question 7

Create a **dictionary** that maps the `ID` of **each** player to a **dictionary** that stores **all** the data about a player.

The first few key/value pairs of your **dict** will look like this:

```python
{239085: {'ID': 239085,
  'Name': 'E. Haaland',
  'Age': 22,
  'Nationality': 'Norway',
  'Team': 'Manchester City',
  'League': 'Premier League (England)',
  'Value': 185000000,
  'Wage': 340000,
  'Attacking': 78.6,
  'Movement': 83.6,
  'Defending': 38.0,
  'Goalkeeping': 10.4,
  'Overall rating': 91,
  'Position': 'ST',
  'Height': 195,
  'Preferred foot': 'Left'},
 231747: {'ID': 231747,
  'Name': 'K. Mbappé',
  'Age': 24,
  'Nationality': 'France',
  'Team': 'Paris Saint Germain',
  'League': 'Ligue 1 (France)',
  'Value': 181500000,
  'Wage': 230000,
  'Attacking': 83.0,
  'Movement': 92.4,
  'Defending': 30.7,
  'Goalkeeping': 8.4,
  'Overall rating': 91,
  'Position': 'ST',
  'Height': 182,
  'Preferred foot': 'Right'},
 ...
}
```

You will have to loop through the `csv_header` and add each column/value pair to the `player_dict` you create for the current player being considered at each iteration of your loop. Look back at question 1 if you get stuck.

Points possible: 5.0

In [31]:
player_dict_by_id = {}

for row_idx in range(len(csv_rows)):
    player_dict = {}
    
    # Loop through each column in the csv_header and populate the player_dict
    for column in csv_header:
        player_dict[column] = cell(row_idx, column)  # Extract column data using the cell function

    # Extract the player's ID
    player_id = player_dict['ID']
    
    # Add the player_id and player_dict to the player_dict_by_id dictionary
    player_dict_by_id[player_id] = player_dict

player_dict_by_id


{239085: {'ID': 239085,
  'Name': 'E. Haaland',
  'Age': 22,
  'Nationality': 'Norway',
  'Team': 'Manchester City',
  'League': 'Premier League (England)',
  'Value': 185000000,
  'Wage': 340000,
  'Attacking': 78.6,
  'Movement': 83.6,
  'Defending': 38.0,
  'Goalkeeping': 10.4,
  'Overall rating': 91,
  'Position': 'ST',
  'Height': 195,
  'Preferred foot': 'Left'},
 231747: {'ID': 231747,
  'Name': 'K. Mbappé',
  'Age': 24,
  'Nationality': 'France',
  'Team': 'Paris Saint Germain',
  'League': 'Ligue 1 (France)',
  'Value': 181500000,
  'Wage': 230000,
  'Attacking': 83.0,
  'Movement': 92.4,
  'Defending': 30.7,
  'Goalkeeping': 8.4,
  'Overall rating': 91,
  'Position': 'ST',
  'Height': 182,
  'Preferred foot': 'Right'},
 192985: {'ID': 192985,
  'Name': 'K. De Bruyne',
  'Age': 32,
  'Nationality': 'Belgium',
  'Team': 'Manchester City',
  'League': 'Premier League (England)',
  'Value': 103000000,
  'Wage': 350000,
  'Attacking': 82.4,
  'Movement': 78.8,
  'Defending': 63.0,

In [32]:
student_grader_with_feedback.check("q7", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q7...
Great job! You passed all test cases for this question.


#### WARNING: For the rest of the assignment, you MUST access all data using ONLY the data structure `player_dict_by_id`.

Now that we have created the data structure `player_dict_by_id`, we can access all the data that we need for this project using this Data Structure instead of using `csv_rows` or the `cell` function.

#### Project Question 8

Find **all** the statistics of the player with the `ID` *256630*.

Your output should look like this:

```python
{'ID': 256630,
 'Name': 'F. Wirtz',
 'Age': 20,
 'Nationality': 'Germany',
 'Team': 'Bayer 04 Leverkusen',
 'League': 'Bundesliga (Germany)',
 'Value': 118500000,
 'Wage': 77000,
 'Attacking': 73.2,
 'Movement': 84.0,
 'Defending': 50.3,
 'Goalkeeping': 11.4,
 'Overall rating': 87,
 'Position': 'CAM',
 'Height': 177,
 'Preferred foot': 'Right'}
```

Points possible: 3.0

In [33]:
player_256630 = player_dict_by_id[256630]

player_256630



{'ID': 256630,
 'Name': 'F. Wirtz',
 'Age': 20,
 'Nationality': 'Germany',
 'Team': 'Bayer 04 Leverkusen',
 'League': 'Bundesliga (Germany)',
 'Value': 118500000,
 'Wage': 77000,
 'Attacking': 73.2,
 'Movement': 84.0,
 'Defending': 50.3,
 'Goalkeeping': 11.4,
 'Overall rating': 87,
 'Position': 'CAM',
 'Height': 177,
 'Preferred foot': 'Right'}

In [34]:
student_grader_with_feedback.check("q8", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q8...
Great job! You passed all test cases for this question.


#### Project Question 9

Find the `Name` of the player with the `ID` *200104*.

Points possible: 3.0

In [35]:
player_200104 = player_dict_by_id[200104]  # Get the player dictionary for ID 200104
player_200104_name = player_200104['Name']  # Extract the 'Name' from the player dictionary

player_200104_name


'H. Son'

In [36]:
student_grader_with_feedback.check("q9", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q9...
Great job! You passed all test cases for this question.


#### Project Question 10

Find the `Overall rating` of the player with the `ID` *252371*.

Points possible: 3.0

In [37]:
player_252371 = player_dict_by_id[252371]  # Get the player dictionary for ID 252371
player_252371_rating = player_252371['Overall rating']  # Extract the 'Overall rating' from the player dictionary

player_252371_rating


88

In [38]:
student_grader_with_feedback.check("q10", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q10...
Great job! You passed all test cases for this question.


#### Project Question 11

Create a dictionary mapping each player's `ID` to their `Attacking` stat.

Your output **must** be a **dict** where each key is an `ID` and the value is the `Attacking` stat. Key-value pairs within this **dict** should look like this:

```python
{
    239085: 78.6,
    231747: 83.0,
    192985: 82.4,
    202126: 88.0,
    192119: 17.2,
    ...
}
```

Points possible: 5.0

In [39]:
attacking_stat_by_id = {}

for player_id in player_dict_by_id:
    player_attacking = player_dict_by_id[player_id]['Attacking']  # Extract the 'Attacking' stat
    attacking_stat_by_id[player_id] = player_attacking  # Add the ID and Attacking stat to the dict

attacking_stat_by_id


{239085: 78.6,
 231747: 83.0,
 192985: 82.4,
 202126: 88.0,
 192119: 17.2,
 188545: 86.6,
 158023: 81.8,
 264012: 82.6,
 239818: 57.0,
 238794: 73.8,
 231866: 74.0,
 212831: 27.8,
 209331: 79.8,
 203376: 63.0,
 192448: 23.6,
 190871: 80.0,
 165153: 86.6,
 252371: 77.0,
 239053: 74.4,
 231478: 80.6,
 218667: 77.0,
 210257: 26.2,
 200389: 19.0,
 194765: 86.4,
 121939: 69.2,
 256630: 73.2,
 235073: 17.0,
 232293: 78.2,
 230621: 16.0,
 228702: 76.6,
 222665: 78.2,
 215698: 26.2,
 212622: 77.6,
 212198: 80.4,
 207865: 66.8,
 200145: 74.4,
 200104: 81.2,
 193080: 20.6,
 177003: 76.0,
 167495: 24.8,
 153079: 81.2,
 265674: 70.2,
 256790: 69.4,
 253163: 62.6,
 251854: 66.8,
 247635: 73.6,
 246669: 74.6,
 241721: 75.8,
 241096: 74.0,
 240130: 63.2}

In [40]:
student_grader_with_feedback.check("q11", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q11...
Great job! You passed all test cases for this question.


#### Project Question 12

Create a dictionary mapping each player's `Name` to their `ID`.

Your output **must** be a **dict** where each key is a `Name` and the value is the `ID` stat.

**Note**: In general, creating a **dict** mapping each `Name` to the `ID` is a **bad idea** as names are **not** unique. Since **keys** in **dicts** need to be unique, this could result in some names being mapped to the **wrong** `ID` (of a **different** player with the **same** name). However, in `small_soccer_stars.csv`, the names have been **chosen** to be **unique**, so we don't have to worry about this issue.

The **first five** key-value pairs within this **dict** would be these:

```python
{
    'E. Haaland': 239085,
    'K. Mbappé': 231747,
    'K. De Bruyne': 192985,
    'H. Kane': 202126,
    'T. Courtois': 192119,
    ...
}
```

Points possible: 5.0

In [41]:
player_id_by_name = {}

for player_id in player_dict_by_id:
    player_name = player_dict_by_id[player_id]['Name']  # Extract the player's name
    player_id_by_name[player_name] = player_id  # Add the player's name and ID to the dict

player_id_by_name


{'E. Haaland': 239085,
 'K. Mbappé': 231747,
 'K. De Bruyne': 192985,
 'H. Kane': 202126,
 'T. Courtois': 192119,
 'R. Lewandowski': 188545,
 'L. Messi': 158023,
 'S. Smith': 264012,
 'Rúben Dias': 239818,
 'Vini Jr.': 238794,
 'Rodri': 231866,
 'Alisson': 212831,
 'M. Salah': 209331,
 'V. van Dijk': 203376,
 'M. ter Stegen': 192448,
 'Neymar Jr': 190871,
 'K. Benzema': 165153,
 'J. Bellingham': 252371,
 'F. Valverde': 239053,
 'L. Martínez': 231478,
 'Bernardo Silva': 218667,
 'Ederson': 210257,
 'J. Oblak': 200389,
 'A. Griezmann': 194765,
 'P. Lahm': 121939,
 'F. Wirtz': 256630,
 'G. Kobel': 235073,
 'V. Osimhen': 232293,
 'G. Donnarumma': 230621,
 'F. de Jong': 228702,
 'M. Ødegaard': 222665,
 'M. Maignan': 215698,
 'J. Kimmich': 212622,
 'Bruno Fernandes': 212198,
 'Marquinhos': 207865,
 'Casemiro': 200145,
 'H. Son': 200104,
 'De Gea': 193080,
 'L. Modrić': 177003,
 'M. Neuer': 167495,
 'S. Agüero': 153079,
 'S. Bacha': 265674,
 'J. Musiala': 256790,
 'R. Araujo': 253163,
 'Pedri

In [42]:
student_grader_with_feedback.check("q12", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q12...
Great job! You passed all test cases for this question.


#### Project Question 13

Create a dictionary mapping each player's `Name` to their `Nationality`.

Your output **must** be a **dict** where each key is a `Name` and the value is the `Nationality`. Key-value pairs in this dictionary should looke like this:

```python
{
    'E. Haaland': 'Norway',
    'K. Mbappé': 'France',
    'K. De Bruyne': 'Belgium',
    'H. Kane': 'England',
    'T. Courtois': 'Belgium',
    ...
}
```

Points possible: 5.0

In [43]:
nationality_by_name = {}

for player_id in player_dict_by_id:
    player_name = player_dict_by_id[player_id]['Name']  # Extract the player's Name
    player_nationality = player_dict_by_id[player_id]['Nationality']  # Extract the player's Nationality
    nationality_by_name[player_name] = player_nationality  # Map Name to Nationality

nationality_by_name


{'E. Haaland': 'Norway',
 'K. Mbappé': 'France',
 'K. De Bruyne': 'Belgium',
 'H. Kane': 'England',
 'T. Courtois': 'Belgium',
 'R. Lewandowski': 'Poland',
 'L. Messi': 'Argentina',
 'S. Smith': 'United States',
 'Rúben Dias': 'Portugal',
 'Vini Jr.': 'Brazil',
 'Rodri': 'Spain',
 'Alisson': 'Brazil',
 'M. Salah': 'Egypt',
 'V. van Dijk': 'Netherlands',
 'M. ter Stegen': 'Germany',
 'Neymar Jr': 'Brazil',
 'K. Benzema': 'France',
 'J. Bellingham': 'England',
 'F. Valverde': 'Uruguay',
 'L. Martínez': 'Argentina',
 'Bernardo Silva': 'Portugal',
 'Ederson': 'Brazil',
 'J. Oblak': 'Slovenia',
 'A. Griezmann': 'France',
 'P. Lahm': 'Germany',
 'F. Wirtz': 'Germany',
 'G. Kobel': 'Switzerland',
 'V. Osimhen': 'Nigeria',
 'G. Donnarumma': 'Italy',
 'F. de Jong': 'Netherlands',
 'M. Ødegaard': 'Norway',
 'M. Maignan': 'France',
 'J. Kimmich': 'Germany',
 'Bruno Fernandes': 'Portugal',
 'Marquinhos': 'Brazil',
 'Casemiro': 'Brazil',
 'H. Son': 'Korea Republic',
 'De Gea': 'Spain',
 'L. Modrić'

In [44]:
student_grader_with_feedback.check("q13", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q13...
Great job! You passed all test cases for this question.


#### Project Question 14

Create a dictionary mapping each `Preferred foot` (i.e., `'Right'` or `'Left'`) to the **number** of players with that `Preferred foot`.

This **dict** should be as follows:

```python
{'Left': 10, 'Right': 40}
```

If we want to add to the value in a dictionary, we have to first make sure that the key/value pair exists in the dictionary. The `if preferred_foot not in count_by_foot:` line in the code below helps us accomplish this. If we have not seen the key before, we initialize its count to 0.

Points possible: 5.0

In [45]:
count_by_foot = {}

for player_id in player_dict_by_id:
    preferred_foot = player_dict_by_id[player_id]['Preferred foot']  # Extract the player's Preferred foot
    if preferred_foot not in count_by_foot:
        count_by_foot[preferred_foot] = 0  # Initialize count if the foot hasn't been seen
    count_by_foot[preferred_foot] += 1  # Increment the count

count_by_foot


{'Left': 10, 'Right': 40}

In [46]:
student_grader_with_feedback.check("q14", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q14...
Great job! You passed all test cases for this question.


#### Project Question 15

Create a dictionary mapping each `Nationality` to the **number** of players with that `Nationality`.

This **dict** should look like this:

```python
{
    'Norway': 2,
    'France': 5,
    'Belgium': 2,
    'England': 3,
    'Poland': 1,
     ...
}
```

Points possible: 5.0

In [47]:
count_by_nationality = {}

for player_id in player_dict_by_id:
    nationality = player_dict_by_id[player_id]['Nationality']  # Extract the player's Nationality
    if nationality not in count_by_nationality:
        count_by_nationality[nationality] = 0  # Initialize count if the nationality hasn't been seen
    count_by_nationality[nationality] += 1  # Increment the count

count_by_nationality


{'Norway': 2,
 'France': 5,
 'Belgium': 2,
 'England': 3,
 'Poland': 1,
 'Argentina': 3,
 'United States': 1,
 'Portugal': 4,
 'Brazil': 7,
 'Spain': 3,
 'Egypt': 1,
 'Netherlands': 2,
 'Germany': 6,
 'Uruguay': 2,
 'Slovenia': 1,
 'Switzerland': 1,
 'Nigeria': 1,
 'Italy': 2,
 'Korea Republic': 1,
 'Croatia': 1,
 'Georgia': 1}

In [48]:
student_grader_with_feedback.check("q15", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q15...
Great job! You passed all test cases for this question.


#### Project Question 16

Create a dictionary mapping each `Nationality` to the **sum** of the `Attacking` stat of all the players with that `Nationality`.

The entries in this **dict** should look like this:

```python
{
    'Norway': 156.8,
    'France': 352.4,
    'Belgium': 99.6,
    'England': 239.6,
    'Poland': 86.6,
}
```

Points possible: 5.0

In [49]:
attacking_total_by_nationality = {}

for player_id in player_dict_by_id:
    nationality = player_dict_by_id[player_id]['Nationality']  # Extract the player's Nationality
    attacking = player_dict_by_id[player_id]['Attacking']  # Extract the player's Attacking stat
    
    if nationality not in attacking_total_by_nationality:
        attacking_total_by_nationality[nationality] = 0  # Initialize the sum if the nationality hasn't been seen
    attacking_total_by_nationality[nationality] += attacking  # Add the player's Attacking stat to the total

attacking_total_by_nationality


{'Norway': 156.8,
 'France': 352.4,
 'Belgium': 99.60000000000001,
 'England': 239.6,
 'Poland': 86.6,
 'Argentina': 243.59999999999997,
 'United States': 82.6,
 'Portugal': 290.2,
 'Brazil': 412.2,
 'Spain': 161.39999999999998,
 'Egypt': 79.8,
 'Netherlands': 139.6,
 'Germany': 337.79999999999995,
 'Uruguay': 137.0,
 'Slovenia': 19.0,
 'Switzerland': 17.0,
 'Nigeria': 78.2,
 'Italy': 90.0,
 'Korea Republic': 81.2,
 'Croatia': 76.0,
 'Georgia': 73.6}

In [50]:
student_grader_with_feedback.check("q16", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q16...
Great job! You passed all test cases for this question.


#### Project Question 17

Use the data in the **dictionaries** `player_id_by_name` and `player_movement_by_id` to create a **new dictionary** that maps each player's `Name` to their `Movement` stats.

Key-value pairs within this **dict** should look like this:

```python
{
    'E. Haaland': 83.6,
    'K. Mbappé': 92.4,
    'K. De Bruyne': 78.8,
    'H. Kane': 74.0,
    'T. Courtois': 58.0,
    ...
}
```

Managing multiple dictionaries can get confusing because you have to think about multiple types of key/value pairs at once. Remember that you can always see the values of all your global variables in the debugging window. You can also add print statements or use the debugger to step through your code.

Points possible: 5.0

In [51]:
movement_stat_by_name = {}

for player_name in player_id_by_name:
    player_id = player_id_by_name[player_name]  # Get the player's ID from player_id_by_name
    player_movement = player_movement_by_id[player_id]  # Get the player's Movement stat from player_movement_by_id
    movement_stat_by_name[player_name] = player_movement  # Map the player's name to their Movement stat

movement_stat_by_name


{'E. Haaland': 83.6,
 'K. Mbappé': 92.4,
 'K. De Bruyne': 78.8,
 'H. Kane': 74.0,
 'T. Courtois': 58.0,
 'R. Lewandowski': 80.8,
 'L. Messi': 87.0,
 'S. Smith': 88.8,
 'Rúben Dias': 65.6,
 'Vini Jr.': 90.8,
 'Rodri': 68.4,
 'Alisson': 54.6,
 'M. Salah': 90.2,
 'V. van Dijk': 70.0,
 'M. ter Stegen': 52.4,
 'Neymar Jr': 87.4,
 'K. Benzema': 78.4,
 'J. Bellingham': 82.2,
 'F. Valverde': 81.8,
 'L. Martínez': 85.4,
 'Bernardo Silva': 84.4,
 'Ederson': 64.6,
 'J. Oblak': 58.4,
 'A. Griezmann': 84.8,
 'P. Lahm': 80.2,
 'F. Wirtz': 84.0,
 'G. Kobel': 48.2,
 'V. Osimhen': 82.8,
 'G. Donnarumma': 57.6,
 'F. de Jong': 83.4,
 'M. Ødegaard': 79.2,
 'M. Maignan': 57.0,
 'J. Kimmich': 79.4,
 'Bruno Fernandes': 76.8,
 'Marquinhos': 75.8,
 'Casemiro': 64.8,
 'H. Son': 83.6,
 'De Gea': 56.4,
 'L. Modrić': 82.8,
 'M. Neuer': 53.6,
 'S. Agüero': 80.8,
 'S. Bacha': 86.6,
 'J. Musiala': 88.0,
 'R. Araujo': 71.4,
 'Pedri': 84.4,
 'K. Kvaratskhelia': 83.4,
 'B. Saka': 85.4,
 'Rafael Leão': 87.4,
 'S. Tonali'

In [52]:
student_grader_with_feedback.check("q17", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q17...
Great job! You passed all test cases for this question.


#### Project Question 18

Use the data in the **dictionaries** `player_id_by_name` and `attacking_stat_by_id` to create a **new dictionary** that maps each player's `Name` to their `Attacking` stat.

Key-value pairs within this **dict** should look like this:

```python
{
    'E. Haaland': 78.6,
    'K. Mbappé': 83.0,
    'K. De Bruyne': 82.4,
    'H. Kane': 88.0,
    'T. Courtois': 17.2,
    ...
}
```

Points possible: 5.0

In [53]:
attacking_stat_by_name = {}

for player_name in player_id_by_name:
    player_id = player_id_by_name[player_name]  # Get the player's ID from player_id_by_name
    player_attacking = player_dict_by_id[player_id]['Attacking']  # Get the player's Attacking stat from player_dict_by_id
    attacking_stat_by_name[player_name] = player_attacking  # Map the player's name to their Attacking stat

attacking_stat_by_name


{'E. Haaland': 78.6,
 'K. Mbappé': 83.0,
 'K. De Bruyne': 82.4,
 'H. Kane': 88.0,
 'T. Courtois': 17.2,
 'R. Lewandowski': 86.6,
 'L. Messi': 81.8,
 'S. Smith': 82.6,
 'Rúben Dias': 57.0,
 'Vini Jr.': 73.8,
 'Rodri': 74.0,
 'Alisson': 27.8,
 'M. Salah': 79.8,
 'V. van Dijk': 63.0,
 'M. ter Stegen': 23.6,
 'Neymar Jr': 80.0,
 'K. Benzema': 86.6,
 'J. Bellingham': 77.0,
 'F. Valverde': 74.4,
 'L. Martínez': 80.6,
 'Bernardo Silva': 77.0,
 'Ederson': 26.2,
 'J. Oblak': 19.0,
 'A. Griezmann': 86.4,
 'P. Lahm': 69.2,
 'F. Wirtz': 73.2,
 'G. Kobel': 17.0,
 'V. Osimhen': 78.2,
 'G. Donnarumma': 16.0,
 'F. de Jong': 76.6,
 'M. Ødegaard': 78.2,
 'M. Maignan': 26.2,
 'J. Kimmich': 77.6,
 'Bruno Fernandes': 80.4,
 'Marquinhos': 66.8,
 'Casemiro': 74.4,
 'H. Son': 81.2,
 'De Gea': 20.6,
 'L. Modrić': 76.0,
 'M. Neuer': 24.8,
 'S. Agüero': 81.2,
 'S. Bacha': 70.2,
 'J. Musiala': 69.4,
 'R. Araujo': 62.6,
 'Pedri': 66.8,
 'K. Kvaratskhelia': 73.6,
 'B. Saka': 74.6,
 'Rafael Leão': 75.8,
 'S. Tonali'

In [54]:
student_grader_with_feedback.check("q18", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q18...
Great job! You passed all test cases for this question.


#### Project Question 19

Use the data in the **dictionaries** `movement_stat_by_name` and `attacking_stat_by_name` to create a **dictionary** that maps each player's `Name` to the **sum** of their `Movement` and `Attacking` stats.

Key-value pairs within this **dict** should look like these:

```python
{
    'E. Haaland': 162.2,
    'K. Mbappé': 175.4,
    'K. De Bruyne': 161.2,
    'H. Kane': 162.0,
    'T. Courtois': 75.2,
    ...
}
```

Points possible: 5.0

In [55]:
sum_stats_by_name = {}

for player_name in movement_stat_by_name:
    movement_stat = movement_stat_by_name[player_name]  # Get the Movement stat
    attacking_stat = attacking_stat_by_name[player_name]  # Get the Attacking stat
    sum_stats_by_name[player_name] = movement_stat + attacking_stat  # Sum of Movement and Attacking stats

sum_stats_by_name


{'E. Haaland': 162.2,
 'K. Mbappé': 175.4,
 'K. De Bruyne': 161.2,
 'H. Kane': 162.0,
 'T. Courtois': 75.2,
 'R. Lewandowski': 167.39999999999998,
 'L. Messi': 168.8,
 'S. Smith': 171.39999999999998,
 'Rúben Dias': 122.6,
 'Vini Jr.': 164.6,
 'Rodri': 142.4,
 'Alisson': 82.4,
 'M. Salah': 170.0,
 'V. van Dijk': 133.0,
 'M. ter Stegen': 76.0,
 'Neymar Jr': 167.4,
 'K. Benzema': 165.0,
 'J. Bellingham': 159.2,
 'F. Valverde': 156.2,
 'L. Martínez': 166.0,
 'Bernardo Silva': 161.4,
 'Ederson': 90.8,
 'J. Oblak': 77.4,
 'A. Griezmann': 171.2,
 'P. Lahm': 149.4,
 'F. Wirtz': 157.2,
 'G. Kobel': 65.2,
 'V. Osimhen': 161.0,
 'G. Donnarumma': 73.6,
 'F. de Jong': 160.0,
 'M. Ødegaard': 157.4,
 'M. Maignan': 83.2,
 'J. Kimmich': 157.0,
 'Bruno Fernandes': 157.2,
 'Marquinhos': 142.6,
 'Casemiro': 139.2,
 'H. Son': 164.8,
 'De Gea': 77.0,
 'L. Modrić': 158.8,
 'M. Neuer': 78.4,
 'S. Agüero': 162.0,
 'S. Bacha': 156.8,
 'J. Musiala': 157.4,
 'R. Araujo': 134.0,
 'Pedri': 151.2,
 'K. Kvaratskhelia

In [56]:
student_grader_with_feedback.check("q19", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q19...
Great job! You passed all test cases for this question.


#### Project Question 20

Use the data in the **dictionaries** `count_by_nationality` and `attacking_total_by_nationality` to create a **dictionary** that maps each `Nationality` in the dataset to the **average** `Attacking` stat for players of that `Nationality`.

Key-value pairs within this **dict** should look like these:

```python
{
    'Norway': 78.4,
    'France': 70.48,
    'Belgium': 49.8,
    'England': 79.86666666666666,
    'Poland': 86.6,
    ...
}
```

Points possible: 5.0

In [57]:
avg_attacking_by_nationality = {}

for nationality in attacking_total_by_nationality:
    total_attacking = attacking_total_by_nationality[nationality]  # Total Attacking stat for this nationality
    player_count = count_by_nationality[nationality]  # Number of players from this nationality
    avg_attacking_by_nationality[nationality] = total_attacking / player_count  # Calculate the average

avg_attacking_by_nationality


{'Norway': 78.4,
 'France': 70.47999999999999,
 'Belgium': 49.800000000000004,
 'England': 79.86666666666666,
 'Poland': 86.6,
 'Argentina': 81.19999999999999,
 'United States': 82.6,
 'Portugal': 72.55,
 'Brazil': 58.885714285714286,
 'Spain': 53.79999999999999,
 'Egypt': 79.8,
 'Netherlands': 69.8,
 'Germany': 56.29999999999999,
 'Uruguay': 68.5,
 'Slovenia': 19.0,
 'Switzerland': 17.0,
 'Nigeria': 78.2,
 'Italy': 45.0,
 'Korea Republic': 81.2,
 'Croatia': 76.0,
 'Georgia': 73.6}

In [58]:
student_grader_with_feedback.check("q20", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q20...
Great job! You passed all test cases for this question.


#### Project Question 21

Which `Nationality` has the players with the **highest** **average** `Attacking` stat?

To answer this question, you should loop over the nations in `avg_attacking_by_nationality` and if the current nation has a higher average attacking stat than the current highest attacking stat (or if the value of `highest_avg_nation` is `None`), then you should update the `highest_avg_nation` variable.

Points possible: 5.0

In [59]:
highest_avg_nation = None
highest_avg_stat = -float('inf')  # Initialize to negative infinity to ensure any stat will be higher

for nationality in avg_attacking_by_nationality:
    avg_attacking_stat = avg_attacking_by_nationality[nationality]  # Get the average attacking stat for the nationality
    if highest_avg_nation is None or avg_attacking_stat > highest_avg_stat:
        highest_avg_nation = nationality  # Update to the nationality with the highest average
        highest_avg_stat = avg_attacking_stat  # Update the highest attacking stat

highest_avg_nation


'Poland'

In [60]:
student_grader_with_feedback.check("q21", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q21...
Great job! You passed all test cases for this question.


#### Project Question 22

What is the `Name` of the player with the **highest** `Attacking` stat?

You **must** find the **key** (i.e., `Name`) with the **highest** **value** (i.e., `Attacking` stat) in the **dict** `attacking_stat_by_name`.

Points possible: 5.0

In [61]:
best_attacker = None
highest_attacking_stat = -float('inf')  # Initialize to negative infinity

for player_name in attacking_stat_by_name:
    attacking_stat = attacking_stat_by_name[player_name]  # Get the Attacking stat for the player
    if best_attacker is None or attacking_stat > highest_attacking_stat:
        best_attacker = player_name  # Update to the player with the highest attacking stat
        highest_attacking_stat = attacking_stat  # Update the highest attacking stat

best_attacker


'H. Kane'

In [62]:
student_grader_with_feedback.check("q22", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q22...
Great job! You passed all test cases for this question.


#### Project Question 23

What is the `Name` of the player with the **least** sum of `Movement` and `Attacking` stats?

**Hint**: Use the `sum_stats_by_name` dict.

Points possible: 5.0

In [63]:
worst_attacker = None
lowest_sum_stat = float('inf')  # Initialize to positive infinity

for player_name in sum_stats_by_name:
    sum_stat = sum_stats_by_name[player_name]  # Get the sum of Movement and Attacking stats for the player
    if worst_attacker is None or sum_stat < lowest_sum_stat:
        worst_attacker = player_name  # Update to the player with the lowest sum
        lowest_sum_stat = sum_stat  # Update the lowest sum stat

worst_attacker


'G. Kobel'

In [121]:
student_grader_with_feedback.check("q23", should_get_llm_feedback=False)

## Submission

Make sure you have run all cells in your notebook in order before submitting on Gradescope. Your notebook should not contain any uncaught Exceptions, otherwise the Gradescope autograder will not give you any points.