# Assignment 5: Investigating Steam Game Data

## Your Information

At the start of each assignment, you will need to provide us your name and the name of the partner you worked with for this assignment (if you had one). In the cell below, replace "First Last" with your first name and last name. Replace "None" with the first and last name of your partner if you had one for this assignment. We ask for this information so we don't accuse you of cheating when your code looks like your partner's.

Please keep these lines commented so they don't cause an error.

In [549]:
# MY NAME: Angel Huang
# PARTNER: None

## Imports

Every project will begin with some import statements. It's crucial that you run the cell below, otherwise we will not be able to grade your code and provide feedback to you.

In [550]:
# it is considered a good coding practice to place all import statements at the top of the notebook

import os
import project
import student_grader
student_grader.initialize(os.getcwd(), "p5")

## Learning Objectives:

In this assignment you will:
- learn the project API
- write loops
- perform string manipulations

# Lab portion

## Segment 1: Learning the API


### Task 2.1: Examine the `steam.csv` file

The `project.py` file will allow you to access the dataset you'll use this week, `steam.csv`. We generated this data file by writing a Python program to extract data for the **most popular** games (i.e., games that have sold at least half a million copies) on [Steam](https://store.steampowered.com/), a game distribution platform.

Open `steam.csv` by double-clicking on the file from the left-hand file view in Jupyter lab. Feel free to use Microsoft Excel or some other Spreadsheet viewer as well. The data shows:

* `name` (the **name** of the game),
* `publisher` (the **name** of the **publisher** of this game),
* `release_date` (the **date** this game was **released**),
* `avg_playtime` (the **average** amount of **time** in minutes, that each purchaser **played** this game),
* `price` (the **price** in `$` of this game),
* `positive_reviews` (the **number** of **positive reviews** of this game),
* `negative_reviews` (the **number** of **negative reviews** of this game).

Often, we'll organize data by assigning numbers (called **indexes**) to different parts of the data (e.g., rows or columns in a table). In Computer Science, indexing typically starts with the number `0`; i.e., when you have a sequence of things, you'll start counting them from `0` instead of `1`. Thus, you should **ignore the numbers shown by your Spreadsheet Viewer to the left of the rows**. From the perspective of `project.py`, the indexes of `POSTAL`, `Half-Life`, and `Team Fortress Classic` are `0`, `1`, and `2` respectively (and so on).

For example, consider this example from `steam.csv` as viewed from Microsoft Excel:

![table.PNG](images/table.PNG)

The **index** for the `Team Fortress Classic` is `2`, but it is the third entry in the dataset, and it is on **row** `4` of the table. 

### Task 2.2: Explore the API

Use the inspection process we learned in Lab-P3 and Lab-P4 to know more details of the 'project' API. In Lab-P3, we saw how to use `dir` and `help` to learn the API. Run the following cells to explore the API:

In [551]:
# use the 'dir' function to learn more about the project API
dir(project)

['Dict',
 'List',
 '__annotations__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_games',
 'count',
 'csv',
 'csv_file',
 'csv_reader',
 'get_avg_playtime',
 'get_name',
 'get_negative_reviews',
 'get_positive_reviews',
 'get_price',
 'get_publisher',
 'get_release_date',
 'row']

Spend some time reading about each of the seven functions that don't begin with two underscores. For example, run this to learn about `count`:

In [552]:
help(project.count) 

Help on function count in module project:

count() -> int
    The number of games in the dataset



Alternatively, you could run the following to just see the function's documentation:

In [553]:
print(project.count.__doc__)

The number of games in the dataset


You may also open up the `project.py` file directly to learn about the functions provided. E.g., you might see this:

```python
def count() -> int:
    """The number of games in the dataset"""

    return len(_games)
```

You don't need to understand the code in the functions, but the strings in triple quotes (called *docstrings*) explain what each function does. All `project.count.__doc__` is providing you is the docstring of the `count` function. Most docstrings will include an explanation about what the function returns when called, here it is 'the number of games'. The `-> int` is a type hint that means the expected return type of this function is an integer.

Try to learn other functions in `project.py` by using `help` function. For example, you may try: 

In [554]:
help(project.get_name)

Help on function get_name in module project:

get_name(idx: int) -> str
    The name of the game at row idx



In [555]:
# now try getting help for the other functions in the `project` module
help(project.get_avg_playtime)

Help on function get_avg_playtime in module project:

get_avg_playtime(idx: int) -> int
    The average playtime (in hours) of the game at row idx



### Task 2.2.1: Getting familiar with `project.py`

You will now demonstrate your familiarity with the functions inside the `project` module by answering a few simple questions. You must have already imported the `project` module to this notebook. Make sure you placed the `import` statememnt at the **top** of the notebook in the designated cell.

**Remember:** In Computer Science, we start indexing at `0`.

#### Lab question 1

What is the `name` of the game at **index** `0`?

Points possible: 4.0

In [556]:
# we have done this for you!
name_idx0 = project.get_name(0)

name_idx0

'POSTAL'

In [557]:
student_grader.check("lab-q1", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q1...
Great job! You passed all test cases for this question.


True

#### Lab question 2


What is the `name` of the game at **index** `3`?

Points possible: 4.0

In [558]:
# replace the ... below with your code
name_idx3 = project.get_name(3)
name_idx3

'Legacy of Kain: Soul Reaver'

In [559]:
student_grader.check("lab-q2", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q2...
Great job! You passed all test cases for this question.


True

#### Lab question 3


What is the `publisher` of the game at **index** `9`? 

Points possible: 4.0

In [560]:
# replace the ... below with your code
publisher_idx9 = project.get_publisher(9)
publisher_idx9

'Valve'

In [561]:
student_grader.check("lab-q3", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q3...
Great job! You passed all test cases for this question.


True

#### Lab question 4

What is the `name` of the **last** game in the dataset? We have done this question for you, please don't edit it.

Remember that indices in python start at 0. For example, if you have a list containing 10 items, the first item will be at index 0 and the last item will be at index 9. That's why we subtract 1 here from the total number of games in the dataset.

Points possible: 4.0

In [562]:
# we have done this for you!
name_idx_last = project.get_name(project.count() - 1)
name_idx_last

'Against the Storm'

In [563]:
student_grader.check("lab-q4", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q4...
Great job! You passed all test cases for this question.


True

#### Lab question 5

How many `positive_reviews` does the game at **index** `7` have? 

Your positive review number will end with a `"K"`. In this dataset, `"K"` represents one thousand, and `"M"` represents one million. This is fine for now, but in the project portion, you'll need to convert these **strings** to **integers** (e.g., `"11.77K"` should become `11770`, `"2.55M"` should become `2550000`). After this question, we will review some string operations as we build up to how to perform this conversion.

Points possible: 4.0

In [564]:
# replace the ... below with your code
positive_reviews_idx7 =  project.get_positive_reviews(7)
positive_reviews_idx7

'11.77K'

In [565]:
student_grader.check("lab-q5", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q5...
Great job! You passed all test cases for this question.


True

## Segment 2: Working with strings

### Task 3.1: Indexing / slicing Strings

Stepping back from the Steam data, Tasks 3.1 and 3.2 introduce us to performing operations with strings. While this will be covered in more detail during Friday's lecture, we will cover the essentials now.

We can think of a string as a sequence of characters. For example, the string `my_str = 'hello_world!'` can be written as...

| index  | 0    | 1    | 2    | 3    | 4    | 5    | 6    | 7    | 8    | 9    | 10   | 11   |
| ------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| string | h    | e    | l    | l    | o    | _    | w    | o    | r    | l    | d    | !    |

... where we can then access specific characters of the string by an **index**, e.g. `my_str[0]` returns `'h'`, `my_str[1]` returns `'e'`, and `my_str[8]` returns `'r'`.

Furthermore, we can **slice** strings -- that is, get a particular section of characters. For example,

- `my_str[1:5]` returns `'ello'`
- `my_str[:8]` returns `'hello_wo'`
- `my_str[5:]` returns `'_world!'`
- `my_str[:]` returns `'hello_world!'`

Try running this in the cell below.

In [566]:
my_str = 'hello_world!'
print("my_str[0] returns", my_str[0])
print("my_str[8] returns", my_str[8])
print("my_str[1:5] returns", my_str[1:5])
print("my_str[:8] returns", my_str[:8])
print("my_str[5:] returns", my_str[5:])
print("my_str[:] returns", my_str[:])

my_str[0] returns h
my_str[8] returns r
my_str[1:5] returns ello
my_str[:8] returns hello_wo
my_str[5:] returns _world!
my_str[:] returns hello_world!


Notice that slicing is **inclusive** on the lower bound and **exclusive** on the upper bound. We can also leave out a bound to start from the beginning (e.g. `my_str[:6]`) or to keep going until the end (e.g. `my_str[8:]`). Lastly, a negative index will count **backwards** from the **end** of the string.

Try running the cell below.

In [567]:
print("my_str[-1] returns", my_str[-1])
print("my_str[-4:-1] returns", my_str[-4:-1])

my_str[-1] returns !
my_str[-4:-1] returns rld


**Your Turn!** Try slicing the below phone number! Can you extract the area code (first 3 digits), exchange code (middle 3 digits), and line number (last 4 digits) of the given phone number?

#### Lab question 6

What is the **last digit** of the phone number: `608-867-5309`?

Points possible: 4.0

In [568]:
# replace the ... with your code
phone_number = "608-867-5309"
last_digit = phone_number[-1]
last_digit

'9'

In [569]:
student_grader.check("lab-q6", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q6...
Great job! You passed all test cases for this question.


True

#### Lab question 7

What is the **area code** (i.e., the first three characters) of the phone number: `608-867-5309`?

Points possible: 4.0

In [570]:
# replace the ... with your code
phone_number = "608-867-5309"
area_code = phone_number[:3]

area_code

'608'

In [571]:
student_grader.check("lab-q7", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q7...
Great job! You passed all test cases for this question.


True

#### Lab question 8

What is the **line number** (i.e., the last four characters) of the phone number: `608-867-5309`?

Points possible: 4.0

In [572]:
# replace the ... with your code
phone_number = "608-867-5309"
line_number = phone_number[-4:]

line_number

'5309'

In [573]:
student_grader.check("lab-q8", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q8...
Great job! You passed all test cases for this question.


True

#### Lab question 9

What is the **exchange code** (i.e., middle three characters) of the phone number: `608-867-5309`?

Points possible: 4.0

In [574]:
# replace the ... with your code
phone_number = "608-867-5309"
exchange_code = phone_number[4:7]

exchange_code

'867'

In [575]:
student_grader.check("lab-q9", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q9...
Great job! You passed all test cases for this question.


True

#### Lab question 10

What is the **department code** (i.e., the letters at the start) of the course: `CS220`?

Points possible: 4.0

In [576]:
course = 'CS220'
dept_code = course[0:2]

dept_code

'CS'

In [577]:
student_grader.check("lab-q10", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q10...
Great job! You passed all test cases for this question.


True

#### Lab question 11

What is the **course code** (i.e., the numbers at the end) of the course: `CS220`?

Points possible: 4.0

In [578]:
course = 'CS220'
course_code = course[2:]

course_code

'220'

In [579]:
student_grader.check("lab-q11", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q11...
Great job! You passed all test cases for this question.


True

After that short detour, we will now go back to working on the Steam game dataset.

### Task 3.2: Calculating Number of Reviews

Question 4 showed us that the number of positive or negative reviews is represented a string with the suffix `'K'` for thousands and `'M'` for millions.

We can **index** the last character of these numbers to find the suffix, then use it to determine whether the suffix represents a thousand or a million.

#### Lab question 12

What is the **suffix** (i.e., the last character) of the number `"3.19M"`? Try to use negative indexing (indexing using a negative value; e.g., `-1` to represent the last character).

Points possible: 4.0

In [580]:
# replace the ... with your code
cost = "3.19M"
suffix = cost[-1]

suffix

'M'

In [581]:
student_grader.check("lab-q12", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q12...
Great job! You passed all test cases for this question.


True

#### Lab question 13

How many millions are there in the number `"3.19M"`?

Just as we found the suffix by **indexing**, we can also find the number by **slicing**. Answer the question by slicing the string to obtain the number of millions, then typecasting the **string** into a **float**. Try to use negative indexing.

Points possible: 4.0

In [582]:
# replace the ... with your code
cost = "3.19M"
millions = float(cost[:-1])

millions

3.19

In [583]:
student_grader.check("lab-q13", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q13...
Great job! You passed all test cases for this question.


True

#### Rounding `millions`

The idea now is that after using the suffix `'M'` to identify that the string is referring to **millions**, and by using the string **slicing** to identify the number of millions, you can convert the **string** to an **integer** by multiplying by one million (`10**6`) and rounding to the nearest int.

In [584]:
round(millions * (10**6))

3190000

### Task 3.3: Slicing dates

Run the below cell which prints the release date of the first game in the dataset.

In [585]:
print(project.get_release_date(0))

11/14/1997


The dates are represented as a string in `mm/dd/yyyy` notation. **Two digits** are used to represent the month and day even when they can be represented with a single digit, that is, `'9/4/1804'` is represented as `'09/04/1804'`.

To extract the month, we could run the following code...

In [586]:
project.get_release_date(0)[:2]

'11'

Notice, however, that this is the *string* `'11'`.

In the next question, you'll write the code to get this as the *int* (e.g. `11`).

#### Lab question 14

In which `month` is the game at **index** `0` released?

Your answer **must** be an `int` between `1` and `12`. You **must not** hardcode the answer, but use the appropriate function from the `project` module to find the release date of the game.

**Hint:** Try writing the code for this question step-by-step. For example, you could first access the entire date, check the output, then slice the month, check the output, then convert to an int.

Points possible: 4.0

In [587]:
# replace the ... with your code
month_idx0 = int(project.get_release_date(0)[:2])

month_idx0

11

In [588]:
student_grader.check("lab-q14", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q14...
Great job! You passed all test cases for this question.


True

### Task 3.4: Helper Functions for Month, Day, and Year

The below functions will be useful in P5. Solve the below questions for getting the day and year as an int. The function to get the month has already been done for you.

Be aware that for the three functions below, we take a `date` as input (e.g., `"02/21/2024"`), **not** the index of a game.

In [589]:
def get_month(date):
    """the month when the date is the in the 'mm/dd/yyyy' format"""
    return int(date[:2])

You can confirm that `get_month` works by running the cell below.

In [590]:
month = get_month("02/21/2024")
month

2

#### Lab question 15

You must now define the `get_year` function, which will take in the `date` as a `str` and return the `year` as an `int`.

Points possible: 4.0

In [591]:
# replace the pass with your code

def get_year(date):
    """The year when the date is the in the 'mm/dd/yyyy' format"""
    
    return int(date[6:])

In [592]:
student_grader.check("lab-q15", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q15...
Great job! You passed all test cases for this question.


True

You must now define this function, which will take in the `date` as a `str` and return the `day` as an `int`.

#### Lab question 16

You must now define the `get_day` function, which will take in the `date` as a `str` and return the `day` as an `int`.

Points possible: 4.0

In [593]:
# replace the pass with your code

def get_day(date):
    """get_day(date) returns the day when the date is the in the 'mm/dd/yyyy' format"""
    
    return int(date[3:5])

In [594]:
student_grader.check("lab-q16", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q16...
Great job! You passed all test cases for this question.


True

### Task 3.5: Using Helper Functions

Using the helper functions you made above, complete the following questions.

**Hint:** You'll use these helper functions in combination with functions from the project module.

#### Lab question 17

On what `day` was the game at **index** `200` released?

You **must** answer this question by calling the `get_day` function. Find the release date of the game using the appropriate function in the `project` module.

Points possible: 4.0

In [595]:
# replace the ... with your code
day_released_idx200 = get_day(project.get_release_date(200))

day_released_idx200

25

In [596]:
student_grader.check("lab-q17", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q17...
Great job! You passed all test cases for this question.


True

#### Lab question 18

In which `year` was the game at **index** `300` released?

You **must** answer this question by calling the `get_year` function.

Points possible: 4.0

In [597]:
# replace with your code
year_released_idx300 =  get_year(project.get_release_date(300))

year_released_idx300

2011

In [598]:
student_grader.check("lab-q18", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q18...
Great job! You passed all test cases for this question.


True

#### Lab question 19

In which `month` was the game at **index** `400` released?

You **must** answer this question by calling the `get_month` function.

Points possible: 4.0

In [599]:
# replace the ... with your code
month_released_idx400 =  get_month(project.get_release_date(400))

month_released_idx400

1

In [600]:
student_grader.check("lab-q19", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q19...
Great job! You passed all test cases for this question.


True

## Segment 3: Looping

### Task 4.1: `while` and `for` loops

Run the below code and observe the output.

In [601]:
idx = 0
while idx < 5:
    print(idx)
    idx += 1

0
1
2
3
4


Equivalently, we can use `for` and `range(n)`. The `range(n)` function returns a sequence of numbers, from `0` to `n` but not including `n`.

In [602]:
for idx in range(5):
    print(idx)

0
1
2
3
4


Now, we will try to use `while` and `for` loops to answer a few simple questions.

#### Lab question 20

What is the sum of the numbers *0 to 25*, both *inclusive*? 

You **must** answer this with a `while` loop.

Note that you can uncomment/comment mulitple lines at a time in jupyter lab by highlighting the lines you wish to uncomment/comment and by holding `control` and hitting `/` on Windows or by holding `command` and hitting `/` on Mac.

Points possible: 4.0

In [603]:
# Uncomment and complete the while loop, replace each ... with your code

i = 0
sum_while = 0

while i<26:
    sum_while += i 
    i += 1

sum_while

325

In [604]:
student_grader.check("lab-q20", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q20...
Great job! You passed all test cases for this question.


True

#### Lab question 21

What is the sum of the numbers *0 to 25*, both inclusive?

You **must** answer this with a `for` loop.

Points possible: 4.0

In [605]:
# Uncomment and complete the for loop, replace each ... with your code

sum_for = 0

for i in range(26):
    sum_for += i

sum_for

325

In [606]:
student_grader.check("lab-q21", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q21...
Great job! You passed all test cases for this question.


True

### Task 4.2: Looping through games


You have had some practice with simple looping structures. You will now loop through the Steam game dataset.

Run the below code and observe the output.

In [607]:
for idx in range(10):
    print(project.get_name(idx))

POSTAL
Half-Life
Team Fortress Classic
Legacy of Kain: Soul Reaver
Half-Life: Opposing Force
Counter-Strike
Ricochet
Gothic 1
Deathmatch Classic
Half-Life: Blue Shift


Can you make the code above display the **year of release** of the first 10 games? How about the **first 15** games? Please feel free to reach out to your TA/PM and ask them for help, if you face any issues.

You are now ready to answer some interesting questions with loops.

#### Lab question 22

What is the **total** `avg_playtime` (in minutes) of the **first** `10` games in the dataset?

Points possible: 4.0

In [608]:
# replace each ... with your code
total_avg_playtime_first10 = 0
for idx in range(10):
    total_avg_playtime_first10 += project.get_avg_playtime(idx)

total_avg_playtime_first10

997

In [609]:
student_grader.check("lab-q22", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q22...
Great job! You passed all test cases for this question.


True

### Task 4.3: Filtering


You will now *filter* the data using an `if` condition as you loop through the dataset.

#### Lab question 23

How many games were played in average for **at least** 200 hours in the dataset?

**Hint:** Recall that the `project.count()` function returns the total number of games in the dataset.

Points possible: 4.0

In [610]:
# replace each ... with your code
num_game_200_hours = 0
for idx in range(project.count()): # loop through ALL games in the dataset; do NOT hardcode the number here
    if project.get_avg_playtime(idx)>=200: # replace ... with a Boolean expression to check if the average playtime of the game in the current iteration is at least 200
        num_game_200_hours += 1

num_game_200_hours

1444

In [611]:
student_grader.check("lab-q23", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q23...
Great job! You passed all test cases for this question.


True

#### Lab question 24

How many games have a `name` that **starts** with the letter `'G'` in the dataset?

Points possible: 4.0

In [612]:
num_game_g = 0

# TODO: loop through all games in the dataset
#     TODO: inside the loop, update the value of 'num_game_g' only if
#           the name of the game at the current idx starts with 'G'
for i in range(project.count()):
    if project.get_name(i)[0] == 'G':
        num_game_g += 1
        
num_game_g

77

In [613]:
student_grader.check("lab-q24", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q24...
Great job! You passed all test cases for this question.


True

#### Lab question 25

How many games in the dataset are published by *Ubisoft* (i.e. games whose value of `publisher` equal to `"Ubisoft"`)?

Points possible: 4.0

In [614]:
num_game_ubisoft = 0

# TODO: loop through all games in the dataset
#     TODO: inside the loop, update the value of 'num_game_ubisoft' only if
#           the publisher of the game at the current idx is 'Ubisoft'
for i in range(project.count()):
    if project.get_publisher(i) == 'Ubisoft':
        num_game_ubisoft += 1
        
num_game_ubisoft

48

In [615]:
student_grader.check("lab-q25", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for lab-q25...
Great job! You passed all test cases for this question.


True

### Submission

Save this p5.ipynb notebook and submit it to the lab-p5 assignment on Gradescope.

## Project Portion (20 questions, 7 functions)

We won't explain how to use the `project` module to access this data here (the code in the `project.py` file). Refer to lab-p5 to understand how the module works. If necessary, use the `help` function to learn about the various functions inside `project.py`. Feel free to take a look at the `project.py` code, if you are curious about how it works.

This project consists of writing code to answer 20 questions.

### Dataset

The dataset you will be working with in this project is linked [here](./steam.csv). Be sure to look at this csv to see what it contains, and specifically what the names of the columns are.

If needed, you can open the `steam.csv` file, to verify answers to simple questions, but you must still have the correct code in your submission!

### Project Requirements:

You **must not** hardcode indices in your code unless specified in the question. If you are not sure what hardcoding is, here is a simple test you can use to determine whether you have hardcoded:

*If we were to change the data (e.g. add more games, remove some games, or swap some columns or rows), would your code still find the correct answer to the question as it is asked?*

### Required Functions:

- `format_price`
- `format_num_reviews`
- `get_month`
- `get_day`
- `get_year`
- `best_in_range`
- `get_year_total`
    
Students are only allowed to use Python commands and concepts that have been taught in the course prior to the release of P5. Therefore, **you should not use concepts/modules such as lists, dictionaries, or the pandas module, to name a few examples**. Otherwise, the grader may not award points, even if your answer is correct.

### Incremental Coding and Testing:

You should always strive to do incremental coding. Incremental coding enables you to avoid challenging bugs. Always write a few lines of code and then test those lines of code, before proceeding to write further code. You can call the `print` function to test intermediate step outputs. You can also use the debugger that we covered in p2.

#### Project Question 1

How **many** games does the dataset have?

Points possible: 2.0

In [616]:
# Replace the None with your code

num_games = project.count()
num_games

2254

In [617]:
student_grader.check("q1", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q1...
Great job! You passed all test cases for this question.


True

#### Project Question 2

What is the `publisher` of the game at index *220*?

Points possible: 2.0

In [618]:
# Replace the None with your code

publisher_220 = project.get_publisher(220)
publisher_220

'Bethesda Softworks'

In [619]:
student_grader.check("q2", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q2...
Great job! You passed all test cases for this question.


True

#### Project Question 3

What is the `name` of the game at the **end** of the dataset?

**Hint**: Your code should work even if the number of games in the dataset were to change. You should use the `num_games` variable from q1 to determine the last index, then get the name of the game at that index.

Points possible: 4.0

In [620]:
# Replace each None with your code

last_index = project.count() - 1
name_last_index = project.get_name(last_index)
name_last_index

'Against the Storm'

In [621]:
student_grader.check("q3", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q3...
Great job! You passed all test cases for this question.


True

#### Project Function 1: `format_price(price)`

If you look at the dataset, you will notice that the price of each game is not a number. It is a string with a dollar sign (`'$'`) as prefix. For example, `project.get_price(10)` returns `'$14.99'`. You will have to convert these 'numbers' into **floats** before you can perform any mathematical operations on them.

Since you will need to format the price of any game before you can apply mathematical operations, you **must** create a general helper function that handles the `'$'` prefix. This function should take in the strings from the `price` column as input, and return a **float**. Refer back to Lab-P5 to understand how to slice a string when calculating the number of reviews, as that will help you here.

**Warning:** Your function `format_price(price)` must take in the price as a **string** and return a float.

Points possible: 2.0

In [622]:
# replace the pass with your code

def format_price(price):
    return float(price[1:])

In [623]:
student_grader.check("format_price", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for format_price...
Great job! You passed all test cases for this question.


True

#### Project Question 4

How many games in the dataset have `price` **under** 10 dollars?

You should loop through all of the indices using `range(num_games)`, use `project.get_price` to get the price, and the `format_price` function to convert the price to a float.

Points possible: 4.0

In [624]:
# Replace the ... with your code

cheap_games = 0

for i in range(num_games):
    if format_price(project.get_price(i)) < 10:
        cheap_games += 1

cheap_games

1011

In [625]:
student_grader.check("q4", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q4...
Great job! You passed all test cases for this question.


True

#### Project Question 5

What is the (formatted) `price` of the game *Call of Duty: Modern Warfare*?

There is **exactly one** game in this dataset named *Call of Duty: Modern Warfare*. You should exit the loop, and **stop** iterating as soon as you find the game named *Call of Duty: Modern Warfare* by using `break`.

Points possible: 4.0

In [626]:
# Replace the ... below with your code

price_call_of_duty = None

for i in range(num_games):
    if project.get_name(i) == "Call of Duty: Modern Warfare":
        price_call_of_duty = format_price(project.get_price(i))
        break

price_call_of_duty

59.99

In [627]:
student_grader.check("q5", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q5...
Great job! You passed all test cases for this question.


True

#### Project Question 6

What is the **average** `price` of all games published by *Warner Bros. Games*?

**Hint:** Loop through all the games in the dataset, and use `project.get_publisher` to find all games published by *Warner Bros. Games*. Then, use `project.get_price` and `format_price` to compute the prices of these games.

You should find the **total** `price` of the games published by *Warner Bros. Games*, as well as the **number** of games published by them, and **divide** them to find the **average**.

Points possible: 4.0

In [628]:
# Replace each ... with your code

total_price_warner_bros = 0
count_warner_bros = 0

for i in range(project.count()):
    if project.get_publisher(i) == "Warner Bros. Games":
        count_warner_bros += 1
        total_price_warner_bros += format_price(project.get_price(i))

average_price_warner_bros = total_price_warner_bros/count_warner_bros
average_price_warner_bros

41.35454545454546

In [629]:
student_grader.check("q6", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q6...
Great job! You passed all test cases for this question.


True

#### Project Question 7

What is the **highest** average playtime (`avg_playtime`) of any game in the dataset?

Your answer **must** be an **integer** which is the **highest** value of average playtime for any game in the dataset. 

**Hint:** Loop through the dataset, and calculate the average playtime for each game using `project.get_avg_playtime`. Update the value of the `highest_avg_playtime` variable whenever you find an average playtime **greater than** its current value.

Points possible: 4.0

In [630]:
# Replace the ... below with your code

highest_avg_playtime = 0

for idx in range(num_games):
    if project.get_avg_playtime(idx) > highest_avg_playtime:
        highest_avg_playtime = project.get_avg_playtime(idx)

highest_avg_playtime

114219

In [631]:
student_grader.check("q7", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q7...
Great job! You passed all test cases for this question.


True

#### Project Function 2: `format_num_reviews(num_reviews)`

You will notice if you look at the dataset that the number of positive and negative reviews are not stored directly as numbers. Instead they are strings which have a suffix (`"K"`, or `"M"`) attached at the very end. You will have to convert these 'numbers' into integers before you can perform any mathematical operations on them. 

Since you will need to format the number of positive and negative reviews for multiple games, you **must** create a general helper function that handles the `"K"`, and `"M"` suffixes. Remember that `"K"` stands for thousand, and `"M"` stands for million. For example, your function should convert the string `"13.5M"` to `13500000`, `"6.9K"` to `6900`, and so on. This will require you to first change the `"6.9"` into the float `6.9` and multiply by the value of K, which is `1000`, to get `6900.0`. Note that for **some** games, the `numer of reviews` does **not** have **any** suffixes. For instance, the game `Grand Theft Auto` at index `80` has `'332'` positive reviews. Your function **must** also deal with such inputs.

This function should take in a string from either the `positive_reviews` and `negative_reviews` columns as input, and return an **int** (use the `round` function to convert **floats** into **ints**). Refer to Task 3.2 in Lab-P5 to understand how to slice and calculate the number of reviews.

**Warning:** Your function `format_num_reviews(num_reviews)` must take in the number of reviews as a **string**, and it should return an **integer**.

Points possible: 3.0

In [632]:
# Replace each ... with your code

def format_num_reviews(num_reviews):
    last_char = num_reviews[-1]

    # Handle reviews ending with "M", like "13.5M"
    if last_char == 'M':
        review_num = float(num_reviews[:-1])*1000000

    # Handle reviews ending with "K", like "6.9K"
    elif last_char == 'K':
        review_num = float(num_reviews[:-1])*1000

    # Handle reviews with no suffix, like "332"
    else:
        review_num = float(num_reviews)

    # Round review_num to an int and return
    return round(review_num)

In [633]:
student_grader.check("format_num_reviews", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for format_num_reviews...
Great job! You passed all test cases for this question.


True

#### Project Question 8

How many `positive reviews` does the game named *South Park: The Stick of Truth* have?

There is **exactly one** game in this dataset named *South Park: The Stick of Truth*. You **must** exit the loop, and **stop** iterating as soon as you find the game named *South Park: The Stick of Truth*.

You **must** use the `format_num_reviews` function to answer this question. Your answer **must** be an `int`. 

Points possible: 4.0

In [634]:
# Replace each ... with your code

num_positive_reviews_southpark = 0

for i in range(project.count()):
    if project.get_name(i) == "South Park: The Stick of Truth":
        num_positive_reviews_southpark = format_num_reviews(project.get_positive_reviews(i))
        break

num_positive_reviews_southpark

57460

In [635]:
student_grader.check("q8", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q8...
Great job! You passed all test cases for this question.


True

#### Project Question 9

What is the the **highest** number of **negative reviews** received by any game?

You **must** use the `project.get_negative_reviews` and `format_num_reviews` functions to answer this question.

Points possible: 4.0

In [636]:
# Replace each ... with your code

most_negative_reviews = 0
for idx in range(num_games):
    this_game_negative_reviews = format_num_reviews(project.get_negative_reviews(idx))
    if this_game_negative_reviews > most_negative_reviews:
        most_negative_reviews = this_game_negative_reviews
most_negative_reviews

961680

In [637]:
student_grader.check("q9", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q9...
Great job! You passed all test cases for this question.


True

#### Project Question 10

What is the `name` of the game with the **highest** number of **negative reviews**?

You **must** find the `name` of the game you found in the previous question. There is a **unique** game in the dataset with the highest number of negative reviews, so you do **not** have to worry about **breaking ties**. Specifically, you can just loop through the games to find the index of the game that has the same (`==`) number of negative reviews as the `most_negative_reviews` variable you computed in the question above.

Points possible: 3.0

In [638]:
# Replace the ... with your code

most_negative_reviews_name = None

for i in range(project.count()):
    if format_num_reviews(project.get_negative_reviews(i)) == most_negative_reviews:
        most_negative_reviews_name = project.get_name(i)
        break

most_negative_reviews_name

'Counter-Strike: Global Offensive'

In [639]:
student_grader.check("q10", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q10...
Great job! You passed all test cases for this question.


True

#### Project Question 11

How **many** games in the dataset were released on *February 19*?

You **must** find the number of games whose month of release was *February* (i.e., `month` *2*), and day of release was *19*. The `year` does not matter.

**Hint:** You can find the release date of each game using the `project.get_release_date` function. You can extract the `month` and `day` using the `get_month` and `get_day` functions that you wrote in the lab portion of this assignment.

Points possible: 5.0

In [640]:
# Replace the ... with your code

num_released_feb_19 = 0
for i in range(project.count()):
    if get_month(project.get_release_date(i)) == 2  and  get_day(project.get_release_date(i)) == 19:
        num_released_feb_19 += 1

num_released_feb_19

7

In [641]:
student_grader.check("q11", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q11...
Great job! You passed all test cases for this question.


True

#### Project Question 12:

What is **earliest** `year` in which a game was released that has a `price` of **at least** `25`?

**Warning:** You need to find the **earliest** release year among those games with `price` **at least** `25`. 

Here, the `earliest_over_25_release_year` is initialized as `None` instead of `0` because we don't want `0` to be mistaken as a valid year value. We wouldn't want to make the mistake of checking if the current release year is less than `earliest_over_25_release_year` when `earliest_over_25_release_year = 0` because we will never find a year value less than zero. Instead, we check if `earliest_over_25_year == None` **or** if the current release year is less than `earliest_over_25_release_year`.

Points possible: 7.0

In [642]:
# Replace each ... with your code

earliest_over_25_release_year = None

for idx in range(num_games):
    
    # Skip games with a price less than 25
    if format_price(project.get_price(idx))<25:
        continue

    # Update earliest_over_25_release_year if this game was released before it 
    this_game_release_year = get_year(project.get_release_date(idx))
    if earliest_over_25_release_year == None or  this_game_release_year < earliest_over_25_release_year:
        earliest_over_25_release_year = this_game_release_year

earliest_over_25_release_year

2007

In [643]:
student_grader.check("q12", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q12...
Great job! You passed all test cases for this question.


True

#### Background for q13

In the next question code, the goal is to find the first game from a specific publisher that has the lowest price from our dataset. To ensure that we're picking the first game with the lowest price (and not a later one), we must use the `<` operator when comparing prices.

**Why use `<` instead of `<=`?**

The code will loop through a list of games, checking each game published by Chubby Pixel.

At each step, we will compare the price of the current game with the current lowest price (cheapest_chubby_pixel_price).

If we used `<=` (less than or equal to): Every time the loop finds another Chubby Pixel game with the same lowest price, it would replace the previous game, even if it comes later in the list. This means the game with the last occurrence of the lowest price would be chosen.

By using `<` (less than): We only update the cheapest_chubby_pixel_idx and cheapest_chubby_pixel_price if we find a game with a price lower than the current one. This way, once we've found a game with the lowest price, we won't replace it with any game that has the same price later on. This ensures the first game with the lowest price gets selected.

**Example to clarify:**

Suppose the games list has the following Chubby Pixel prices at different indices:

* Index 2: $5

* Index 5: $3 (lowest price)

* Index 8: $3 (same lowest price)

If we use <=, the second occurrence of the price ($3 at index 8) would overwrite the one found earlier (at index 5), and you'd end up returning the last game with the lowest price.

If we use <, the game at index 5 (the first one with the lowest price) will be chosen and won't be replaced when the loop encounters another game with the same price later.

**Key takeaway**

For this question, use `<` to ensure that once you find the first game with the lowest price, you won't accidentally replace it with a game that has the same price but comes later in the list.

#### Project Question 13

What is the `name` of the game with the **lowest** `price` that was published by *Chubby Pixel*?

You **must** find the `name` of the game published by *Chubby Pixel* that has the **lowest** value of `price`. There are **multiple** *Chubby Pixel* games tied with the **lowest** `price`. 

You need to find the **first** such game (i.e., with the least index) that appears in the dataset. **Hint**: make sure you read the above section for background on this question. The code you end up writing will look similar to the code for question 12.

Points possible: 7.0

In [644]:
# Replace each ... with your code

cheapest_chubby_pixel_idx = None
cheapest_chubby_pixel_price = None

# Loop through the dataset to find the right value for cheapest_chubby_pixel_idx
for idx in range(project.count()):
    if project.get_publisher(idx) == "Chubby Pixel":
        if (cheapest_chubby_pixel_price == None) or (format_price(project.get_price(idx)) < cheapest_chubby_pixel_price):
            cheapest_chubby_pixel_price = format_price(project.get_price(idx)) 
            cheapest_chubby_pixel_idx = idx

# Use your idx variable to get the name of the game with `project.get_name(cheapest_chubby_pixel_idx)`
cheapest_chubby_pixel = project.get_name(cheapest_chubby_pixel_idx)
cheapest_chubby_pixel

'Heaven Island - VR MMO'

In [645]:
student_grader.check("q13", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q13...
Great job! You passed all test cases for this question.


True

#### Project Question 14

What is the `name` of the game with the **highest** `price` that was published by *Bethesda Softworks*?

You **must** find the `name` of the game published by *Bethesda Softworks* that has the **highest** value of `price`. There are **multiple** games tied with the **highest** `price`. You need to find the **last** such game (i.e., with the highest index) that appears in the dataset. To accomplish this, you should use the `>=` here when comparing prices.

Points possible: 7.0

In [646]:
# Replace each ... with your code

priciest_bethesda_idx = None
highest_bethesda_price = None

for idx in range(project.count()):
    if project.get_publisher(idx) == "Bethesda Softworks":
        if (highest_bethesda_price == None) or (format_price(project.get_price(idx)) >= highest_bethesda_price):
            highest_bethesda_price = format_price(project.get_price(idx)) 
            priciest_bethesda_idx = idx

priciest_bethesda = project.get_name(priciest_bethesda_idx)
priciest_bethesda

'Starfield'

In [647]:
student_grader.check("q14", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q14...
Great job! You passed all test cases for this question.


True

#### Project Function 6: `best_in_range(year1, year2)`

When deciding what games to buy, people sometimes choose the **most positively reviewed** games. We want to write a function to compute this.

The function `best_in_range(year1, year2)` should take in two years, `year1` and `year2`, as its inputs, and return the **index** of the game which was released between `year1` and `year2` (both **inclusive**) and has the **most** `positive_reviews`. In case of any ties, you must return the index of the **first** game in the dataset with the most `positive_reviews`.

If there are **no** games released between the given `year1` and `year2`, the function **must** return `None`.

Remember that you can simplify the check for a value within a range. If you have the variable `release_year` for a game as an integer, that could look like this:

```Python
if year1 <= release_year <= year2:
```

Points possible: 6.0

In [648]:
# Replace the ... with your code

def best_in_range(year1, year2):
    best_idx = None
    best_num_reviews = None

    for idx in range(project.count()):
        if year1 <= get_year(project.get_release_date(idx)) <= year2:
            if best_num_reviews == None or (format_num_reviews(project.get_positive_reviews(idx))>best_num_reviews):
                best_num_reviews = format_num_reviews(project.get_positive_reviews(idx))
                best_idx = idx
      
    return best_idx

In [649]:
student_grader.check("best_in_range", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for best_in_range...
Great job! You passed all test cases for this question.


True

#### Project Question 15

What is the `name` of the **best** game released between *2016* and *2020* (inclusive of both bounds)?

You **must** answer this question by calling the `best_in_range` function.

Points possible: 3.0

In [650]:
# Replace each ... with your code

best_game_2016_2020_idx = best_in_range(2016, 2020)
best_game_2016_2020 = project.get_name(best_game_2016_2020_idx)
best_game_2016_2020

'PUBG: BATTLEGROUNDS'

In [651]:
student_grader.check("q15", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q15...
Great job! You passed all test cases for this question.


True

#### Project Question 16

What is the `price` of the **best** game in `year` *2022*?

You **must** use `project.get_price` to get the price of the game, and `format_price` to convert the price to `float`.

**Hint:** Think of a clever way to use function `best_in_range`.

Points possible: 4.0

In [652]:
# Replace each ... with your code

best_game_2022_idx = best_in_range(2022, 2022)
best_game_2022_price = format_price(project.get_price(best_game_2022_idx))
best_game_2022_price

59.99

In [653]:
student_grader.check("q16", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q16...
Great job! You passed all test cases for this question.


True

#### Project Question 17

Who **published** the **best** game of all time in the dataset?

Think of a clever way to use function `best_in_range`.

Points possible: 4.0

In [654]:
# Replace the ... with your code

min_release_year = None
max_release_year = None
for i in range (project.count()):
    if min_release_year == None or get_year(project.get_release_date(i))< min_release_year:
        min_release_year = get_year(project.get_release_date(i))
    if max_release_year == None or get_year(project.get_release_date(i))> max_release_year:
        max_release_year = get_year(project.get_release_date(i))

best_game_publisher = project.get_publisher(best_in_range(min_release_year, max_release_year))
best_game_publisher

'Valve'

In [655]:
student_grader.check("q17", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q17...
Great job! You passed all test cases for this question.


True

#### Project Function 7: `get_year_total(year)`

This function should take in `year` as its input and return the number of games released in the given `year`. When no game was released in the given `year`, this function should return `0`.

Points possible: 4.0

In [656]:
# Replace the ... with your code
def get_year_total(year):
    count = 0
    for i in range(project.count()):
        if get_year(project.get_release_date(i)) == year:
            count += 1
    return count

In [657]:
student_grader.check("get_year_total", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for get_year_total...
Great job! You passed all test cases for this question.


True

#### Project Question 18

How **many** games were released in the `year` *2023*?

You **must** answer this question by calling `get_year_total`.

Points possible: 3.0

In [658]:
# Replace the ... with your code

total_games_2023 = get_year_total(2023)
total_games_2023

86

In [659]:
student_grader.check("q18", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q18...
Great job! You passed all test cases for this question.


True

#### Project Question 19

How **many** games were released in the **21st century** (*2001 to 2024*, both inclusive)?

You **must** answer this question by **looping** across the years in this period, and calling the function `get_year_total`.

Points possible: 5.0

In [660]:
# Replace the ... with your code

total_games_in_century = 0
year = 2001
for i in range(24):
    total_games_in_century += get_year_total(year)
    year += 1
    
total_games_in_century

2247

In [661]:
student_grader.check("q19", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q19...
Great job! You passed all test cases for this question.


True

#### Project Question 20

What is the **average** `number of reviews` received by the **best** game of each year from *2001 - 2024*, both inclusive?

You **must** use the `best_in_range` function to identify the **best** game of each year, and you **must** use `format_num_reviews` to convert the `positive_reviews` and `negative_reviews` into an `int`, and **add** them up to get the `number of reviews`. You **must** ignore any year in which no games were released while computing the average.

Points possible: 5.0

In [662]:
# Replace each ... with your code

total_num_reviews = 0
num_years = 0

for i in range(2001,2025):
    if best_in_range(i, i) == None:   #那年沒遊戲就pass
        continue  
    total_num_reviews += format_num_reviews(project.get_positive_reviews(best_in_range(i,i))) + format_num_reviews(project.get_negative_reviews(best_in_range(i,i)))
    num_years += 1

average_num_reviews_best = total_num_reviews/num_years
average_num_reviews_best

1013364.5652173914

In [663]:
student_grader.check("q20", should_get_llm_feedback=False)

Make sure you saved the notebook before running this cell. Running check for q20...
Great job! You passed all test cases for this question.


True

## Submission and Grading

**Congrats on finishing p5!**

Make sure you have saved and run all cells in your notebook in order before submitting on Gradescope. Your notebook should not contain any uncaught Exceptions, otherwise the Gradescope autograder will not give you full points.  Also, make sure you have put your name in the cell at the top of this notebook.

To shut down the kernel and close Jupyter select `File -> Shut Down` in the menu.

Make sure you complete the required [Canvas quiz](https://canvas.wisc.edu/courses/427075/quizzes/578050) about the LLM feedback feature.