# Homework 3: DataFrames, Control Flow, and Probability

## Due Tuesday, February 4th at 11:59PM

Welcome to Homework 3! This homework will cover lots of different topics:
- Grouping with subgroups (see [BPD 11](https://notes.dsc10.com/02-data_sets/groupby.html#subgroups))
- Merging DataFrames (see [BPD 13](https://notes.dsc10.com/02-data_sets/merging.html))
- Conditional statements (see [CIT 9.1](https://inferentialthinking.com/chapters/09/1/Conditional_Statements.html))
- Iteration (see [CIT 9.2](https://inferentialthinking.com/chapters/09/2/Iteration.html))
- Probability (see [CIT 9.5](https://inferentialthinking.com/chapters/09/5/Finding_Probabilities.html))

### Instructions

Remember to start early and submit often. You are given six slip days throughout the quarter to extend deadlines. See the syllabus for more details. With the exception of using slip days, late work will not be accepted unless you have made special arrangements with your instructor.

**Important**: For homeworks, the `otter` tests don't usually tell you that your answer is correct. More often, they help catch careless mistakes. It's up to you to ensure that your answer is correct. If you're not sure, ask someone (not for the answer, but for some guidance about your approach). These are great questions for office hours (the schedule can be found [here](https://dsc10.com/calendar)) or Ed. Directly sharing answers is not okay, but discussing problems with the course staff or with other students is encouraged. 

In [None]:
# Please don't change this cell, but do make sure to run it.
import babypandas as bpd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (10, 5)

import numpy as np
import otter
grader = otter.Notebook()

# We need to import some extra packages for some fun demonstrations.
from ipywidgets import interact, widgets
from IPython.display import YouTubeVideo, HTML, display, clear_output, Image, IFrame

### Supplemental Video on DataHub and Jupyter Notebooks

In Lab 0, we linked you to a video that walks you through key ideas you should be aware of when working on DataHub and in Jupyter Notebooks, including
- how files are organized on DataHub
- what it means to "restart the kernel"
- how to use keyboard shortcuts (most important: use `SHIFT + ENTER` to run a cell!)

Now that you have some experience with Jupyter Notebooks, we're linking this video again for your convenience. If you feel a little shaky on how to work your way around a notebook or troubleshoot issues, we recommend you give it another watch. (When troubleshooting, make sure to always check the [Debugging](https://dsc10.com/debugging/) tab on the course website as well.)

The video is quite long, but if you open the video directly on YouTube (which you can do by clicking the video's title after it loads in the next cell) you'll see timestamps in the description which you can use to jump to different parts of the video depending on what you'd like to learn more about.

In [None]:
# Run this cell.
YouTubeVideo('Hq8VaNirDRQ')

## 0. Mid-Quarter Survey

We'd like to hear from you on how DSC 10 has been going so far this quarter. To do so, we've put together a survey that asks you to provide feedback on all aspects of the course. You can provide as much or as little detail as you'd like. We value your input and will use the results of the survey to improve the course!

This survey is entirely anonymous, though you are free to leave your name and email if you want. The responses to the survey will be visible to both course staff and the Data Science Student Representatives. There will also be a question at the end of the survey that will allow you to provide feedback on the DSC program as a whole.

<center><h3>Click <a href="https://forms.gle/3CPD5WACZEZqHmmR7">here</a> to access the survey.</h3></center>

After completing the survey, enter the keyword provided at the end of the survey to get credit towards this homework assignment.

In [None]:
survey_keyword = ...

In [None]:
grader.check("q0")

## 1. 100 Years of "J" Baby Names 👶🏻

What letter does your first name start with? In this problem, we'll look at baby names starting with the letter "J". The file `data/baby_names.csv` contains information from the [Social Security Administration](https://www.ssa.gov/oact/babynames/limits.html) about "J" baby names in the US from 1924 to 2023 — that's one hundred years of data! Run the cell below to read in the data.

In [None]:
baby = bpd.read_csv('data/baby_names.csv')
baby

The DataFrame `baby` has a row for each `'State'` (50 US states plus Washington DC), `'Gender'` (`'M'` or `'F'`, as assigned at birth), `'Year'` (between 1924 and 2023), and `'Name'`. The `'Count'` column records the number of babies of that gender who were given that name in one state in one year.

The first row in `baby` contains the name John. Below, we look at only the rows corresponding to the name John.

In [None]:
baby[baby.get('Name') == 'John']

The first row of the DataFrame shows that there were 36 male babies named John born in Alaska in 1924. There are many other rows corresponding to the name John, which come from other years, other states, and also female babies named John, of which there are some!


Run the cell below to find out when and where many female Johns were born.

In [None]:
female_john = baby[(baby.get('Name') == 'John') & (baby.get('Gender') == 'F')]
female_john.sort_values(by='Count', ascending=False)

**Question 1.1.** There are many more male Johns than female Johns, so let's look at the popularity of the name John in male babies over time. Create a line plot that shows how the number of male babies named John has changed over time in the US. Then use your plot to answer the question that follows.

In [None]:
# Create your line plot here.

Around what year was the peak in popularity for the name John in male babies? Choose the closest answer from the options below and set `male_john_peak` to 1, 2, 3, or 4 corresponding to your answer choice.
1. 1930
2. 1950
3. 1970
4. 1990

In [None]:
male_john_peak = ...

In [None]:
grader.check("q1_1")

**Question 1.2.** In the `baby` DataFrame, how many babies of each gender were born in each state? Create a DataFrame named `num_babies` with one row for each gender in each state and columns `'State'`, `'Gender'`, and `'Count'`, which contains the total number of babies of each gender in each state with a "J" name. The first few rows of `num_babies` are shown below.

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>State</th>
      <th>Gender</th>
      <th>Count</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>AK</td>
      <td>F</td>
      <td>15495</td>
    </tr>
    <tr>
      <th>1</th>
      <td>AK</td>
      <td>M</td>
      <td>44767</td>
    </tr>
    <tr>
      <th>2</th>
      <td>AL</td>
      <td>F</td>
      <td>191205</td>
    </tr>
    <tr>
      <th>3</th>
      <td>AL</td>
      <td>M</td>
      <td>555313</td>
    </tr>
  </tbody>
</table>

***Hints:***
- You can do this in one line of code.
- Don't forget to use `.reset_index()`.


In [None]:
num_babies = ...
num_babies

In [None]:
grader.check("q1_2")

A gendered name is a combination of a name and a gender, such as female John. Let's explore the average age of people with each gendered name. For example, let's calculate the average age of all female Johns.

In [None]:
female_john

We'll define the age of a person as 2024 (the current year) minus the year in which the person was born. This doesn't take into account people's birthdays, because we don't have that information. For example, if a female John was born in 1984, they will be counted as 2024 - 1984 = 40 years old. Therefore the **total age** of all the female Johns is given below.

In [None]:
total_age = ((2024 - female_john.get('Year')) * female_john.get('Count')).sum()
total_age

To find the average age, we need to know how many female Johns there are. The **total count** of female Johns is given below.

In [None]:
total_count = female_john.get('Count').sum()
total_count

Therefore the **average age** of female Johns is given below.

In [None]:
average_age = total_age / total_count
average_age

Notice that we _cannot_ calculate the average age of female Johns as follows.

In [None]:
age = 2024 - female_john.get('Year')
age.mean()

This is incorrect because it does not take into account the fact that there were more female Johns born some years than others. 

**Question 1.3.** Create a DataFrame named `avg_age` that has one row for each gendered name and columns `'Gender'`, `'Name'`, and `'Average_Age'`, which contains the average age of all people with each gendered name. The first few rows of `avg_age` are shown below.

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Gender</th>
      <th>Name</th>
      <th>Average_Age</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>F</td>
      <td>Ja</td>
      <td>24.000000</td>
    </tr>
    <tr>
      <th>1</th>
      <td>M</td>
      <td>Ja</td>
      <td>24.571429</td>
    </tr>
    <tr>
      <th>2</th>
      <td>F</td>
      <td>Jace</td>
      <td>10.451613</td>
    </tr>
    <tr>
      <th>3</th>
      <td>M</td>
      <td>Jace</td>
      <td>11.472549</td>
    </tr>
  </tbody>
</table>

***Hints:***
- Before attempting this question, make sure you understand the strategy shown above for finding the average age of female Johns. You will need to generalize this approach.
- This is a multi-step problem. Add cells and display your intermediate results so you can see your progress as you go.
- You should check that the average age for female Johns in your DataFrame `avg_age` is the same as we found above.


In [None]:
avg_age = ...
avg_age

In [None]:
grader.check("q1_3")

## 2. Holding Space 👸🏼💗💚🧙🏻‍♀️

*Wicked (2024)* is a film adaptation of a stage musical of the [same name](https://en.wikipedia.org/wiki/Wicked_(musical)), which are both based on the *Oz* books. Set up against a backdrop of political intrigue and conflict in the Land of Oz, the movie follows the story of Elphaba Thropp, a character who is ostracized due to her green skin and who becomes the Wicked Witch of the West. The movie tells the story of Elphaba's unexpected friendship with Galinda Upland, a ditzy yet caring character who ultimately becomes Glinda the Good.
<br>
<br>
<center><img src=images/wicked_poster.jpg width=250><br>(<a href="http://www.impawards.com/2024/wicked_ver2_xlg.html">source</a>)</center>

The film broke numerous records, most notably becoming the [highest-grossing film adaptation of a Broadway musical](https://www.billboard.com/lists/broadway-musical-films-biggest-box-office-wicked/). Therefore, it seems fitting to analyze the film's music because after all, it is a **musical**. 🎶

In the `singers` DataFrame below, each row contains information about a song and an artist who performed in that song. There can be multiple rows with the same `'Song Title'` since a song can include more than one singer, and there can be multiple rows with the same `'Artist'` because an artist can sing more than one song on the soundtrack. <br>

In [None]:
singers = bpd.read_csv('data/wicked_soundtrack.csv')
singers

The `'Artist'` column includes the real names of the artists who performed each song, but most moviegoers will know the names of the characters better than the names of the actors who portrayed them.

Run the cell below to load in a dataset which contains information about many of the characters and respective actors. We'll read these into a DataFrame called `actors`.

In [None]:
actors = bpd.read_csv('data/wicked_characters.csv')
actors

**Question 2.1.** Using the `merge` method, combine the `actors` and `singers` DataFrames, and assign the resulting DataFrame to the variable `actors_and_singers`. 
- `actors_and_singers` should contain all of the columns in both `actors` and `singers`, except the `'Artist'` column from `singers`, which is redundant with the `'Actor'` column from `actors`.
- Sort `actors_and_singers` by `'Character'` in ascending order.


<!--
BEGIN QUESTION
name: q2_1
-->

In [None]:
actors_and_singers= ...
actors_and_singers

In [None]:
grader.check("q2_1")

**Question 2.2.** If you completed Question 2.1. correctly, you'll notice that `actors_and_singers` has fewer rows than `singers` but more rows than `actors`. This is because there are some artists in `singers` that are not a part of the main cast, and some people in `actors` that sing multiple songs. 

Below, assign `actors_not_singers` to the number of actors that are in `actors` but not in `singers`. Similarly, assign `singers_not_actors` to the number of singers that are in `singers` but not in `actors`.

_Hint_: There are two ways to find the number of unique values in a column.

1. Group by that column. On the resulting DataFrame, use `.shape[0]`.

2. Use the `.unique()` method on the Series corresponding to that column. Use `len` on the resulting array.

You'll need to do this three times – once each for the columns that contain the people's names in `actors`, `singers`, and `actors_and_singers`.



In [None]:
actors_not_singers = ...
singers_not_actors = ...
print('There are', actors_not_singers, 'actors in the main cast that are not singers.')
print('There are', singers_not_actors, 'singers on the soundtrack that are not in the main cast.')

In [None]:
grader.check("q2_2")

Now that we better understand how `actors_and_singers` came to be, let's use it to answer some *Wicked* trivia questions. <br> <br>

<center><img src=images/wicked_west_end.jpg width=250><br>A poster for the musical in the West End (<a href="https://collections.vam.ac.uk/item/O155331/wicked-poster-dewynters-ltd/">source</a>)</center>

**Question 2.3.** *Wicked* is told from the perspective of the two main characters, Elphaba and Galinda. Elphaba is arguably the more important character, as the word "wicked" is attributed to her, and she is the face of the musical. However, which character is more <a href="https://www.youtube.com/watch?v=5VjTswqyHdA" style="color:pink;">popular</a> in the movie?

According to the `actors_and_singers` DataFrame, who appears in more songs, `'Elphaba Thropp'` or `'Galinda Upland'`? Assign your answer to `actors_and_singers` to 1, 2, or 3 corresponding to the answer choices below. 

1. Elphaba 
1. Galinda  
1. They are tied.

<!--
BEGIN QUESTION
name: q2_3
-->

In [None]:
most_popular_character = ...

In [None]:
grader.check("q2_3")

**Question 2.4.** How many songs include *both* Elphaba and Galinda? 

Using the `actors_and_singers` DataFrame, set the variable `num_shared_songs` to the number of songs that include both main characters. 

In [None]:
num_shared_songs = ...
num_shared_songs

In [None]:
grader.check("q2_4")

## 3. Tritonmobile 🚗 🚙

UCSD is launching its own car dealership called Tritonmobile! Every UCSD student that buys a Tritonmobile will have 5 years to pay off a car loan, plus they'll get a special Triton discount. You and Charlie have been hired as data scientists for the project. Your job is to make a monthly car-payment calculator for the Tritonmobile website. 

Car payments depend on a few factors, including the price of the car, the amount paid up-front (called the down payment), the interest rate, and the length of the loan (in this case, 5 years). 

Tritonmobile will use a student's credit score to determine their interest rate. Interest is essentially the cost of borrowing money - it is money paid on top of borrowed money. Interest rates are mainly decided based on a person's credit score, which is a way of measuring financial trustworthiness. To learn more about this, check out [this article](https://www.equifax.com/personal/education/personal-finance/articles/-/learn/what-do-interest-rates-mean/) about what interest rates mean and [this article](https://www.experian.com/blogs/ask-experian/why-do-people-with-higher-credit-scores-get-lower-interest-rates/#:~:text=People%20with%20higher%20credit%20scores,always%20looking%20to%20minimize%20risk.) about why credit scores affect interest rates.

Let's walk through how Tritonmobile will calculate the loan payment on a \\$20,000 car with a $5,000 down payment, when the vehicle is sold to a student with a credit score of 750. The steps are as follows.

1. **Calculate the interest rate of the loan based on the credit score**.  Tritonmobile determines the interest rate according to the table below.

| Credit Rating |  Credit Score |Interest Rate |
| --- | --- | --- |
| super prime | 781 and above | 5.6% |
| prime | 661 to 780 | 6.9% |
| near prime | 601 to 660 |  9.3% |
| subprime | 501 to 600 |  11.9% |
| deep subprime | up to 500 | 14.2% |

For our example, if the car buyer has a 750 credit score, their interest rate is 6.9\%. 

2. **Calculate the principal amount on the loan using the down payment**. The principal amount is how much money you have to pay off (through monthly installments) after giving a down payment. This is usually calculated by simply subtracting the down payment from the cost of the car. However, Tritonmobile is offering a special discount for UCSD students called the Triton discount that lowers the principal even further. The higher the down payment, then the higher the discount. The Triton discount is determined by the table below.

| Down Payment as Percentage of Cost | Triton Discount |
| --- | --- |
| [0%, 10%) | 2% |
| [10%, 20%) | 5% |
| [20%, 30%) | 9% |
| [30%, 40%) | 14% |
| 40% or above | 20% |

Note that the notation $[a, b)$ means "greater than or equal to $a$ and less than $b$". For example, a down payment that is equal to 10% of the total payment earns the buyer a 2% Triton discount, but 10.1% earns a 5% discount.

For our example, a down payment of \\$5,000 on a \\$20,000 car means the down payment already covers 25% of the cost of the car. According to the table above, with 25% of the cost of the car already covered by the down payment, the Triton discount is 9%. This discount is applied to the remaining balance of the car after the down payment, which is \$15,000. So the buyer gets an additional discount of \\$1,350, which comes from 9\% of \\$15,000. This means that instead of having to pay off \\$15,000 through monthly installments, they only need to pay off \\$13,650 (which is \\$1,350 less than \\$15,000).

3. **Calculate the monthly payment based on the interest rate and principal.** The formula below details how to calculate the monthly payment. This is called the *annuity* formula and it's widely used to determine monthly loan payments, a process known as [*amortization*](https://en.wikipedia.org/wiki/Amortization_calculator). 
    
$$ \text{monthly car payment} = \text{principal} \cdot\frac{\frac{\text{interest}}{12}}{1-\left(1+ \frac{\text{interest}}{12}\right)^{-\text{number of months}}}$$

For this example, we previously calculated the interest rate to be 6.9\% (or 0.069 as a proportion) and the principal to be \\$13,650. A five year loan is 60 months. Plugging these values into the formula yields $270, which is the monthly payment.

Now that you know how monthly loan payments are calculated, let's build our loan payment calculator!

**Question 3.1.** Complete the implementation of the function `calculate_interest`, which takes in a credit score (`score`) and returns the corresponding interest rate as a **proportion**. For instance, `calculate_interest(750)` should evaluate to `.069` and `calculate_interest(781)` should evaluate to `.056`.

For you convenience, the interest table is shown again below.

| Credit Rating |  Credit Score |Interest Rate |
| --- | --- | --- |
| super prime | 781 and above | 5.6% |
| prime | 661 to 780 | 6.9% |
| near prime | 601 to 660 |  9.3% |
| subprime | 501 to 600 |  11.9% |
| deep subprime | up to 500 | 14.2% |

***Hint:*** 
- Use `elif`.

In [None]:
def calculate_interest(score):  
    ...

# Feel free to change the line below to try other examples.
calculate_interest(750) # should be 0.069

In [None]:
grader.check("q3_1")

**Question 3.2.** Next, you need to calculate the Triton discount and the resulting principal for the loan. Complete the implementation of the function `calculate_principal`, which takes in the price of a car (`price`) and a down payment (`down_payment`) and returns the **principal after the Triton discount**. For instance, `calculate_principal(20_000, 5_000)` should evaluate to `13650.0` as calculated in the example above. For your convenience, the Triton discount table is shown again below.

| Down Payment as Percentage of Cost | Triton Discount |
| --- | --- |
| [0%, 10%) | 2% |
| [10%, 20%) | 5% |
| [20%, 30%) | 9% |
| [30%, 40%) | 14% |
| 40% or above | 20% |

***Hint:*** There are 3 steps to this process:
1. Calculate the percentage of the total payment that is covered by the down payment (e.g. a \\$5,000 down payment covers 25\% of a \\$20,000 car).
2. Use the table above to determine the Triton discount (e.g. if 25\% of the price is covered by the down payment, the Triton discount is 9\%).
3. Calculate the principal after the discount. Remember that the discount is applied to the price of the car less the down payment amount (e.g. with a 9\% discount, the balance of \\$5,000 yields a principal of \\$13650).

In [None]:
def calculate_principal(price, down_payment):  
    ...

# Feel free to change the line below to try other examples.
calculate_principal(20_000, 5_000) # should be 13650.0

In [None]:
grader.check("q3_2")

**Question 3.3.** Finally, complete the implementation of the function `monthly_payment`, which takes in the price of a car (`price`), a down payment (`down_payment`), and a credit score (`score`), and returns the monthly payment amount, **rounded to the nearest integer**. For example, a \\$20,000 car with a \\$5,000 down payment and a 750 credit score should produce a monthly payment of \\$270. For your convenience, the monthly payment formula is shown here again.

$$ \text{monthly car payment} = \text{principal} \cdot\frac{\frac{\text{interest}}{12}}{1-\left(1+ \frac{\text{interest}}{12}\right)^{-\text{number of months}}}$$


**Notes:**
- In this formula, the denominator is raised to the *negative* power of the number of months on the loan.
- All Tritonmobile loans are all paid over 5 years, or 60 months.
- You should use both of the functions you implemented in 3.1 and 3.2.

In [None]:
def monthly_payment(price, down_payment, score): 
    ...

# Feel free to change the line below to try other examples.
monthly_payment(20_000, 5_000, 750) # should be 270

In [None]:
grader.check("q3_3")

### Final Product

You just did all of the math necessary to build a car payment calculator, like the ones you see on many dealership websites. Charlie used your functions to create this interactive widget for the Tritonmobile website. Run the following cell once you've completed the rest of this question!

In [None]:
# Don't worry about the code, just play with the slider that appears after running.
def plot_car_payment_calculator(price, down_payment, credit_score):
    s = f'''
    <h1>Tritonmobile Car Payment Calculator 🚘 </h1>
    <h3>{'${:,.2f}'.format(price)}</h3>
    Total Car Price
    <h3>{'${:,.2f}'.format(down_payment)}</h3>
    Down Payment
    <h3>{'{:,.0f}'.format(credit_score)}</h3>
    Credit Score
    
    <h1>{'${:,.0f}'.format(monthly_payment(price, down_payment, credit_score))}</h2>
    Your Monthly Payment
    
    '''
    display(HTML(s))
    
interact(plot_car_payment_calculator, price=(10_000, 50_000, 1000), down_payment=(0, 10_000, 100), credit_score=(300, 850, 1));



## 4. Alternating Products

In this problem, we'll define two functions that compute some sort of "alternating product" of a sequence of values.

**Question 4.1.** Complete the implementation of the function `alternating_product`, which takes in an array of numbers, `values`, and returns the product of every other element in `values`, starting with the first element (at position `0`). Example behavior is shown below.

```py
>>> alternating_product(np.array([2, 3.5, 1, 1.5]))
2.0 # comes from 2 * 1

>>> alternating_product(np.array([2, 3.5, 1, 1.5, 4.5]))
9.0 # comes from 2 * 1 * 4.5
```
<!--
BEGIN QUESTION
name: q4_1
-->

In [None]:
def alternating_product(values):
    ...
    
# Feel free to change this input to make sure your function works correctly.
alternating_product(np.array([2, 3.5, 1, 1.5]))

In [None]:
grader.check("q4_1")

**Question 4.2.** In math, the word "alternating" is also used to describe sequences of numbers where the signs oscillate back and forth between positive and negative. Complete the implementation of the function `alternating_sign_product`, which takes in an array of positive numbers, `values`, and returns the product of every element in `values`, with alternating signs, starting with a positive sign for element `0`, a negative sign for element `1`, and so on. Example behavior is shown below.

```py
>>> alternating_sign_product(np.array([2, 3.5, 1]))
-7.0 # comes from 2 * (-3.5) * 1

>>> alternating_sign_product(np.array([2, 3.5, 1, 1.5]))
10.5 # comes from 2 * (-3.5) * 1 * (-1.5)
```

***Hint:*** If `x` is an integer, `x % 2` evaluates to 0 when `x` is even and to 1 when `x` is odd. If `x` represents the position of an element in the array, you can use this to help you figure out whether the sign should be positive or negative.

<!--
BEGIN QUESTION
name: q4_2
-->

In [None]:
def alternating_sign_product(values):
    ...
    
# Feel free to change this input to make sure your function works correctly.
alternating_sign_product(np.array([2, 3.5, 1]))

In [None]:
grader.check("q4_2")

## 5. Lucky Triton Lotto 🔱 🎱 

Suppose UCSD holds an annual lottery called the Lucky Triton Lotto, where students can enter to win Triton Cash, or even free housing! Here's how the Lucky Triton Lotto works:

- First, you pick five **different** numbers, one at a time, from 1 to 29, representing that according to [USNews](https://www.usnews.com/best-colleges/university-of-california-san-diego-1317), UCSD is ranked 29th in the nation for best universities to attend for 2024-2025.
- Then, you separately pick a number from 1 to 12. This is because UCSD's Data Science program is ranked 12th in [USNews's](https://www.usnews.com/best-colleges/rankings/computer-science/data-analytics-science) best undergraduate Data Science programs list (though we think it's number one). Let's say you select 3.
- The six numbers you have selected, or  **your numbers**, can be represented all together as (7, 12, 24, 15, 13, 3). This is a _sequence_ of six numbers – **order matters**!

The **winning numbers** are chosen by King Triton drawing five balls, one at a time, **without replacement**, from a pot of white balls numbered 1 to 29. Then, he draws a gold ball, the Tritonball, from a pot of gold balls numbered 1 to 12. Both pots are completely separate, hence the different ball colors. For example, maybe the winning numbers are (15, 9, 24, 23, 1, 3).

We’ll assume for this problem that in order to win the grand prize (free housing), all six of your numbers need to match the winning numbers and be in the **exact same order**. In other words, your entire sequence of numbers must be exactly the same as the sequence of winning numbers. However, if some numbers in your sequence match up with the corresponding number in the winning sequence, you will still win some Triton Cash. 

Suppose again that you select (7, 12, 24, 15, 13, 3) and the winning numbers are (15, 9, 24, 23, 1, 3). In this case, two of your numbers are considered to match two of the winning numbers. 
- Your numbers: (7, 12, **24**, 15, 13, **3**)
- Winning numbers: (15, 9, **24**, 23, 1, **3**)

You won't win free housing, but you will win some Triton Cash. Note that although both sequences include the number 15 within the first five numbers (representing a white ball), since they are in different positions, that's not considered a match.


**Question 5.1.** What is the probability that your Tritonball number (the last number in your sequence) matches the winning Tritonball number? Calculate your answer and assign it to `tritonball_chance`. If you need to do any calculations (e.g. multiplication or division), make Python do it; don't use a separate calculator. Your result should be a decimal number between 0 and 1.

In [None]:
tritonball_chance = ...
tritonball_chance

In [None]:
grader.check("q5_1")

**Question 5.2.** What is the probability that your first three numbers match the first three winning numbers? Calculate your answer and assign it to `first_three_chance`. If you need to do any calculations (e.g. multiplication or division), make Python do it; don't use a separate calculator. Your result should be a decimal number between 0 and 1.

***Hint:*** You need **all three** of the first three numbers to match. What probability rule should you use?

In [None]:
first_three_chance = ...
first_three_chance

In [None]:
grader.check("q5_2")

**Question 5.3.** What is the probability that you win the grand prize, free housing? Calculate your answer and assign it to `free_housing_chance`. If you need to do any calculations (e.g. multiplication or division), make Python do it; don't use a separate calculator. Your result should be a decimal number between 0 and 1.

***Hint:*** When you select a ball without replacement, what happens to the total number of balls you can select next time?

In [None]:
free_housing_chance = ...
free_housing_chance

In [None]:
grader.check("q5_3")

**Question 5.4.** What is the probability that you do **not** win free housing? Calculate your answer and assign it to `no_free_housing_chance`. If you need to do any calculations (e.g. multiplication or division), make Python do it; don't use a separate calculator. Your result should be a decimal number between 0 and 1.

In [None]:
no_free_housing_chance = ...
no_free_housing_chance

In [None]:
grader.check("q5_4")

## Finish Line: Almost there, but make sure to follow the steps below to submit! 🏁

**_Citations:_** Did you use any generative artificial intelligence tools to assist you on this assignment? If so, please state, for each tool you used, the name of the tool (ex. ChatGPT) and the problem(s) in this assignment where you used the tool for help.

<hr style="color:Maroon;background-color:Maroon;border:0 none; height: 3px;">

Please cite tools here.

<hr style="color:Maroon;background-color:Maroon;border:0 none; height: 3px;">

To submit your assignment:

1. Select `Kernel -> Restart & Run All` to ensure that you have executed all cells, including the test cells.
1. Read through the notebook to make sure everything is fine and all tests passed.
1. Run the cell below to run all tests, and make sure that they all pass.
1. Download your notebook using `File -> Download as -> Notebook (.ipynb)`, then upload your notebook to Gradescope.
1. Stick around while the Gradescope autograder grades your work. Make sure you see that all tests have passed on Gradescope.
1. Check that you have a confirmation email from Gradescope and save it as proof of your submission.

In [None]:
grader.check_all()