In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("hw07.ipynb")

<img style="display: block; margin-left: auto; margin-right: auto" src="./ccsf-logo.png" width="250rem;" alt="The CCSF black and white logo">

# Homework 7: Simulation

## References

* [Sections 10.1 - 10.4](https://inferentialthinking.com/chapters/10/1/Empirical_Distributions.html)
* [`datascience` Documentation](https://datascience.readthedocs.io/)
* [Python Quick Reference](https://ccsf-math-108.github.io/materials-fa23/resources/quick_reference.html)

## Assignment Reminders

- Make sure to run the code cell at the top of this notebook that starts with `# Initialize Otter` to load the auto-grader.
- For all tasks indicated with a üîé that you must write explanations and sentences for, provide your answer in the designated space.
- Throughout this assignment and all future ones, please be sure to not re-assign variables throughout the notebook! _For example, if you use `max_temperature` in your answer to one question, do not reassign it later on. Otherwise, you will fail tests that you thought you were passing previously!_
- We encourage you to discuss this assignment with others but make sure to write and submit your own code. Refer to the syllabus to learn more about how to learn cooperatively.
- Unless you are asked otherwise, use the non-interactive visualizations when asked to produce a visualization for a task.
- View the related <a href="https://ccsf.instructure.com" target="_blank">Canvas</a> Assignment page for additional details.

Run the following code cell to import the tools for this assignment.

In [None]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

## Roulette Again

A Nevada roulette wheel has 38 pockets and a small ball that rests on the wheel. When the wheel is spun, the ball comes to rest in one of the 38 pockets. That pocket is declared the winner. 

The pockets are labeled 0, 00, 1, 2, 3, 4, ... , 36. Pockets 0 and 00 are green, and the other pockets are alternately red and black. The table `wheel` is a representation of a Nevada roulette wheel. **Note that *both* columns consist of strings.** Below is an example of a roulette wheel!

<img src="./roulette_wheel.jpeg" alt="roulette wheel" width="330px">

Run the cell below to load the `wheel` table.

In [None]:
wheel = Table.read_table('roulette_wheel.csv', dtype=str)
wheel

### Betting on Red

If you bet on *red*, you are betting that the winning pocket will be red. This bet *pays 1 to 1*. That means if you place a one-dollar bet on red, then:

- If the winning pocket is red, you gain 1 dollar. That is, you get your original dollar back, plus one more dollar.
- if the winning pocket is not red, you lose your dollar. In other words, you gain -1 dollars.

Let's see if you can make money by betting on red at roulette.

#### Task 01 üìç

Define a function `dollar_bet_on_red`. The function definition should:
1. Have one argument `color` that is a `str` for the name of a color.
2. Return your gain in dollars as an `int` if that color had won and you had placed a one-dollar bet on red. 

Consider the following as you work:
* Remember that the gain can be negative.
* Make sure your function returns an integer.
* You can assume that the only colors that will be passed as arguments are `'red'`, `'black'`, and `'green'`. Your function doesn't have to check that the input is correct.

_Points:_ 3

In [None]:
def dollar_bet_on_red(...):
    ...

In [None]:
grader.check("task_01")

Run the cell below to make sure your function is working.

In [None]:
print(dollar_bet_on_red('green'))
print(dollar_bet_on_red('black'))
print(dollar_bet_on_red('red'))

#### Task 02 üìç

1. Add a column labeled `'Winnings: Red'` as the last column in the table `wheel` such that, for each pocket, the column should contain your gain in dollars if that pocket won and you had bet one dollar on red. 
2. Your code should apply the function `dollar_bet_on_red` on the `wheel` table to create an array `red_winnings`.

_Points:_ 3

In [None]:
red_winnings = ...
wheel = ...
wheel

In [None]:
grader.check("task_02")

### Simulating 10 Bets on Red

Roulette wheels are set up so that each time they are spun, the winning pocket is equally likely to be any of the 38 pockets regardless of the results of all other spins. Let's see what would happen if we decided to bet one dollar on red each round.

#### Task 03 üìç

1. Create a table `ten_bets` by sampling the table `wheel` to simulate 10 spins of the roulette wheel. 
2. Your table should have the same three column labels as in `wheel`. 
3. Once you've created that table, set `sum_bets` to your net gain, an integer, in all 10 bets, assuming that you bet one dollar on red each time.

While you are working on this, it may be helpful to print out `ten_bets` after you create it!

_Points:_ 5

In [None]:
ten_bets = ...
sum_bets = ...(ten_bets.column('Winnings: Red'))
sum_bets

In [None]:
grader.check("task_03")

Run the cells above a few times to see how much money you would make if you made 10 one-dollar bets on red. Making a negative amount of money doesn't feel good, but it is a reality in gambling. Casinos are a business, and they make money when gamblers lose.

#### Task 04 üìç

Let's see what would happen if you made more bets. 

Define a function `net_gain_red` with one arguement that takes the number (`int`) of bets and returns the net gain (`int`) in that number of one-dollar bets on red.

Reference the `wheel` table that you've defined in the notebook within your function.

_Points:_ 2

In [None]:
def net_gain_red(...):
    ...
    
net_gain_red(10)

In [None]:
grader.check("task_04")

#### Task 05 üìç

1. Complete the cell below to simulate the net gain in 200 one-dollar bets on red, repeating the process 10,000 times. 
2. After the cell is run, `all_gains_red` should be an array with 10,000 entries, each of which is the net gain in 200 one-dollar bets on red.

_Points:_ 2

In [None]:
num_bets = ...
repetitions = ...

all_gains_red = ...
...

len(all_gains_red) # Do not change this line! Check that all_gains_red is length 10000.

In [None]:
grader.check("task_05")

Run the cell below to visualize the results of your simulation.

In [None]:
gains = Table().with_columns('Net Gain on Red', all_gains_red)
gains.hist(bins = np.arange(-80, 41, 4))

#### Task 06 üìç

Using the histogram above, decide whether the following statement is true or false:

>If you make 200 one-dollar bets on red, your chance of losing money is more than 50%.

Assign `loss_more_than_50` to either `True` or `False` depending on your answer to the question.

_Points:_ 2

In [None]:
loss_more_than_50 = ...

In [None]:
grader.check("task_06")

### Betting on a Split

If betting on red doesn't seem like a good idea, maybe a gambler might want to try a different bet. A bet on a *split* is a bet on two consecutive numbers such as 5 and 6. This bets pays 17 to 1. That means if you place a one-dollar bet on the split 5 and 6, then:

- If the winning pocket is either 5 or 6, your gain is 17 dollars.
- If any other pocket wins, you lose your dollar, so your gain is -1 dollars.

#### Task 07 üìç

Define a function `dollar_bet_on_5_6_split`.
1. The function should have 1 argument (type `str`) that represents the pocket number
2. The function should return the gain in dollars if that pocket won and they had bet one dollar on the 5-6 split.

Remember that the pockets are represented as strings.

_Points:_ 4

In [None]:
def dollar_bet_on_5_6_split(...):
    ...

In [None]:
grader.check("task_07")

Run the cell below to check that your function is doing what it should.

In [None]:
print(dollar_bet_on_5_6_split('5'))
print(dollar_bet_on_5_6_split('6'))
print(dollar_bet_on_5_6_split('00'))
print(dollar_bet_on_5_6_split('23'))

#### Task 08 üìç

Add a column `'Winnings: 5-6 Split'` to the end of the `wheel` table. For each pocket, the column should contain your gain in dollars if that pocket won and you had bet one dollar on the 5-6 split.

_Points:_ 3

In [None]:
split_winnings = ...
wheel = ...

In [None]:
grader.check("task_08")

#### Task 09 üìç

1. Simulate the net gain in 200 one-dollar bets on the 5-6 split.
2. Repeat the simulation for a total of 10,000 times.
3. Store your gains in the array `all_gains_split`.

_Points:_ 2

In [None]:
all_gains_split = ...
...

# Do not change the two lines below
gains = gains.with_columns('Net Gain on Split', all_gains_split)
gains.hist(bins = np.arange(-200, 150, 20))

In [None]:
grader.check("task_09")

#### Task 10 üìç

Look carefully at the visualization above, and assign `histogram_statements` to an array of the numbers of each statement below that can be correctly inferred from the overlaid histogram.

1. If you bet one dollar 200 times on a split, your chance of losing money is more than 50%.
2. If you bet one dollar 200 times in roulette, your chance of making more than 50 dollars is greater if you bet on a split each time than if you bet on red each time.
3. If you bet one dollar 200 times in roulette, your chance of losing more than 50 dollars is greater if you bet on a split each time than if you bet on red each time.

Notice that you've already seen one of these statements in a prior question.

_Points:_ 3

In [None]:
histogram_statements = ...

In [None]:
grader.check("task_10")

If this exercise has put you off playing roulette, it has done its job. If you are still curious about other bets, [here](https://en.wikipedia.org/wiki/Roulette#Bet_odds_table) they all are, and [here](https://en.wikipedia.org/wiki/Roulette#House_edge) is the bad news. The house ‚Äì that is, the casino ‚Äì always has an edge over the gambler.

## Three Ways Python Draws Random Samples

You have learned three ways to draw random samples using Python:

- `tbl.sample` draws a random sample of rows from the table `tbl`. The output is a table consisting of the sampled rows. 

- `np.random.choice` draws a random sample from a population whose elements are in an array. The output is an array consisting of the sampled elements.

- `sample_proportions` draws from a categorical distribution whose proportions are in an array. The output is an array consisting of the sampled proportions in all the categories. 

Run the following four code cells and look through the code. You'll use this information for the following two tasks.

In [None]:
top = Table.read_table('top_movies_2017.csv').select(0, 1)
top.show(3)

In [None]:
studios_with_counts = top.group('Studio').sort('count', descending=True)
studios_with_counts.show(3)

In [None]:
studios_of_all_movies = top.column('Studio')
distinct_studios = studios_with_counts.column('Studio')

print("studios_of_all_movies:", studios_of_all_movies[:10], "...")
print("\n distinct_studios:", distinct_studios)

In [None]:
studio_counts_only = studios_with_counts.column('count')
studio_proportions_only = studio_counts_only / sum(studio_counts_only)

print("studio_counts_only:", studio_counts_only)
print("\n studio_proportions_only:", studio_proportions_only)

In the following two tasks, we will present a scenario. Determine which three of the following six options are true in regards to what the question is asking, and list them in the following answer cell. If your answer includes any of (i)-(iii), state what you would fill in the blank to make it true: `top`, `studios_with_counts`, `studios_of_all_movies`, `distinct_studios`, `studio_counts_only` or `studio_proportions_only`.

1. This can be done using `sample` and the table _________.
2. This can be done using `np.random.choice` and the array ________.
3. This can be done using `sample_proportions` and the array _______.
4. This cannot be done using `sample` and the data given.
5. This cannot be done using `np.random.choice` and the data given.
6. This cannot be done using `sample_proportions` and the data given.

### Task 11 üìçüîé

<!-- BEGIN QUESTION -->

Simulate a sample of 10 movies drawn at random with replacement from the 200 movies. Outputs True if Paramount appears more often than Warner Brothers among studios that released the sampled movies, and False otherwise.

*Example Answer:* (1.) `studios_of_all_movies`, (3.) `top`, (5.)

***Note***: Do not explain your answer for any of the options you've chosen; please follow the structure of the example answer provided.

_Points:_ 3

_Type your answer here, replacing this text._

<!-- END QUESTION -->

### Task 12 üìçüîé

<!-- BEGIN QUESTION -->

Simulate a sample of 10 movies drawn at random with replacement from the 200 movies. Outputs True if the first sampled movie was released by the same studio as the last sampled movie.

*Example Answer:* (1.) `studios_of_all_movies`, (3.) `top`, (5.)

***Note***: Do not explain your answer for any of the options you've chosen; please follow the structure of the example answer provided.

_Points:_ 3

_Type your answer here, replacing this text._

<!-- END QUESTION -->

## Assessing Jade's Models

### Games with Jade

Our friend Jade comes over and asks us to play a game with her. The game works like this: 

> We will draw randomly with replacement from a simplified 13 card deck with 4 face cards (A, J, Q, K), and 9 numbered cards (2, 3, 4, 5, 6, 7, 8, 9, 10). If we draw cards with replacement 13 times, and if the number of face cards is greater than or equal to 4, we lose.
> 
> Otherwise, Jade wins.

We play the game once and we lose, observing 8 total face cards. We are angry and accuse Jade of cheating! Jade is adamant, however, that the deck is fair.

Jade's model claims that there is an equal chance of getting any of the cards (A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K), but we do not believe her. We believe that the deck is clearly rigged, with face cards (A, J, Q, K) being more likely than the numbered cards (2, 3, 4, 5, 6, 7, 8, 9, 10).

#### Task 13 üìç

Assign `deck_model_probabilities` to a two-item array containing the chance of drawing a face card as the first element, and the chance of drawing a numbered card as the second element under Jade's model. Since we're working with probabilities, make sure your values are between 0 and 1. 


_Points:_ 3

In [None]:
deck_model_probabilities = ...
deck_model_probabilities

In [None]:
grader.check("task_13")

#### Task 14 üìç

We believe Jade's model is incorrect. In particular, we believe there to be a  larger chance of getting a face card. Which of the following statistics can we use during our simulation to test between the model and our alternative? Assign `statistic_choice` to the correct answer. 

1. The actual number of face cards we get in 13 draws
2. The distance (absolute value) between the actual number of face cards in 13 draws and the expected number of face cards in 13 draws (4)
3. The expected number of face cards in 13 draws (4)



_Points:_ 2

In [None]:
statistic_choice = ...
statistic_choice

In [None]:
grader.check("task_14")

#### Task 15 üìç

Define the function `deck_simulation_and_statistic`, which, given a sample size and an array of model proportions (like the one you created in Task 1), returns the number of face cards in one simulation of drawing a card under the model specified in `model_proportions`. 

As you form your response, think about how you can use the function `sample_proportions`. 


_Points:_ 1

In [None]:
def deck_simulation_and_statistic(sample_size, model_proportions):
    ...

deck_simulation_and_statistic(13, deck_model_probabilities)

In [None]:
grader.check("task_15")

#### Task 16 üìç

Use your function from above to simulate the drawing of 13 cards 5000 times under the proportions that you specified in Task 1. Keep track of all of your statistics in `deck_statistics`. 


_Points:_ 2

In [None]:
repetitions = 5000 
...

deck_statistics

In [None]:
grader.check("task_16")

Let‚Äôs take a look at the distribution of simulated statistics.

In [None]:
#Draw a distribution of statistics 
Table().with_column('Deck Statistics', deck_statistics).hist()

#### Task 17 üìçüîé

<!-- BEGIN QUESTION -->

Given your observed value, do you believe that Jade's model is reasonable, or is our alternative more likely? Explain your answer using the distribution drawn in the previous problem. 


_Points:_ 2

_Type your answer here, replacing this text._

<!-- END QUESTION -->

## Submit your Homework to Canvas

Once you have finished working on the homework tasks, prepare to submit your work in Canvas by completing the following steps.

1. In the related Canvas Assignment page, check the rubric to know how you will be scored for this assignment.
2. Double-check that you have run the code cell near the end of the notebook that contains the command `"grader.check_all()"`. This command will run all of the run tests on all your responses to the auto-graded tasks marked with üìç.
3. Double-check your responses to the manually graded tasks marked with üìçüîé.
3. Select the menu item "File" and "Save Notebook" in the notebook's Toolbar to save your work and create a specific checkpoint in the notebook's work history.
4. Select the menu items "File", "Download" in the notebook's Toolbar to download the notebook (.ipynb) file. 
5. In the related Canvas Assignment page, click Start Assignment or New Attempt to upload the downloaded .ipynb file.

**Keep in mind that the autograder does not always check for correctness. Sometimes it just checks for the format of your answer, so passing the autograder for a question does not mean you got the answer correct for that question.**

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()