In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab_6.ipynb")

# Lab 6: Simulations, iterations, and probability

We will go over [iteration](https://www.inferentialthinking.com/chapters/09/2/Iteration.html) and [simulations](https://www.inferentialthinking.com/chapters/09/3/Simulation.html), as well as introduce the concept of [randomness](https://www.inferentialthinking.com/chapters/09/Randomness.html).

**Reading**: 
* [Randomness](https://www.inferentialthinking.com/chapters/09/randomness.html) 
* [Sampling and Empirical Distributions](https://www.inferentialthinking.com/chapters/10/sampling-and-empirical-distributions.html)
* [Testing Hypotheses](https://www.inferentialthinking.com/chapters/11/testing-hypotheses.html)

First, set up the tests and imports by running the cell below.

In [1]:
# Run this cell, but please don't change it.

# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *

# These lines do some fancy plotting magic
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

## 1. Poutine and Conditionals

In Python, the boolean data type contains only two unique values:  `True` and `False`. Expressions containing comparison operators such as `<` (less than), `>` (greater than), and `==` (equal to) evaluate to Boolean values. A list of common comparison operators can be found below!

<img src="comparisons.png">

Run the cell below to see an example of a comparison operator in action.

In [2]:
3 < 3 + 1

We can even assign the result of a comparison operation to a variable.

In [3]:
result = 10 / 2 == 5
result

Arrays are compatible with comparison operators. The output is an array of boolean values.

In [4]:
make_array(1, 5, 7, 8, 3, -1) > 3

One day, when you come home after a long week, you see a hot bowl of poutine waiting on the dining table! Let's say that whenever you take a fry from the bowl, it will either have only **cheese**, only **gravey**, **both** cheese and gravey, or **neither** cheese nor gravey (a sad fry indeed). 

Let's try and simulate taking fries from the bowl at random using the function, `np.random.choice(...)`.

### `np.random.choice`

`np.random.choice` picks one item at random from the given array. It is equally likely to pick any of the items. Run the cell below several times, and observe how the results change.

In [5]:
fries = make_array('cheese', 'gravey', 'both', 'neither')
np.random.choice(fries)

To repeat this process multiple times, pass in an int `n` as the second argument to return `n` different random choices. By default, `np.random.choice` samples **with replacement** and returns an *array* of items. 

Run the next cell to see an example of sampling with replacement 10 times from the `fries` array.

In [6]:
np.random.choice(fries, 10)

To count the number of times a certain type of nacho is randomly chosen, we can use `np.count_nonzero`

### `np.count_nonzero`

`np.count_nonzero` counts the number of non-zero values that appear in an array. When an array of boolean values are passed through the function, it will count the number of `True` values (remember that in Python, `True` is coded as 1 and `False` is coded as 0.)

Run the next cell to see an example that uses `np.count_nonzero`.

In [7]:
np.count_nonzero(make_array(True, False, False, True, True))

**Question 1.** Assume we took ten fries at random, and stored the results in an array called `ten_fries` as done below. Find the number of fries with only cheese using code (do not hardcode the answer).  

*Hint:* Our solution involves a comparison operator (e.g. `=`, `<`, ...) and the `np.count_nonzero` method.

<!--
BEGIN QUESTION
name: q11
-->

In [8]:
ten_fries = make_array('neither', 'cheese', 'both', 'both', 'cheese', 'gravey', 'both', 'neither', 'cheese', 'both')
number_cheese = ...
number_cheese

In [None]:
grader.check("q11")

**Conditional Statements**

A conditional statement is a multi-line statement that allows Python to choose among different alternatives based on the truth value of an expression.

Here is a basic example.

```
def sign(x):
    if x > 0:
        return 'Positive'
    else:
        return 'Negative'
```

If the input `x` is greater than `0`, we return the string `'Positive'`. Otherwise, we return `'Negative'`.

If we want to test multiple conditions at once, we use the following general format.

```
if <if expression>:
    <if body>
elif <elif expression 0>:
    <elif body 0>
elif <elif expression 1>:
    <elif body 1>
...
else:
    <else body>
```

Only the body for the first conditional expression that is true will be evaluated. Each `if` and `elif` expression is evaluated and considered in order, starting at the top. As soon as a true value is found, the corresponding body is executed, and the rest of the conditional statement is skipped. If none of the `if` or `elif` expressions are true, then the `else body` is executed. 

For more examples and explanation, refer to the section on conditional statements [here](https://www.inferentialthinking.com/chapters/09/1/conditional-statements.html).

**Question 2.** Complete the following conditional statement so that the string `'More please'` is assigned to the variable `say_please` if the number of fries with cheese in `ten_fries` is less than `5`.

*Hint*: You should be using `number_cheese` from Question 1.

<!--
BEGIN QUESTION
name: q12
-->

In [10]:
say_please = '?'

if ...:
    say_please = 'More please' 

say_please

In [None]:
grader.check("q12")

**Question 3.** Write a function called `poutine_reaction` that returns a reaction (as a string) based on the type of poutine passed in as an argument. Use the table below to match the poutine type to the appropriate reaction.

| Ingredient       | Reaction           | 
| ------------- |:-------------:|
| gravey | Savoury! 
| cheese      | Cheesy!      |
| neither | What?      |
| both | Mmmmm |


*Hint:* If you're failing the test, double check the spelling of your reactions. Case matters.

<!--
BEGIN QUESTION
name: q13
-->

In [12]:
def poutine_reaction(poutine):
    if poutine == "cheese":
        ...
    ... :
        ...
    ... :
        ...
    ... :
        ...


savoury_poutine = poutine_reaction('gravey')
savoury_poutine

In [None]:
grader.check("q13")

**Question 4.** Create a table `ten_fries_reactions` that consists of the fries in `ten_fries` as well as the reactions for each of those fries. The columns should be called `Fries` and `Reactions`.

*Hint:* Use the `apply` method. 

<!--
BEGIN QUESTION
name: q14
-->

In [18]:
ten_fries_tbl = Table().with_column('Fries', ten_fries)
...
ten_fries_reactions

In [None]:
grader.check("q14")

**Question 5.** Using code, find the number of 'Mmmmm' reactions for the fries in `ten_fries_reactions`.

<!--
BEGIN QUESTION
name: q15
-->

In [20]:
number_mmm_reactions = ...
number_mmm_reactions

In [None]:
grader.check("q15")

## 3. Simulations and For Loops
Using a `for` statement, we can perform a task multiple times. This is known as iteration.

One use of iteration is to loop through a set of values. For instance, we can print out all of the colors of the rainbow.

In [23]:
rainbow = make_array("red", "orange", "yellow", "green", "blue", "indigo", "violet")

for color in rainbow:
    print(color)

We can see that the indented part of the `for` loop, known as the body, is executed once for each item in `rainbow`. The name `color` is assigned to the next value in `rainbow` at the start of each iteration. Note that the name `color` is arbitrary; we could easily have named it something else. The important thing is we stay consistent throughout the `for` loop. 

In [24]:
for another_name in rainbow:
    print(another_name)

In general, however, we would like the variable name to be somewhat informative. 

**Question 1.** In the following cell, we've loaded the text of _Pride and Prejudice_ by Jane Austen, split it into individual words, and stored these words in an array `p_and_p_words`. Using a `for` loop, assign `longer_than_five` to the number of words in the novel that are more than 5 letters long.

*Hint*: You can find the number of letters in a word with the `len` function.

<!--
BEGIN QUESTION
name: q31
-->

In [25]:
austen_string = open('Austen_PrideAndPrejudice.txt', encoding='utf-8').read()
p_and_p_words = np.array(austen_string.split())

longer_than_five = ...

# a for loop would be useful here



longer_than_five

In [None]:
grader.check("q31")

**Question 2.** Using a simulation with 10,000 trials, assign num_different to the number of times, in 10,000 trials, that two words picked uniformly at random (with replacement) from Pride and Prejudice have different lengths. 

*Hint 1*: What function did we use in section 1 to sample at random with replacement from an array? 

*Hint 2*: Remember that `!=` checks for non-equality between two items.

<!--
BEGIN QUESTION
name: q32
-->

In [27]:
trials = 10000
num_different = ...

for ... in ...:
    ...

num_different

In [None]:
grader.check("q32")

We can also use `np.random.choice` to simulate multiple trials.

**Question 3.** Allie is playing darts. Her dartboard contains ten equal-sized zones with point values from 1 to 10. Write code that simulates her total score after 1000 dart tosses.

*Hint:* First decide the possible values you can take in the experiment (point values in this case). Then use `np.random.choice` to simulate Allie's tosses. Finally, sum up the scores to get Allie's total score.

<!--
BEGIN QUESTION
name: q33
-->

In [29]:
num_tosses = 1000
simulated_tosses = ...
total_score = ...
total_score

In [None]:
grader.check("q33")

## 4. 2019 Football Season


Graham is trying to analyze how well the Edmonton football team performed in the 2019 season. A football game is divided into four periods, called quarters. The number of points Edm scored in each quarter, and the number of points their opponent scored in each quarter are stored in a table called `edm.csv`.

In [32]:
# Just run this cell
# Read in the cal_fb csv file
games = Table().read_table("edm.csv")
games.show()

Let's start by finding the total points each team scored in a game.

**Question 1.** Write a function called `sum_scores`.  It should take four arguments, where each argument is the team's score for that quarter. It should return the team's total score for that game.


<!--
BEGIN QUESTION
name: q4_1
manual: false
-->

In [33]:
def sum_scores(...):
     '''Returns the total score calculated by adding up the score of each quarter'''
     ...

sum_scores(14, 7, 3, 0) #DO NOT CHANGE THIS LINE

In [None]:
grader.check("q4_1")

**Question 2.** Create a new table `final_scores` with three columns in this *specific* order: `Opponent`, `Edm Score`, `Opponent Score`. You will have to create the `Edm Score` and `Opponent Score` columns. Use the function `sum_scores` you just defined in the previous question for this problem.

*Hint:* If you want to apply a function that takes in multiple arguments, you can pass multiple column names as arguments in `tbl.apply()`. The column values will be passed into the corresponding arguments of the function. Take a look at the python reference for syntax.

*Tip:* If you’re running into issues creating final_scores, check that `edm_scores` and `opponent_scores` output what you want. 


<!--
BEGIN QUESTION
name: q4_2
manual: false
-->

In [37]:
edm_scores = ...
opponent_scores = ...
final_scores = ...
final_scores

In [None]:
grader.check("q4_2")

We can get specific row objects from a table. You can use `tbl.row(n)` to get the `n`th row of a table. `row.item("column_name")` will allow you to select the element that corresponds to `column_name` in a particular row. Here's an example:

In [41]:
# Just run this cell
# We got the Axe!
games.row(10)

In [42]:
# Just run this cell
games.row(10).item("Edm 4Q")

**Question 3.** We want to see for a particular game whether or not Edm won. Write a function called `did_edm_win`.  It should take one argument: a row object from the `final_scores` table. It should return either `True` if Edm's score was greater than the Opponent's score, and `False` otherwise.


<!--
BEGIN QUESTION
name: q4_3
manual: false
-->

In [43]:
def did_edm_win(row):
    ...

In [None]:
grader.check("q4_3")

**Question 4.** Graham wants to see how Edm did against every opponent during the 2019 season. Using the `final_scores` table, assign `results` to an array of `True` and `False` values that correspond to whether or not Edm won. Add the `results` array to the `final_scores` table, and assign this to `final_scores_with_results`. Then, respectively assign the number of wins and losses Edm had to `edm_wins` and `edm_losses`.

*Hint*: When you only pass a function name and no column labels through `tbl.apply()`, the function gets applied to every row in `tbl`


<!--
BEGIN QUESTION
name: q4_4
manual: false
-->

In [46]:
results = ...
final_scores_with_results = ...
edm_wins = ...
edm_losses = ...

# Don't delete or edit the following line:
print(f"In the 2019 Season, Edm Football won {edm_wins} games and lost {edm_losses} games.")

In [None]:
grader.check("q4_4")

## 5. Probability


We will be testing some probability concepts that were introduced in lecture. For all of the following problems, we will introduce a problem statement and give you a proposed answer. You must assign the provided variable to one of the following three integers, depending on whether the proposed answer is too low, too high, or correct. 

1. Assign the variable to 1 if you believe our proposed answer is too high.
2. Assign the variable to 2 if you believe our proposed answer is too low.
3. Assign the variable to 3 if you believe our proposed answer is correct.


You are more than welcome to create more cells across this notebook to use for arithmetic operations 

**Question 1.** You roll a 6-sided die 10 times. What is the chance of getting 10 sixes?

Our proposed answer: $$\left(\frac{1}{6}\right)^{10}$$

Assign `ten_sixes` to either 1, 2, or 3 depending on if you think our answer is too high, too low, or correct. 

<!--
BEGIN QUESTION
name: q5_1
manual: false
-->

In [51]:
ten_sixes = ...
ten_sixes

In [None]:
grader.check("q5_1")

**Question 2.** Take the same problem set-up as before, rolling a fair dice 10 times. What is the chance that every roll is less than or equal to 5?

Our proposed answer: $$1 - \left(\frac{1}{6}\right)^{10}$$

Assign `five_or_less` to either 1, 2, or 3. 

<!--
BEGIN QUESTION
name: q5_2
manual: false
-->

In [54]:
five_or_less = ...
five_or_less

In [None]:
grader.check("q5_2")

**Question 3.** Assume we are picking a lottery ticket. We must choose three distinct numbers from 1 to 1000 and write them on a ticket. Next, someone picks three numbers one by one from a bowl with numbers from 1 to 1000 each time without putting the previous number back in. We win if our numbers are all called in order. 

If we decide to play the game and pick our numbers as 12, 140, and 890, what is the chance that we win? 

Our proposed answer: $$\left(\frac{3}{1000}\right)^3$$

Assign `lottery` to either 1, 2, or 3. 

<!--
BEGIN QUESTION
name: q5_3
manual: false
-->

In [57]:
lottery = ...

In [None]:
grader.check("q5_3")

**Question 4.** Assume we have two lists, list A and list B. List A contains the numbers [20,10,30], while list B contains the numbers [10,30,20,40,30]. We choose one number from list A randomly and one number from list B randomly. What is the chance that the number we drew from list A is larger than or equal to the number we drew from list B?

Our proposed solution: $$1/5$$

Assign `list_chances` to either 1, 2, or 3. 

*Hint: Consider the different possible ways that the items in List A can be greater than or equal to items in List B. Try working out your thoughts with a pencil and paper, what do you think the correct solutions will be close to?*

<!--
BEGIN QUESTION
name: q5_4
manual: false
-->

In [60]:
list_chances = ...

In [None]:
grader.check("q5_4")

## 6. Monkeys Typing Shakespeare
##### (...or at least the string "datascience")

A monkey is banging repeatedly on the keys of a typewriter. Each time, the monkey is equally likely to hit any of the 26 lowercase letters of the English alphabet, 26 uppercase letters of the English alphabet, and any number between 0-9 (inclusive), regardless of what it has hit before. There are no other keys on the keyboard.  

This question is inspired by a mathematical theorem called the Infinite monkey theorem (<https://en.wikipedia.org/wiki/Infinite_monkey_theorem>), which postulates that if you put a monkey in the situation described above for an infinite time, they will eventually type out all of Shakespeare’s works.

**Question 1.** Suppose the monkey hits the keyboard 8 times.  Compute the chance that the monkey types the sequence `Cmput191`.  (Call this `data_chance`.) Use algebra and type in an arithmetic equation that Python can evalute.

<!--
BEGIN QUESTION
name: q6_1
manual: false
-->

In [63]:
data_chance = ...
data_chance

In [None]:
grader.check("q6_1")

**Question 2.** Write a function called `simulate_key_strike`.  It should take **no arguments**, and it should return a random one-character string that is equally likely to be any of the 26 lower-case English letters, 26 upper-case English letters, or any number between 0-9 (inclusive). 

<!--
BEGIN QUESTION
name: q6_2
manual: false
-->

In [65]:
# We have provided the code below to compute a list called keys,
# containing all the lower-case English letters, upper-case English letters, and the digits 0-9 (inclusive).  Print it if you
# want to verify what it contains.
import string
keys = list(string.ascii_lowercase + string.ascii_uppercase + string.digits)

def simulate_key_strike():
    """Simulates one random key strike."""
    ...

# An example call to your function:
simulate_key_strike()

In [None]:
grader.check("q6_2")

**Question 3.** Write a function called `simulate_several_key_strikes`.  It should take one argument: an integer specifying the number of key strikes to simulate. It should return a string containing that many characters, each one obtained from simulating a key strike by the monkey.

*Hint:* If you make a list or array of the simulated key strikes called `key_strikes_array`, you can convert that to a string by calling `"".join(key_strikes_array)`

<!--
BEGIN QUESTION
name: q6_3
manual: false
-->

In [69]:
def simulate_several_key_strikes(num_strikes):
    ...

# An example call to your function:
simulate_several_key_strikes(11)

In [None]:
grader.check("q6_3")

**Question 4.** Call `simulate_several_key_strikes` 5000 times, each time simulating the monkey striking 8 keys.  Compute the proportion of times the monkey types `"Cmput191"`, calling that proportion `data_proportion`.

<!--
BEGIN QUESTION
name: q6_4
manual: false
-->

In [73]:
...
data_proportion

In [None]:
grader.check("q6_4")

<!-- BEGIN QUESTION -->

**Question 5.** Check the value your simulation computed for `data_proportion`.  Is your simulation a good way to estimate the chance that the monkey types `"cmput191"` in 5 strikes (the answer to question 1)?  Why or why not?

<!--
BEGIN QUESTION
name: q6_5
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**Question 6.** Compute the chance that the monkey types the letter `"t"` at least once in the 5 strikes.  Call it `t_chance`. Use algebra and type in an arithmetic equation that Python can evalute. 

<!--
BEGIN QUESTION
name: q6_6
manual: false
-->

In [75]:
t_chance = ...
t_chance

In [None]:
grader.check("q6_6")

<!-- BEGIN QUESTION -->

**Question 7.** Do you think that a computer simulation is more or less effective to estimate `t_chance` compared to when we tried to estimate `data_chance` this way? Why or why not? (You don't need to write a simulation, but it is an interesting exercise.)

<!--
BEGIN QUESTION
name: q6_7
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



This lab is altered from the original [Berkeley data-8 course](http://data8.org/), which is licensed under the [Creative Commons license](https://creativecommons.org/licenses/by-nc/4.0/).

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export()