# Data 80A/180A Data Science for Everyone

# Lab 7: Simulations

#### Today's lab

Welcome to Lab 7! In today's lab, you'll learn about:


1. [9 Randomness](https://www.inferentialthinking.com/chapters/09/Randomness.html)
2. [9.2 Iteration](https://www.inferentialthinking.com/chapters/09/2/Iteration.html) 
3. [9.3 Simulations](https://www.inferentialthinking.com/chapters/09/3/Simulation.html)
4. [9.5 Probability](https://inferentialthinking.com/chapters/09/5/Finding_Probabilities.html)

First, set up the tests and imports by running the cell below.

In [None]:
# Run this cell, but please don't change it.

# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *

# These lines do some fancy plotting magic
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
              
import otter
grader = otter.Notebook()

## 1. Conditional Statements and Randomness 

In Python, the boolean data type contains only two unique values:  `True` and `False`. Expressions containing comparison operators such as `<` (less than), `>` (greater than), and `==` (equal to) evaluate to Boolean values. A list of common comparison operators can be found below!

<img src="comparisons.png">

Run the cells below to see an example of a comparison operator in action.

In [None]:
3 > 1 + 1

In [None]:
x = 7
4 <= x <= 10

We can even assign the result of a comparison operation to a variable.

In [None]:
result = (10 / 2 == 5)  # the condition is inside the parenthesis 
result

Arrays are compatible with comparison operators. The output is an array of boolean values.

In [None]:
make_array(1, 5, 7, 8, 3, -1) > 3

One day, when you come home after a long week, you see a hot bowl of nachos waiting on the dining table! Let's say that whenever you take a nacho from the bowl, it will either have only **cheese**, only **salsa**, **both** cheese and salsa, or **neither** cheese nor salsa (a sad tortilla chip indeed). 

Let's try and simulate taking nachos from the bowl at random using the function, `np.random.choice(...)`.

### `np.random.choice`

`np.random.choice` picks one item at random from the given **array**. It is equally likely to pick any of the items. Run the cell below several times, and observe how the results change.

In [None]:
nachos = make_array('cheese', 'salsa', 'both', 'neither')
np.random.choice(nachos)

To repeat this process multiple times, pass in an int `n` as the second argument to return `n` different random choices. By default, `np.random.choice` samples **with replacement** and returns an *array* of items. 

Run the next cell to see an example of sampling with replacement 10 times from the `nachos` array.

In [None]:
np.random.choice(nachos, 10)

To count the number of times a certain type of nacho is randomly chosen, we can use `np.count_nonzero`

### `np.count_nonzero`

`np.count_nonzero` counts the number of non-zero values that appear in an array. When an array of boolean values are passed through the function, it will count the number of `True` values (remember that in Python, `True` is coded as 1 and `False` is coded as 0.)

Run the next cell to see an example that uses `np.count_nonzero`.

In [None]:
np.count_nonzero(make_array(True, False, False, True, True))

**Question 1.1.** Assume we have an array `ten_nachos` as shown below. Find the number of nachos with only `cheese` using code (do not hardcode the answer).  

*Hint:* Our solution involves a comparison operator (e.g. `=`, `<`, ...) and the `np.count_nonzero` method.

In [None]:
ten_nachos = make_array('neither', 'cheese', 'both', 'both', 'cheese', 'salsa', 'both', 'neither', 'cheese', 'both')
number_cheese = ...
number_cheese

In [None]:
grader.check("q11")

**Conditional Statements**

In Python, the boolean data type contains only two unique values:  `True` and `False`. Expressions containing comparison operators such as `<` (less than), `>` (greater than), and `==` (equal to) evaluate to Boolean values. A list of common comparison operators can be found below!

<img src="comparisons.png">


A conditional statement is a multi-line statement that allows Python to choose among different alternatives based on the truth value of an expression.

Here is a basic example.

```
def sign(x):
    if x > 0:
        return 'Positive'
    elif x == 0:
        return 'Zero'
    else:
        return 'Negative'
```

If the input `x` is greater than `0`, we return the string `'Positive'`. 
    If the input `x` is `0`, we return the string `'Zero'`.  Otherwise, we return `'Negative'`.

If we want to test multiple conditions at once, we use the following general format.

```
if <if expression>:
    <if body>
elif <elif expression 0>:
    <elif body 0>
elif <elif expression 1>:
    <elif body 1>
...
else:
    <else body>
```

Only the body for the first conditional expression that is true will be evaluated. Each `if` and `elif` expression is evaluated and considered in order, starting at the top. As soon as a true value is found, the corresponding body is executed, and the rest of the conditional statement is skipped. If none of the `if` or `elif` expressions are true, then the `else body` is executed. 

For more examples and explanation, refer to the section on conditional statements [here](https://inferentialthinking.com/chapters/09/1/Conditional_Statements.html).

**Question 1.2.** Complete the following conditional statement so that if the number of nachos with cheese in `ten_nachos` is less than `5`, the string `'More please'` is assigned to the variable `more_nachos`.

*Hint*: You should be using `number_cheese` from Question 1.1.

In [None]:
more_nachos = 'No more!'

if ...:
    more_nachos = 'More please!'
    
more_nachos


In [None]:
grader.check("q12")

**Question 1.3.** Write a function called `nacho_reaction` that returns a reaction (as a string) based on the type of nacho passed in as an argument. Use the table below to match the nacho type to the appropriate reaction.

<img src="nacho_reactions.png">

*Hint:* If you're failing the test, double check the spelling of your reactions.

In [None]:

def nacho_reaction(nacho):
    if nacho == "cheese":
        return ...
    ... :
        ...
    ... :
        ...
    ... :
        ...

spicy_nacho = nacho_reaction('salsa')
spicy_nacho

In [None]:
grader.check("q13")

**Question 1.4.** Create a table `ten_nachos_reactions` that consists of the nachos in `ten_nachos` as well as the reactions for each of those nachos. The columns should be called `Nachos` and `Reactions`.

*Hint:* Use the `apply` method. 

The table should look like:  
<img src="nachos.png" width="150">

In [None]:
ten_nachos_reactions = Table().with_column('Nachos', ten_nachos)
...

ten_nachos_reactions

In [None]:
grader.check("q14")

**Question 1.5.** Using code, find the number of 'Wow!' reactions for the nachos in `ten_nachos_reactions`.

*Hint:* Use `np.count_nonzero` to count the number of rows with the value 'Wow!' in the `Reactions` column.  

In [None]:
number_wow_reactions = ...
number_wow_reactions

In [None]:
grader.check("q15")

## 2. For Loops and Simulation

Using a `for` statement, we can perform a task multiple times. This is known as iteration.

One use of iteration is to loop through a set of values. For instance, we can print out all of the colors of the rainbow.

In [None]:
rainbow = make_array("red", "orange", "yellow", "green", "blue", "indigo", "violet")

for color in rainbow:
    print(color)

We can see that the indented part of the `for` loop, known as the body, is executed once for each item in `rainbow`. The name `color` is assigned to the next value in `rainbow` at the start of each iteration. Note that the name `color` is arbitrary; we could easily have named it something else. The important thing is we stay consistent throughout the `for` loop. 

In [None]:
for another_name in rainbow:
    print(another_name)

In general, however, we would like the variable name to be somewhat informative. 

**Question 2.1.** In the following cell, we've loaded the text of _Pride and Prejudice_ by Jane Austen, split it into individual words, and stored these words in an array `p_and_p_words`. Using a `for` loop, assign `longer_than_five_count` to the number of words in the novel that are more than 5 letters long.

*Hint*: You can find the number of letters in a word with the `len` function.

In [None]:
austen_string = open('Austen_PrideAndPrejudice.txt', encoding='utf-8').read()
p_and_p_words = np.array(austen_string.split())

longer_than_five_count = 0

for word in p_and_p_words:
    ...


longer_than_five_count        

In [None]:
grader.check("q21") 

**Question 2.2.**  Using a `for` loop, modify your solution for Qestion 2.1 so that your code counts:

* `shorter_than_five` -- the number of words in the novel that are less than 5 letters long

* `between_five_and_nine`-- the number of words in the novel that are between 5 (inclusive) and 9 (inclusive) letters long

* `between_ten_and_fourteen`-- the number of words in the novel that are between 10 (inclusive) and 14 (inclusive) letters long

* `longer_than_fourteen` -- the number of words in the novel that are more than 14 letters long

In [None]:
shorter_than_five = 0
between_five_and_nine = 0
between_ten_and_fourteen = 0
longer_than_fourteen = 0

for word in p_and_p_words:
    ...
    
    
    
        
print('Shorter than 5 count is', shorter_than_five)
print('Between 5 and 9 count is', between_five_and_nine)
print('Between 10 and 14 count is', between_ten_and_fourteen)
print('Longer than 14 count is', longer_than_fourteen)

In [None]:
grader.check("q22")

Another use of iteration is to loop through some code a fixed number of times.  For example, if we want to loop through some code `6` times, we write 

    for i in np.arange(6):
        # body of loop
    
where `i` acts as a counter, starting at `0` in the first iteration of the loop, then `i` becomes `1` in the second iteration of the loop, and so on until `i` ends at `5` in the sixth iteration of the loop.

**Question 2.3.** Using a simulation with 10,000 trials, assign `num_different` to the number of times, in 10,000 trials, that two words picked uniformly at random (with replacement) from Pride and Prejudice have different lengths. 

*Hint 1*: What function did we use in Section 1 to sample at random with replacement from an array? 

*Hint 2*: Remember that `!=` checks for non-equality between two items.

In [None]:
trials = 10000
num_different = 0

for i in np.arange(trials):
    # first: pick two words at random p_and_p_words
    ...
    
    # second: if the length of these two words are different, then increase num_different by 1
    ...

    
        
num_different

In [None]:
grader.check("q23")

We can also use `np.random.choice` to simulate multiple trials.

**Question 2.4.** Allie is playing darts. Her dartboard contains ten equal-sized zones with point values from 1 to 10. Write code that simulates her total score after 1000 dart tosses.

*Hint:* First decide the possible values you can take in the experiment (point values in this case). Then use `np.random.choice` to simulate Allie's tosses. Finally, sum up the scores to get Allie's total score.

In [None]:
possible_point_values = ...  # create an array containing values from 1 to 10 
num_tosses = 1000
simulated_tosses = ...
total_score = sum(...)

total_score

In [None]:
grader.check("q24")

## 3. Probability


Now we turn our attention to probability.  For a refresher, read section [9.5 Probability](https://inferentialthinking.com/chapters/09/5/Finding_Probabilities.html)

**Question 3.1.**  Our DATA 80A/180A class has launched a TikTok account.  Four members of the class — Almudena, Ben, Natasha, and Santiago — have auditioned to perform an activity in a promotional TikTok video for DATA 80A/180A. Prof. Wang will select exactly one of the four auditioners at random (with equal probability). The selected auditioner then selects exactly one
activity at random (with equal probability) — from among *acting, dancing,* and *singing* — to perform in the video.
Determine the probability that Ben is selected to perform and they choose to dance in the video.
Express your answer as a Python expression (e.g., 1/8 + 1/9).

In [None]:
Ben_dance = ...
Ben_dance

In [None]:
grader.check("q31")

**Questions 3.2.** Determine the probability that Ben is selected to perform but they choose **not** to dance
in the video.

In [None]:
Ben_not_dance = ...
Ben_not_dance

In [None]:
grader.check("q32")

**Questions 3.3.** Prof. Wang did not select Almudena to perform in the video. Determine the probability
that Ben was selected.

In [None]:
Ben_selected = ...
Ben_selected

In [None]:
grader.check("q33")

**Questions 3.4.** The course members decide to release a TikTok video at the beginning of every week of the semester, for a
total of fourteen (14) weeks. Each video has an 15% chance of going viral in each of the first 24 hours
after its release. Thereafter, it has no chance of going viral.  Determine the probability that none of the fourteen videos goes viral within 24 hours of release.

In [None]:
no_viral_videos = ...
no_viral_videos

In [None]:
grader.check("q34")

**Question 3.5.** After seeing the DATA 80A/180A class transition to a position of social-media influence, the CS 63
class turns jealous and decides to pirate DATA 80A/180A TikTok videos.
There is a 9% chance that the CS 63 class members copy three (3) or more DATA 80A/180A TikTok videos, a 21%
chance that they copy exactly two (2) DATA 80A/180A TikTok videos, and a 62% chance that they do not
pirate (copy) any TikTok video from DATA 80A/180A.
Determine the probability that the CS63 class members copy exactly one DATA 80A/180A TikTok video. Express your
answer as a percent.

In [None]:
exactly_one = ...
exactly_one

In [None]:
grader.check("q35")

Congratulations, you're done with Lab 7! 

Be sure to:

   * run all the tests,
   * save your notebook and download a pdf version of it,
   * submit your work to Canvas,
   * and ask a lab instructors to check you off.