# STOR 120 -  Lab 5: Simulations

Welcome to Lab 5! 

We will go over [iteration](https://www.inferentialthinking.com/chapters/09/2/Iteration.html) and [simulations](https://www.inferentialthinking.com/chapters/09/3/Simulation.html), as well as introduce the concept of [randomness](https://www.inferentialthinking.com/chapters/09/Randomness.html).

First, run the cell below.

In [1]:
# Run this cell, but please don't change it.

# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *

# These lines do some fancy plotting magic
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

In [39]:
import pandas as pd

## 1. Nachos and Conditionals

In Python, the boolean data type contains only two unique values:  `True` and `False`. Expressions containing comparison operators such as `<` (less than), `>` (greater than), and `==` (equal to) evaluate to Boolean values. A list of common comparison operators can be found below!

<img src="comparisons.png">

Run the cell below to see an example of a comparison operator in action.

In [2]:
3 > 1 + 1

True

We can even assign the result of a comparison operation to a variable.

In [3]:
result = 10 / 2 == 5
result

True

Arrays are compatible with comparison operators. The output is an array of boolean values.

In [4]:
make_array(1, 5, 7, 8, 3, -1) > 3

array([False,  True,  True,  True, False, False])

One day, when you come home after a long week, you see a hot bowl of nachos waiting on the dining table! Let's say that whenever you take a nacho from the bowl, it will either have only **cheese**, only **salsa**, **both** cheese and salsa, or **neither** cheese nor salsa (a sad tortilla chip indeed). 

Let's try and simulate taking nachos from the bowl at random using the function, `np.random.choice(...)`.

### `np.random.choice`

`np.random.choice` picks one item at random from the given array. It is equally likely to pick any of the items. Run the cell below several times, and observe how the results change.

In [7]:
nachos = make_array('cheese', 'salsa', 'both', 'neither')
np.random.choice(nachos)

'both'

To repeat this process multiple times, pass in an int `n` as the second argument to return `n` different random choices. By default, `np.random.choice` samples **with replacement** and returns an *array* of items. 

Run the next cell to see an example of sampling with replacement 10 times from the `nachos` array.

In [8]:
np.random.choice(nachos, 10)

array(['both', 'salsa', 'cheese', 'neither', 'both', 'both', 'neither',
       'cheese', 'cheese', 'salsa'], dtype='<U7')

To count the number of times a certain type of nacho is randomly chosen, we can use `np.count_nonzero`

### `np.count_nonzero`

`np.count_nonzero` counts the number of non-zero values that appear in an array. When an array of boolean values are passed through the function, it will count the number of `True` values (remember that in Python, `True` is coded as 1 and `False` is coded as 0.)

Run the next cell to see an example that uses `np.count_nonzero`.

In [7]:
np.count_nonzero(make_array(True, False, False, True, True))

3

**Question 1.** Assume we took ten nachos at random, and stored the results in an array called `ten_nachos` as done below. Find the number of nachos with only cheese using code (do not hardcode the answer).  

*Hint:* Our solution involves a comparison operator (e.g. `=`, `<`, ...) and the `np.count_nonzero` method.

<!--
BEGIN QUESTION
name: q11
-->

In [44]:
ten_nachos = make_array('neither', 'cheese', 'both', 'both', 'cheese', 'salsa', 'both', 'neither', 'cheese', 'both')
number_cheese = np.count_nonzero(ten_nachos == 'cheese')
number_cheese

3

**Conditional Statements**

A conditional statement is a multi-line statement that allows Python to choose among different alternatives based on the truth value of an expression.

Here is a basic example.

```
def sign(x):
    if x > 0:
        return 'Positive'
    else:
        return 'Negative'
```

If the input `x` is greater than `0`, we return the string `'Positive'`. Otherwise, we return `'Negative'`.

If we want to test multiple conditions at once, we use the following general format.

```
if <if expression>:
    <if body>
elif <elif expression 0>:
    <elif body 0>
elif <elif expression 1>:
    <elif body 1>
...
else:
    <else body>
```

Only the body for the first conditional expression that is true will be evaluated. Each `if` and `elif` expression is evaluated and considered in order, starting at the top. As soon as a true value is found, the corresponding body is executed, and the rest of the conditional statement is skipped. If none of the `if` or `elif` expressions are true, then the `else body` is executed. 

For more examples and explanation, refer to the section on conditional statements [here](https://www.inferentialthinking.com/chapters/09/1/Conditional_Statements.html).

**Question 2.** Complete the following conditional statement so that the string `'More please'` is assigned to the variable `say_please` if the number of nachos with cheese in `ten_nachos` is less than `5`.

*Hint*: You should be using `number_cheese` from Question 1.

<!--
BEGIN QUESTION
name: q12
-->

In [10]:
say_please = []

if number_cheese < 5:
    say_please = 'More please'
say_please

'More please'

**Question 3.** Write a function called `nacho_reaction` that returns a reaction (as a string) based on the type of nacho passed in as an argument. Use the table below to match the nacho type to the appropriate reaction.

<img src="nacho_reactions.png">

<!--
BEGIN QUESTION
name: q13
-->

In [18]:
def nacho_reaction(nacho):
    if nacho == 'cheese':
        return "Cheesy!"
    elif nacho == 'salsa': 
        return "Spicy!"
    elif nacho == 'both': 
        return "Wow!"
    else: 
        return "Meh."

spicy_nacho = nacho_reaction('salsa')
spicy_nacho

'Spicy!'

**Question 4.** Create a table `ten_nachos_reactions` that consists of the nachos in `ten_nachos` as well as the reactions for each of those nachos. The columns should be called `Nachos` and `Reactions`.

*Hint:* Use the `apply` method. 

<!--
BEGIN QUESTION
name: q14
-->

In [62]:
ten_nachos_tbl = Table().with_column('Nachos', ten_nachos)
ten_nachos_tbl.with_column("Reactions", ten_nachos_tbl.apply(nacho_reaction, 'Nachos'))
ten_nachos_reactions = ten_nachos_tbl.with_column("Reactions", ten_nachos_tbl.apply(nacho_reaction, 'Nachos'))

**Question 5.** Using code, find the number of 'Wow!' reactions for the nachos in `ten_nachos_reactions`.

<!--
BEGIN QUESTION
name: q15
-->

In [63]:
number_wow_reactions = sum(ten_nachos_reactions["Reactions"] == "Wow!")
number_wow_reactions

4

## 2. Simulations and For Loops
Using a `for` statement, we can perform a task multiple times. This is known as iteration.

One use of iteration is to loop through a set of values. For instance, we can print out all of the colors of the rainbow.

In [None]:
rainbow = make_array("red", "orange", "yellow", "green", "blue", "indigo", "violet")

for color in rainbow:
    print(color)

We can see that the indented part of the `for` loop, known as the body, is executed once for each item in `rainbow`. The name `color` is assigned to the next value in `rainbow` at the start of each iteration. Note that the name `color` is arbitrary; we could easily have named it something else. The important thing is we stay consistent throughout the `for` loop. 

In [None]:
for another_name in rainbow:
    print(another_name)

In general, however, we would like the variable name to be somewhat informative. 

**Question 1.** In the following cell, we've loaded the text of _The Great Gatsby_ by F. Scott Fitzgerald, split it into individual words, and stored these words in an array `TGG_words`. Assign `longer_than_seven` to the number of words in the novel that are more than 7 letters long.

*Hint*: You can find the number of letters in a word with the `len` function.

<!--
BEGIN QUESTION
name: q21
-->

In [77]:
TheGreatGatsby_string = open('../Datasets/TheGreatGatsby.txt', encoding='utf-8').read()
TGG_words = np.array(TheGreatGatsby_string.split())


# a for loop would be useful here


longer_than_seven = []
for x in TGG_words: 
    if len(x) > 7:
        longer_than_seven.append(x)
        
longer_than_seven

['Gutenberg',
 'Fitzgerald',
 'anywhere',
 'restrictions',
 'whatsoever.',
 'Gutenberg',
 'included',
 'www.gutenberg.org.',
 'Fitzgerald',
 'recently',
 'updated:',
 'Language:',
 'Produced',
 'Standard',
 'project,',
 'transcription',
 'produced',
 'Gutenberg',
 'Australia.',
 'GUTENBERG',
 'Fitzgerald',
 'Contents',
 'gold-hatted,',
 'high-bouncing',
 'd’Invilliers',
 'vulnerable',
 '“Whenever',
 'criticizing',
 'anyone,”',
 'remember',
 'advantages',
 'unusually',
 'communicative',
 'reserved',
 'understood',
 'consequence,',
 'inclined',
 'judgements,',
 'abnormal',
 'unjustly',
 'politician,',
 'confidences',
 'unsought—frequently',
 'preoccupation,',
 'realized',
 'unmistakable',
 'intimate',
 'revelation',
 'quivering',
 'horizon;',
 'intimate',
 'revelations',
 'plagiaristic',
 'suppressions.',
 'Reserving',
 'judgements',
 'infinite',
 'something',
 'snobbishly',
 'suggested,',
 'snobbishly',
 'fundamental',
 'decencies',
 'parcelled',
 'unequally',
 'boasting',
 'tolerance,'

**Question 2.** Using a simulation with 10,000 trials, assign num_different to the number of times, in 10,000 trials, that two words picked uniformly at random (with replacement) from _The Great Gatsby_ have different lengths. 

*Hint 1*: What function did we use in section 1 to sample at random with replacement from an array? 

*Hint 2*: Remember that `!=` checks for non-equality between two items.

<!--
BEGIN QUESTION
name: q22
-->

In [83]:
trials = np.arange(0,10001, 1)
num_different = [] # When 2 words have different lengths 

# for x in trials (So we do this 10,000 times):
    # ranomly pick 2 words, 
    # if these two words are a different length (use != same), 
        # then assign the trial number to num_different

for x in trials:
    word_1 = np.random.choice(TGG_words)
    word_2 = np.random.choice(TGG_words)
    if len(word_1) != len(word_2):
        num_different.append(x)
    
    
num_different = len(num_different)
num_different 

8702

In [79]:
np.random.choice(TGG_words)

'Edgar.'

Congratulations, you're done with lab 5!