In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("pre05.ipynb")

<table style="width: 100%;">
<tr style="background-color: transparent;">
<td width="100px"><img src="https://cs104williams.github.io/assets/cs104-logo.png" width="90px" style="text-align: center"/></td>
<td>
  <p style="margin-bottom: 0px; text-align: left; font-size: 18pt;"><strong>CSCI 104: Data Science and Computing for All</strong><br>
                Williams College<br>
                Fall 2024</p>
</td>
</tr>


# Prelab 5: Conditionals and Simulations

**Instructions**
- Before you begin, execute the cell at the TOP of the notebook to load the provided tests, as well as the following cell to setup the notebook by importing some helpful libraries. Each time you start your server, you will need to execute these cells again.  
- Be sure to consult your [Python Reference](https://cs104williams.github.io/assets/python-library-ref.html)!
- Complete this notebook by filling in the cells provided. 
- Please be sure to not re-assign variables throughout the notebook.  For example, if you use `max_temperature` in your answer to one question, do not reassign it later on. Otherwise, you will fail tests that you thought you were passing previously.
- There are no hidden tests in prelabs.

<hr/>
<h2>Setup</h2>


In [None]:
# Run this cell to set up the notebook.
# These lines import the numpy, datascience, and cs104 libraries.

import numpy as np
from datascience import *
from cs104 import *
%matplotlib inline

### Conditionals Review

In Python, the boolean data type contains only two unique values:  `True` and `False`. Expressions containing comparison operators such as `<` (less than), `>` (greater than), and `==` (equal to) evaluate to Boolean values. A list of common comparison operators can be found below!

| Comparision    | True example | False example | 
| -------------- | ------------ | ------------- |
| Less than      | 2 < 3        | 2 < 2         | 
| Greater than   | 3 > 2        | 2 > 2         | 
| Less than or equal | 2 <= 2   | 3 <= 2        |
| Greater than or equal | 2 >= 2   | 2 >= 3     |
| Equal          | 3 == 3       | 3 == 2        |
| Not equal      | 3 != 2       | 3 != 3        |

Run the cell below to see an example of a comparison operator in action.

In [None]:
3 > 1 + 1

We can even assign the result of a comparison operation to a variable.

In [None]:
result = 10 / 2 == 5
result

We can also combine two comparisons with `or` (at least one test is true) and `and` (both tests must be true):

In [None]:
x = 1
y = 1
print(y > 0 and x < 2)
print(y > 1 and x < 2)
print(y > 1 or  x < 2)

Arrays are compatible with comparison operators. The output is an array of boolean values.

In [None]:
make_array(1, 5, 7, 8, 3, -1) > 3

Conditional tests are often used in conditional statements that enables you to choose among different code alternatives based on the truth value of an expression.

Here is a basic example.

```
def sign(x):
    if x > 0:
        return 'Positive'
    else:
        return 'Negative'
```

If the input `x` is greater than `0`, we return the string `'Positive'`. Otherwise, we return `'Negative'`.

If we want to test multiple conditions at once, we use the following general format.

```
if <if expression>:
    <if body>
elif <elif expression 0>:
    <elif body 0>
elif <elif expression 1>:
    <elif body 1>
...
else:
    <else body>
```

Only the body for the first conditional expression that is true will be evaluated. Each `if` and `elif` expression is evaluated and considered in order, starting at the top. As soon as a true value is found, the corresponding body is executed, and the rest of the conditional statement is skipped. If none of the `if` or `elif` expressions are true, then the `else body` is executed. 

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 1. Conditional Tests and Statements (20 pts)



<font color='#B1008E'>

##### Learning objectives
    
- Use Python's conditional operators
- Use if statements to conditionally execute statements
</font>

The following cell loads blood pressure data collected by the [National Center for Health Statistics](https://www.cdc.gov/nchs/index.htm) at the CDC.  It includes measurements for about 5,500 people.  Recall that blood pressure is usually reported as two numbers:

* the **systolic pressure** measures the pressure in your arteries when your heart beats; and
* the **diastolic pressure** measures the pressure in your arteries when your heart rests between beats.

"Normal" blood pressure would be measurements like "110 over 80", or a systolic pressure of 110 and diastolic pressure of 80.

In [None]:
bp = Table().read_table("blood_pressure.csv")
bp.show(5)

#### Part 1.1 (5 pts)


We will add new columns to our table.  The first new column will classify each individual as either "young" or "old", depending on whether they are younger than 65 or not.  To do this, first complete the following function that returns "young" or "old" for the given age.

In [None]:
def age_category(age):
    if ...:
        ...
    else:
        ...


print(50, "is", age_category(50))
print(70, "is", age_category(70))


In [None]:
grader.check("p1.1")

#### Part 1.2 (5 pts)


Use Table's `apply` method to apply your `age_category` function to all rows in your `bp` table.  Save that array in `age_categories`.  Then create a new copy of `bp` called `bp_age` with an additional column `Age category` containing the results.

In [None]:
age_categories = ...
bp_age = ...
bp_age.show(10)

Here is a histogram of systolic pressure for the two groups.  Later in the class we'll be able to quantify the difference in groups, but this histogram suggests there is a difference in the two age groups.

In [None]:
bp_age.hist('Systolic', group='Age category')

In [None]:
grader.check("p1.2")

#### Part 1.3 (5 pts)


High blood pressure can cause various serious medical conditions.  The American Heart Associations categorizes blood publishes the following [standards](https://www.heart.org/en/health-topics/high-blood-pressure/understanding-blood-pressure-readings) for assessing elevated pressure levels:


|   BLOOD PRESSURE CATEGORY   | SYSTOLIC      | and/or | DIASTOLIC |
| --------------------------- | --------      |  -     |  ----------|
| 1. Normal                  | less than 120 | and    | less than 80  |
| 2. Elevated                | 120-129       | and    | less than 80  |
| 3. High (Stage 1)          | 130-139       | or     | 80-89         |
| 4. High (Stage 2)          | 140 or higher | or     | 90 or higher  |
| 5. Crisis                  | 180 or higher | or     | 120 or higher |

We'll now categorize the blood pressure of our study participants using these rules.  First, complete the following function to return the category for a measurement.  The categories are "1. Normal", "2. Elevated", etc.  Be sure to include the numbers at the start of the category names to match our tests.

There are many ways to write this.  One approach is to test in the opposite order as the cases are listed.  Try to keep your logic as simple as possible.

In [None]:
def bp_category(systolic, diastolic):
    if systolic >= 180 or diastolic >= 120:
        return "5. Crisis"
    elif ...:
        ...
    ...
    else:
        ...


In [None]:
grader.check("p1.3")

#### Part 1.4 (5 pts)


Apply your `bp_category` function to the measurements in all rows in your `bp_age` table, and create a new copy of the table with an additional column `BP Category` containing the results.

In [None]:
bp_categories = ...
bp_full = ...
bp_full.show(10)

In [None]:
grader.check("p1.4")

Here's a bar chart again illustrating how our two groups of participants differ in their blood pressure categories.  In the coming weeks, we'll learn how to take our general observation that the two age groups differ and make that a precise, quantitative conclusion.

In [None]:
bp_by_category = bp_full.pivot('Age category', 'BP Category')
bp_by_category = bp_by_category.with_columns('young', bp_by_category.column('young') / sum(bp_by_category.column('young')))
bp_by_category = bp_by_category.with_columns('old', bp_by_category.column('old') / sum(bp_by_category.column('old')))
bp_by_category.barh('BP Category')

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 2. For Loops (5 pts)



<font color='#B1008E'>
    
##### Learning objectives
- Use for loops to iterate over elements of an array
</font>

You may wish to go back and review the Loops lecture notebook and video if you are not sure how to approach writing this part.

#### Part 2.1 (5 pts)


This part and the next explore properties of five-letter English words.  We begin by loading the array of five-letter words we saw earlier in the semester.

In [None]:
fives = Table().read_table('fives.csv').column('Five Letter Words')
print(np.random.choice(fives, 10))  # show ten random words from the table

We'd like to compute the proportion of words that contain a given letter.  For example, the proportion of words containing at least one 'a' is about 0.38. Complete the function `proportion_with_letter` to calculate this. The following conditional test will be useful:

In [None]:
# An example of how to test if a letter is in a word
word = 'okapi'
if 'a' in word:
    print('okapi has an a')
if 'b' in word:
    print('okapi has a b')

In [None]:
def proportion_with_letter(letter):
    count = ...
    for ... in ...:
        ...
    return ...

proportion_with_letter('a')

In [None]:
grader.check("p2.1")

With your function, we can now create a bar chart with the proportion of words containing each letter in the alphabet. 

Let's look at the top 10! 

In [None]:
letters = Table().with_columns("letter", make_array('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
                                                        'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 
                                                        's', 't', 'u', 'v', 'w', 'x', 'y', 'z'))
proportions = letters.with_columns("proportion", letters.apply(proportion_with_letter, "letter"))
top_k_letters = proportions.sort('proportion',descending=True).take(np.arange(10))
top_k_letters.barh('letter', 'proportion')

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 3. Simulation and Random Choice (10 pts)



<font color='#B1008E'>
    
##### Learning objectives
- Use `np.random_choice`
- Use loops to simulate the outcome multiple trials involving random chance
</font>

The `np.random.choice` function picks and item at random from an  array. It is equally likely to pick any of the items. Run the cell below several times, and observe how the results change.

In [None]:
# Run several times and observe changes 
nums = make_array(1,2,3,4)
np.random.choice(nums)

If you pass in an integer `n` as the second argument, the function return return an array of `n` different random choices.  By default, `np.random.choice` samples **with replacement**, meaning that each of the `n` items is drawn from all of the items appearing in the array, even if those items have already been picked one or more times.

Run the next cell to see an example of sampling with replacement 10 times from the `nums` array.

In [None]:
# Run several times and observe changes 
np.random.choice(nums, 10)

#### Part 3.1 (5 pts)


We have a dartboard containing ten equal-sized zones with scores taking point values from 1 to 10.  We throw 1,000 darts at the board, and given our level of skill, they land randomly across those 10 zones.  Write code that simulates our total score after 1000 dart tosses.  Here, we'll want to use [np.random.choice](https://cs104williams.github.io/assets/python-library-ref.html#random.choice) to simulate multiple trials.

*Hint*: How can we create the `dart_values` we want? Either `make_array` or `np.arange` could be helpful here.  

In [None]:
num_tosses = 1000
dart_values = ... 
simulated_tosses = ...
total_score = ...
total_score

In [None]:
grader.check("p3.1")

#### Part 3.2 (5 pts)


We'll now write a slightly more sophisticated simulation using our `fives` array from the previous question.  

In particular, using a simulation with 10,000 trials, assign `same_start_count` to the number of times (in 10,000 trials) that two words picked uniformly at random (with replacement) from `fives` start with the same letter.  



*Hints: 
- We have given you a helper function `first_letter` to returns the first letter of a string.  
- If you have strings `a` and `b`, the test `first_letter(a) == first_letter(b)` will compute whether `a` and `b` start with the same letter.
- This is a difficult problem! We recommend adding some "scratch" cells and testing intermediate pieces of your solution. 

In [None]:
def first_letter(word):
    return word[0]

trials = 10000
same_start_count = ...

for ... in ...:
    ...

same_start_count

In [None]:
grader.check("p3.2")

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 4. Putting it All Together (30 pts)



<font color='#B1008E'>
    
##### Learning objectives
- Use `np.count_nonzero` and `np.random_choice`
- Use loops to simulate the outcome of multiple trials involving random chance.
</font>

This question puts together all of this weeks material and is an opportunity to use conditionals, loops, and randomness in the larger context of simulation the payouts for a slot machine.  Our slot machine has three wheels each containing four symbols -- '🍒', '🔔', '🍋', and '🍉'.  When you take a turn, you insert a coin, pull a lever, and spin the wheels.  The payout for the spin depends on the three symbols that are showing when the wheels stop.  The payout may be negative, meaning you lose your money, or non-negative meaning you break even or win money.

We'll use `np.random.choice` to "spin" our wheels:

In [None]:
symbols = make_array('🍒', '🔔', '🍋', '🍉')
np.random.choice(symbols)

Recal that if you pass in an integer `n` as the second argument, the function return return an array of `n` different random choices.

In [None]:
np.random.choice(symbols, 10)

To count the number of times a certain type of symbol is randomly chosen, we can use `np.count_nonzero`, a function that counts the number of non-zero values that appear in an array. When an array of boolean values are passed through the function, it will count the number of `True` values.

Run the next cell to see an example that uses `np.count_nonzero`.

In [None]:
np.count_nonzero(make_array(True, False, False, True, True))

#### Part 4.1 (5 pts)


Assume we took ten symbols at random, and stored the results in an array called `ten_symbols`, as below. Find the number of bell symbols using code (do not hardcode the answer).  

*Hint:* Our solution involves a comparison operator (e.g. `=`, `<`, ...) and the `np.count_nonzero` function.

In [None]:
ten_symbols = make_array('🍉', '🍋', '🍉', '🍉', '🍒', '🍉', '🔔', '🍒', '🔔', '🔔')
number_bells = ...
number_bells

In [None]:
grader.check("p4.1")

#### Part 4.2 (5 pts)


We'll now simulate the outcome of playing the slot machine from lecture 10,000 times.  Here's our slot machine's `payout_for_symbols` function:

In [None]:
slot_symbols = make_array('🍒', '🔔', '🍋', '🍉')

def payout_for_symbols(symbols):
    """
    Given an array of three symbols drawn from slot_symbols, this function
    computes the payout for that combination.
    """
    if np.count_nonzero(symbols == '🍋') > 0:  
        # any lemons means we lose
        return -1  
    elif np.count_nonzero(symbols == '🍒') == len(symbols):
        # all cherries means jackpot!
        return 15
    else:
        # otherwise, we count the bells.
        return 3 * np.count_nonzero(symbols == '🔔')

check(payout_for_symbols(make_array('🍒', '🍒', '🍒')) == 15)

Use that function in a new function that draws three random symbols from `slot_symbols` and returns the payout for those symbols.

In [None]:
def payout_for_one_spin():
    ...

Run this cell a couple times to verify that the result looks good:

In [None]:
payout_for_one_spin()

In [None]:
grader.check("p4.2")

#### Part 4.3 (5 pts)


Now, use our **simulation algorithm** to simulate multiple rounds (one round is one spin). Assign the variable `outcomes` to an array that stores the payouts for 10,000 rounds of playing the slot machine.

*Note:* From now on, we'll use general functions like the `simulate` function we wrote in lecture to run simulations.  However, we'd like you to write the full algorithm from scratch this one time to gain experience building arrays with `np.append`. 

In [None]:
outcomes = ...
for i in ...:
    payout = ...
    outcomes = ...

outcomes

In [None]:
grader.check("p4.3")

#### Part 4.4 (5 pts)


After creating our `outcomes` array, we determine that the proportion of 15's in the array is 0.013.  

Suppose we run the simulation again, but use 1,000,000 trials this time.  In that case, we determine that the proportion of 15's is 0.015. 

Assign the variable `propability_of_15` to the value--either `0.013` or `0.015`--that is closer to the true probability of scoring 15 when playing the slot machine. 

In [None]:
probability_of_15 = ...
probability_of_15

In [None]:
grader.check("p4.4")

#### Part 4.5 (5 pts)


Set `positive_proportion` to the proportion of outcomes in your simulation that had a positive payout. 

In [None]:
positive_proportion = ...
positive_proportion

In [None]:
grader.check("p4.5")

#### Part 4.6 (5 pts)


Set `mean_payout` to the average payout from your simulation.

In [None]:
mean_payout = ...
mean_payout

In [None]:
grader.check("p4.6")

**Thought question:** If you were running a casino, would you want to install our slot machine?  Why or why not?

<hr class="m-0" style="border: 3px solid #500082;"/>

# You're Done!
Follow these steps to submit your work:
* Run the tests and verify that they pass as you expect. 
* Choose **Save Notebook** from the **File** menu.
* **Run the final cell** and click the link below to download the zip file. 

Once you have downloaded that file, go to [Gradescope](https://www.gradescope.com/) and submit the zip file to 
the corresponding assignment. For Prelab N, the assignment will be called "Prelab N Autograder".

Once you have submitted, your Gradescope assignment should show you passing all the tests you passed in your assignment notebook.


## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)