# Statistical Data Management Session 6: Discrete Random Variables (chapter 4 in McClave & Sincich)


**It is not necessary to use Python for all of the exercises. We clearly indicate when it is. ("Run the following cell of code" or "Use Python" etc.)**


## 1. Expected Value

You play the following game at a fair. You pay €4.20 to participate and are presented with a bag which contains 240 marbles, 10 red and 230 blue. You are allowed to pick one at random without looking. If the marble you draw is red, you win €100 back, nothing if the marble was blue.

1. Let $x$ represent your profit. Find $E(x)$.
2. Now let the fee to play be a variable $c$ (used to be €4.20). How many marbles should be red to make this game profitable to you, as a function of $c$? The potential prize of €100 stays the same. Check with $c=4.2$.

## 2. Expected Value *(based on ex. 4.48 from the book)*

**The Showcase Showdown.** On the popular television game show The Price is Right, contestants can play "The Showcase Showdown." The game involves a large wheel with 20 nickel values, 5, 10, 15, 20, ..., 95, 100, marked on it. Contestants spin the wheel once or twice, with the objective of obtaining the highest total score *without going over a dollar (100)*. Let $x$ represent the score of a single contestant playing "The Showcase Showdown." Assume a "fair" wheel (i.e., a wheel with equally likely outcomes). If the total of the player's spins exceeds 100, the total score $x$ is set to 0.

1. If the player is permitted only one spin of the wheel, find the probability distribution for $x$.
2. Find $E(x)$ and interpret this value.
3. Find the standard deviation of $x$.
4. Suppose the player obtains a 20 on the first spin and decides to spin again. What is, in this case, the probability that the player's total score exceeds a dollar (and is reset to 0 consequentially)?
5. Find the probability distribution for $x$ in case the player obtained 20 on the first spin and decides to spin again.
6. Given that the player obtains a 65 on the first spin and decides to spin again, find the probability that the player's total score exceeds a dollar (and is reset to 0 consequentially).
7. **Run the following cell of code.** Make sure you understand what happens. The `scipy.stats` (`sts`) package API may be found [here](https://docs.scipy.org/doc/scipy/reference/stats.html). Have a look at 'rv_discrete'.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats as sts
%matplotlib inline

outcomes = np.arange(5, 101, 5)
print(outcomes)
probabilities = np.empty(20)
probabilities.fill(0.05)

#manually overwrite in case of non-equal probabilities (make sure they sum to 1 though!):
#uncomment following two lines to use this
#probabilities[0] = 0.07
#probabilities[1] = 0.03

print(probabilities)
x = sts.rv_discrete(values=(outcomes, probabilities))
print(x.mean())
print(x.std())
print(x.var())

plt.figure(figsize=(10, 6))
plt.stem(outcomes, x.pmf(outcomes))
plt.title("Stem diagram of discrete probability distribution", fontsize=16)
plt.show()
plt.close()

## 3. The Binomial Random Variable *(based on ex. 4.66 from the book)*

**Caesarian births.** The American College of Obstetricians and Gynecologists reports that 32% of all births in the United States take place by Caesarian section each year. (*National Vital Statistics Reports*, Mar. 2010).
1. In a random sample of 100 births, how many, on average, will take place by Caesarian section?
2. What is the expected standard deviation of the number of Caesarian section births in a sample of 100 births?
3. Assuming that the distribution is mound-shaped, use your answers from 1 and 2 to compute the interval that will likely contain 95% of the observations.
4. **Run the following cell of code.** What do the numerical values represent? (Look up `binom` in the `stats` API!)
5. Plot the cdf **using Python**.

In [None]:
caeserian = sts.binom(100, 0.32)
outcomes = np.arange(0, 101, 1)

plt.figure(figsize=(10, 6))
plt.stem(outcomes, caeserian.pmf(outcomes))
plt.title('PMF of binomial distribution with $n=100$, $p=0.32$', fontsize=16)
plt.show()
plt.close()

print(caeserian.mean()) 
print(caeserian.std()) 
print(caeserian.cdf(20))
print(caeserian.pmf(30))

6. Run the following cell of code, giving the variable a meaningful name. Explain why the two outcomes are the same.

In [None]:
<...> = sts.binom(100, 0.68)
print(caeserian.cdf(40))
print(1 - <...>.cdf(59))

## 4. The Poisson Random Variable

The 1953 storm that caused severe flooding in the Netherlands and Belgium is described as a "once in 250 year storm". (SNICK, I. 2009. "Duizendjarige storm zet zeespiegel vier meter hoger". https://www.standaard.be/cnt/9i25dc0f). **Use Python** to answer the following questions.


1. How many such storms are to be expected the next 1000 years? How does the knowledge that one took place only 70 years ago, influence your answer?
2. Run the following cell of code.

In [None]:
storms = sts.poisson(4)
outcomes = np.arange(0, 15, 1)

# two different plots, but here with some extra tricks to represent both 
# clearly in one graph (you're not required to be able to do this yourself)
plt.figure(figsize=(10,6))
plt.title('Number of storms to be expected over the next 1000 years', fontsize=16)
plt.stem(outcomes, storms.cdf(outcomes), label="cdf")
plt.stem(outcomes, storms.pmf(outcomes), linefmt='C1-', markerfmt='C1o', label="pmf")
plt.legend()
plt.show()
plt.close()

3. What is the probability that 2 or fewer such storms occur the next 1000 years?
4. What is the probability that no such storms occur the next 1000 years?
5. What is the probability that 5 or more such storms occur the next 1000 years?
6. What is the probability that 100 or more such storms occur the next 1000 years?


## 5. Train Delays

You take the train and noticed it has a 15% probability of being delayed. You take a course which has 13 exercise sessions and you will sometimes arrive late because of the train being delayed. If the lecturer wants to allow for only a 1% probability of wrongfully accusing a student, from how many absences onwards should he/she be suspicious you are using the fact that the train is late too often as an excuse? **Use Python to answer this question.**

## 6.  Tracking Missiles *(ex. 4.65 from the book)*

The U.S. government has devoted considerable funding to missile defense research over the past 20 years. The latest development is the Space-Based Infrared System (SBIRS), which uses satellite imagery to detect and track missiles (*Chance*, Summer 2005). The probability that an intruding object (e.g., a missile) will be detected on a flight track by SBIRS is 0.8. Consider a sample of 20 simulated tracks, each with an intruding object. Let $x$ equal the number of these tracks on which SBIRS detects the object. **Use python where required.**

1. Graph the probability mass function (pmf) and the cumulative density function for all possible number of tracks with objects detected by the SBIRS. Use two different stem plots.
2. Find $P(x = 15)$, the probability that SBIRS will detect the object on exactly 15 tracks.
3. Find $P(x \geq 15)$, the probability that SBIRS will detect the object on at least 15 tracks. 
4. Find $E(x)$ and interpret the result.

## 7. Radioactive Decay

A sample of $10^9$ carbon-14 atoms exhibits radioactive decay, after one year only $99.9879 \%$ of carbon-14 atoms remain. **Use Python to answer the following questions**.

1. How many atoms decay on average during one day? (Assume the year in question is not a leap year.)
2. Find the probability that on a given day, between 320 and 360 atoms of the sample decay.