# Homework 1 - Andrew Schwartz - PHYS 403, SPR 2024

## Problem 1

The probability of either $A$ or $B$ (the logical sum $A + B$) being true is
$$
P(A + B|I) = P(A|I) + P(B|I) − P(A, B|I).
$$
It’s easy to see this graphically with a Venn diagram: $A + B = A ∪ B$ is the union of the disjoint sets $A$ and $B$ minus their overlap $AB = A ∩ B$:

![venn](venn.png)

Prove the relation using the rules of probability and basic Boolean algebra.

## Solution

Each of $A$ and $B$ can either be $\texttt{true}$ or $\texttt{false}$ given $I$, meaning there are four possible outcomes, with the combination of $A$ or $B$ being either $\texttt{true}$ or $\texttt{false}$ as shown in the following truth table:

---|---|---|---
. | A | B | A+B
1 | F | F | F
2 | F | T | T
3 | T | F | T
4 | T | T | T

Thus, the probability of $A$ or $B$ being true is the pro

The combination $P(A|I) + P(B|I)$ gives (rows 3 and 4 + rows 2 and 4), whereas the $P(A+B|I)$ can be seen to be (rows 2, 3, and 4). Thus, the sum $P(A|I)+P(B|I)$ double counts the probability of both being true (row 4), so we must subtract that off once ($P(AB|I)$), giving the final answer of
$$
P(A+B|I)=P(A|I)+P(B|I)-P(AB|I).
$$

## Problem 2

According to a 2014 report by the Centers for Disease Control and Prevention, adult smokers are roughly 25 times more likely to develop lung cancer than nonsmokers, with all genders affected equally. Suppose you learn that a given woman has been diagnosed with lung cancer. If you know nothing else about her, what is the probability that she is a smoker? 
*Hint: express your answer in terms of the unknown fraction $s$ of women who smoke.*

Then provide a numerical result using a recent estimate of $s$, documenting the source of your estimate.

## Solution


Given: 
1. $P(C|S)=25P(C|\overline{S})$
2. $P(S)=s=0.101$ as per [CDC](https://www.cdc.gov/tobacco/data_statistics/fact_sheets/adult_data/cig_smoking/index.htm) for 2021

Want: $P(S|C)$

From Bayes's rule, we know that $P(S|C)=\frac{P(C|S)P(S)}{P(C)}$.
Using our given 1, we know 
$$
\begin{align}
P(CS)+P(C\overline{S})&=P(C) \\
P(C|S)P(S)+P(C|\overline{S})P(\overline{S})&=P(C) \\
P(C|S)\left[P(S)+\frac{1}{25}P(\overline{S})\right]&=P(C) \\
P(C|S)&=\frac{P(C)}{P(S)+\frac{1}{25}P(\overline{S})}
\end{align}
$$ 

Thus, we can plus this and given 2 into first equation, and find
$$
\begin{align}
P(S|C)&=\frac{P(C)}{P(C)}\frac{P(S)}{P(S)+\frac{1}{25}P(\overline{S})} \\
&=\frac{s}{s+\frac{1}{25}(1-s)} \\
&\approx0.737
\end{align}
$$

## Problem 3

The gene for blue eye color is recessive. If you have two brown-eyed parents who carry the gene, their child has a 25% chance of having blue eyes. Consider a family with two parents, both with brown eyes who carry the recessive gene, and three children.

### (a)
If it is known that at least one child has blue eyes, what is the probability that at least two children have blue eyes?

### Solution

Want: $P(2+3|1)$ (the probability that 2 or 3 of the children have blue eyes given that one of the children does).

We use Bayes's rule to reverse the conditional:

$$
P(2+3|1+2+3)=\frac{P(1+2+3|2+3)P(2+3)}{P(1+2+3)}.
$$

The first factor is simple; if we know that either 2 or 3 of the children have blue eyes then we know that 1, 2, or 3 of the children do ($P(1+2+3|2+3)=1$), so all we have to determine are $P(2+3)$ and $P(1+2+3)$. The latter is simply $P(1+2+3)=1-P(0)=1-\left(\frac{3}{4}\right)^3=\frac{37}{64}$, and the former is $P(2+3)=1-P(0+1)=1-(P(0)+P(1)-P(01)$. The last term, that of both there being 0 and 1 blue-eyed children, is a contradiction and thus has a probability of 0. The only new probability left is $P(1)=\frac{3}{4}\frac{3}{4}\frac{1}{4}+\frac{3}{4}\frac{1}{4}\frac{3}{4}+\frac{1}{4}\frac{3}{4}\frac{3}{4}=\frac{27}{64}$. Putting our terms together, we have $P(2+3)=1-\left(\frac{27}{64}+\frac{27}{64}\right)=\frac{10}{64}$. Finally, we put these probabilities together to get

$$
\begin{align}
P(2+3|1+2+3)&=\frac{1*\frac{10}{64}}{\frac{37}{64}} \\
&=\frac{10}{37} \\
&=0.\overline{270}
\end{align}
$$

### (b)
If it is known that the youngest child has blue eyes, what is the probability that at least two children have blue eyes?

### Solution
This is the probability that the oldest child has blue eyes ($O$) or the middle child has blue eyes ($M$) given that the youngest child has blue eyes ($Y$) - $P(O+M|Y)$. From problem 1, we know that this probability is 
$$
\begin{align}
&P(O|Y)+P(M|Y)-P(OM|Y) \\
=&1/4+1/4-1/16 \\
=&7/16 \\
=&0.4375
\end{align}
$$

### (c)
Write a short program using a random number generator to simulate both cases. Show that you get the same answer.

In [3]:
try:
    import cupy as np
except ImportError:
    import numpy as np

from prettytable import PrettyTable

In [4]:
# are the eyes blue?
eye_choices = [True, False, False, False]

# generate 10 million sets of children in order (oldest, middle, youngest)
n = 10_000_000
children = np.random.choice(eye_choices, (n, 3))

# at least one has blue eyes (is true) (the given for part a)
a = np.any(children, axis=1)
# how many cases is the given for part a met?
n_a = np.sum(a)

# youngest has blue eyes
b = children[:, 2]
n_b = np.sum(b)

# reduce children to counts of children with blue eyes
n_blue = np.sum(children, axis=1)

# count how many cases have 2 or more blue-eyed children, given that
p_a = np.sum(n_blue[a] >= 2) / n_a
p_b = np.sum(n_blue[b] >= 2) / n_b

# print as a nice table of the expected probability calculated above and the simulated probability found here 
t = PrettyTable()
t.align = 'r'
t.field_names = ["Part", "Expected", "Simulated"]
t.add_rows([
    ["a", 10 / 37, p_a],
    ["b", 7 / 16, p_b]
])
print(t)

+------+--------------------+--------------------+
| Part |           Expected |          Simulated |
+------+--------------------+--------------------+
|    a | 0.2702702702702703 | 0.2702866238622694 |
|    b |             0.4375 | 0.4373643316576401 |
+------+--------------------+--------------------+
