## Sample space analysis

A _sample space_ is the set of all the possible outcomes an action could produce. 

E.g. flipping a coin would produce head or tails. 

In [1]:
sample_space = {"heads", "tails"}

probability_heads = 1 / len(sample_space)
print(f"Probability of choosing heads is {probability_heads}")

Probability of choosing heads is 0.5


### EVENT

An _event_ is the subset of those elements within `sample_space` that satisfy some _event condition_.

Event condition:
- is heads or tails
- is heads
- is tails
- is neither

**Defining event conditions**

In [2]:
def is_head_or_tails(outcome):
    return outcome in sample_space


def is_neither(outcome):
    return not is_head_or_tails(outcome)


def is_heads(outcome):
    return outcome == "heads"


def is_tails(outcome):
    return outcome == "tails"

**Defining an event-detection function**

In [3]:
def get_matching_event(event_condition, sample_space):
    return set([outcome for outcome in sample_space if event_condition(outcome)])

**Detecting events using events conditions**

In [4]:
event_conditions = [is_head_or_tails, is_heads, is_tails, is_neither]

for event_condition in event_conditions:
    print(f"Event condition: {event_condition.__name__}")
    event = get_matching_event(event_condition, sample_space)
    print(f"Event: {event}\n")

Event condition: is_head_or_tails
Event: {'heads', 'tails'}

Event condition: is_heads
Event: {'heads'}

Event condition: is_tails
Event: {'tails'}

Event condition: is_neither
Event: set()



**Computing event probabilities**

In [5]:
def compute_probability(event_condition, generic_sample_space):
    event = get_matching_event(event_condition, generic_sample_space)
    return len(event) / len(generic_sample_space)


for event_condition in event_conditions:
    prob = compute_probability(event_condition, sample_space)
    name = event_condition.__name__
    print(f"Probability of event arising from {name} is {prob}")

Probability of event arising from is_head_or_tails is 1.0
Probability of event arising from is_heads is 0.5
Probability of event arising from is_tails is 0.5
Probability of event arising from is_neither is 0.0


#### Analyzing a biased coin

**Representing a weighted sample space**

In [6]:
weighted_sample_space = {"heads": 4, "tails": 1}

**Checking the weighted sample space size**

In [7]:
sample_space_size = sum(weighted_sample_space.values())
assert sample_space_size == 5

**Checking the weighted event size**

In [8]:
event = get_matching_event(is_head_or_tails, weighted_sample_space)
event_size = sum(weighted_sample_space[outcome] for outcome in event)
assert event_size == 5

**Defining a generalized event probability function**

In [9]:
def compute_event_probability(event_condition, generic_sample_space):
    event = get_matching_event(event_condition, generic_sample_space)
    if type(generic_sample_space) == type(set()):
        return len(event) / len(generic_sample_space)
    event_size = sum(generic_sample_space[outcome] for outcome in event)
    return event_size / sum(generic_sample_space.values())

**Computing weighted event probabilities**

In [10]:
for event_condition in event_conditions:
    prob = compute_event_probability(event_condition, weighted_sample_space)
    name = event_condition.__name__
    print(f"Probability of event arising from {name} is {prob}")

Probability of event arising from is_head_or_tails is 1.0
Probability of event arising from is_heads is 0.8
Probability of event arising from is_tails is 0.2
Probability of event arising from is_neither is 0.0


## Computing nontrivial probabilities

How to use `compute_event_probability` to solve problems?

### Problem 1: Analyzing a family with four children

Suppose a family has four children. What is the probability that exactly two of the children are boys? Each child is likely equal to be a boy or a girl (unweighted sample space).

**Computing the sample space of children**

In [11]:
possible_children = ["boy", "girl"]
sample_space = set()

for child1 in possible_children:
    for child2 in possible_children:
        for child3 in possible_children:
            for child4 in possible_children:
                outcome = (child1, child2, child3, child4)
                sample_space.add(outcome)
sample_space

{('boy', 'boy', 'boy', 'boy'),
 ('boy', 'boy', 'boy', 'girl'),
 ('boy', 'boy', 'girl', 'boy'),
 ('boy', 'boy', 'girl', 'girl'),
 ('boy', 'girl', 'boy', 'boy'),
 ('boy', 'girl', 'boy', 'girl'),
 ('boy', 'girl', 'girl', 'boy'),
 ('boy', 'girl', 'girl', 'girl'),
 ('girl', 'boy', 'boy', 'boy'),
 ('girl', 'boy', 'boy', 'girl'),
 ('girl', 'boy', 'girl', 'boy'),
 ('girl', 'boy', 'girl', 'girl'),
 ('girl', 'girl', 'boy', 'boy'),
 ('girl', 'girl', 'boy', 'girl'),
 ('girl', 'girl', 'girl', 'boy'),
 ('girl', 'girl', 'girl', 'girl')}

In [12]:
import itertools

# The `*` unpacks the list, making this similar as product(possible_children, possible_children, possible_children, possible_children)
all_combinations = itertools.product(*(4 * [possible_children]))
assert set(all_combinations) == sample_space

In [13]:
sample_space_efficient = itertools.product(possible_children, repeat=4)
assert set(sample_space_efficient) == sample_space

**Computing the probabilities of two boys**

In [14]:
def has_two_boys(outcome):
    return len([child for child in outcome if child == "boy"]) == 2


prob = compute_event_probability(has_two_boys, sample_space)

print(f"Probability of 2 boys is {prob}")

Probability of 2 boys is 0.375


### Problem 2: Analyzing multiple die rools

We have a fair six-sided die whose faces are numbered from 1 to 6. The die is rolled 6 times. What is the probability that these six die rolls add up to 21?

In [15]:
possible_rolls = list(range(1, 7))
possible_rolls

[1, 2, 3, 4, 5, 6]

In [16]:
sample_space = set(itertools.product(possible_rolls, repeat=6))


def has_sum_of_21(outcome):
    return sum(outcome) == 21


prob = compute_event_probability(has_sum_of_21, sample_space)
print(f"6 rolls sum to 21 has probability of {prob:.3f}")

6 rolls sum to 21 has probability of 0.093


### Problem 3: Computing die-roll probabilities using weighted sample space.

Similar to Problem 2, but we converted out unweighted sample space set into a weighted sample space dictionary.



In [17]:
from collections import defaultdict

weighted_sample_space = defaultdict(int)
for outcome in sample_space:
    total = sum(outcome)
    weighted_sample_space[total] += 1
len(weighted_sample_space)

31

**Checking very rare die-roll combinations**

In [18]:
# We must roll all 1s to get sum of 6.
assert weighted_sample_space[6] == 1

# We must roll all 6s to get sum of 36.
assert weighted_sample_space[36] == 1

**Checking a more common die-roll combination**

In [19]:
num_combinations = weighted_sample_space[21]
print(f"There are {num_combinations} ways for 6 die rolls to sum to 21")

There are 4332 ways for 6 die rolls to sum to 21


**Comparing weighted events and regular events**

In [20]:
event = get_matching_event(has_sum_of_21, sample_space)
assert weighted_sample_space[21] == len(event)
assert sum(weighted_sample_space.values()) == len(sample_space)

**Computing the weighted event probability of die rolls**

In [21]:
prob = compute_event_probability(lambda x: x == 21, weighted_sample_space)
assert prob == compute_event_probability(has_sum_of_21, sample_space)
print(f"6 rolls sum to 21 with probability of {prob:.3f}")

6 rolls sum to 21 with probability of 0.093


**Comparing weighted to unweighted event space size**

What is the benefits of using weighted sample space? Less memory usage.

In [22]:
print("Number of elements in unweighted sample space:")
print(len(sample_space))

print()

print("Number of elements in weighted sample space:")
print(len(weighted_sample_space))

Number of elements in unweighted sample space:
46656

Number of elements in weighted sample space:
31
