# Let's *Program to Learn*: Probability

**Motivation:** Assume we have data of the weather statistics for the last 200 years. Analysing the data we find out that *on average* in July and in a certain city the weather is rainy for 5 days out of 31 days. Suppose there is not much change in the climate. What would be the probability of having a rainy day on 10th of July in the coming year?
In this case, we use the past relative frequency of certain event (rainy days) to predict the event in the future. We achive this by using statistical data from a certain events to assign a *numerical value* to the probability of events. Basically, we are emplying [the empirical law of large numbers](https://en.wikipedia.org/wiki/Law_of_large_numbers). 
In the following we use this concept to practice probability.

## Simulation

**Problem:** There is a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles. What is the probability of pulling *one yellow* marble?

We try to simulate the experiment and find (an estimation) of the answer. 
The first step is to model the experiment: how should we model the jar and marbles? In order to find a proper structure to define the elements of our experiments, it is important to understand what actions are needed to perform on the elements.

First, we need to scale down the problem into a simpler case and see how we can simulate an experiment for probability problems. We start with tossing a fair coin. What is the probability of having one *Head*?
We simply choose to model the head of the coin with 0 and tail with 1. This is just a choice to make. Another choice would be using a more readable and understandable model: $\{'H','T'\}$. How do we model the action of tossing the coin? The result of the action (tossing) is a random selection from $\{0,1\}$ (or $\{'H','T'\}$) where 0 indicates the head and 1 represents the tail. Let's see how Python supports this operation: check it [here](https://docs.python.org/3/library/random.html).

In [24]:
import random

count_head = 0
max_num_experiments = 1_000_000
for _ in range(0,max_num_experiments):
    side = random.randint(0,1)
    if side == 0: # if side is Head
        count_head = count_head + 1
print('Number of heads:',count_head)
print('Probability of Head:',count_head/max_num_experiments)

Number of heads: 499644
Probability of Head: 0.499644


We have experienced that based on [the empirical law of large numbers](https://en.wikipedia.org/wiki/Law_of_large_numbers) we can calculate the frequency of an event and find an *estimation* of the probability. 
Therefore, back to our original jar of marbles problem, we need a structure such that: (1) can simulate pulling the marbles randomly, (2) can log the frequency of target event (*what is our target event?*). We try our first code.

In [2]:
import random as rand

jar = ['Y','Y','Y','B','R','R','G','G'] # a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles

count_yellow = 0 # target event
max_num_experiments = 1_000_000

for _ in range(0,max_num_experiments):
    rand.shuffle(jar) # we shuffle the container
    marble = rand.choice(jar) # select an item from a list randomly
    if marble == 'Y':
        count_yellow = count_yellow + 1
print('Number of yellow pulls:',count_yellow)
print('Probability of pulling one Yellow marble from ',jar,' is :',count_yellow/max_num_experiments)

Number of yellow pulls: 374874
Probability of pulling one Yellow marble from  ['Y', 'Y', 'R', 'R', 'G', 'Y', 'G', 'B']  is : 0.374874


It seems we have found a solution. But, we become more curious and interested in finding the probability of other events: pulling one red marble? what about pulling one green marble? We need to refactor the code such that we keep the frequencies of all possible events. So after executing the experiment we have the answer for other events.

In [3]:
import random as rand

jar = ['Y','Y','Y','B','R','R','G','G'] # a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles
count_events = [0,0,0,0] # each index indicates a color
yellow_index = 0
red_index = 1
blue_index = 2
green_index = 3

max_num_experiments = 1_000_000
for _ in range(0,max_num_experiments):
    rand.shuffle(jar)
    marble = rand.choice(jar) # select an item from a list randomly
    if marble == 'Y':
        count_events[yellow_index] = count_events[yellow_index] + 1
    if marble == 'R':
        count_events[red_index] = count_events[red_index] + 1
    if marble == 'B':
        count_events[blue_index] = count_events[blue_index] + 1
    if marble == 'G':
        count_events[green_index] = count_events[green_index] + 1
print('Number of events:',count_events)
print('Probability of pulling one Yellow marble from ',jar,' is :',count_events[yellow_index]/max_num_experiments)
print('Probability of pulling one Red marble from ',jar,' is :',count_events[red_index]/max_num_experiments)
print('Probability of pulling one Green marble from ',jar,' is :',count_events[green_index]/max_num_experiments)
print('Probability of pulling one Blue marble from ',jar,' is :',count_events[blue_index]/max_num_experiments)

Number of events: [375464, 249155, 125128, 250253]
Probability of pulling one Yellow marble from  ['R', 'B', 'Y', 'Y', 'Y', 'G', 'G', 'R']  is : 0.375464
Probability of pulling one Red marble from  ['R', 'B', 'Y', 'Y', 'Y', 'G', 'G', 'R']  is : 0.249155
Probability of pulling one Green marble from  ['R', 'B', 'Y', 'Y', 'Y', 'G', 'G', 'R']  is : 0.250253
Probability of pulling one Blue marble from  ['R', 'B', 'Y', 'Y', 'Y', 'G', 'G', 'R']  is : 0.125128


Is it possible to write the code differently? Encoding information as a list of frequencies works fine. But, there are some hidden information which affects the readability and understandability of the code. Let's refactor the code and use dictionaries.

In [4]:
import random as rand

jar = ['Y','Y','Y','B','R','R','G','G'] # a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles
count_events = {'Y':0 , 'R':0 , 'G':0 , 'B':0 } # a dictionary with keys are colors and values as frequencies

max_num_experiments = 1_000_000
for _ in range(0,max_num_experiments):
    rand.shuffle(jar)
    marble = rand.choice(jar) # select an item from a list randomly
    count_events[marble] = count_events[marble] + 1
print('Number of events:',count_events)
probabilities = {'Yellow':count_events['Y']/max_num_experiments , 'Red':count_events['R']/max_num_experiments , 'Green':count_events['G']/max_num_experiments , 'Blue':count_events['B']/max_num_experiments }
print('Probability of pulling marbles from ',jar,' is :',probabilities)

Number of events: {'Y': 374527, 'R': 250737, 'G': 249287, 'B': 125449}
Probability of pulling marbles from  ['Y', 'G', 'R', 'B', 'R', 'G', 'Y', 'Y']  is : {'Yellow': 0.374527, 'Red': 0.250737, 'Green': 0.249287, 'Blue': 0.125449}


Next step: Refactor the first experiment and make both code as reusable pieces of code segments. Functions in programming can be helpful in dividing long code into smaller segments such that each segment can be reused (called) in other pieces of the program.


In [7]:
import random as rand


max_num_experiments = 1_000_000 # global variable

def simulate_marble():
    '''This function simulates pulling one marble from a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles.'''
    jar = ['Y','Y','Y','B','R','R','G','G'] # a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles
    count_events = {'Y':0 , 'R':0 , 'G':0 , 'B':0 } # a dictionary with keys are colors and values as frequencies

    for _ in range(0,max_num_experiments):
        rand.shuffle(jar)
        marble = rand.choice(jar) # select an item from a list randomly
        count_events[marble] = count_events[marble] + 1
    print('Number of events:',count_events)
    probabilities = {'Yellow':count_events['Y']/max_num_experiments , 'Red':count_events['R']/max_num_experiments , 'Green':count_events['G']/max_num_experiments , 'Blue':count_events['B']/max_num_experiments }
    print('Probability of pulling marbles from ',jar,' is :',probabilities)

def simulate_toss_coin():
    '''This function simulates tossing a fair coin.'''
    coin = ['H','T']
    count_events = {'H':0 , 'T':0}
    
    for _ in range(0,max_num_experiments):
        rand.shuffle(coin)
        side = rand.choice(coin)
        count_events[side] = count_events[side] + 1
    print('Number of events:',count_events)
    probabilities = {'Head':count_events['H']/max_num_experiments , 'Tail':count_events['T']/max_num_experiments}
    print('Probability of sides:',coin,' is :',probabilities)
    
simulate_marble()
simulate_toss_coin()

Number of events: {'Y': 375368, 'R': 250030, 'G': 249365, 'B': 125237}
Probability of pulling marbles from  ['R', 'G', 'R', 'Y', 'Y', 'Y', 'B', 'G']  is : {'Yellow': 0.375368, 'Red': 0.25003, 'Green': 0.249365, 'Blue': 0.125237}
Number of events: {'H': 500345, 'T': 499655}
Probability of sides: ['T', 'H']  is : {'Head': 0.500345, 'Tail': 0.499655}


Reviewing the implementation of both functions we recognise some similarities (*like what?*). Can we improve the code? Let's refactor the code.

In [8]:
import random as rand


def simulate_one_rand_event(sample_space, frequencies, max_num_experiments = 1_000_000):
    '''This function simulates an experiment given the sample space and an structure to calculate frequencies.'''

    for _ in range(0,max_num_experiments):
        rand.shuffle(sample_space)
        sample = rand.choice(sample_space) # select an item from a list randomly
        frequencies[sample] = frequencies[sample] + 1
    probabilities = {}
    for key in frequencies.keys():
        probabilities[key] = frequencies[key]/max_num_experiments
    return probabilities

    
def main():
    result_marble = simulate_one_rand_event(['Y','Y','Y','B','R','R','G','G'] , {'Y':0 , 'R':0 , 'G':0 , 'B':0 })
    result_coin = simulate_one_rand_event(['H','T'] , {'H':0 , 'T':0} , 10_000_000)
    print(result_marble)
    print(result_coin)

main()

{'Y': 0.374752, 'R': 0.250385, 'G': 0.24915, 'B': 0.125713}
{'H': 0.4999809, 'T': 0.5000191}


**Exercise:** Do we need to pass frequencies? Refactor the code. 

## Theory

### Basic Concepts

**Relative Frequency:** Suppose $A$ is a random event. The relative frequency of the event $A$ in $n$ repetitions of the experiment is defined as $f_n(A)=n(A)/n$, where $n(A)$ is the number of times that event $A$ occurred in the $n$ repetitions of the experiment. The relative frequency is a number between 0 and 1 (why?).

**Random Process:** A process is a random process if after its occurance one outcome out of all its possible outcomes for sure occures. But, it is impossible to predict with certainty which outcome that will be.

**Sample Space:** In theory, the sample space of a random process (or an experiment) is the set of all possible outcomes of that experiment. 

**Event:** Any *subset* of the sample space is called an *event*. 

**Probability of an Event:** A *probability measure* is simply a function $P$ that assigns a numerical probability to each subset of the sample space. If $S$ is a finite sample space containing all outcomes with equal likelihood and $E$ is an event in $S$, then probability of $E$ will be: $P(E) = |E| / |S|$

For example, tossing a fair coin can be considered as a random process. The sample space is the set of all possible outcomes, i.e. $\{Head, Tail\}$. One event can be having a head which is the subset $\{Head\}$, and another event can be $\{Tail\}$. The probability of having head will be: $P(\{Head\}) = \frac{|\{Head\}|}{|\{Head, Tail\}|} = \frac{1}{2}=0.5$.

**Exercise:** What are the domain, codomain and range of function $P$ (in the coin tossing experiment)?


In [19]:
# **Exercise:** There is a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles. 
# We pull one marble. What is the random process? What is the sample space? What are the events? 
# What is the probability of pulling *one yellow* marble?
sample_space = set(['Y'+str(i) for i in range(1,4)]+['R'+str(i) for i in range(1,3)]+['G'+str(i) for i in range(1,3)]+['B'])
event_marble_is = {'Yellow':set(['Y'+str(i) for i in range(1,4)]),'Red':set(['R'+str(i) for i in range(1,3)]),'Green':set(['G'+str(i) for i in range(1,3)]),'Blue':set(['B'+str(i) for i in range(1,2)])}
p = len(event_marble_is['Yellow'])/len(sample_space)
print(sample_space , event_marble_is , p)

{'R2', 'R1', 'Y1', 'B', 'G2', 'Y3', 'G1', 'Y2'} {'Yellow': {'Y3', 'Y2', 'Y1'}, 'Red': {'R2', 'R1'}, 'Green': {'G1', 'G2'}, 'Blue': {'B1'}} 0.375


**Problem:** There are two fair coins. We toss both independently and at the same time. What is the probability of having (only) one head? Propose your solution both in Python simulation and theory.

**Problem:** What is the probability of getting the sum of 7 when rolling two 6-sided dice? Propose your solution both in Python simulation and theory.

## We need to count

**Motivation:** We have learned that in order to calculate the probability of an event, we need to count. We need to know the size of the event and sample space. In some of the cases it is easy, like tossing a fair coin: there are two possibilities and we need to find the likelihood of one. But, there are some cases that counting is not as simple as tossing a coin. For example, in a survey there are 20 questions with three possible answers for each (yes, no, maybe). What is the probability of ...? In order to calculate the probability of any event here, we need to know the sample space and the size of the event. So, let's explore some basic techniques of counting: 1,2,3,...

**Problem:**  In a survey there are 20 questions with three possible answers for each (yes, no, maybe). What is the probability of having no for all quenstions?

**Exercise:** Implement a simulation for the experiment and find an estimation for the probability. What is the result? Discuss.

In [20]:
def simulate_survey_qs(num_of_experiments = 10_000_000):
    '''In a survey there are 20 questions with three possible answers for each (yes, no, maybe). What is the probability of having no for all quenstions?'''
    result = {'Freq':0,'Prob':0.0}
    questions = [[0,1,2]]*20 # 0: NO, 1: MAYBE, 2: YES
    # todo: implement the experiment here
    
    result['Prob'] = result['Freq']/num_of_experiments
    return result

print(simulate_survey_qs.__doc__,simulate_survey_qs())

In a survey there are 20 questions with three possible answers for each (yes, no, maybe). What is the probability of having no for all quenstions? {'Freq': 0, 'Prob': 0.0}


In order to find a solution for the survey problem first we scale dowm the problem and see if we can find a pattern in the solution. Let's assume a survey with only 3 questions. How many possible options do we have for question 1? 3 possible answers, right? Easy. How many options do we have for question 2? Again, 3 options. Now, how many options do we have for the first two questions? Obviously, for each choice for question 1, we have three possible options for question 2. This problem turned to be generating cartesian product of two sets: $Q1=\{Y,M,N\}$ and $Q2=\{Y,M,N\}$. The result of $Q1 \times Q2$ (What is the result?) can generate all possible options for answers of questions 1 and 2. If we extend this with possible answers of question 3 then we will have three sets, i.e. $Q1 \times Q2 \times Q3$ (what is the result?). Let's generalise this and solve it for 20 questions. What will be the result of $Q1 \times Q2 \times ... \times Q20$?

**Exercise (challenging):** Implement your (without using existing libraries) Python solution to generate all possible combinations of the answers for 20 questions.

While you are busy with proposing your solution for the previous exercise, I take the easy path and use an existing library: [itertools](https://docs.python.org/3/library/itertools.html). Check the code below.

In [23]:
import itertools as iter

options = {0,1,2} # to save some space in the print, 1,2,3 are used instead of No, Maybe, Yes.
num_questions = 4 # to manage the space here, num of questions is defined as 4.
survey_sample_space = iter.product(options,repeat=num_questions) # check the reference
print(set(survey_sample_space))

{(1, 2, 1, 1), (2, 1, 0, 0), (2, 1, 1, 1), (0, 1, 2, 1), (0, 1, 1, 2), (0, 1, 0, 0), (2, 2, 1, 0), (0, 2, 2, 1), (2, 2, 0, 1), (1, 0, 2, 2), (0, 2, 0, 1), (2, 0, 0, 1), (1, 0, 1, 0), (0, 2, 1, 2), (0, 0, 2, 0), (2, 2, 2, 1), (1, 1, 0, 1), (2, 0, 1, 1), (2, 0, 2, 0), (0, 0, 2, 2), (1, 1, 2, 0), (1, 2, 1, 0), (2, 0, 2, 2), (2, 1, 1, 0), (2, 1, 0, 2), (1, 2, 0, 1), (0, 1, 2, 0), (1, 2, 1, 2), (1, 2, 2, 1), (0, 1, 1, 1), (1, 1, 1, 0), (0, 0, 0, 0), (2, 1, 1, 2), (2, 1, 2, 1), (1, 0, 0, 1), (0, 1, 0, 2), (2, 2, 1, 2), (0, 2, 2, 0), (1, 0, 2, 1), (2, 0, 0, 0), (0, 2, 1, 1), (1, 1, 1, 2), (0, 0, 0, 2), (0, 0, 1, 1), (1, 0, 1, 2), (2, 0, 0, 2), (0, 0, 2, 1), (1, 1, 2, 2), (2, 1, 0, 1), (1, 2, 0, 0), (0, 1, 2, 2), (1, 2, 2, 0), (0, 1, 1, 0), (2, 2, 0, 0), (0, 2, 0, 0), (2, 1, 2, 0), (1, 0, 0, 0), (1, 2, 0, 2), (0, 1, 0, 1), (2, 2, 1, 1), (2, 2, 2, 0), (1, 1, 0, 0), (1, 0, 2, 0), (0, 2, 2, 2), (1, 2, 2, 2), (2, 2, 0, 2), (0, 2, 1, 0), (0, 2, 0, 2), (1, 1, 1, 1), (0, 0, 0, 1), (2, 0, 1, 0), (0, 0

**Exercise:** Back to our original problem (suervey of questions), what is the event? Do you see the event in the generated sample space? What is the probability of having No for a survey of 4 questions? Discuss your solution.

**Combinatorics** is field of counting and generating. We have started with this field here and now we would like to formulate it for our problems.

**Exercise:** Above we saw how to generate the sample space for a survey of four questions. What is the cardinality? Can you calculate the cardinality of the sample space for a survey of 20 questions? Note: after running your code continue with the rest.

**Rule 1 (Multiplication)** If there are $n$ options for stage (step, problem, etc) one and $m$ options for stage two, then there are $m \times n$ options for both stages provided that stages are *independent*.

**Exercise:** You are asked to generate unique codes for a group of students using four characters: first character can be I or T, second character must be 0 and the third and fourth characters can be any digit except 0. How many students can be uniquely coded? Implement a program that can generate all the codes.



In [6]:
'''Exercise: What is the final value of count after finishing the execution?'''

count = 0
for _ in range(0,25):
    for _ in range(5,30):
        for _ in range(0,100):
            count = count + 1

**Exercise:** Let's not forget our starting problem: probablity of having **NO** for all the questions in a survey with 20 questions where each question had three options. What is the cardinality of sample space? What is the carinality of event (no for all the questions)? What is the probability?
<!-- Answers: $|sample_space| = 3^{20}$, $|event| = 1$ , $P(\text{No for all 20 questions})=\frac{1}{3^{20}}$ -->

**Problem:** Assume the word `COMPUTER`. Let's assume we randomly arrange the letters of the word. What is the probability that the letters `CO` remain next to each other as a unit?

We will tackle this problem with three different techniques: we implement a simulation in Python, using itertools from Python we generate event and sample space to calculate the probability, and we use multiplication rule to find the answer without programming.

*Simulation:* Similar to the previous simulations we simulate the experiment: we shuffle the word and we check if `CO` remains together. See the code below.

In [54]:
import random as rand

word = 'COMPUTER'
sub_str = 'CO'
count = 0
max_experiment = 1_000_000
for _ in range(0,max_experiment):
    word_list = list(word)
    rand.shuffle(word_list)
    shuffled_word = ''.join(word_list)
    if sub_str in shuffled_word:
        count+=1
print('Probability (in simulation) of having CO in a random word using COMPUTER is:', count/max_experiment)



Probability (in simulation) of having CO in a random word using COMPUTER is: 0.125148


**Exercise:** Use your programming skills and try to generalise the above code with the help of functions, higher-order functions.

*Theory in Python (itertools):* First we need to generate the event and sample space. In order to generate the sample space we need to generate all possible orders of the letters within `COMPUTER`. In combinatorics different arrangements of some objects is called *permitation* and the library `itertools` in Python provides a function for this. Check the code below.

In [59]:
import itertools as it

word = 'COMPUTER'
sub_str = 'CO'
ss_disjoined = list(it.permutations(word)) # the sample space will be generated with each word as disjoined letters.
ss_joined = list(map(''.join,ss_disjoined)) # here we join the letter together for all the words within the sample space.
# what is map() ? 
event = list(filter(lambda s: sub_str in s , ss_joined )) # we remove the words that don't have CO together.
# what is filter() here?
print('Probability (theory with itertools) of having CO in a random word using COMPUTER is:',len(event)/len(ss_joined))

Probability (theory with itertools) of having CO in a random word using COMPUTER is: 0.125


**Exercise:** We used lists for our sample space and event. But we have learned they are sets. Try the solution with sets and check the final result. Why there is no difference?

*Theory (without programming):* Let's apply multiplication rule and count the number of elements in the sample space. We would like to generate words using the letters of `COMPUTER`. We have 8 (length of the word) places to fill: 8 options for the first place, 7 options for the second place (why 7? repeatition is not allowed), 6 options for the third place and so on. Then the answer will be $8\times7\times6\times5\times4\times3\times2\times1$. In mathematics there is shorter way of writing the multiplication of all the numbers less than or equal to 8. They call it as *8 factorial* and they use the notation $!$ for it.
Now we need to count the event. We keep `CO` as one unit and try to move this one unit together. If we assume that instead of `CO` we have `X`, then our problem statement becomes "what are the permutations of the word `XMPUTER`". The answer is: $7!$ (why?).
Now based on the theory of the probability we can say $P(E)=\frac{7!}{8!}=\frac{1}{8}$. Does our answer match our previous results?

**Define $n!$ ** We can define $n!$ (read is n factorial) recursively as $n!=n\times(n-1)!$ or simply as $n!=n\times(n-1)\times ... \times 2 \times 1$ where $...$ means all the numbers between its left and right.
**Permutation:** In combinatorics if we are interested in counting all possible arrangements of some objects we call it as permutations of objects. All permutations of $n$ objects (no repeating object) will be $P_{n}=n!$.  

It seems we are using more and more functions from `itertools`. Let's spend some time to read practice with them.
Here we list some learning resources:
- `map()`:[Reference](https://docs.python.org/3/library/functions.html?highlight=map#map), [Examples](https://www.programiz.com/python-programming/methods/built-in/map)
- `filter()`: [Reference](https://docs.python.org/3/library/functions.html?highlight=map#filter), [Examples](https://www.programiz.com/python-programming/methods/built-in/filter)
- `join()`: [Reference](https://docs.python.org/3/library/stdtypes.html#str.join) , [Examples](https://www.programiz.com/python-programming/methods/string/join)
- `zip()`:[Reference](https://docs.python.org/3/library/functions.html?highlight=zip#zip), [Examples](https://www.programiz.com/python-programming/methods/built-in/zip)
- `product()`:[Reference](https://docs.python.org/3/library/itertools.html?highlight=combination#itertools.product), [Examples](https://note.nkmk.me/en/python-itertools-product/)
- `permutations()`:[Reference](https://docs.python.org/3/library/itertools.html?highlight=permutation#itertools.permutations), [Examples](https://inventwithpython.com/blog/2021/07/03/combinations-and-permutations-in-python-with-itertools/)
- `combinations()`:[Reference](https://docs.python.org/3/library/itertools.html?highlight=combination#itertools.combinations) , [Examples](https://inventwithpython.com/blog/2021/07/03/combinations-and-permutations-in-python-with-itertools/)

In [1]:
# some useful functions from itertools

from itertools import * # what is new here?

L1 = list('abc')
L2 = list('123')
zipped = zip(L1,L2) # aggregates elements from iterables
# todo: print elements of zipped 
# todo: experiment what if iterables of zip do not have same length?
# todo: experiment what if we pass more than 2 iterables to zip?
s = '_'.join(('a','1'))  # joins string elements of the iterable
# todo: check what is the result? what is the separator?
# todo: concat elements of zipped without any separator
p1 = product(L1,L2)
p2 = product(range(1,7),repeat=2) # why repeat= is needed?
# print the results

# design some exercises here to practice other functions from the list above.


**Problem:** Assume the word `ANALYSIS`. Let's assume we randomly arrange the letters of the word. What is the probability that the letters `SIS` remain next to each other as a unit?

Similar to our previous problem, we will tackle this problem with three different techniques: we implement a simulation in Python, using itertools from Python we generate event and sample space to calculate the probability, and we use multiplication rule to find the answer without programming.

*Simulation:* Similar to the previous simulations we simulate the experiment: we shuffle the word and we check if `SIS` remains together. See the code below.

In [1]:
import random as rand

word = 'ANALYSIS'
sub_str = 'SIS'
count = 0
max_experiment = 1_000_000
for _ in range(0,max_experiment):
    word_list = list(word)
    rand.shuffle(word_list)
    shuffled_word = ''.join(word_list)
    if sub_str in shuffled_word:
        count+=1
print('Probability (in simulation) of having SIS in a random word using ANALYSIS is:', count/max_experiment)

Probability (in simulation) of having SIS in a random word using ANALYSIS is: 0.03593


*Theory in Python (itertools):* First we need to generate the event and sample space. In order to generate the sample space we need to generate all possible orders of the letters within `ANALYSIS`. This is a permutation problem. There is a small difference with the previous case: Some letters are repeated. Check the code below.

In [3]:
import itertools as it

word = 'ANALYSIS'
sub_str = 'SIS'
ss_disjoined = list(it.permutations(word)) # the sample space will be generated with each word as disjoined letters.
ss_joined = set(map(''.join,ss_disjoined)) # here we join the letter together for all the words within the sample space.
# what is map() ? 
event = set(filter(lambda s: sub_str in s , ss_joined )) # we remove the words that don't have SIS together.
# what is filter() here?
print('Probability (theory with itertools) of having SIS in a random word using the letters of ANALYSIS is:',len(event)/len(ss_joined))

Probability (theory with itertools) of having SIS in a random word using the letters of ANALYSIS is: 0.03571428571428571


*Theory (without programming):* This is a permutation problem, but a special type: a permutation with repeated elements. We would like to generate words using the letters of `ANALYSIS`. The letters `A` and `S` are repeated twice. Therefore, $|S|=\frac{8!}{2!\times 2!}$ (why?). Similarly, $|E|=\frac{6!}{2!}$ (why?). Then, $P(E)=\frac{3\times 5!}{2\times 7!}=\frac{1}{28}$. 

# [The rest is in progress]

**Problem** We toss a fair 6-sided dice. What is the probability of having 6? What is the probability of having any number except 6? How can we tackle this problem? 
<!-- There are two ways. Let's first discuss the easiest way. We have already learned how to calculate the prbability of having 6. The probability of others except 6 will be 1-P(side is6). How can we justofy this using sets?[todo: rule of addition] -->

In [13]:
import itertools as it

days = ['Fri','Sat','Sun']
dishes = ['Chinese','Mexican','Pizza','Pasta','Fries']
dishes_products = it.product(dishes,repeat=len(days))
# print(set(dishes_products))
mappings = [dict(zip(days,m)) for m in list(dishes_products)]
# print('Number of possible mappings:',len(mappings), '\n Mappings:',mappings)



**Exercise:** You are asked to generate plate numbers for cars. The plate number will be constructed with a format like: AB-CD-EF-GHIJ where AB are two letters (English alphabets), CD can be any digits except 0, EF are again two letters and GHJI are four digits. As output print only the first 10 numbers. How many cars can be numbered?
<!--
*Solution:* Apply multiplication rule: $28^{2}\times8^{2}\times28^{2}\times9^{2}}$ The code below will generate all possible numbers with a correct format. 
-->

**Exercise:** You need to go from point A to D. Between these points you need to stop at points B and C to do some shoppings. Between A to B you have three options to commute: $AB=\{Bus,Tram,Car\}$. Options between B to C are $BC=\{Bike, Walk\}$ and finally $CD=\{Bus,Bike\}$. In how many ways you can commute between A and D? Generate the possible options (with and without itertools).
<!--
*Solution:* Apply multiplication rule and the answer will be $|AB|\times|BC|\times|CD|$. To generate all the possible cases we can find the result of $AB \times BC \times CD$. The code below will generate all the possible cases. 
-->

In [16]:
import itertools as it
AB={'Bus','Tram','Car'}
BC={'Bike', 'Walk'}
CD={'Bus','Bike'}
commute = set(it.product(AB,BC,CD))
#print(commute)

**** Assume you want to choose a book from your library. The library has three shelves. Shelf one has 57 books on it, shelf two 26, and shelf three 44. In how many different ways can you choose a book?
<!--
*Solution:* Apply addition rule. 
-->

Permutation involves arranging a set of objects or data in sequential order and determining the number of ways it can be arranged. An important point to remember here is that the order of arrangement of objects/ data matters in permutation.

n! can be applied when repetition is not allowed. Drive the rule from the basic multiplication rule.

Example: How many numbers can be generated using the set $S=\{2,5,6,8\}$ when:
- repetition is allowed.
- repetition is not allowed.

**Exercise:** There are 10 cars participating in a race. In how many ways they can finish the end line?
<!--
*Solution:* This is a permutation problem. We need to find the number of possible orders and the order of the sequence matters here. For $n$ objects participating in a race, there is $n!$ possible ways to finish the end line.
-->

Permutation with repeating elements
Example: Assume the name `ANN`. Using the letters of the name:
- How many three-letters words (the meanings are not important) can you generate?
- In how many ways you can arrange the letters of the name?

In [62]:
import itertools as it

name = 'ANN'
three_let_words = list(it.product(name,repeat=len(name)))
#print(three_let_words)
words_list = list(map(lambda t : ''.join(t) , three_let_words))
words_set = set(words_list)
#print(words_list)
#print(words_set)
arrangements = list(it.permutations(name))
arrangements_list = list(map(lambda t:''.join(t), arrangements))
#print(arrangements_list)
#print(set(arrangements_list))

Assume these two names `name=David` and `name_r=Tennessee`. For each one the following problems find the number of possiblities and generate all the possibilities.
1. In how many ways you can arrange the order of the letters?
2. How many same length words can be generated using the letters?
3. How many 3-letter words can be generated using the letters?

In [61]:
import itertools as it
name = 'abc'
#print(list(it.permutations(name,r=2)))
#print(list(it.combinations(name,r=2)))


In [52]:
import random as rand
import itertools as it

word = 'ANALYSIS'
substr = 'SIS'

#Simulation
count = 0
max_experiment = 1_0000_000
for _ in range(0,max_experiment):
    word_list = list(word)
    rand.shuffle(word_list)
    shuffled = ''.join(word_list)
    if substr in shuffled:
        count+=1
print('prob is=',count/max_experiment)

# itertools
word_perms = list(it.permutations(word))
word_perms_joined = list(map(''.join,word_perms))
sample_space = len(set(word_perms_joined))
word_sis = list(word.removesuffix(substr))+[substr]
event = len(set(it.permutations(word_sis)))
print(word_sis)
print(event/sample_space)
print(1/28)




prob is= 0.0357635
['A', 'N', 'A', 'L', 'Y', 'SIS']
0.03571428571428571
0.03571428571428571


In [60]:
import itertools as it
word = 'ANALYSIS'
word_perms = list(it.permutations(word))
word_perms_joined = list(map(''.join,word_perms))
#print(word_perms_joined)

On the contrary, combination involves arranging or selecting objects/ data from a large set, and the arrangement or order of selection does not matter.

## Problems:
1. A jar contains 30 red marbles, 12 yellow, 8 green and 5 blue. What is the probability that you draw and replace marbles 3 times and you get NO red marble?
**