<a href="https://colab.research.google.com/github/AzuHooks/Course_FutureCoder_MH/blob/main/Probability_mh.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Probability 
---
Probability is how likely it is an event will happen.

**In simple terms**:

Coin toss - there are 2 possible outcomes (heads or tails). Therefore the probability that the coin will land on heads is 1/2 (or 0.5). 

In general **bold text**:

Probability = number of ways it can happen/total number of outcomes 

Proability is always between 0 and 1 - (0 = impossible, 1 = certain)


### Coded example coin-toss
---
The 0.5 (50%) coin toss probability is the probability in ideal conditions. Whilst if we flipped a coin 10 times, we might not get heads 50% of the time, if we flipped a coin 10,000 times, we would see heads close to 50% of the time.

Run the code below to simulate coin toss trials.  It is set to 10 tosses of the coin.  Run it a few times to see how the actual proportion of heads varies from the standard probability (0.5).  **You should see that it varies quite a bit**.

Now change the number_of_trials to 100 and run it a few times.  **How close does it get now?**

Now change the number_of_trials to 1000 and run it a few times.  **How close does it get now?**

Try 10000 a 100000. **Does it get closer to 0.5 more often?**

**You are seeing the accuracy improve the larger the number of trials.**

In [None]:
import random
# generate a coin toss and return 1 if it tossed a head and 0 if it was a tail
def toss_the_coin():
    coin_toss = random.random()
    if coin_toss <= 0.5:
      return 1
    else:
      return 0


# simulate tossing a coin n times, counting the number of heads and returning that number divided by the number of times)
def simulate(n):
  num_heads = 0
  for i in range(n):
    num_heads += toss_the_coin()
  return num_heads/n

# Test with simulations
number_of_trials = 1000
result = simulate(number_of_trials)
print(result, "-", str(int(result*100))+"% of coin tosses were heads")

The above confirms that our probability is correct as whilst there is a large amount of error when the number of trials is quite low (10), the error almost disappears when a higher number of trials is conducted (10000). This means, as long as there is enough data, we can calculate probabilities from real-world observations. 

**In practice, this means, that if you have a very large dataset, it is possible to calculate the probability that a particular event will occur / not occur. A common use-case for probability in data science is hypothesis testing - so calculating the probability that your hypothesis is likely to occur. **

## Key Expressions:
---
**P(A**) stands for Probability of event A  
**P(B)** stands for probability of event B

### Independent Events
---
Independent events are events which are NOT affected by previous events. So if we take our coin toss, no matter what has come up on previous tosses, it has no impact on subsequent coin tosses (ie the probability is not affected). So even if heads has come up 3 times in a row, it's still  50% likely to come up on the 4th toss. 

To work out the probability of two independent events occuring together:
  
Probability of 'Event A' AND 'Event B' occuring = Probability of Event A * Probability of Event B

`A AND B = P(A) * P(B)`

### Mutually exclusive events
---
Mutually exclusive events are events which simply CANNOT happen simultaneously - for example, you cannot throw both a 2 and a 5 on a single die at once. 

To work out probability of mututally exclusive events (so Event A OR event B occurring) = Probability of Event A + Probability of Event B

`A OR B = P(A) + P(B)`


### Conditional Probability
---
Conditional probability is when subsequent probabilities depend on a previous event having occured (so the opposite to independent).  

**For example**: You have a bag with 6 balls in (3 red and 3 blue), but you want the bag to only contain blue balls.  

If you draw a ball, the probability that it is red is 3/6 or 0.5. The probability that it is blue is also 3/6 or 0.5.  

You draw a red ball, then throw this ball in the bin.  There are 5 balls left in the bag, 3 blue and 2 red.   

If you draw a second ball, the probability that it is red is now 2/5 or 0.4 and the probability that it is blue is 3/5 or 0.6.  The probabilities are altered as there are now more blue than red balls left in the bag. 

You draw another ball, this time its blue.  You throw this one in the bin as well.  There are now 4 balls left in the bag, 2 red and 2 blue.  The probability of drawing a red is now 0.5 again, and the probability of blue has now returned to 0.5.

To work out conditional probabilities (dependent events):  

`Probability of event A and event B = probability of event A * probability of event B GIVEN event A has happened`  

So: Probability of drawing a red then a blue ball for example:  

Probability of drawing Red (3/6 or 0.5) * Probability of drawing blue given a red has been removed (3/5 or 0.6)  
 = 0.5 * 0.6 = 0.3

### Exercise 1 - dice pairs
---
Imagine you roll two dice. The probability that they are both the same number (ie a double) is:
1/6  (there are 36 possible combinations of two numbers and 6 of them are doubles)   

Create a function that:
* assigns a random number to each dice (between 1 and 6) to simulate rolling the dice (use random.randint(1,6) for each)
* does this 100 times and counts the number of times that the two dice are the same
* returns the number of doubles divided by the number of throws (in this case 100)

Run the function and print the result.  

Is it generally around 1/6 or 0.167?

What happens if you roll 1000 times instead of 100?

In [10]:
import random
# simulate two roll dice return 1 if both are same and 0 if it not same
def simulate():
    dice_roll_1 = random.randint(1,6)
    dice_roll_2 = random.randint(1,6)
    if dice_roll_1 == dice_roll_2:
      return 1
    else:
      return 0
# rollig two dice and calculating number of doubles divided by the number of throws
def roll_the_dice(num_of_times):
  num_of_doubles = 0
  num_of_throws = num_of_times
  for i in range(num_of_times):
      num_of_doubles += simulate()
  return num_of_doubles/num_of_throws

result = roll_the_dice(100)
print(result, "-", str(int(result*100))+"%")


0.13 - 13%


### Exercise 2
---
The probability of getting 3 heads in a row if you toss a coin three times is:
```
0.5 * 0.5 * 0.5 which is 0.125
```
This means that if you were to toss a coin 1000 times you would expect to get 3 in a row 125 times (0.125 * 1000)

Write a function that will be given the number of times in a row (num_in_a_row, and the number of times the coin will be tossed (num_times)-*see the code cell*.

It will:

* calculate the probability of getting a heads `num_in_a_row` times 
* calculate how many times  `num_in_a_row` heads be tossed if the coin was tossed `num_times`  
* prints the probability
* returns the expected number of times

**Expected output if `num_in_a_row` is 3 and `num_times` is 1000:**
Probability: 0.125  
Expected number of times for 3 heads in a row:  125

**TEST THE CODE:**  
Change rows to 2 and times to 100 and run the code again.  You should see the probability is 0.25 and Expected number of times for 2 heads in a row: 25


In [2]:
import random
def calculate_expected(num_in_a_row, num_times):
  # add your code here
    num_prob = 0
    for i in range(num_times):
        num_prob += simulate(num_in_a_row)
    return num_prob

def simulate(num_in_a_row):
      num_heads = 0
      for i in range(num_in_a_row):
          num_heads += toss_the_coin()
      return num_heads / num_in_a_row

# generate a coin toss and return 1 if it tossed a head and 0 if it was a tail
def toss_the_coin():
      coin_toss = random.random()
      if coin_toss <= 0.125:
          return 1
      else:
          return 0
# Test the code
row = 3
times = 1000
result = int(calculate_expected(row, times))
print("Expected number of times for", row, "heads in a row:",str(result))

Expected number of times for 3 heads in a row: 111


### Exercise 3
---

A restaurant offers the following options:

***Starter – soup or salad***

***Main – chicken, fish or vegetarian***

***Dessert – ice cream or cake***

People may not order a starter or a desert but all diners must order a main.

How many possible different combinations of starter, main and dessert are there?

The restaurant's data suggests that:

* 50% of diners order salad as a starter, 20% order soup and 30% do not have a starter  
* 40% order fish, 40% order chicken and 20% order the vegetarian option
* 30% order cake, 30% order ice cream and 40% do not order a dessert


Write a function that will accept the three choices of food for a three course meal and will return the probable number of diners who would order that combination.   

In [3]:
def calculate_probability(starter, main, dessert):
  prob_salad = 0.50
  prob_soup = 0.20
  no_starter = 0.30

  prob_chicken = 0.40
  prob_fish = 0.40
  prob_veg = 0.20

  prob_icecream = 0.30
  prob_cake = 0.30
  no_desert = 0.40

  starter_prob = 0
  main_prob = 0
  desert_prob = 0

  if(starter == "none"):
      starter_prob += no_starter
  elif(starter == "salad"):
      starter_prob += prob_salad
  elif(starter == "soup"):
      starter_prob += prob_soup

  if(main == "chicken"):
      main_prob += prob_chicken
  elif(main == "fish"):
      main_prob += prob_fish
  elif (main == "vegetarian"):
      main_prob += prob_veg

  if(dessert =="none"):
      desert_prob += no_desert
  elif(dessert == "cake"):
      desert_prob += prob_cake
  elif(dessert == "icecream"):
      desert_prob += prob_icecream

  num_of_diners = (starter_prob + main_prob + desert_prob) *100
  return num_of_diners


print("Probable number of diners : ",int(calculate_probability("salad", "chicken","none")))

Probable number of diners :  130


### Exercise 4 
---
Write a function,to calculate the probability of selecting a red token from each of the following configurations:

1. A bag with 4 red tokens and 4 green tokens.
2. A bag with 4 red tokens, 4 green tokens and 10 yellow tokens.
3. A bag with 0 red tokens, 4 green tokens and 10 yellow tokens.

In [8]:
import random

def drawing_red_token(numTrials):
    bucket1 = ['R', 'R', 'R', 'R', 'G', 'G', 'G', 'G']
    bucket2 = ['R', 'R', 'R', 'R', 'G', 'G', 'G', 'G', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y']
    bucket3 = ['G', 'G', 'G', 'G', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y']
    counter1 = 0
    counter2 = 0
    counter3 = 0
    for i in range(numTrials):
            bag1 = random.choice(bucket1)
            bag2 = random.choice(bucket2)
            bag3 = random.choice(bucket3)
            if bag1 == 'R':
                counter1 += 1
            if bag2 == 'R':
                counter2 += 1
            if bag3 == 'R':
                counter3 += 1
    print("Probability of Bag1: " ,counter1/numTrials)
    print("Probability of Bag2: ",counter2 / numTrials)
    print("Probability of Bag3: ",counter3 / numTrials)
    return

drawing_red_token(1000)

Probability of Bag1:  0.504
Probability of Bag2:  0.215
Probability of Bag3:  0.0


### Exercise 5 - challenging
---

An experiment consists of selecting a token from a bag and flipping a coin. The bag contains 3 red tokens and 4 blue tokens. A token is selected at random from the bag, its colour is noted and then the token is returned to the bag.

When a red token is selected, a biased coin with probability 4/5 of landing heads is spun.

When a blue token is selected, a biased coin with probability 2/5 of landing heads is spun.

Write a function that will:

1. Approximate the probability of picking a red token
2. Approximate the probability of obtaining heads
3. If a heads is obtained, approximate the probability of also having selected a red token.