# Lab | Intro to Probability

**Objective**

Welcome to this Intro to Probability lab, where we explore decision-making scenarios through the lens of probability and strategic analysis. In the business world, making informed decisions is crucial, especially when faced with uncertainties. This lab focuses on scenarios where probabilistic outcomes play a significant role in shaping strategies and outcomes. Students will engage in exercises that require assessing and choosing optimal paths based on data-driven insights. The goal is to enhance your skills by applying probability concepts to solve real-world problems.

In [3]:
#Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as st

%matplotlib inline

**Challenge 1**

#### Ironhack Airlines 

Often Airlines sell more tickets than they have seats available, this is called overbooking. Consider the following:
- A plane has 450 seats. 
- Based on historical data we conclude that each individual passenger has a 3% chance of missing it's flight. 

If the Ironhack Airlines routinely sells 460 tickets, what is the chance that they have a seats for all passenger?

In [13]:
# Binomial Distribution - When dealing with an experiment that can only take on TWO possible outcomes ('success or fail'), 
# the probability of having k successes in n independent trials with individual success probability p

# OJO: 'success' is a loose concept here to mean quasi 'correct signal', what I WANT to happen/ what I am counting

# the prob. of a passenger showing up is 1 - p = 0.97 (fail rate)
# 460 - 450 = 10 passengers should not show up for all passengers to have a seat

k = 10 # of passengers who ideally not show up so all 450 passengers have a seat -> # of 'sucess'
n = 460 # of tickets sold
p = 0.03 # probability of each indiv. passenger missing the flight -> 3% 'success'

# {binom.pmf(k, n, p): .2f} -> example code in the Intro to Prob. lab
# the : .2f is basically rounding the output to two digits after the decimal & tthat only works with f.strings hence the { }
# binom.pmf with a k of 10 is calc. the prob. of EXACTLY 10 ppl not showing up b/c pmf deals with indep. prob. (1/6th of a die regardless of #)
# pmf (prob. mass func.)- a discrete version of a pdf - which means it equals to 1 [ALL prob. = 1 !!!]
# pdf (prob. density func.) would be with continuous var.
# this means we would need to calc. the pmf of EACH instance, meaning 10, 11, 12...until 460 ppl not showing up -> cdf(460) would = 1
# binom.cdf deals with cumulative prob. (meaning the prob. accumulates per event), ie either this OR this)
# knowing all the above means we could more effectively do 1 - binom.cdf(9) cause we want to take out the cumulative prob. of 9 ppl not showing up
# binom.cdf(460) = 1 - binom.cdf(9) = pmf(10...to 460)


c1 = (round(1 - st.binom.cdf((k-1), n, p), 2)*100)
print(f"There's a {c1}% chance that Ironhack Airlines will have a seat for all pasengers if they sell 460 tickets to a plane with only 450 seats")

There's a 88.0% chance that Ironhack Airlines will have a seat for all pasengers if they sell 460 tickets to a plane with only 450 seats


**Challenge 2**

#### Ironhack Call Center 

Suppose a customer service representative at a call center is handling customer complaints. Consider the following:
- The probability of successfully resolving a customer complaint on the first attempt is 0.3. 


What is the probability that the representative needs to make at least three attempts before successfully resolving a customer complaint?

In [19]:
prob = 0.3 # of getting the first attempt right...& for each attempt it's that for some reason
neg_prob = 1 - prob # prob. of not getting first attempt right

# Approach: Prob of SUCCESS (roughly accumulate the yes's & subtract from 1)

# right on the 1st attempt = 0.3%...so right on the second attempt is 70% AND 30% (so .7 * .3)...
# ... this means to be right on the 3rd attempt is 70% * 70% * 30% (so .7 * .7 * .3)...
# ... but since we are looking for AT LEAST 3 attempts we gotta figure out getting it right from 3 attempts to infinity attempts
# ... so what we can do is subtract the prob. of needing 1 OR needing 2 attempts from 1 (total prob. space)... OR means add
# ... this gives us 1 - (prob. of needing 1 attempt = .3) + (prob. of needing 2 attempts = .7 * .3)

at_least_3_prob = ((1 - (prob + (neg_prob * prob)))*100)
print(f"The prob. that the rep needs to make at least 3 attempts before successfully resolving a cust. complaint is {at_least_3_prob}%")

The prob. that the rep needs to make at least 3 attempts before successfully resolving a cust. complaint is 49.0%


In [53]:
# Approach: Prob of FAILURE (accumulateing no's, then stop cause the rest is 'success')

# Another way to think of this is the rep needs to FAIL twice before we count the rest as success
# this would be .7 AND .7 which gives the prob of failing twice in a row
# we don't subtract from the total prob space because we are approaching it 
alt_way = neg_prob * neg_prob
round(alt_way,2)

0.49

**Challenge 3**

#### Ironhack Website

Consider a scenario related to Ironhack website traffic. Where:
- our website takes on average 500 visits per hour.
- the website's server is designed to handle up to 550 vists per hour.


What is the probability of the website server being overwhelmed?

In [39]:
avg_visits = 500 #visits per hour
s_capacity = 550 #visits per hour

# what is the cumulative prob leading up to 550; CDF(550) = the prob the # of visits per hour is 550 or less?
c3_cdf = st.poisson.cdf(s_capacity, avg_visits)

# what is the prob. that is is more than 550 which means 1 - c3_cdf
c3 = 1 - c3_cdf
print(f"The prob. of the website server being overwhelmed is {c3: .3f}%")

The prob. of the website server being overwhelmed is  0.013%


What is the probability of being overwhelmed at some point during a day? (consider 24hours)

In [38]:
# not being overwhelmed in 24 hours
twentyfourhours = c3_cdf ** 24

# being overwhelmed at least once in 24 hours
oncein24 = 1 - twentyfourhours

print(f"The prob. of the website server being overwhelmed at some point during a 24 hour day is {oncein24: .2f}%")

The prob. of the website server being overwhelmed at some point during a 24 hour day is  0.27%


**Challenge 4**

#### Ironhack Helpdesk

Consider a scenario related to the time between arrivals of customers at a service desk.

On average, a customers arrives every 10minutes.

What is the probability that the next customer will arrive within the next 5 minutes?

In [34]:
#the parameter represents the rate of events occuring per unit of time
lambda_value = 0.1

lambda_inv = st.expon(scale = 1/lambda_value)

lambda_inv.cdf(5)

0.3934693402873666

If there is no customer for 15minutes, employees can that a 5minutes break.

What is the probability an employee taking a break?

In [35]:
st.expon.sf(15, scale=1/lambda_value)

0.22313016014842982

**Challenge 5**

The weights of a certain species of birds follow a normal distribution with a mean weight of 150 grams and a standard deviation of 10 grams. 

- If we randomly select a bird, what is the probability that its weight is between 140 and 160 grams?

In [48]:
import scipy.stats as stats

mean_bird_weight = 150
std_bird_weight = 10

z_140 = (140 - mean_bird_weight) / std_bird_weight 
z_160 = (160 - mean_bird_weight) / std_bird_weight 

prob_140 = stats.norm.cdf(z_140)
prob_160 = stats.norm.cdf(z_160)

bird_prob = (prob_160 - prob_140)*100
print(f"The probability that a bird's weight is between 140 and 160 grams is: {bird_prob:.2f}%")

The probability that a bird's weight is between 140 and 160 grams is: 68.27%


**Challenge 6**

If the lifetime (in hours) of a certain electronic component follows an exponential distribution with a mean lifetime of 50 hours, what is the probability that the component fails within the first 30 hours?

In [50]:
mean_lifetime = 50
lambda_param = 1 / mean_lifetime
time = 30

# Calculate the probability using the exponential CDF
probability = stats.expon.cdf(time, scale=1/lambda_param) *100

print(f"The probability that the component fails within the first 30 hours is: {probability:.2f}")

The probability that the component fails within the first 30 hours is: 45.12
