# Lab | Intro to Probability

**Objective**

Welcome to this Intro to Probability lab, where we explore decision-making scenarios through the lens of probability and strategic analysis. In the business world, making informed decisions is crucial, especially when faced with uncertainties. This lab focuses on scenarios where probabilistic outcomes play a significant role in shaping strategies and outcomes. Students will engage in exercises that require assessing and choosing optimal paths based on data-driven insights. The goal is to enhance your skills by applying probability concepts to solve real-world problems.

**Challenge 1**

#### Ironhack Airlines 

Often Airlines sell more tickets than they have seats available, this is called overbooking. Consider the following:
- A plane has 450 seats. 
- Based on historical data we conclude that each individual passenger has a 3% chance of missing it's flight. 

If the Ironhack Airlines routinely sells 460 tickets, what is the chance that they have a seats for all passenger?

In [1]:
from scipy.stats import binom

In [2]:
# Rephrase question - If Ironhack airlines passengers have a 3% chance of missing their flights, how are at least 10 people to not show up?
#code here
# Number of tickets sold
tickets = 460

# Probability of a ticket buyer missing their flight
p_miss = .03

# Runs a binomial distribution
binom_dist = binom(tickets,p_miss)

# Gets the probability of only nine or fewer passengers missing their flight, and subtracts that from 1 to get the probability that 10 or more will miss their flight. 
1 - binom_dist.cdf(9)

0.8844772466215439

In [19]:
# Another way to do this is just the reverse - looking at what the chances are that no more than 450 people show up. 

tickets = 460 
p_make = .97

binom_dist = binom(tickets,p_make)

binom_dist.cdf(450)

0.8844772466215431

**Challenge 2**

#### Ironhack Call Center 

Suppose a customer service representative at a call center is handling customer complaints. Consider the following:
- The probability of successfully resolving a customer complaint on the first attempt is 0.3. 


What is the probability that the representative needs to make at least three attempts before successfully resolving a customer complaint?

In [4]:
from scipy.stats import geom

# Set probability of resolving an issue
p_resolve = 0.3

# Create a geometric distribution object
p_first_success_on_attempt = geom(p_resolve)

# The number in the parentheses shows what number of attempts we want before success. 
p_first_success_on_attempt.pmf(3)

# Probability of succeeding for the first time after 3 calls is 14.7%. 

0.14699999999999996

In [5]:
p_first_success_on_attempt.pmf(5)
# Probability of succeeding for the first time after 5 calls is 7%. 


0.07202999999999998

This essentially means that it is less and less likely for consecutive calls to fail to resolve the issue and that there's only a 7.2% chance that it takes more than 4 (or at least 5) calls to resolve an issue.

However, this assumes that each phone call has the same chance of getting the issue resolved, and it that each call is independent of the other. So in real life, it likely wouldn't play out like this because most times consecutive calls are placed, they are for the same, unresolved issue. In this case, the problem may be more likely to be solved if the next call gets directed to a manager or something, or it may be less or equally likely to be left unresolved if talking to an employee of the same department and seniority level.

So, really, for this scenario to work properly, we would be examining what the chances are that a person fails to resolve one issue with one person, then fails another issue with another person, and then finally succeeds on the first call attempt for a different issue with another person. So it's really a question of "how likely is a caller to be presented with ***n*** back-to-back issues that they cannot resolve?" where ***n*** is the number of issues they are presented with that they cannot solve and ***n+1*** is the number that goes in the parentheses, being the first attempt on which a success is had. 

**Challenge 3**

#### Ironhack Website

Consider a scenario related to Ironhack website traffic. Where:
- our website takes on average 500 visits per hour.
- the website's server is designed to handle up to 550 vists per hour.


What is the probability of the website server being overwhelmed?

In [6]:
from scipy.stats import poisson

ave_hourly_visits = 500
poisson_dist = poisson(ave_hourly_visits)

# This only shows the probability of getting exactly 550 visitors in one hour.
poisson_dist.pmf(550)

0.0015115070495210661

In [7]:
# In order to find the probability of the server being overwhelmed at some point, we have to subtract from the 100% chance of either overload or not overload the probability of the website not being overloaded. In this case, the probability of the website NOT being overloaded is the same as the probability of the website getting UP TO 550 visitors in any given hour. The chance of it being overloaded would be 100% minus that. 

# Gets the probability of up to 550 visitors on the site in any given hour. 
poisson_dist.cdf(550)

# Gets the inverse of that, or the "remaining" probability. 
1 - poisson_dist.cdf(550)

# So in this case there is only a 1.2% chance of the website overloading in any given hour. 

0.01289822084039205

What is the probability of being overwhelmed at some point during a day? (consider 24hours)

In [8]:
# In order to calculate this, we need to sort of iterate this hourly probability over multiple hours, in this case 24. 

# Set number of hours to iterate over
hours = 24

# Saves the probability of not getting more than 550 visitors in an hour (ie, probability of not getting overloaded in a given hour).
p_one_hr_no_overload = poisson_dist.cdf(550)

# Iterates the probability of not overloading over the number of hours in the hours object
# The way this works is essentially asking for a repeat of a not overload 24 times in a row. 
# Since the probability of not overloading in any given hour is about 98.8%, this essentially does 98.8% * 98.8% * 98.8% etc. 24 times. 
p_mult_hours_no_overload = p_one_hr_no_overload ** hours

# Because that gives us the probability of NOT overloading over the course of 24 hours, we subtract that probability from 1 to get our result. 
1 - p_mult_hours_no_overload


0.2677043869515715

**Challenge 4**

#### Ironhack Helpdesk

Consider a scenario related to the time between arrivals of customers at a service desk.

On average, a customers arrives every 10minutes.

What is the probability that the next customer will arrive within the next 5 minutes?

In [9]:
# This won't work. Don't do it. Don't use it. It's not right. 

# from scipy.stats import expon

# lambda_v = 10/60        # 10 minutes out of 60 is the average amount of time between arrivals

# lambda_inv = expon(scale = 1/lambda_v)   # runs exponential function - not sure yet what exactly this does. 

# lambda_inv.cdf(5)   # Checks what the probablility is that a customer will arrive within five minutes. 

# There's a 56.54% chance of a customer arriving within 5 minutes when the average time between arrivals is 10 minutes. 

0.5654017914929218

# BIG DEAL - DRAW ATTN TO THIS!!! WERE WE WRONG YESTERDAY?!

In [20]:
lambda_v = .1     # lambda_v is .1 because we are working with minutes, not hours, and because the average arrival time is 1 customer per 10 minutes, or 1/10. That's why it was 0.1 yesterday.

lambda_inv = expon(scale = 1/lambda_v)

lambda_inv.cdf(5)   # We're working with minutes, and we wanna know the chances of a customer coming within 5 minutes. 
# thats 39.35%

0.3934693402873666

In [21]:
time_between = 10

lambda_inv = expon(scale = time_between)

lambda_inv.cdf(5)

# Same result. Do scale = 1/ lambda_v if you are starting with the rate, do scale = lambda_v if you are starting with the average time between events. 

0.3934693402873666

If there is no customer for 15minutes, employees can that a 5minutes break.

What is the probability an employee taking a break?

In [13]:
# This is exactly the same sub problem as the lesson, with different wording. 

1 - lambda_inv.cdf(15) # Not sure what's happening here yet because I don't understand what expon() does, but this should be the right answer. 

# There's an 22.31% chance of getting to take a five minute break between any two customer visits. 

0.2231301601484298

**Challenge 5**

The weights of a certain species of birds follow a normal distribution with a mean weight of 150 grams and a standard deviation of 10 grams. 

- If we randomly select a bird, what is the probability that its weight is between 140 and 160 grams?

In [22]:
from scipy.stats import norm

m = 150    # Sets mean to be applied in normal distribution
std = 10   # Sets standard deviation to be applied in normal distribution

norm_dist = norm(loc = m, scale = std)    # creates a mathematical model of a normal distribution, in which model the probabilities of different values can be analyzed. 

# Gets the probability of getting a bird with a weight of 160g or less, then subtracts from that the probability of getting a bird of 140g or less, giving the probability of getting a bird between those two weights. 
norm_dist.cdf(160) - norm_dist.cdf(140)   

0.6826894921370859

**Challenge 6**

If the lifetime (in hours) of a certain electronic component follows an exponential distribution with a mean lifetime of 50 hours, what is the probability that the component fails within the first 30 hours?

In [15]:
# no need to import expon from scipy.stats because I already did

lambda_v = 50      # sets lambda value of 50 hours, the average lifetime of the electronic component

lambda_inv = expon(scale = lambda_v)   # We don't use 1/lambda_v this time because we have the average lifetime, not the arrival rate. 

lambda_inv.cdf(30) 
# There is a 45.12% chance that the component fails within 50 hours. 

0.4511883639059735