# Introduction

These are some exercises and quizzes from the __"Become a Probability & Statistics Master"__ (https://www.udemy.com/course/statistics-probability/) course on Udemy.

Main topics of this notebook are

- Data distributions
- Probability
- Discrete random variables

---

In [6]:
# Importing useful libraries

import pandas as pd
import math
from math import sqrt
import numpy as np
import statistics as st
import scipy
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

# 03 Data distributions
Link: https://drive.google.com/drive/folders/1Vu56xgUToj9XqvnMqDeNCxWJmDFGSgum

---
### 01 Mean, variance, and standard deviation

__1.Question:__ Find the mean and population standard deviation for the data set.

In [18]:
data = [6,3,3,2,2]

mean = np.mean(data)
std = st.pstdev(data) #pstdev for population standard deviation

print("The mean is",(mean))
print("The standard deviation is",round(std,4))

The mean is 3.2
The standard deviation is 1.4697


__2.Question:__ Find the mean and sample standard deviation for the data set.

In [19]:
data = [2,4,7,9,10]

mean = np.mean(data)
std = st.stdev(data) #stdev for sample standard deviation

print("The mean is",(mean))
print("The standard deviation is",round(std,4))

The mean is 6.4
The standard deviation is 3.3615


__3.Question:__ Consider the small population: 1, 2, 1. If each number is increased by 4, how will the population standard deviation change?

In [3]:
data = [1,2,1]
data_increased = np.array(data) + 4 #using numopoy array as an easy way to add a number to all the element of the list

std = st.pstdev(data) #pstdev for population standard deviation
std_increased = np.std(data_increased) #using numpy standard deviation for the array

print("The standard deviation of data is",round(std,4))
print("The standard deviation of data_increased is",round(std_increased,4))
print("Both population standard deviations are the same.")

The standard deviation of data is 0.4714
The standard deviation of data_increased is 0.4714
Both population standard deviations are the same.


---
### 04 Normal distributions and z-scores

__1.Question:__ A third grade class has a mean height of 50′′ with a standard deviation of 3′′. What is the approximate percentile of a third grader who is 53′′ tall?

In [5]:
mean = 50
std = 3
x =53

In [6]:
z = round((x - mean)/std,4)
z

1.0

In [7]:
# Using norm cdf function to find the area under the curve, having z-score = 1 
p = stats.norm.cdf(z)

print("The approximate percentile of a third grader who is 53′′ tall is",round(p,4))

The approximate percentile of a third grader who is 53′′ tall is 0.8413


# 04 Probability
Link: https://drive.google.com/drive/folders/1Vu56xgUToj9XqvnMqDeNCxWJmDFGSgum

---
### 03 Independent and dependent events and conditional probability

__1.Question:__ Events A and B are independent events. Find P(B) if P(A and B) = 0.25 and P(A) = 0.5.

In [10]:
# Since the events are independent, events we know that
# P(A and B) = P(A) ⋅ P(B) 

pa = 0.5
pab = 0.25
pb = pab/pa

print("The probability of B is",round(pb,4))

The probability of B is 0.5


__2.Question:__ Suppose that Katie rolls a six-sided die twice. Event A is that the first roll is a 6, so P(A) is the probability that the first roll is a 6. Event B is that the second roll is a 6, so P(B) is the probability that the second roll is a 6.

In [11]:
# Since the events are independent, events we know that
# P(A and B) = P(A) ⋅ P(B) 

pa = 1/6
pb = 1/6
pab = pa*pb

print("The probability of getting twice 6 after rolling the dice twice is",round(pab,4))

The probability of getting twice 6 after rolling the dice twice is 0.0278


# 05 Dicrete random variables
Link: https://drive.google.com/drive/folders/1Vu56xgUToj9XqvnMqDeNCxWJmDFGSgum

---
### 01 Discrete probability

__1.Question:__ You purchase a raffle ticket for 125. In exchange, you’ll be allowed to participate in two drawings. In each drawing, you blindly pick
one of three tokens. One token is worth 0, one is worth 50, and one is worth 100. Let Y be the profit made by a raffle ticket. Find the expected
value for Y after the two drawings.

__Explanation__

The sample space is therefore 0, 50, 100, 150, or 200. Because there are 9 possible combinations, from (0,0) to (100,100), the probability of
winning each amount of money is

    Y 0, 50, 100, 150, 200

    P(Y) 1/9, 2/9, 3/9, 2/9, 1/9

But the ticket cost 125, which means we need to adjust the probability distribution by subtracting the cost from each potential profit.

    Y -125, -75, -25, 25, 75

    P(Y) 1/9, 2/9, 3/9, 2/9, 1/9

In [13]:
def expected_value(values, probabilities):
    return sum([v * p for v, p in zip(values, probabilities)])

y = [-125,-75,-25,25,75]
py = [1/9,2/9,3/9,2/9,1/9]

exp = round(expected_value(y, py))

print("the expected value for Y is",exp)

the expected value for Y is -25


__2.Question:__ The following table shows the 2017 AP Statistics Exam score distribution for all students taking the test in the United States. Let Z
represent the exam score. Find μZ and σZ.

| Score | 1   |  2  |  3  |  4  |  5  |
|-------|------|-----|-----|-----|-----|
|   Probability  | 0.136 | 0.159 | 0.248 | 0.202 | 0.255 |

__Explanation__

Z is a discrete random variable with sample space {1, 2, 3, 4, 5}. The percentage of students taking the exam who received each of those scores is given in the table.

We’ll find the mean of this discrete random variable as:

In [9]:
def expected_value(values, probabilities):
    return sum([v * p for v, p in zip(values, probabilities)])

y = [1,2,3,4,5]
py = [0.136,0.159,0.248,0.202,0.255]

exp = round(expected_value(y, py),4)

print("the expected value for Y is",exp)

the expected value for Y is 3.281


In [28]:
import scipy
from scipy import stats
values = [1,2,3,4,5]
probabilities = [0.136,0.159,0.248,0.202,0.255]

st = round(stats.rv_discrete(values=(values, probabilities)).std(),3)

In [29]:
print("the standard deviation for Y is",st)

the standard deviation for Y is 1.359


---
### 03 Combinations of random variables

__1.Question:__ Sandwiches can be purchased in the school cafeteria. Bread is baked each day and the sandwich is topped with meat and cheese, then
sold for $3.50.

The weight of the bread used for each sandwich is normally distributed with mean of 2.3 ounces and standard deviation of 0.4 ounces. The weight of the meat and cheese used for each sandwich is normally distributed with mean of 2.5 ounces and standard deviation of 0.6 ounces. Suppose you purchase a sandwich at random from the school cafeteria. What is the probability that the overall weight of the sandwich exceeds 6 ounces?
Assume the two variables are independent.

In [2]:
## data:

# bread weight, mean and standard deviation
b_mean = 2.3
b_stdv = 0.4

# meat and cheees weight,  mean and standard deviation
mc_mean = 2.5
mc_stdv = 0.6

First, we need to find the mean and the standard deviation of the sandwich, as a combination of the weight of the bread and the cheese and meat

In [16]:
# mean: when we want to find the mean of the sum, we just find the sum of the mean
s_mean = b_mean + mc_mean

# standard deviation: first we square the two standard deviation to get the variances, then we add them together and finally 
# we square root the sum, to get the standard deviation of the total

s_stdv = round(sqrt(b_stdv**2 + mc_stdv**2),4)

print(s_mean)
print(s_stdv)

4.8
0.7211


The task requires to calculate the propability that a random picked up sandwich weights more than 6 ounces, so at this point we calculate a Z score for x = 6

In [20]:
x = 6
z = round((x - s_mean)/s_stdv,2)

print(z)

1.66


In [22]:
# Using norm cdf function to find the area under the curve
p = stats.norm.cdf(z)

# Since we want to calculate the probability of more than 6, we need the remaining are under the curve:

p_minus = 1 - p

print("The probablity that a random picked up sandwich weights more than 6 ounces",round(p_minus,4))

The probablity that a random picked up sandwich weights more than 6 ounces 0.0485


---
### 04 Permutations and combinations

__1.Question:__ Out of 30 students in a math class, how many study groups of 5 students can be formed from the class members?

__Solution__: This is a combination question where n = 30 and k = 5. The order in which we choose the 5 study group members doesn’t matter in this situation.

In [46]:
# source: https://www.geeksforgeeks.org/permutation-and-combination-in-python/?ref=lbp

# A Python program to print all
# combinations of given length
from itertools import combinations

arr = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]
k = 5


# Function which returns subset or r length from n
from itertools import combinations
  
def rSubset(arr, k):
  
    # return list of all subsets of length k
    # to deal with duplicate subsets use 
    # set(list(combinations(arr, k)))
    return list(combinations(arr, k))
  
# Driver Function

print("The number of groups of 5 people are: ",len(rSubset(arr, k)))

The number of groups of 5 people are:  142506


__2.Question:__ Four children are sledding in a toboggan. How many ways can the children arrange themselves on the toboggan?

__Solution__: This is a permutation question. We have 4 people we’re arranging and we’ll arrange those 4 people as many different ways as we can.

In [47]:
# source: https://www.geeksforgeeks.org/permutation-and-combination-in-python/?ref=lbp

# A Python program to print all
# permutations of given length
from itertools import permutations

# Get all permutations of length 4
# and length 4
arr = [1, 2, 3, 4]
k = 4

def rSubset(arr, k):
  
    # return list of all subsets of length k
    # to deal with duplicate subsets use 
    # set(list(combinations(arr, k)))
    return list(permutations(arr, k))
  
# Driver Function

print("The number of ways we can arrange the 4 people is: ",len(rSubset(arr, k)))

The number of ways we can arrange the 4 people is:  24


__3.Question:__ Sawyer is taking a 5-question biology test, and the test only requires him to answer 3 out of the 5 questions. He gets to choose which 3
he answers. How many different ways could he choose exactly 3 of the 5 questions?

__Solution__: To figure out how many different ways could Sawyer could answer exactly 3 of the 5 questions, we need the formula for combinations. 
We have 5 questions and want to know how many ways we can pick 3 of the 5 questions. The order won’t matter, which is why we need the combination, and not the permutation.

In [48]:
# source: https://www.geeksforgeeks.org/permutation-and-combination-in-python/?ref=lbp

# A Python program to print all
# combinations of given length
from itertools import combinations

arr = [1,2,3,4,5]
k = 3


# Function which returns subset or r length from n
from itertools import combinations
  
def rSubset(arr, k):
  
    # return list of all subsets of length k
    # to deal with duplicate subsets use 
    # set(list(combinations(arr, k)))
    return list(combinations(arr, k))
  
# Driver Function

print("The number of groups of 5 people are: ",len(rSubset(arr, k)))

The number of groups of 5 people are:  10


---
### 05 Binomial random variables

__1.Question:__ Let X be a binomial random variable with n = 15 and p = 0.45. Find P(X = 9).

In [51]:
# scipy.stats.binom.pmf() function is used to obtain the probability mass function for a certain value of r, n and p. 
# We can obtain the distribution by passing all possible values of r(0 to n).
# https://www.geeksforgeeks.org/python-binomial-distribution/#:~:text=Binomial%20distribution%20is%20a%20probability,a%20number%20of%20Bernoulli%20trials.

r = 9 #(number of success)
n = 15
p =0.45

pr = scipy.stats.binom.pmf(r, n, p)

print("The probability of exactly 9 success: ",round(pr,4))

The probability of exactly 9 success:  0.1048


__2.Question:__ Suppose 35 % of our nation's high school seniors will be taking at least one AP Exam this year. We select 80 students at random from our
nation. What is the probability that exactly 30 will be taking at least one exam?

In [52]:
# scipy.stats.binom.pmf() function is used to obtain the probability mass function for a certain value of r, n and p. 
# We can obtain the distribution by passing all possible values of r(0 to n).
# https://www.geeksforgeeks.org/python-binomial-distribution/#:~:text=Binomial%20distribution%20is%20a%20probability,a%20number%20of%20Bernoulli%20trials.

r = 30 #(number of success)
n = 80
p =0.35

pr = scipy.stats.binom.pmf(r, n, p)

print("The probability of exactly 30 success: ",round(pr,4))

The probability of exactly 30 success:  0.0824
