# <p style="text-align: center;">Statistics for data analysis<p/>

In [1]:
# importing libraries

import numpy as np
import statistics

from scipy.stats import norm
from scipy.stats import binom
from scipy.stats import poisson
from scipy.stats import expon

### <p style="text-align: center;">1: Measures of Spread <p/>

A teacher wants to assess the performance of a class in a mathematics quiz. The class consists of 10 students, and their quiz scores are as follows:
<br /><br />
Class: [40,30, 20, 60, 70, 60, 80, 50, 60, 60 ]

In [2]:
quizz_score = [40,30, 20, 60, 70, 60, 80, 50, 60, 60]

1) Calculate the variance of the quiz scores for the class. Show your calculations step by step.
<br /><br />
Variance can be calculated using the following formula:

$\sigma^2 = \frac{\sum_{i=1}^{n} (x_i - \mu)^2}{n}$

In [3]:
print('Variance: {:.2f}'.format(statistics.variance(quizz_score)))

Variance: 334.44


2) Calculate the standard deviation of the quiz scores.
<br /><br />
Stantard deviation is the square root of variance:

$\sigma = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \mu)^2}{n}}$

In [4]:
#2
print('Standard Deviation: {:.2f}'.format(statistics.stdev(quizz_score)))

Standard Deviation: 18.29


### <p style="text-align: center;">2: Normal Distribution<p/>

Consider a continuous random variable X that follows a normal distribution with a mean (μ) of 4 and a standard deviation (σ) of 2.
<br /><br />
1) Write a Python program to calculate the Probability Density Function (PDF) of X at x = 3. Your program should use the provided mean and standard deviation values.
<br /><br />
PDF can be calculated using the following formula:

$f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}$

In [5]:
PDFnormal = norm.pdf(3,4,2)
print('f(3) = {:.3f}'.format(PDFnormal))

f(3) = 0.176


### <p style="text-align: center;">3: Binomial Distribution<p/>

1) Sixty-five percent of people pass the state driver’s exam on the first try. A group of 50 individuals who have taken the driver’s exam is randomly selected. Give two reasons why this is a binomial problem.
<br/><br/>
* reason 1: Binary outcomes. Success or failure in this case
* reason 2: Each trial in the experiment is independent to the others

2) Suppose you play a game that you can only either win or lose. The probability that you win any game is 55%, and the probability that you lose is 45%. Each game you play is independent.

* If you play the game 20 times,write the function that describes the probability that you win 15 of the 20 times.

The PMF can be calculated using the following formula:

$P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}$
<br/><br/>
for the problem, we have:

$P(X = 15) = \binom{20}{15} 0.55^{15} (1 - 0.55)^{20 - 15}$

In [6]:
PMFbinom = binom.pmf(15,20,0.55)
print('P(X=15) = {:.3f}'.format(PMFbinom))

P(X=15) = 0.036


* Find the mean number of wins.
<br/><br/>
Mean of a Binomial Distribution:

$\mu = n \cdot p$,

$\mu = 20 \cdot 0.55$

In [7]:
MeanBinom = binom.mean(20,0.55)
print('mean = {:.3f}'.format(MeanBinom))

mean = 11.000


* Find the standard deviation of wins.
<br/><br/>
STD for a Binomial Distribution:
$\sigma = \sqrt{n \cdot p \cdot (1 - p)}$,

$\sigma = \sqrt{20 \cdot 0.55 \cdot (1 - 0.55)}$

In [8]:
#Find the standard deviation of wins.

STDBinom = binom.std(20,0.55)
print('STD = {:.3f}'.format(STDBinom))

STD = 2.225


### <p style="text-align: center;">4: Poisson Distribution<p/>

You notice that a news reporter says ”uh,” on average, two times per broadcast. What is the probability that the news reporter says ”uh” more than two times per broadcast? This is a Poisson problem because you are interested in knowing the number of times the news reporter says ”uh” during a broadcast.

1) What is the interval of interest?

X > 2

2) What is the average number of times the news reporter says ”uh” during one broadcast?
<br/><br/>
The reporter says "uh", on average, 2 times per broadcast

3) What does X represent? Write the correct notation for Poisson distribution?
<br/><br/>
$\ P(X = k) = \frac{{e^{-\lambda} \lambda^k}}{{k!}}$,

Where, X represents a random variable following the Poisson Distribution


4) Write a mathematical statement for the probability question

$\ P(X > 2) = 1 - \sum_{i=0}^{2} \frac{{e^{-2} \ 2^i}}{{i!}}$


5) Find the probability that the news reporter says ”uh” more than two times per broadcast.

In [9]:
#5
CDFpoisson = 1 - poisson.cdf(2,2)
print('P(X>2) = {:.3f}'.format(CDFpoisson))

P(X>2) = 0.323


### <p style="text-align: center;">5: Exponential Distribution<p/>

Suppose that an average of 30 customers per hour arrive at a store, and the time between arrivals is exponentially distributed.

1) On average, how many minutes elapse between two successive arrivals?

1 hour has 60 minutes.
If we expect 30 customers in 1 hour, than we can expect 60 min / 30 customers = 2 min/customer

Answer: 2 minutes/customer

2) When the store first opens, how long on average does it take for three customers to arrive?
<br/><br/>
If 1 customer arrives each 2 minutes on average, it will take 3*2 minutes for 3 customers to arrive.

Answer: 6 minutes

3) After a customer arrives, find the probability that it takes less than one minute for the next customer to arrive.
<br/><br/>
Formula for CDF of a exponential distribution:

$F(x;\lambda) =1 - e^{-\lambda x}$


In [10]:
# 3
CDFexp = expon.cdf(x=1,scale=2)
print('F(1) = {:.3f}'.format(CDFexp))

F(1) = 0.393


4) After a customer arrives, find the probability that it takes more than five minutes for the next customer to arrive.

$F(x>5) = 1 - F(x;\lambda)$

In [11]:
#4
CDFexp = 1 - expon.cdf(x=5,scale=2)
print('F(x>5) = {:.3f}'.format(CDFexp))

F(x>5) = 0.082


5) Seventy percent of the customers arrive within how many minutes of the previous customer?
<br/><br/>
Considering the formula:

$P = 1 - e^{-\lambda t}$, for $P = 0.7$
<br/><br/>
we can isolate t

In [12]:
#5
PPFexp = expon.ppf(0.7,scale=2)
print('t = {:.3f} minutes'.format(PPFexp))

t = 2.408 minutes


In [13]:
# We can prove the result above using the answer as X in a CDF. The result must be 0.7
proof = expon.cdf(2.408, scale=2)
print('P(x=2.408) = {:.3f}'.format(proof))

P(x=2.408) = 0.700


6) Is an exponential distribution reasonable for this situation?
<br/><br/>
Yes. Exponential distribution is a continous distribution used to model the time elapsed between events. That's exactly the problem we have here, we wanna model how the customers arrival (an event) is distributed along the time. 