## Probability Distributions
The probability distribution function or probability density function (PDF)
of a random variable `X` means the values taken by that random variable
and their associated probabilities.


### Cumulative Distribution Function
The CDF of a random variable `X` (defined as `F(X)`) is a graph
associating all possible values, or the range of possible values with
`P(X <= x)`.

### The Expected Value of X
![image-2.png](attachment:image-2.png)

### The Variance and Standard Deviation
![image-4.png](attachment:image-4.png)

![image-5.png](attachment:image-5.png)

### Covariance
![image-6.png](attachment:image-6.png)

### Correlation Coefficient
The covariance tells the sign but not the magnitude about how
strongly the variables are positively or negatively related. The
correlation coefficient provides such measure of how strongly the
variables are related to each other.

![image-7.png](attachment:image-7.png)

### Some Special Distributions

- **Discrete** <br>
  – Binomial <br>
  – Poisson <br>
  – Hyper geometric <br>
- **Continuous** <br>
  – Uniform<br>
  – Exponential<br>
  – Normal

In [1]:
import scipy
import numpy as np

from scipy.stats import binom

![](http://www.stat.yale.edu/Courses/1997-98/101/binpdf.gif)

**Q** : A survey found that 65% of all financial consumers were very satisfied with their primary financial institutions. Suppose that 25 financial consumers are samples and if survey result still holds true today, what is the probability that exactly 19 are very satisfied with their primary financial institution ?

In [2]:
binom.pmf(k = 19, n = 25, p = 0.65)

0.09077799859322791

**Q** : Accordiing to the U.S. Census Bureau, approximately 6% of all workers in Jackson,are unemployed. In conducting a random 
telephone surveyin Jackson, what is the probabilty of getting two or fewer unemployed workers in a sample of 20 ?

In [3]:
binom.cdf(2,20,0.06)

0.8850275957378545

**Q** : Solve the binomial probability for n = 20, p = 0.40 and x = 10.

In [4]:
binom.pmf(k = 10, n = 20,p = 0.4)

0.11714155053639011

### Poisson Distribution

In [5]:
from scipy.stats import poisson

In [6]:
poisson.pmf(3,2)

0.18044704431548356

**Q** Suppose bank customers arrive randomly on weekday afternoons at an average of 3.2 customers every 4 minutes. What is the probability of exactly 5 customers arriving in a 4 minute interval on a weekday afternoon?

In [7]:
poisson.pmf(5,3.2)

0.11397938346351824

**Q** Suppose bank customers arrive randomly on weekday afternoons at an average of 3.2 customers every 4 minutes. What is the probability of having more than 7 cutomers ina 4 minute interval on a weekday afternoon?

In [8]:
prob = poisson.cdf(7,3.2)
prob

0.9831701582510425

In [9]:
prob_more_than_7 = 1 - prob
prob_more_than_7

0.01682984174895752

**Q** A bank has an average random arrival rate of 3.2 cutomers every 4 miutes. What is the probability of getting exactly 10 customers during an 8 minute interval ?

In [10]:
poisson.pmf(10,6.4)

0.052790043854115495

### Unifrom Distribution 

**Q** Suppose the amount of time it takes to assemble module range from 27 to 39 seconds and that assembly times are uniformly distributed.Describe the distribution . What is the probability  that a given assembly will take between 30 to 35 seconds?

In [11]:
U = np.arange(27,40,1)
U

array([27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39])

In [12]:
from scipy.stats import uniform
uniform.mean(loc = 27, scale = 12)

33.0

In [13]:
uniform.cdf(np.arange(30,36,1), loc = 27, scale = 12)

array([0.25      , 0.33333333, 0.41666667, 0.5       , 0.58333333,
       0.66666667])

In [14]:
Prob = 0.66666667-0.25
Prob

0.41666667

**Q** According to the National Association of Insurance Comissioners, the average annual cost for automobile insurance in the US in a recent year wise `$691`. Suppose automible insurance costs are uniformly distributed in the US with range of from `$200 ` to `$1182`. What is the standard deviation of this uniform distribution?

In [15]:
uniform.mean(loc = 200, scale = 982)

691.0

In [16]:
uniform.std(loc = 200, scale = 982)

283.4789821721062

### Normal Distribution

![](https://ds055uzetaobb.cloudfront.net/brioche/uploads/enBFdM8LyU-basic-normal-distribution.png?width=1200)


In [17]:
from scipy.stats import norm

In [18]:
val,m,s = 68,65.5,2.5

In [19]:
norm.cdf(val,m,s)

0.8413447460685429

In [20]:
1 - norm.cdf(val,m,s)

0.15865525393145707

In [21]:
norm.cdf(val,m,s) - norm.cdf(63,m,s)

0.6826894921370859

**Q** What is the probability of obtaining a score greter than 700 on GMAT test that has a mean of 494 and a Standar deviation of 100? Assume GMAT scores are normally distibuted.

`(x > 700| m = 494  and s = 100) = ? `

In [22]:
1 - norm.cdf(700,494,100)

0.019699270409376912

**Q** For the same GMAT examination. what is the probability of randomly drowing a score that is 5500 or less ?

In [23]:
norm.cdf(550,494,100)

0.712260281150973

**Q** What is the probability of randomly obtaining a score between 300 and 600 on the GMAT exam ?

In [24]:
norm.cdf(600,494,100) - norm.cdf(300,494,100)

0.8292378553956377

**Q** What is the probability of randomly obtaining a score between 350 and 450 on the GMAT exam ?

In [25]:
norm.cdf(450,494,100) - norm.cdf(350,494,100)

0.2550348541262666

In [26]:
norm.ppf(0.95)

1.6448536269514722

In [27]:
norm.ppf(1-0.6672)

-0.43219457763866204

### Hypergeometric Distribution

**Q** Suppose 18 major computer companies in the US and that 12 are located in California's Silicon valley. If three computer companies are selected randomly from the entire list, what is the probablity that one or more of the selected campanies are located in the Silicon Valley ?

In [28]:
from scipy.stats import hypergeom

In [29]:
pval = hypergeom.sf(0,19,3,12)
pval

0.9638802889576872

**Q** A western city has 18 police officers eligible for promotion. Eleven of the 18 are Hospanic. Suppose only five of the police officers are chosen for ptomotion. If the officers chosen for promotion had been selected by chance alone. what is the probability that one or fewer of the five promoted officers would have been Hispanic ?

In [35]:
pval = hypergeom.sf(1,18,5,11)
pval

0.9526143790849686

In [36]:
cdf  = 1 -  pval
cdf

0.0473856209150314

**Q** A manufacturing firm has been involved in statistical quality control for several years. As part of the production process, parts are randomly selected and tested . From the records of these tests, it has been established that a defective part occurs in a pattern that is Poisson distributed on the average of 1.38 defects every 20 minutes during production runs. Use this information to determine the probability that less tha 15 miutes will elapse between any two defects.

In [31]:
mu1 = 1/1.38
mu1

0.7246376811594204

In [32]:
from scipy.stats import expon

In [33]:
expon.cdf(0.75,0,(1/1.38))

0.6447736190750485

In [34]:
# we can alsp define the function manually

def CDFExponential(lamb,x):
    if x <=0:
        cdf = 0
    else:
        cdf = 1-np.exp(-lamb*x)