## Binomial distribution

#### https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html

#### Q1) For the random variable below that follows a binomial distribution corresponding to the given number of trials n, and probability of success p, find the probability of seeing x successes

a) n = 12, p = 3/4, x = 10

In [None]:
from scipy.stats import binom
import numpy as np
np.random.seed(42)
n = 12
p = 0.75
x = 10
mean, var = binom.stats(n, p)
binom.pmf(x, n, p)

#### Q2) Is it unusual to see less than 3 heads in 12 flips of a coin? Why?

In [None]:
n = 12
p = 0.5
n_heads = 2
binom.cdf(n_heads, n, p)

## Poisson distribution

A DVD has a defect on average every 2 inches along its track. What is the probability of seeing less than 3 defects within a 5 inch section of its track?


In [None]:
from scipy.stats import poisson
lam = 5/2
poisson.cdf(2,lam)

## Histograms

In [None]:
import numpy as np
import matplotlib.pyplot as plt
x1 = np.random.normal(0, 0.8, 1000)
x2 = np.random.normal(-2, 1, 1000)
x3 = np.random.normal(3, 2, 1000)

kwargs = dict(histtype='stepfilled', alpha=0.7, density=True, bins=20)

plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs)
plt.show()

####  The average salary for first-year teachers is 27,989 USD. Assume the distribution is approximately normal with standard deviation 3250 USD.

What is the probability that a randomly selected first-year teacher has a salary less than 20,000 USD?

What is the probability that a randomly selected first-year teacher makes between 20,000 USD and 30,000 USD each year?



In [None]:
import scipy.stats as st

mu, std = 27989, 3250

std_normal_2 = (20000-mu)/std 

print(st.norm.cdf(0))
print(st.norm.cdf(std_normal_2))


In [None]:
std_normal_3 = (30000-mu)/std 

prob_2_3 = st.norm.cdf(std_normal_3) - st.norm.cdf(std_normal_2)
print(prob_2_3)

### Men's heights are normally distributed with a mean of 69.0 inches and a standard deviation of 2.8 inches, while women's heights are normally distributed with a mean of 63.6 inches and a standard deviation of 2.5 inches.

1- What percentage of men must duck when walking through a door that is 72 inches high?

2- What percentage of women must duck when walking through a door that is 72 inches high?

3- What door height would allow at least 95% of men to walk through the door without ducking?

In [None]:
mu_men, std_men = 69, 2.8
mu_women, std_women = 63.6, 2.5

print(1- st.norm.cdf((72-mu_men)/std_men))
print(1- st.norm.cdf((72-mu_women)/std_women))


In [None]:
prob_95 = st.norm.ppf(0.95)
print(prob_95)
door_height = prob_95 * std_men + mu_men
print(door_height)

# Linear Regression

## Given two variables $\textit{x}$ and $\textit{y}$, we want to test if there is a linear relationship. In other terms, we want to test if we can describe the relation between the variables can be described by: 

## $y = mx + b + e$
## where $m$ is the coefficient, $b$ is the intercept term, and $e$ is the noise. 

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
# Generate two random variables, same size
x1 = 50 * np.random.random_sample(size = 50)
y1 = 50 * np.random.random_sample(size = 50)
# Store variables in a dataframe
data1 = pd.DataFrame({'x':x1, 'y':y1})

# Plot the dataset
data1.plot.scatter(x = 'x', y = 'y')
plt.show()

In [None]:
#Generate two related variables
x2 = np.linspace(-50, 50, 100)
y2 = -8 + 3*x2 + 5*np.random.normal(size = x2.shape)
# Store variables in a dataframe
data2 = pd.DataFrame({'x':x2, 'y':y2})

# Plot the dataset
data2.plot.scatter(x = 'x', y = 'y')
plt.show()

## Use OLS by Statsmodel to find the coefficients


In [None]:
import statsmodels.api as sm
x2 = sm.add_constant(x2)

results = sm.OLS(y2, x2).fit()

print(results.summary())
results.params

In [None]:
b_hat, m_hat = results.params
y_hat = m_hat * x2 + b_hat
data2.plot.scatter(x = 'x', y = 'y')
plt.plot(x2, y_hat)
plt.show()

### Linear regression using scikit-learn library
https://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares

In [None]:
from sklearn import linear_model
x2 = np.linspace(-50, 50, 100)
y2 = -8 + 3*x2 + 5*np.random.normal(size = x2.shape)
reg = linear_model.LinearRegression()
reg.fit(x2, y2)

In [None]:
x2 = x2.reshape(-1,1)
y2 = y2.reshape(-1,1)
print(x2.shape, y2.shape)
reg.fit(x2, y2)
print(reg.intercept_, reg.coef_)

### Curve fitting


In [None]:
# Generate artificial data plus some noise.

xdata = np.array([0.0,1.0,2.0,3.0,4.0,5.0])
ydata = np.array([0.1,0.9,2.2,2.8,3.9,5.1])
# Initial guess.
x0    = np.array([0.0, 0.0, 0.0])
sigma = np.array([1.0,1.0,1.0,1.0,1.0,1.0])
plt.scatter(xdata, ydata+sigma)
plt.show()

In [None]:
# Let's fit to a second degree function
import scipy.optimize as optimization
def func(x, a, b, c):
    return a + b*x + c*x*x

x0    = np.array([0.0, 0.0, 0.0])
print (optimization.curve_fit(func, xdata, ydata, x0, sigma))


What is the return of the function?
Check: 
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html


In [None]:
paramaeter_estimates,_ = optimization.curve_fit(func, xdata, ydata, x0, sigma)
print(paramaeter_estimates)