# Probability Distributions

## Geometric Distribution

Anele is playing tournament chess against a computer program and the probability he wins against the program at any given game is 0.25. Anele is playing several practice games every day, one after the other.
Find the probability that on a given day 
a) Anele wins for the first time on the the 4th game played.
b) Anele has to play more than 4 games before he wins for the first time.

In [1]:
# Geometric Question
from scipy.stats import geom
a = geom.pmf(4, 0.25)
print(a)
b = geom.sf(4, 0.25)
print(b)

0.10546875
0.31640625000000006


## Binomial Distribution

The random variable X has binomial distribution X~Bin(30,0.3). Determine each of the following.
a) $P(X=11)$
b) $P(X<15)$
c) $P(X>10)$
d) $P(8< X \leq 13)$

In [2]:
from scipy.stats import binom
a = binom.pmf(k=11, n=30, p=0.3)
print(a)
b = binom.cdf(k=14, n=30, p=0.3)
print(b)
c = binom.sf(k=10, n=30, p=0.3)
print(c)
d = binom.cdf(k=13, n=30, p=0.3) - binom.cdf(k=8, n=30, p=0.3)
print(d)

0.11030781900935062
0.9830626885074504
0.2696296136972765
0.5284295460349258


## Poisson Distribution

A bakery sells chocolate birthday cakes through the internet. Orders for chocolate cakes are random and arrive at the constant rate of 4.5 per day. At the start of any given day, the bakery produces 6 chocolate birthday cakes and
produces no more until the day after.
a) Find the probability that by the middle of a working day the bakery would have
sold half the chocolate birthday cakes it produced for that day.
b) Calculate the probability that by the end of the working day the bakery would
have sold all the chocolate birthday cakes it produced for the day.


In [3]:
from scipy.stats import poisson
a = poisson.pmf(k=3, mu=2.25)
print(a)
b = poisson.sf(k=5, mu=4.5)
print(b)


0.20009384037916433
0.29706956513917254


# Normal distribution

The average number of acres burned by forest and range fires in a large New Mexico county is 4,300 acres
per year, with a standard deviation of 750 acres. The distribution of the number of acres burned is normal.
What is the probability that between 2,500 and 4,200 acres will be burned in any given year?

In [4]:
from scipy.stats import norm
a = norm.cdf(x=4200, loc=4300, scale=750)-norm.cdf(x=2500, loc=4300, scale=750)
print(a)

0.43876734745178986


# Measures of Central Tendency

In [5]:
import numpy as np
from scipy import stats

x = [2,3,4,5,3,2,3,4,5,3,2,4,5,3,2,4,5,2,2,4,5,2,3,3,3,4,3,3,3,3,4,3,2]
a=average = np.mean(x)
b= stats.mode(x)
b = b[0]
c = np.median(x)
print("The mean, mode and median for the array is {}, {} and {} respectively".format(a,b,c))

The mean, mode and median for the array is 3.272727272727273, [3] and 3.0 respectively


# Measures of Dispersion


In [6]:
x = [2,3,4,5,3,2,3,4,5,3,2,4,5,3,2,4,5,2,2,4,5,2,3,3,3,4,3,3,3,3,4,3,2]
a = np.std(x)
print("The standard deviation is {}".format(a))
b = np.var(x)
print("The variance is {}".format(b))
c = max(x)-min(x)
print("The range is {}".format(c))

The standard deviation is 0.9930890671620569
The variance is 0.9862258953168045
The range is 3


# Covariance and Correlation

In [7]:
from scipy import stats
x = [2,3,4,5,3,2,3,4,5,3,2,4,5,3,2,4,5,2,2,4,5,2,3,3,3,4,3,3,3,3,4,3,2]
z = [22,23,24,25,23,22,23,42,25,23,22,24,52,32,22,42,52,22,22,42,52,22,23,23,23,24,23,23,23,23,24,23,22]

np.cov(x,z)


array([[ 1.01704545,  6.53977273],
       [ 6.53977273, 93.67613636]])

In [8]:
stats.pearsonr(x, z)


(0.6700049158871217, 1.998606625233726e-05)

# Linear Regression

In [9]:
x = [22,23,24,25,23,22,23,42,25,23,22,24,52,32,22,42,52,22,22,42,52,22,23,23,23,24,23,23,23,23,24,23,22]
y = [12,13,14,15,13,12,13,14,15,13,12,14,15,13,12,14,15,12,12,14,15,12,13,13,13,14,13,13,13,13,14,13,12]

slope, intercept, r, p, se = stats.linregress(x, y)
print(slope)
print(intercept)
print(r)
print(p)
print(se)

0.06981257960817613
11.343361436283132
0.6700049158871219
1.998606625233714e-05
0.013892725652319315


# Percintiles

In [10]:
from scipy import stats
x = [22,23,24,25,23,22,23,42,25,23,22,24,52,32,22,42,52,22,22,42,52,22,23,23,23,24,23,23,23,23,24,23,22]
stats.scoreatpercentile(x, 50)

23.0

# Confidence Interval

In [11]:
mean, sigma = np.mean(x), np.std(x)
#85% CI
stats.norm.interval(0.85, loc=mean, scale=sigma)


(13.916377639386395, 41.35634963334088)

In [12]:
#95% CI
stats.norm.interval(0.95, loc=mean, scale=sigma)

(8.956203483325261, 46.31652378940201)

# Skewness 

In [13]:
from scipy.stats import stats
stats.skew(x)

1.7326425394468898

# Kurtosis

In [14]:
from scipy.stats import stats
stats.kurtosis(x)

1.4051407765816792

# Linear Algebra

In [16]:
# Import the required libraries
from scipy import linalg
import numpy as np

# Initializing the matrix
x = np.array([[7, 2, 5, 6, 8], [5, 4, 4, 5, 6], [5, 4, 4, 5, 7], [4, 7, 9, 8, 6], [7, 8, 9, 5, 6]])

# Finding the inverse of
# matrix x
y = linalg.inv(x)
print(y)

[[ 0.09090909  0.81818182 -0.72727273 -0.18181818  0.09090909]
 [-0.30958231  0.37592138  0.04422604 -0.05651106  0.04176904]
 [ 0.21130221 -0.82800983  0.25552826  0.11793612  0.13022113]
 [-0.01228501  0.94348894 -0.71253071  0.13267813 -0.22850123]
 [ 0.         -1.          1.          0.          0.        ]]
