## Table of Content

1. **[Import Libraries](#lib)**
2. **[Descriptive Statistics](#des)**
    - 2.1 - **[Measures of Central Tendency](#CT)**
    - 2.2 - **[Measures of Dispersion](#disp)**
    - 2.3 - **[Skewness and Kurtosis](#sk)**
    - 2.4 - **[Covariance and Correlation](#cc)**
3. **[Probability](#prob)**
    - 3.1 - **[Conditional Probability](#cond)**
        - 3.1.1 - **[Bayes' Theorem](#bayes)**

<a id="lib"></a>
# 1. Import Libraries

**Let us import the required libraries.**

In [None]:
# import 'pandas'
import pandas as pd

# import 'numpy'
import numpy as np

# import subpackage of matplotlib
import matplotlib.pyplot as plt
from matplotlib import gridspec
%matplotlib inline

# import 'seaborn'
import seaborn as sns

# to suppress warnings
from warnings import filterwarnings
filterwarnings('ignore')

# import 'factorial' from math library
from math import factorial

# import 'stats' package from scipy library
from scipy import stats
from scipy.stats import randint
from scipy.stats import skewnorm

# import 'random' to generate a random sample
import random

In [None]:
# set the plot size using 'rcParams'
# once the plot size is set using 'rcParams', it sets the size of all the forthcoming plots in the file
# pass width and height in inches to 'figure.figsize'
plt.rcParams['figure.figsize'] = [15,8]

The study of statistics is mainly divided into two parts: `Descriptive` and `Inferential`.

Here we mainly focus on `Inferential Statistics`. Before that, let us recall the descriptive statistics methods learned as a part of exploratory data analysis.

<a id="des"></a>
# 2. Descriptive Statistics

Descriptive statistics summarizes or describes the given data. It includes measures of central tendency, measures of dispersion and distribution of the data.

<a id="CT"></a>
## 2.1 Measures of Central Tendency

A measure of central tendency is a value that distinguishes the central position of the data. It includes mean, median, mode and partition values of the data.

### Mean:
It is defined as the ratio of the sum of all the observations to the total number of observations. It is affected by the presence of outliers.

### Median:
It is the middlemost observation in the data when it is arranged in increasing or decreasing order based on the values. It divides the dataset into two equal parts.

### Mode:
It is defined as the value in the data with the highest frequency. There can be more than one mode in the data.

### Partition values:
Partition values are defined as the values that divide the data into equal parts. `Quartiles` divide the data into 4 equal parts, `Deciles` divide the data into 10 equal parts and `Percentiles` divide the data into 100 equal parts.

### Example:

#### 1. A manager handles 12 branches of a supermarket situated in the U.S.A. Consider one day sale (in dollars) of all the branches. Calculate the mean and median to find the average sale.
    
    Sale = [165, 182, 140, 193, 172, 168, 174, 124, 187, 204, 148, 175]

In [None]:
# given data
sale = [165, 182, 140, 193, 172, 168, 174, 124, 187, 204, 148, 175]

# calculate mean sale
mean_sale = np.mean(sale)
print('Mean:', mean_sale)

# calculate median sale
med_sale = np.median(sale)
print('Median:', med_sale)

Mean: 169.33333333333334
Median: 173.0


<a id="disp"></a>
## 2.2 Measures of Dispersion

A measure of dispersion describes the variability in the data. Some of the measures of dispersion are range, variance, standard deviation, coefficient of variation, and IQR.

### Range:
It is defined as the difference between the largest and smallest observation in the data. It is affected by the presence of extreme observations.

### Variance:
It calculates the dispersion of the data from the mean. It is defined as the average of the sum of squares of the difference between the observation and the mean.

### Standard Deviation:
It is the positive square root of variance. The unit of standard deviation is the same as the unit of data points. The variable with near-zero standard deviation is least important for the analysis.

### Coefficient of Variation
It is a measure of the dispersion of data points around the mean. It is always expressed in percentage. We can compare the coefficient of variation of two or more groups to identify the group with more spread.

### Interquartile Range (IQR):
It is defined as the difference between the third and first quartiles. It returns the range of the middle 50% of the data. IQR can be used to identify the outliers in the data.

### Example:

#### 1. A manager handles 12 branches of a supermarket situated in the U.S.A. Consider one day sale (in dollars) of all the branches. Calculate the standard deviation of the sale. Also, find the range in which the middle 50% of the sale would lie.
    
    Sale = [165, 182, 140, 193, 172, 168, 174, 124, 187, 204, 148, 175]

In [None]:
# given sale
sale = [165, 182, 140, 193, 172, 168, 174, 124, 187, 204, 148, 175]

# calculate standard deviation
std_sale = np.std(sale)
print('Standard Deviation:', std_sale)

# calculate the IQR to obtain the range of middle 50% of the sale

# 1st quartile
# pass the sale values to the parameter, 'a'
# pass the required quantile value to the parameter, 'q'
Q1_sale = np.quantile(a = sale, q = 0.25)

# 3rd quartile
# pass the sale values to the parameter, 'a'
# pass the required quantile value to the parameter, 'q'
Q3_sale = np.quantile(a = sale, q = 0.75)

# calculate IQR
IQR = Q3_sale - Q1_sale

print('Range of the middle 50% of the sale:', IQR)

Standard Deviation: 21.76898915634093
Range of the middle 50% of the sale: 22.5


<a id="sk"></a>
## 2.3 Skewness and Kurtosis

### Skewness:
It measures the degree to which the distribution of the data differs from the normal distribution. The value of skewness can be `positive`, `negative`, or `zero`.

### Kurtosis:
It identifies the peakedness of the data distribution. The positive value of kurtosis represents the `leptokurtic` distribution, the negative value represents the `platykurtic` distribution, and zero value represents the `mesokurtic` distribution.

### Example:

#### 1. A manager handles 12 branches of a supermarket situated in the U.S.A. Consider one day sale (in dollars) of all the branches. Identify the type of Skewness and Kurtosis for sales.
    
    Sale = [165, 182, 140, 193, 172, 168, 174, 124, 187, 204, 148, 175]

In [None]:
# calculate the value of skewness to identify the type
sale_kurt = stats.skew(sale)
print('Skewness of Sale:', sale_kurt)

# calculate the value of kurtosis to identify the type
sale_kurt = stats.kurtosis(sale)
print('Kurtosis of Sale:', sale_kurt)

Skewness of Sale: -0.5285526567587567
Kurtosis of Sale: -0.38240010775017863


The above output shows that the value of skewness is negative which implies that the data is `negatively skewed`. Also, the value of kurtosis is negative that implies the distribution of the sales is `platykurtic`.

<a id="cc"></a>
## 2.4 Covariance and Correlation

### Covariance:
It measures the degree to which two variables move together. The value of covariance can be between $-\infty$ to $\infty$. The magnitude of covariance is not easy to interpret.  

### Correlation:
It is the normalized value of covariance. The correlation value near to +1 indicates a `strong positive` correlation between the variables, and value near to -1 indicates a `strong negative` correlation.

### Example:

#### 1. A manager handles 12 branches of a supermarket situated in the U.S.A. Consider one day sale (in dollars) and working hours of all the branches. Find the relationship between the working hours of a store and its sales.
    Sale = [165, 182, 140, 193, 172, 168, 174, 124, 187, 204, 148, 175]
    Working hours = [7, 8.5, 8, 10, 9, 8, 8.5, 7.5, 9.5, 8.5, 8, 9]

In [None]:
# given data
sale = pd.Series([165, 182, 140, 193, 172, 168, 174, 124, 187, 204, 148, 175])
working_hrs = pd.Series([7, 8.5, 8, 10, 9, 8, 8.5, 7.5, 9.5, 8.5, 8, 9])

# calculate the correlation coefficient to find the relationship between working hours and sales of a store
corr_coeff = working_hrs.corr(sale)

print('Correlation coefficient:', corr_coeff)

Correlation coefficient: 0.6447248082202144


The value of the correlation coefficient shows that there is a positive correlation between the working hours and sales of a store.

<a id="prob"></a>
# 3. Probability

An event is the outcome or collection of outcomes of an experiment. It is a subset of the `sample space`, which is defined as the set of all possible outcomes of an experiment.

### Example:

In [None]:
# consider a set of first ten prime numbers
sample_space = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29}

# consider an event A: Occurrence of even prime number
A = set()

# use the for loop to check the condition of even number on each element of the sample space
for i in sample_space:

    # pass a condition to check whether the number is even
    if (i%2 == 0):

        # add the number to 'A' if it is even
        A.add(i)
# print the set A
print('A =', A)

A = {2}


`Probability` is defined as the measure of the likelihood of an event to occur. Probability of occurrence of event A is denoted as `P(A)`. The probability of an event takes values between 0 and 1. The probability of sample space is always 1.

The probability of complement of an event A is `P(A') = 1-P(A)`.

### Example:

#### 1. If the letters of the word `AABRAAKAADAABRAA` are arranged at random, find the probability that 10 A's come consecutively in the word.

In [None]:
# set the frequency of each letter in the given word
length_of_word = len("AABRAAKAADAABRAA")
No_of_A = 10
No_of_B = 2
No_of_R = 2
No_of_K = 1
No_of_D = 1

If 10 A's come consecutively in the word, we consider 10 A's as one group ([AAAAAAAAAA]BRKDBR).

Now the total number of letters is 6+1=7.

In [None]:
# number of words when all A's are together
no_words_with_10A = factorial(7) / (factorial(No_of_B)*factorial(No_of_D)*factorial(No_of_K)*factorial(No_of_R))

# total number of words using the letters of the word "AABRAAKAADAABRAA"
total_words = factorial(length_of_word) /(factorial(No_of_A)*factorial(No_of_B)*factorial(No_of_D)*factorial(No_of_K)
                                          *factorial(No_of_R))

In [None]:
# the required probability is
req_prob = (no_words_with_10A/total_words)
print("The probability that 10 A's come consecutively in the word is", req_prob)

The probability that 10 A's come consecutively in the word is 0.0008741258741258741


#### 2. If the letters of the word `AABRAAKAADAABRAA` are arranged at random, find the probability that 2 B's and 2 R's come together.

We consider 2 B's as 1 group and 2 R's as one group. Thus, the total letters will be 12 + 1 + 1 = 14.

In [None]:
# '\' is used to continue the operation in next line
# number of words when 2 B's and 2 R's come together
no_words_with_2B_2R = (factorial(14) / (factorial(No_of_A)*factorial(No_of_D)*factorial(No_of_K))) * \
                      (factorial(4) / (factorial(2)*factorial(2)))

# total number of words using the letters of the word "AABRAAKAADAABRAA"
total_words = factorial(length_of_word) /(factorial(No_of_A)*factorial(No_of_B)*factorial(No_of_D)*factorial(No_of_K)
                                          *factorial(No_of_R))

In [None]:
# the required probability is
req_prob = (no_words_with_2B_2R/total_words)
print("The probability that 2 B's and 2 R's come together is", req_prob)

The probability that 2 B's and 2 R's come together is 0.1


#### 3. A kitchen set contains 10 knives, 3 of which are defective. Two knives are drawn at random with replacement. What is the probability that none of the two knives will be defective?

In [None]:
# define a function to calculate combinations
def combination(n, r):
    result = factorial(n) / (factorial(r) * factorial(n-r))
    return result

# here 3 knives out of 10 are defective and 7 are not defective
# probability of selecting two non defective knives
probability = (combination(3, 0) * combination(7, 2)) / combination(10, 2)

print("The probability that none of the two knives is defective is", probability)

The probability that none of the two knives is defective is 0.4666666666666667


#### 4. The new vaccine is to be tested on patients. There are 5 diabetic patients (have the same type of diabetes), 9 patients with a similar heart condition and 11 patients with the same liver condition. One patient is randomly chosen. What is the probability that the patient is not diabetic?

In [None]:
# total number of patients
no_patients = 25

# probability of selecting a diabetic patient
prob_diabetic = 5/25

# we want to calculate the probability that the selected patient is not diabetic
req_prob = 1 - prob_diabetic

print('The probability that the selected patient is not diabetic is', req_prob)

The probability that the selected patient is not diabetic is 0.8


### Odds

Probability can also be expressed in terms of `odds`. Odds is the ratio of the number of observations in favor of an event to the number of observations not in favor of an event. If odds in favor of event A are a:b then $P(A) = \frac{a}{a+b}$

### Example:

#### 1. The odds that a New Yorker picked at random will be either overweight or obese are 14:11. What is the probability that the person is fit (is not overweight or obese)?

In [None]:
# here odds are 14:11
a = 14
b = 11

# required probability is that the person is fit
# let, A: The person is either overweight or obese
# to find: P(A') = 1 - P(A)
req_prob = 1 - (a/(a+b))

print('The probability that the person is fit is', req_prob)

The probability that the person is fit is 0.43999999999999995


<a id="cond"></a>
## 3.1 Conditional Probability

Consider two events X and Y. The conditional probability of an event `X given Y` is the probability that event X will occur given that the event Y has already occurred. It is denoted by `P(X|Y)` and defined as:

<p style='text-indent:25em'> <strong> $ P(X|Y) = \frac{P(X \cap Y)}{P(Y)} $</strong> </p>

Where,<br>
P(X $\cap$ Y): the probability of the intersection of events X and Y<br>
P(Y): Probability of an event Y

If X and Y are `mutually exclusive` events, then P(X|Y) = 0. (since, P(X $\cap$ Y) = 0)

If X and Y are `independent` events, then P(X|Y) = P(X). (since, P(X $\cap$ Y) = P(X).P(Y))

### Example:

#### 1. A random experiment results in an integer outcome from 21 to 30. Consider two events X and Y.
        X: Occurrence of an even number
        Y: Occurrence of a number divisible by 4
        
#### Calculate the probability that an even number will occur given that the number is divisible by 4.

In [None]:
# given sample space
samp_space = {21, 22, 23, 24, 25, 26, 27, 28, 29, 30}

# event X: Occurrence of an even number
X = {22, 24, 26, 28, 30}

# event Y: Occurrence of a number divisible by 4
Y = {24, 28}

# to find: P(X|Y)

Prob_X_inter_Y = 2/10

Prob_Y = 2/10

req_prob = Prob_X_inter_Y / Prob_Y

print('The probability that an even number will occur given that the number is divisible by 4 is', req_prob)

The probability that an even number will occur given that the number is divisible by 4 is 1.0


Since Y $\subset$ X, P(X $\cap$ Y) = P(Y) which implies the P(X|Y) = 1.

#### 2. A pair of fair dice is rolled. If the product of numbers that appear is 6, find the probability that the second die shows an even number?

In [None]:
# total number of elements in sample space
samp_space = 36

# consider an event A: Getting the product of numbers as 6
# A = {(1,6), (2,3), (3,2), (6,1)}
# number of elements for event A
num_A = 4

# consider an event B: Occurrence of an even number on a second die
# B = {(1,2), (1,4), (1,6), (2,2), (2,4), (2,6), (3,2), (3,4), (3,6),
#      (4,2), (4,4), (4,6), (5,2), (5,4), (5,6), (6,2), (6,4), (6,6)}
# number of elements for event B
num_B = 18

# to find: P(B|A)
# B ∩ A = {(1,6), (3,2)}
# number of elements in B ∩ A
num_B_inter_A = 2

# calculate required probabilities
prob_B_inter_A = 2/36
prob_A = 4/36

# caculate conditional probability
req_prob = prob_B_inter_A / prob_A

print('The probability that the second die shows an even number given the product of numbers is 6:', req_prob)

The probability that the second die shows an even number given the product of numbers is 6: 0.5


<a id="bayes"></a>
### 3.1.1 Bayes' Theorem

Bayes' theorem is based upon a conditional probability concept. It is used to update the probability of an event based on the information obtained from the event that has already occurred. It is also known as `Bayes' Law` or `Bayes' Rule`. It is given as:

Consider two events X and Y.

<p style='text-indent:25em'><strong>$P(Y|X) =\frac{P(Y).P(X|Y)}{P(X)}$</strong></p>

Where,<br>
P(Y|X): Probability that event Y will occur given that event X has already occurred<br>
P(X|Y): Probability that event X will occur given that event Y has already occurred<br>
P(X), P(Y): Probability of event X and Y respectively

### Example:
<table align="center" width=350>
    <tr>
        <td width="20%">
            <img src="matrix.png">
        </td>
    </tr>
</table>

#### 1. What is the probability that a girl is chosen given that she likes pink color?

In [2]:
# probability that the favorite color is pink given girl: P(Pink | Girl)
prob_PgG = 70/120

# probability of being a girl: P(Girl)
prob_G = 120/190

# probability that the favorite color is pink: P(Pink)
prob_P = 80/190

#  probability that a girl is chosen given that she likes pink color: P(Girl | Pink)
# using Bayes' theorem
Prob_GgP = (prob_PgG * prob_G) / prob_P

# use 'round()' to round-off the value to 2 digits
req_prob = round(Prob_GgP, 2)

print('The probability that a girl is chosen given that she likes Pink is', req_prob)

The probability that a girl is chosen given that she likes Pink is 0.88


#### 2. In an armament production station, the explosion can occur due to short circuit, fault in the machinery, negligence of workers. From experience, the chances of these causes are 0.1, 0.3, 0.6 respectively. The chief engineer feels that an explosion can occur with probability:
        1. 0.3 if there is a short circuit
        2. 0.2 if there is a fault in the machinery
        3. 0.25 if the workers are negligent
#### Given that an explosion has occurred, determine the most likely cause of it?

In [None]:
# probability that the explosion can occur due to short circuit
prob_sc = 0.1

# probability that the explosion can occur due to fault in the machinery
prob_fm = 0.3

# probability that the explosion can occur due to the negligence of workers
prob_nw = 0.6

# probability that explosion occurs given there is a short circuit
prob_exp_sc = 0.3

# probability that explosion occurs given there is a fault in the machinery
prob_exp_fm = 0.2

# probability that explosion occurs given there is a negligence of workers
prob_exp_nw = 0.25

# probability of explosion
prob_exp = (prob_sc*prob_exp_sc) + (prob_fm*prob_exp_fm) + (prob_nw*prob_exp_nw)

# use Bayes' theorem to calculate the probabilities of cause of explosion given there is an explosion
prob_sc_exp = (prob_exp_sc * prob_sc) / prob_exp
prob_fm_exp = (prob_exp_fm * prob_fm) / prob_exp
prob_nw_exp = (prob_exp_nw * prob_nw) / prob_exp

print('prob_sc_exp', prob_sc_exp)
print('prob_fm_exp', prob_fm_exp)
print('prob_nw_exp', prob_nw_exp)

prob_sc_exp 0.125
prob_fm_exp 0.25
prob_nw_exp 0.625


The negligence of workers is the most likely cause of an explosion in the factory.