# Inferential Statistics

## Key Terms:
- A **random experiment** is an action or an experiment that leads to one of several possible outcomes. Ex: Tossing a coin, rolling a dice. A **trial** comes out each time when a random experiment is conducted and a **sample space** is considered to be a list of all possible outcomes of an experiment. An **event** is a possible outcome of a random experiment. It is a subset of sample space.
<br></br>
- A **random variable** is a variable which takes one of the values from the outcome of an event defined based on the random experiment, and the definition of **probability** is how likely is that an event can occur. 

Let A be an event. Hence, 
- $P(A) = \frac{FavorableNumberOfOutcomes}{TotalNumberOfOutcomes}$
- $ 0<=P(A)<=1$

### Few rules of Probability

- The probability of complementary event A' of A is given by $P(A')=1-P(A)$
- If A and B are any two events, then  $P(A \cup B) = P(A)+P(B)-P(A \cap B)$
- When two events are mutually exclusive/disjoint then $P(A \cap B)=0 $
- **Multiplication theorem:** If A and B are independent events then $P(A \cap B)=P(A)P(B)$

### Few Important Terms

- **Joint Probability:** It is a statistical measure that calculates the likelihood of two events occurring together and at the same point in time. It can only be applied to situations where more than one observation can occur at the same time.

    - For example, from a deck of 52 cards, the joint probability of picking up a card that is both red and 6 is $P(6 \cap red) = \frac{2}{52} = \frac{1}{26}$, since a deck of cards has two red sixes — the six of hearts and the six of diamonds. Because the events "6" and "red" are independent in this example, you can also use the following formula to calculate the joint probability: $P(6 \cap red)= P(6) × P(red) = \frac{4}{52} × \frac{26}{52} =\frac{1}{26}$

<hr>

- **Marginal Probability:** It is the probability of an event irrespective of the outcome of another variable. It is also defined as the probability of an event for one random variable, irrespective of the outcome of another random variable.
    - For example, the probability of X=A for all outcomes of Y: $P(X=A) = \sum P(X=A, Y = y_i)  \forall i $

<hr>

- **Conditional Probability:** The probability of event A given that event B has already occured is $P(A | B) = \frac{P(A \cap B)}{P(B)}$

#### Examples:

1. Consider a random experiment of rolling a die. What is the probability that getting an even number or a number divisible by 3?

Solution: 
- When a die is rolled, the sample space is {1, 2, 3, 4, 5, 6}
- Let A be an event of getting an even number. Then favourable outcomes are {2, 4, 6}.
- Let B be an event of getting a number divisible by 3. Then favourable outcomes are {3, 6}
- Events A and B are not mutually exclusive.
- $ P(A\cup B) = P(A) + P(B) - P(A \cap B) = \frac{3}{6} + \frac{2}{6} - \frac{1}{6}  = \frac{4}{6}$
               

2. Consider a random experiment of rolling a die. What is the probability that getting a en even number and a number divisible by 3?

Solution: 
- When a die is rolled, the sample space is {1, 2, 3, 4, 5, 6}
- Let A be an event of getting an even number. Then favourable outcomes are {2, 4, 6}.
- Let B be an event of getting a number divisible by 3. Then favourable outcomes are {3, 6}
- $ P(A\cap B) = P(A)P(B) = \frac{3}{6}.\frac{2}{6} = \frac{1}{6}$
               

3. Consider a random experiment of rolling a die. Let A be the event that shows an outcome is an odd number and suppose B the event that shows the outcome is less than or equal to 3. Then what is the probability A given B, P(A|B)?

Solution: 
- When a die is rolled, the sample space is {1, 2, 3, 4, 5, 6}
- Let A be an event of getting an odd number. Then favourable outcomes are {1, 3, 5}.
- Let B be an event of getting a number $\le 3$. Then favourable outcomes are {1,2,3}
- Favourable outcomes for both A and B are {1, 3}
- $ P(A|B) = \frac{P(A \cap B)}{P(B)} =\frac{ \frac{2}{6}}{\frac{3}{6}} = \frac{2}{3}$
               

4. Consider the table given below showing probabilities of people owning pets. What is the probability a randomly selected person is male, given that they own a pet?
![image.png](attachment:image.png)

Solution:
- Let M is for male and PO stands for pet owner, so the formula becomes: $P(M|PO) = P(M \cap PO) / P(PO)$
- From the table, it is obvious that  $ P(M \cap PO) = 0.41$ and $P(PO) = 0.86$
- So, $P(M|PO) = \frac{0.41}{0.86} = 0.4777 $

### Loading data and importing libraries

In [2]:
# Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns

In [3]:
# Loading data
cars_data = pd.read_csv('datasets/UsedCarsPrice.csv', index_col = 0, na_values=["??", "????"]) 
cars_data.head(10)

Unnamed: 0,Price,Age,KM,FuelType,HP,MetColor,Automatic,CC,Doors,Weight
0,13500,23.0,46986.0,Diesel,90.0,1.0,0,2000,three,1165
1,13750,23.0,72937.0,Diesel,90.0,1.0,0,2000,3,1165
2,13950,24.0,41711.0,Diesel,90.0,,0,2000,3,1165
3,14950,26.0,48000.0,Diesel,90.0,0.0,0,2000,3,1165
4,13750,30.0,38500.0,Diesel,90.0,0.0,0,2000,3,1170
5,12950,32.0,61000.0,Diesel,90.0,0.0,0,2000,3,1170
6,16900,27.0,,Diesel,,,0,2000,3,1245
7,18600,30.0,75889.0,,90.0,1.0,0,2000,3,1245
8,21500,27.0,19700.0,Petrol,192.0,0.0,0,1800,3,1185
9,12950,23.0,71138.0,Diesel,,,0,1900,3,1105


In [4]:
# Size of the data
print('Shape:', cars_data.shape)

Shape: (1436, 10)


In [5]:
# Checking for NULL values
cars_data.isna().sum()

Price          0
Age          100
KM            15
FuelType     100
HP             6
MetColor     150
Automatic      0
CC             0
Doors          0
Weight         0
dtype: int64

### Frequency Table
- To compute a simple cross tabulation of one or more factors
- By default, computes a frequency table of factors using crosstab()

In [6]:
# Number of cars having various fuel types using crosstab()
pd.crosstab(index = cars_data['FuelType'], columns = 'count', dropna = True)

col_0,count
FuelType,Unnamed: 1_level_1
CNG,15
Diesel,144
Petrol,1177


### Two way tables

In [7]:
# Two-way table of fuel type vs gear type (automatic or manual)
pd.crosstab(index=cars_data['Automatic'], columns = cars_data['FuelType'], dropna=True)

FuelType,CNG,Diesel,Petrol
Automatic,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,15,144,1104
1,0,0,73


#### Two Way table with Joint Probability
- Joint probability is the likelihood of two independent events happening at the same time.
- Convert the table values from numbers to proportions to get joint probability table

In [8]:
# Joint Probability
pd.crosstab(index = cars_data['Automatic'], columns = cars_data['FuelType'], normalize = True, dropna = True)

FuelType,CNG,Diesel,Petrol
Automatic,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.011228,0.107784,0.826347
1,0.0,0.0,0.054641


#### Two Way table with Marginal probability
- Marginal probability is the probability of the occurrence of the single event.
- Gives the row-sums and column-sums on Joint probability table

In [9]:
# row sum and col sum joint probability
pd.crosstab(index=cars_data['Automatic'], columns = cars_data['FuelType'], normalize = True, margins= True, dropna=True)

FuelType,CNG,Diesel,Petrol,All
Automatic,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.011228,0.107784,0.826347,0.945359
1,0.0,0.0,0.054641,0.054641
All,0.011228,0.107784,0.880988,1.0


***Interpretation:***
- Probability of cars having manual gear box when the fuel type are CNG or Diesel or Petrol is 0.945
- Probability of cars having Fuel Type as Petrol, whether it is automatic or manual, is 0.88 (or, 88% of the cars are petrol cars)

#### Two Way table with Conditional probability
- Conditional probability is the probability of an event (A), given that another event (B) has already occurred.

In [10]:
# Normalize
pd.crosstab(index=cars_data['Automatic'], columns = cars_data['FuelType'], normalize = 'index', margins= True, dropna=True)

FuelType,CNG,Diesel,Petrol
Automatic,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.011876,0.114014,0.874109
1,0.0,0.0,1.0
All,0.011228,0.107784,0.880988


***Interpretation:***
- Observe that the row-sum is 1.
- Given that the gearbox type is manual, 
    - the probability of the car being CNG Fuel Type is 0.011876
    - the probability of the car being Petrol Fuel Type is 0.874109
- Given that the gearbox type is automatic, 
    - the probability of the car being Petrol Fuel Type is 1

In [11]:
pd.crosstab(index=cars_data['Automatic'], columns = cars_data['FuelType'], normalize = 'columns', margins= True, dropna=True)

FuelType,CNG,Diesel,Petrol,All
Automatic,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1.0,1.0,0.937978,0.945359
1,0.0,0.0,0.062022,0.054641


***Interpretation:***
- Observe that the column-sum is 1.
- Given that the Fuel type is CNG, 
    - the probability of the car being automatic is 0
    - the probability of the car is of manual gear is 1
- Given that the Fuel type is Petrol, 
    - the probability of the car being automatic is 0.054641