## Missing Values
Missing values occurs in dataset when some of the informations is not stored for a variable
There are 3 mechanisms

### 1 Missing Completely at Random, MCAR:
Missing completely at random (MCAR) is a type of missing data mechanism in which the probability of a value being missing is unrelated to both the observed data and the missing data. In other words, if the data is MCAR, the missing values are randomly distributed throughout the dataset, and there is no systematic reason for why they are missing.

For example, in a survey about the prevalence of a certain disease, the missing data might be MCAR if the survey participants with missing values for certain questions were selected randomly and their missing responses are not related to their disease status or any other variables measured in the survey.


### 2. Missing at Random MAR:
Missing at Random (MAR) is a type of missing data mechanism in which the probability of a value being missing depends only on the observed data, but not on the missing data itself. In other words, if the data is MAR, the missing values are systematically related to the observed data, but not to the missing data.
Here are a few examples of missing at random:

Income data: Suppose you are collecting income data from a group of people, but some participants choose not to report their income. If the decision to report or not report income is related to the participant's age or gender, but not to their income level, then the data is missing at random.

Medical data: Suppose you are collecting medical data on patients, including their blood pressure, but some patients do not report their blood pressure. If the patients who do not report their blood pressure are more likely to be younger or have healthier lifestyles, but the missingness is not related to their actual blood pressure values, then the data is missing at random.

### 3. Missing data not at random (MNAR) 
It is a type of missing data mechanism where the probability of missing values depends on the value of the missing data itself. In other words, if the data is MNAR, the missingness is not random and is dependent on unobserved or unmeasured factors that are associated with the missing values.

For example, suppose you are collecting data on the income and job satisfaction of employees in a company. If employees who are less satisfied with their jobs are more likely to refuse to report their income, then the data is not missing at random. In this case, the missingness is dependent on job satisfaction, which is not directly observed or measured.

## Examples

In [8]:
import seaborn as sns

In [9]:
# It loads the titanic dataset (here we have many missing values)
df=sns.load_dataset('titanic')

In [None]:
df.head()

In [None]:
## Check missing values
df.isnull()
# Wherever there is a missing value it will be true

In [None]:
# Shows in which colums how many missing values are there
df.isnull().sum()

In [None]:
# Checks how many data points are there

df.shape


In [None]:
# Delete the rows or data point to handle missing values
df.dropna().shape

# It is the easiest way to to handle missing values -> nut we will lost many data

In [None]:
## Column wise deletion
df.dropna(axis=1)

## Imputation Missing Values
### 1- Mean Value Imputation

In [None]:
sns.histplot(df['age'],kde=True)

In [21]:
df['Age_mean']=df['age'].fillna(df['age'].mean())

In [None]:
df[['Age_mean','age']] 
# NaN value will be replaced by mean of the age

In [55]:
## MEan Imputation Works Well when we have normally distributed data

### 2. Median Value Imputation- If we have outliers in the dataset

In [23]:
df['age_median']=df['age'].fillna(df['age'].median())
# NaN value will be replaced by median of the age

In [None]:
df[['age_median','Age_mean','age']]

### 3. Mode Imputation Technqiue -> For Categorical values

In [None]:
df[df['embarked'].isnull()]

In [None]:
df['embarked'].unique()

In [27]:
mode_value=df[df['embarked'].notna()]['embarked'].mode()[0]

In [28]:
df['embarked_mode']=df['embarked'].fillna(mode_value)

In [None]:
df[['embarked_mode','embarked']]

In [None]:
df['embarked_mode'].isnull().sum()

In [None]:
df['embarked'].isnull().sum()