## __Imbalanced Dataset Handling__

#### Imbalanced dataset will result into a biased model , eg. if we have a binary classification model on a dataset with 1000 datapoints , 900 YES and 100 NO , the model wil be biased towards YES. 
#### There are two methods to handle this:

### 1- **Up Sampling**
#### We increase the no of datapoints of the minority.  

In [5]:
import numpy as np
import pandas as pd

## Setting the random seed , so whenever we use random , the nos. dont change
np.random.seed(123) 

## Create a dataframe with two classes
n_samples = 1000
class_0_ratio = 0.9
n_class_0 = int(n_samples * class_0_ratio)
n_class_1 = n_samples - n_class_0

In [6]:
n_class_0,n_class_1

(900, 100)

In [7]:
## Create dataframe with imbalanced dataset

class_0 = pd.DataFrame({
    'feature_1': np.random.normal(loc = 0 , scale = 1 , size = n_class_0),
    'feature_2': np.random.normal(loc = 0 , scale = 1 , size = n_class_0),
    'target': [0]*n_class_0
})

class_1 = pd.DataFrame({
    'feature_1': np.random.normal(loc = 2 , scale = 1 , size = n_class_1),
    'feature_2': np.random.normal(loc = 2 , scale = 1 , size = n_class_1),
    'target': [1]*n_class_1
})

In [9]:
df= pd.concat([class_0,class_1]).reset_index(drop=True)

In [10]:
df['target'].value_counts()

target
0    900
1    100
Name: count, dtype: int64

In [12]:
df_minority = df[df['target']==1]
df_majority = df[df['target']==0]

In [None]:
from sklearn.utils import resample
df_miority_upsampled = resample(df_minority, replace=True , #Sample with replacement
         n_samples=len(df_majority),
         random_state=42
         )