### Q) What is class imbalance in machine learning?
Class imbalance refers to a situation where the classes in a classification problem are not represented equally. One class (the majority class) has significantly more samples than the other class(es) (the minority class(es)). Class imbalance can pose challenges for machine learning algorithms, as they tend to be biased towards the majority class and have lower predictive performance for the minority class.

### Q) Why is class imbalance a problem in machine learning?
Class imbalance can lead to biased models that have poor performance on the minority class. The model tends to predict the majority class most of the time, resulting in low sensitivity/recall for the minority class and a high false negative rate. Class imbalance can also cause issues with model evaluation metrics, such as accuracy, as even a simple majority class prediction can yield high accuracy due to the imbalanced class distribution.

### Q) What are undersampling and oversampling techniques?
Undersampling and oversampling are techniques used to address class imbalance in machine learning:

* Undersampling involves reducing the number of samples in the majority class to make it more balanced with the minority class.


* Oversampling involves increasing the number of samples in the minority class to balance it with the majority class.

### Q) What is the goal of undersampling?
The goal of undersampling is to create a balanced dataset by reducing the number of samples in the majority class. By undersampling, we aim to eliminate the bias towards the majority class and give equal importance to both classes during model training. This helps the model learn the patterns and characteristics of the minority class more effectively.

### Q) What is the goal of oversampling?
The goal of oversampling is to address class imbalance by increasing the number of samples in the minority class. By replicating or synthesizing new instances of the minority class, oversampling ensures that the model receives sufficient exposure to the minority class, improving its ability to learn the distinguishing features and make accurate predictions for that class.

### Q) What are some commonly used undersampling techniques?
Some commonly used undersampling techniques include:

* Random Undersampling: Randomly selects a subset of samples from the majority class to match the size of the minority class.


* Tomek Links: Identifies pairs of samples from different classes that are closest to each other and removes the majority class samples to make the boundary clearer.


* Cluster Centroids: Uses clustering algorithms to identify clusters of majority class samples and reduces them to a representative subset.

### Q) What are some commonly used oversampling techniques?
Some commonly used oversampling techniques include:

* Random Oversampling: Randomly replicates samples from the minority class to increase its representation in the dataset.


* SMOTE (Synthetic Minority Over-sampling Technique): Generates synthetic samples by interpolating between neighboring minority class samples.


* ADASYN (Adaptive Synthetic Sampling): Similar to SMOTE, but gives more focus to the samples that are harder to learn by generating more synthetic samples for them.

### Q) What are the advantages and disadvantages of undersampling and oversampling?

Advantages of undersampling:

* It reduces the computational complexity and training time since the dataset size is reduced.


* It can help mitigate the impact of noisy or irrelevant majority class samples.

Disadvantages of undersampling:

* It can lead to the loss of potentially important information from the majority class.


* It may result in underutilization of available data, especially if the majority class is already limited.

Advantages of oversampling:

* It increases the representation of the minority class, allowing the model to learn its patterns more effectively.


* It retains all available data, preventing information loss.

Disadvantages of oversampling:


* Oversampling can increase the risk of overgeneralization if the minority class is already well-represented in the training data.