# Data Cleaning

<a id="data_normalizing"></a>
## Data Normalizing

Normalization is scaling the data to be analyzed to a specific range such as [0.0, 1.0] to provide better results.

Data normalization is a vital pre-processing, mapping, and scaling method that helps forecasting and prediction models become more accurate. The current data range is transformed into a new, standardized range using this method. Normalization is extremely important when it comes to bringing disparate prediction and forecasting techniques into harmony. Data normalization improves the consistency and comparability of different predictive models by standardizing the range of independent variables or features within a dataset, leading to more steady and dependable results.

### **Some common approaches:**

1.	Min-Max Scaling (Rescaling)

	•	Formula:
    $X{\prime} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}$

	•	Scales data between 0 and 1 (or any custom range).

	•	Sensitive to outliers.


2.	Z-Score Normalization (Standardization)

	•	Formula:

    $X{\prime} = \frac{X - \mu}{\sigma}$

	•	Centers data around mean = 0 with std deviation = 1.

	•	Works well when data has a normal distribution.



3.	Robust Scaling (Median & IQR-based)

	•	Formula: 
  
    $X{\prime} = \frac{X - \text{median}(X)}{\text{IQR}(X)}$

	•	Uses median and interquartile range (IQR) to make it robust against outliers.


4.	Log Transformation

	•	Formula:

    $X{\prime} = \log(X + c)$

	•	Helps handle skewed distributions by compressing large values.

5.	Max Abs Scaling

	•	Formula:

    $X{\prime} = \frac{X}{|X_{\max}|}$

	•	Scales data to [-1, 1] without shifting the mean (useful for sparse data).

6.	L2 Normalization (Unit Vector Scaling)

	•	Formula:

    $X{\prime} = \frac{X}{||X||_2}$

	•	Scales each data point to have unit norm, useful for text and image data.

The choice depends on the dataset and model type.