# **What is Scaling and Normalization ?**

Scaling : Scaling is the process of transforming numerical data to a specific range or scale. It involves re-scaling the values of a variable to a common range, typically between 0 and 1 or -1 and 1. The purpose of scaling is to bring all variables to a similar scale, eliminating the influence of their original measurement units.

Normalization : Normalization is another technique used to transform data to a standard range. It adjusts the values of a variable to have a standard deviation of 1 or to fit within a specific range, such as -1 to 1 or 0 to 1. The purpose of normalization is to bring variables with different scales and variances to a similar level of comparison.



---
Let's Import necessary libraries along with the dataset


In [None]:

import pandas as pd
import numpy as np


from scipy import stats


from mlxtend.preprocessing import minmax_scaling


import seaborn as sns
import matplotlib.pyplot as plt

# **2.Scaling our Data**

---
Let's take a look into the following code snippet.


In [None]:
# Generate random exponential data
original_data = np.random.exponential(size=1000)

# Perform Min-Max scaling manually
scaled_data = (original_data - np.min(original_data)) / (np.max(original_data) - np.min(original_data))

# Plot the original and scaled data
fig, ax = plt.subplots(1, 2, figsize=(15, 3))
ax[0].hist(original_data, bins=30, density=True)
ax[0].set_title("Original Data")
ax[1].hist(scaled_data, bins=30, density=True)
ax[1].set_title("Scaled Data")
plt.show()

The above code generates random exponential data using np.random.exponential and then reshapes it to a 2D array with a single column. The data is then scaled using MinMaxScaler from scikit-learn. Finally, the original and scaled data are plotted using seaborn.histplot.

# **3.Normalizing our Data**

---
The following code snippet shows How to normalize our data.


In [None]:
# normalize the exponential data with boxcox
normalized_data = stats.boxcox(original_data)

# plot both together to compare
fig, ax=plt.subplots(1, 2, figsize=(15, 3))
sns.histplot(original_data, ax=ax[0], kde=True, legend=False)
ax[0].set_title("Original Data")
sns.histplot(normalized_data[0], ax=ax[1], kde=True, legend=False)
ax[1].set_title("Normalized data")
plt.show()

In this code, we import the stats module from scipy to access the boxcox function. The boxcox function performs the Box-Cox transformation on the original_data, returning a tuple where the first element represents the normalized data.

The original and normalized data are then plotted side by side using seaborn.histplot.

# **Challenge Problem : Perform Scaling and Normalization for the Houseprice Dataset.**