# Preprocessing Methods

## StandardScaler
Standard scaler is a technique for transforming numerical data to have a mean of zero and a standard deviation of one. It is useful for machine learning algorithms that perform better when the input variables are scaled to a standard range.

The formula for standard scaling is:

$$z = \frac{x - \mu}{\sigma}$$

Where:

- $z$ is the scaled data.
- $x$ is the original data.
- $\mu$ is the mean of the data.
- $\sigma$ is the standard deviation of the data.

<font color='Blue'><b>Example:</b></font> This following example demonstrates how to use the `StandardScaler` from `sklearn.preprocessing` to standardize features of a dataset by removing the mean and scaling to unit variance. The original and scaled data are displayed for comparison. The purpose is to prepare the data for machine learning algorithms that perform better with standardized input features.

In [None]:
# Import the StandardScaler class
from sklearn.preprocessing import StandardScaler

# Import pandas
import pandas as pd

# Create some sample data
data = {
    "x1": [1, 2, 3, 4],
    "x2": [5, 6, 7, 8]
}

# Create a DataFrame from the data
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original DataFrame:")
display(df)

print('\nMean and Standard Deviation:')
df_temp = pd.concat([df.mean(axis = 0).to_frame('Mean'), df.std(axis = 0).to_frame('STD')], axis = 1)
display(df_temp)

# Initialize the scaler object
scaler = StandardScaler()

# Fit and transform the DataFrame
df_scaled = scaler.fit_transform(df)

# Convert the scaled array to a DataFrame
df_scaled = pd.DataFrame(df_scaled, columns=["x1", "x2"])

# Print the scaled DataFrame
print("Scaled DataFrame:")
display(df_scaled)

You can see that the scaled data has a mean of zero and a standard deviation of one for each feature. This makes the data more suitable for some machine learning algorithms, such as linear regression, k-nearest neighbors, and neural networks.

## Kernel Density Estimation (KDE) plot

A **Kernel Density Estimation (KDE) plot** is a method for visualizing the distribution of observations in a dataset. It's used to estimate the probability density function of continuous or non-parametric data. In other words, it shows the probability density at different values in a continuous variable.

A **univariate KDE plot** represents the probability distribution of a single variable. The area under the plotted curve represents the probability distribution of the data values.

A univariate KDE plot is a way to visualize this data. It's like a smooth, continuous version of a histogram. Instead of bars, you get a curve (or a 'hill') that shows where most of the data points (or 'ages') fall. The higher the hill, the more data points in that area.

The total area under a **Kernel Density Estimation (KDE) plot** is 1. This is because a KDE plot represents a probability distribution, and the total probability of all outcomes in a probability distribution always adds up to 1. This property makes KDE plots a useful tool for visualizing data distributions.



---


**FYI Only:**

A PDF, denoted as $f(x)$, describes the likelihood of a continuous random variable $X$ taking on a certain value $x$. However, for a continuous random variable, the probability that $X$ takes on any exact value is essentially zero because there are infinite possible values it can take. Therefore, we use the PDF to calculate the probability that $X$ falls within a certain range of values. This is done by integrating the PDF over that range.

Mathematically, the probability that $X$ lies in the interval $[a, b]$ is given by the integral of the PDF from $a$ to $b$, i.e.,

\begin{equation}P(a \leq X \leq b) = \int_{a}^{b} f(x) dx\end{equation}

This integral represents the area under the curve of the PDF from $a$ to $b$, which gives the probability that $X$ takes a value in this range.



---



In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

plt.style.use('https://raw.githubusercontent.com/HatefDastour/ENSF444/main/Files/mystyle.mplstyle')
# Set the seed for the random number generator
np.random.seed(0)

# Let's say we have ages of 300 people
ages = np.random.randint(1, 80, 300)

# Create a new figure
plt.figure(figsize=(6, 5))

# Plot the histogram with KDE
sns.histplot(ages, stat = 'density', kde=True, color='ForestGreen', alpha =0.3)

# Plot the KDE separately with a wider range
sns.kdeplot(ages, color='red', lw = 3)

plt.tight_layout()

This plot gives you a 'smoothed' view of where the ages fall, showing you the distribution of ages in your data. The higher the curve at any point, the more people of that age you have in your data.


The KDE plot in this example might appear to stretch from -20 to 100. This is because the KDE plot uses a technique called **kernel smoothing**.

In kernel smoothing, each data point is replaced with a curve (or 'kernel'). The curves of nearby data points overlap, and where they overlap the most is where you see the 'hills' in your KDE plot.

However, these curves can also extend beyond the range of your data, which is why you see the plot stretching from -20 to 100, even though your data only ranges from 1 to 80.

## Boxplots

A box plot, also referred to as a box-and-whisker diagram, is a graphical tool that visually depicts the distribution of a dataset. The components of a box plot include:

1. **Minimum Score**: This is the smallest value in the dataset, excluding outliers, and is represented by the left whisker's end.
2. **Lower Quartile ($Q_1$)**: This is the value below which 25% of the data falls, also known as the first quartile.
3. **Median ($Q_2$)**: This is the middle value of the dataset, also referred to as the second quartile. Half of the data points are equal to or greater than this value, and the other half are less.
4. **Upper Quartile ($Q3$)**: This is the value below which 75% of the data falls, also known as the third quartile. Consequently, 25% of the data is above this value.
5. **Maximum Score**: This is the largest value in the dataset, excluding outliers, and is represented by the right whisker's end.
6. **Whiskers**: These are the lines extending from the box indicating variability outside the upper and lower quartiles, hence they capture the data within the interquartile range.
7. **Interquartile Range (IQR)**: This is the range between the first quartile (Q1) and the third quartile (Q3), and is represented by the box:

$$IQR = Q_3 - Q1$$

Outliers, which are data points that deviate significantly from the overall distribution pattern, are also depicted in a box plot. They are typically shown as individual points outside the whiskers. A data point is considered an outlier if it satisfies one of the following conditions:

- The data point is less than $Q_1 - 1.5 \times IQR$
- The data point is greater than $Q_3 + 1.5 \times IQR$

These outliers are typically marked with a small dot outside the whiskers' range in the box plot. When outliers are present, the "minimum" and "maximum" values in the box plot are simply set to $Q_1 - 1.5 \times IQR$ and $Q_3 + 1.5 \times IQR$, respectively.

In [None]:
# Set the seed for the random number generator
np.random.seed(0)

# Create 1D dataset with some outliers
data_1d = np.concatenate([np.random.rand(100, 1) * 10,
                          np.array([[20], [25]])])

fig, ax = plt.subplots(1, 1, figsize=(8, 3))
sns.boxplot(x=data_1d.flatten(), ax=ax, color='#9fc5e8', linewidth = 2, linecolor = 'k')
_ = ax.set_title('Box Plot - Original 1D Data')
plt.tight_layout()
display(pd.DataFrame(data = data_1d, columns = ['Original 1D Data']))

<font color='Blue'><b>Example:</b></font>

In [None]:
import numpy as np
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns

def plot_data(ax, data, color, title):
    sns.kdeplot(data = data.flatten(),
                ax = ax, fill = True,
                color = color,
                lw = 2,
                legend = False)
    ax.set_title(title, weight='bold')
    ax.grid(axis = 'x')

# Set the seed for the random number generator
np.random.seed(0)

# Create 1D dataset with some outliers
data_1d = np.concatenate([np.random.rand(100, 1) * 10,
                          np.array([[20], [25]])])

# Initialize the standard scaler
scaler = StandardScaler()

# Fit and transform the 1D dataset
scaled_data_1d = scaler.fit_transform(data_1d)

# Create figure with additional row for box plots
fig, axs = plt.subplots(2, 2, figsize=(12, 6),
                        sharex =  'col', sharey = False,
                        height_ratios=[0.2, 0.8])

# Box plots
sns.boxplot(x=data_1d.flatten(), ax=axs[0, 0], color='#9fc5e8', linewidth = 2, linecolor = 'k')
axs[0, 0].set_title('Box Plot - Original 1D Data')
sns.boxplot(x=scaled_data_1d.flatten(), ax=axs[0, 1], color= '#b6d7a8', linewidth = 2, linecolor = 'k')
axs[0, 1].set_title('Box Plot - Scaled 1D Data')

# Density plots
plot_data(axs[1, 0], data_1d, '#0b5394', 'Density Plot - Original 1D Data')
plot_data(axs[1, 1], scaled_data_1d, '#38761d', 'Density Plot - Scaled 1D Data')
# axs[1, 0].set(ylim = [0,.4])
plt.tight_layout()

# Display the original and scaled data (sorted)
display(pd.DataFrame(data = np.hstack([data_1d, scaled_data_1d]), columns = ['Original 1D Data', 'Scaled 1D Data']).sort_values(by = ['Original 1D Data']))

<font color='Blue'><b>Example:</b></font>

In [None]:
import numpy as np
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

def plot_data(ax, data, facecolor, edgecolor, title):
    ax.scatter(data[:, 0], data[:, 1], fc=facecolor, alpha=alpha, ec=edgecolor)
    ax.set_title(title, weight='bold')
    ax.set(xlim=[-6, 6], ylim=[-6, 6],
           xlabel = 'Feature 1', ylabel = 'Feature 2',
           aspect='equal')

# Set the seed for the random number generator
np.random.seed(0)

# Create 2D dataset
data_2d = np.random.rand(100, 2)
data_2d[:,0] = data_2d[:,0] * 10 - 5
data_2d[:,1] = data_2d[:,1] * 8 - 4

# Initialize the standard scaler
scaler = StandardScaler()

# Fit and transform the 2D dataset
scaled_data_2d = scaler.fit_transform(data_2d)

# Define color and alpha
facecolor_orig = '#cc0000'
edgecolor_orig = '#990000'
facecolor_scaled = '#6aa84f'
edgecolor_scaled = '#38761d'
alpha = 0.7

# Visualize the original and scaled 2D dataset
fig, axs = plt.subplots(1, 2, figsize=(12, 6))

plot_data(axs[0], data_2d, facecolor_orig, edgecolor_orig, 'Original 2D Data')
plot_data(axs[1], scaled_data_2d, facecolor_scaled, edgecolor_scaled, 'Scaled 2D Data')

plt.tight_layout()

<font color='Blue'><b>Example:</b></font>

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Function for data visualization
def plot_data(ax, data, color, title):
    sns.kdeplot(data = data, ax = ax, fill = True, color = color, legend = False)
    ax.set_title(title, weight='bold')
    ax.grid(axis = 'x')
    ax.set(ylim = [0, 1])

# Set the seed for the random number generator
np.random.seed(0)

# Load Iris dataset
iris = load_iris()

# Create DataFrame for visualization
data = pd.DataFrame(data=iris.data, columns=iris.feature_names)

# StandardScaler
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
scaled_data = pd.DataFrame(data=scaled_data, columns=iris.feature_names)
# Create figure and axes
fig, axs = plt.subplots(len(data.columns), 2, figsize=(12, 8), sharex=True, sharey = True)

# Subplots for each feature
for i, feature in enumerate(data.columns):
    plot_data(axs[i, 0], data[feature], '#0b5394', f'Original Data - {feature}')
    plot_data(axs[i, 1], scaled_data[feature], '#6aa84f', f'Scaled Data - {feature}')

plt.tight_layout()

Stacked visualization of the same plot

In [None]:
import seaborn as sns

# Create figure and axes
fig, axs = plt.subplots(1, 2, figsize=(12, 5), sharey = True)

sns.kdeplot(ax = axs[0], data=data, fill=True, common_norm=False, palette="tab10", alpha=.5, linewidth= 2)
axs[0].set_title('Iris Data', weight = 'bold')
sns.kdeplot(ax = axs[1], data=scaled_data, fill=True, common_norm=False, palette="tab10", alpha=.5, linewidth=2)
axs[1].set_title('Iris Data (Scaled)', weight = 'bold')
plt.tight_layout()

## Robust Scaler Method

Given a dataset, for each feature, the `RobustScaler` adjusts the values using the following formula:

$$x_{scaled} = \frac{x - median(x)}{IQR(x)}$$

Where:
- $x$ is the original feature vector
- $median(x)$ is the median of the feature vector
- $IQR(x)$ is the interquartile range of the feature vector, which is the difference between the third quartile (75th percentile) and the first quartile (25th percentile)
- $x_{scaled}$ is the scaled feature vector

This formula essentially subtracts the median and then divides by the interquartile range. The interquartile range is the range between the first quartile (25th quantile) and the third quartile (75th quantile). This makes `RobustScaler` less prone to outliers. This method is particularly useful when you have data that has outliers. It's a way to standardize your data that is robust to outliers.

<font color='Blue'><b>Example:</b></font>

In [None]:
# Import the RobustScaler class
from sklearn.preprocessing import RobustScaler

# Import pandas
import pandas as pd

# Create some sample data
data = {
    "x1": [1, 2, 3, 4],
    "x2": [5, 6, 7, 8]
}

# Create a DataFrame from the data
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original DataFrame:")
display(df)

# Calculate Q1, Q2 (median), and Q3
Q1 = df.quantile(0.25)
Q2 = df.median()
Q3 = df.quantile(0.75)
IQR = Q3 - Q1

df_temp = pd.concat([Q1.to_frame('Q1'), Q2.to_frame('Q2'), Q3.to_frame('Q3'), IQR.to_frame('IQR')], axis = 1)
print('\nQ1, Median, Q3, and IQR:')
display(df_temp)

# Initialize the scaler object
scaler = RobustScaler()

# Fit and transform the DataFrame
df_scaled = scaler.fit_transform(df)

# Convert the scaled array to a DataFrame
df_scaled = pd.DataFrame(df_scaled, columns=["x1", "x2"])

# Print the scaled DataFrame
print("Scaled DataFrame:")
display(df_scaled)


<font color='Blue'><b>Example:</b></font>

In [None]:
# Import the necessary libraries
import numpy as np
from sklearn.preprocessing import RobustScaler
import matplotlib.pyplot as plt
import seaborn as sns

# Set the style for the plot
plt.style.use('https://raw.githubusercontent.com/HatefDastour/ENSF444/main/Files/mystyle.mplstyle')

# Define a function to plot the data
def plot_data(ax, data, color, title):
    sns.kdeplot(data = data.flatten(),
                ax = ax, fill = True,
                color = color,
                lw = 2,
                legend = False)
    ax.set_title(title, weight='bold')
    ax.grid(axis = 'x')

# Set the seed for the random number generator
np.random.seed(0)

# Create 1D dataset with some outliers
data_1d = np.concatenate([np.random.rand(100, 1) * 10,
                          np.array([[20], [25]])])

# Initialize the robust scaler
scaler = RobustScaler()

# Fit and transform the 1D dataset
scaled_data_1d = scaler.fit_transform(data_1d)

# Create figure with additional row for box plots
fig, axs = plt.subplots(2, 2, figsize=(12, 6),
                        sharex =  'col', sharey = False,
                        height_ratios=[0.2, 0.8])

# Box plots
sns.boxplot(x=data_1d.flatten(), ax=axs[0, 0], color='#9fc5e8', linewidth = 2, linecolor = 'k')
axs[0, 0].set_title('Box Plot - Original 1D Data')
sns.boxplot(x=scaled_data_1d.flatten(), ax=axs[0, 1], color= '#b6d7a8', linewidth = 2, linecolor = 'k')
axs[0, 1].set_title('Box Plot - Scaled 1D Data')

# Density plots
plot_data(axs[1, 0], data_1d, '#0b5394', 'Density Plot - Original 1D Data')
plot_data(axs[1, 1], scaled_data_1d, '#38761d', 'Density Plot - Scaled 1D Data')
# axs[1, 0].set(ylim = [0,.6])
plt.tight_layout()
plt.show()

# Display the original and scaled data (sorted)
display(pd.DataFrame(data = np.hstack([data_1d, scaled_data_1d]), columns = ['Original 1D Data', 'Scaled 1D Data']).sort_values(by = ['Original 1D Data']))

The `RobustScaler` doesn't remove the outliers; it only reduces their impact when scaling the data. The outliers are still present in the scaled data, but they have less influence on the overall distribution of the scaled data compared to other scaling methods like `StandardScaler` which uses the mean and standard deviation.

The goal of `RobustScaler` is not to remove outliers but to ensure that they have less influence on the scaled data. This can be particularly useful in machine learning models that are sensitive to the range of the input features. It helps to make these models more robust to outliers.

<font color='Blue'><b>Example:</b></font>

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.preprocessing import RobustScaler
import matplotlib.pyplot as plt

# Function for data visualization
def plot_data(ax, data, color, title):
    sns.kdeplot(data = data, ax = ax, fill = True, color = color, legend = False)
    ax.set_title(title, weight='bold')
    ax.grid(axis = 'x')
    ax.set(ylim = [0, 1])

# Set the seed for the random number generator
np.random.seed(0)

# Load Iris dataset
iris = load_iris()

# Create DataFrame for visualization
data = pd.DataFrame(data=iris.data, columns=iris.feature_names)

# Initialize the robust scaler
scaler = RobustScaler()
scaled_data = scaler.fit_transform(data)
scaled_data = pd.DataFrame(data=scaled_data, columns=iris.feature_names)
# Create figure and axes
fig, axs = plt.subplots(len(data.columns), 2, figsize=(12, 8), sharex=True, sharey = True)

# Subplots for each feature
for i, feature in enumerate(data.columns):
    plot_data(axs[i, 0], data[feature], '#0b5394', f'Original Data - {feature}')
    plot_data(axs[i, 1], scaled_data[feature], '#6aa84f', f'Scaled Data - {feature}')

plt.tight_layout()

Stacked visualization of the same plot

In [None]:
import seaborn as sns

# Create figure and axes
fig, axs = plt.subplots(1, 2, figsize=(12, 5), sharey = True)

sns.kdeplot(ax = axs[0], data=data, fill=True, common_norm=False, palette="tab10", alpha=.5, linewidth= 2)
axs[0].set_title('Iris Data', weight = 'bold')
sns.kdeplot(ax = axs[1], data=scaled_data, fill=True, common_norm=False, palette="tab10", alpha=.5, linewidth=2)
axs[1].set_title('Iris Data (Scaled)', weight = 'bold')
plt.tight_layout()

## MinMax Scaler Method

The `MinMaxScaler` is a data normalization technique used in machine learning preprocessing. It scales each feature to a given range, usually between 0 and 1.

- It subtracts the minimum value in the feature from each value in the feature.
- It then divides the result by the range of that feature (i.e., the difference between the maximum and minimum value).
- The result is that the minimum value of the feature becomes 0, the maximum value becomes 1, and all other values lie in between on a relative scale.

Mathematical representation of the `MinMaxScaler`:

Given a dataset, for each feature, the `MinMaxScaler` adjusts the values using the following formula:

$$x_{scaled} = \frac{x - min(x)}{max(x) - min(x)}$$

where:
- $x$ is the original feature vector
- $min(x)$ is the minimum value of the feature vector
- $max(x)$ is the maximum value of the feature vector
- $x_{scaled}$ is the scaled feature vector

This scaling method is beneficial when you want your data to be bounded within a certain range. However, it's important to note that `MinMaxScaler` does not reduce the impact of outliers. If your data contains significant outliers, you might want to consider using a method more robust to outliers, such as the `RobustScaler`.

<font color='Blue'><b>Example:</b></font>

In [None]:
# Import the necessary libraries
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
import seaborn as sns

# Set the style for the plot
plt.style.use('https://raw.githubusercontent.com/HatefDastour/ENSF444/main/Files/mystyle.mplstyle')

# Define a function to plot the data
def plot_data(ax, data, color, title):
    sns.kdeplot(data = data.flatten(),
                ax = ax, fill = True,
                color = color,
                lw = 2,
                legend = False)
    ax.set_title(title, weight='bold')
    ax.grid(axis = 'x')

# Set the seed for the random number generator
np.random.seed(0)

# Create 1D dataset with some outliers
data_1d = np.concatenate([np.random.rand(100, 1) * 10,
                          np.array([[20], [25]])])

# Initialize the MinMax scaler
scaler = MinMaxScaler()

# Fit and transform the 1D dataset
scaled_data_1d = scaler.fit_transform(data_1d)

# Create figure with additional row for box plots
fig, axs = plt.subplots(2, 2, figsize=(12, 6),
                        sharex =  'col', sharey = False,
                        height_ratios=[0.2, 0.8])

# Box plots
sns.boxplot(x=data_1d.flatten(), ax=axs[0, 0], color='#9fc5e8', linewidth = 2, linecolor = 'k')
axs[0, 0].set_title('Box Plot - Original 1D Data')
sns.boxplot(x=scaled_data_1d.flatten(), ax=axs[0, 1], color= '#b6d7a8', linewidth = 2, linecolor = 'k')
axs[0, 1].set_title('Box Plot - Scaled 1D Data')

# Density plots
plot_data(axs[1, 0], data_1d, '#0b5394', 'Density Plot - Original 1D Data')
plot_data(axs[1, 1], scaled_data_1d, '#38761d', 'Density Plot - Scaled 1D Data')
plt.tight_layout()
plt.show()

# Display the original and scaled data (sorted)
display(pd.DataFrame(data = np.hstack([data_1d, scaled_data_1d]), columns = ['Original 1D Data', 'Scaled 1D Data']).sort_values(by = ['Original 1D Data']))

## Normalizer method

The Normalizer method is a function that scales each non zero row of a data matrix to unit norm. This means that each row vector is divided by its length, resulting in a vector with magnitude one. This can be useful for text classification or clustering, where the cosine similarity between two vectors can be computed as their dot product. The Normalizer method can use different norms, such as l1, l2, or inf, to measure the length of a vector.

1. **$\ell_1$ norm (Manhattan norm)**: The l1 norm of a vector `x` is the sum of the absolute values of its elements. If we use the l1 norm, the normalized vector $x_{scaled}$ is given by:

   $$ x_{scaled} = \frac{x}{||x||_1} $$

   where $||x||_1$ is the l1 norm of the vector `x`, calculated as $||x||_1 = \sum |x_i|$.

2. **$\ell_2$ norm (Euclidean norm)**: The l2 norm of a vector `x` is the square root of the sum of the squares of its elements. If we use the l2 norm, the normalized vector $x_{scaled}$ is given by:

   $$ x_{scaled} = \frac{x}{||x||_2} $$

   where $||x||_2$ is the l2 norm of the vector `x`, calculated as $||x||_2 = \sqrt{ \sum x_i^2}$.

3. **$\ell_\inf$ norm (Maximum norm)**: The inf norm of a vector `x` is the maximum absolute value among its elements. If we use the inf norm, the normalized vector $x_{scaled}$ is given by:

   $$ x_{scaled} = \frac{x}{||x||_{inf}} $$

   where $||x||_{inf}$ is the inf norm of the vector `x`, calculated as $||x||_{inf} = \max(|x_i|)$.

In all cases, the transformed vectors have a unit norm, which means the sum of their elements' absolute values (for $\ell_1$), the square root of the sum of their elements' squares (for $\ell_2$), or the maximum absolute value among their elements (for $\ell_\inf$) equals to 1. This transformation can be useful when the cosine similarity between two vectors is computed as their dot product, as it is often the case in text classification or clustering.

The current version of `Normalizer` (scikit-learn 1.4.0) in `sklearn.preprocessing` scales each **row** of the data to unit norm. This means that for each row in your data, it calculates the norm (based on the type of norm you specify - 'l1', 'l2', or 'max'), and then divides each element in the row by this norm. The result is that the norm of each row will be 1 according to the type of norm used. This can be particularly useful in certain applications like text classification or clustering where cosine similarity is used. Please note that this normalization is applied independently to each sample (i.e., each row of your data), not to the features (i.e., not column-wise).

<font color='Blue'><b>Example:</b></font>

In [None]:
# Import the Normalizer class
from sklearn.preprocessing import Normalizer

# Import pandas
import pandas as pd

# Create some sample data
data = {
    "x1": [1, 2, 3, 4],
    "x2": [5, 6, 7, 8]
}

# Create a DataFrame from the data
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original Data:")
display(df)

# List of norms
norms = ['l1', 'l2', 'max']

# Loop over the norms
for norm in norms:
    # Initialize the Normalizer object
    normalizer = Normalizer(norm=norm)

    # Fit and transform the DataFrame
    df_normalized = normalizer.fit_transform(df)

    # Convert the normalized array to a DataFrame
    df_normalized = pd.DataFrame(df_normalized, columns=["x1", "x2"])

    # Print the normalized DataFrame
    print(f"\nNormalized Data by {norm} norm:")
    display(df_normalized)