# Normalization:
- Normalization is a technique used to scale numeric data in a specific range, usually between 0 and 1.
- It helps in bringing all features to the same scale and prevents some features from dominating others.
- A common method is Min-Max scaling, where the minimum value is transformed to 0 and the maximum value to 1, with other values scaled proportionally in between.
Example: 
1. Suppose you have a dataset of house prices ranging from $100,000 to $1,000,000. Normalization would transform these prices to a range between 0 and 1, making $100,000 equivalent to 0 and $1,000,000 equivalent to 1.
2.Imagine you're analyzing data on customer satisfaction scores (ranging from 1 to 10) and product prices (ranging from $10 to $1000). Normalization would scale both features to a range between 0 and 1, allowing for fair comparison.



In [1]:
# Importing Libraries:
# We begin by importing the necessary libraries. In this code, we import load_wine from 
#sklearn.datasets to load the Wine dataset, 
# MinMaxScaler from sklearn.preprocessing to perform Min-Max scaling, 
# and pandas to work with data in DataFrame format.

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import load_wine


In [3]:
# Loading the Wine Dataset:
# We load the Wine dataset using the load_wine() function from scikit-learn. 
# This dataset contains information about different attributes of wines.
wine_data=load_wine()

In [4]:
# Creating a DataFrame:
# We create a pandas DataFrame (wine_df) to store the features of the Wine dataset. 
# We use data attribute of the loaded dataset to access the feature data, 
# and feature_names attribute to set column names.
wine_df=pd.DataFrame(data=wine_data.data,columns=wine_data.feature_names)

In [5]:
# Displaying the Original Dataset:
# We print the first few rows of the original Wine dataset (wine_df) to see its structure and values.
print("original data",wine_df.head(5))

original data    alcohol  malic_acid   ash  alcalinity_of_ash  magnesium  total_phenols  \
0    14.23        1.71  2.43               15.6      127.0           2.80   
1    13.20        1.78  2.14               11.2      100.0           2.65   
2    13.16        2.36  2.67               18.6      101.0           2.80   
3    14.37        1.95  2.50               16.8      113.0           3.85   
4    13.24        2.59  2.87               21.0      118.0           2.80   

   flavanoids  nonflavanoid_phenols  proanthocyanins  color_intensity   hue  \
0        3.06                  0.28             2.29             5.64  1.04   
1        2.76                  0.26             1.28             4.38  1.05   
2        3.24                  0.30             2.81             5.68  1.03   
3        3.49                  0.24             2.18             7.80  0.86   
4        2.69                  0.39             1.82             4.32  1.04   

   od280/od315_of_diluted_wines  proline  
0    

In [7]:
updated_data=wine_df.columns[[1,2,3]]

In [8]:
updated_data

Index(['malic_acid', 'ash', 'alcalinity_of_ash'], dtype='object')

In [9]:
# Initializing the MinMaxScaler:
# We create an instance of MinMaxScaler and assign it to the variable scaler. 
# This scaler will be used to perform Min-Max scaling on the dataset.
scaler=MinMaxScaler()

In [10]:
# Normalizing the Dataset:
# We apply Min-Max scaling to the original dataset using the fit_transform() method of the scaler. 
# This method both fits the scaler to the data and transforms the data simultaneously. 
# The normalized data is stored in the variable wine_normalized.
wine_normalized=scaler.fit_transform(wine_df)

In [13]:
print(wine_normalized)

[[0.84210526 0.1916996  0.57219251 ... 0.45528455 0.97069597 0.56134094]
 [0.57105263 0.2055336  0.4171123  ... 0.46341463 0.78021978 0.55064194]
 [0.56052632 0.3201581  0.70053476 ... 0.44715447 0.6959707  0.64693295]
 ...
 [0.58947368 0.69960474 0.48128342 ... 0.08943089 0.10622711 0.39728959]
 [0.56315789 0.36561265 0.54010695 ... 0.09756098 0.12820513 0.40085592]
 [0.81578947 0.66403162 0.73796791 ... 0.10569106 0.12087912 0.20114123]]


In [15]:
# Converting Normalized Data to DataFrame:
# We convert the normalized data (wine_normalized) back to a pandas DataFrame (wine_normalized_df) 
# so that we can easily view and work with the normalized dataset.
wine_normalized_df = pd.DataFrame(data=wine_normalized, columns=wine_data.feature_names)


In [16]:
# Displaying the Normalized Dataset:
# Finally, we print the first few rows of the normalized Wine dataset (wine_normalized_df) 
# to see how the features have been scaled between 0 and 1.
print("Normalized data", wine_normalized_df.head(5))

Normalized data     alcohol  malic_acid       ash  alcalinity_of_ash  magnesium  \
0  0.842105    0.191700  0.572193           0.257732   0.619565   
1  0.571053    0.205534  0.417112           0.030928   0.326087   
2  0.560526    0.320158  0.700535           0.412371   0.336957   
3  0.878947    0.239130  0.609626           0.319588   0.467391   
4  0.581579    0.365613  0.807487           0.536082   0.521739   

   total_phenols  flavanoids  nonflavanoid_phenols  proanthocyanins  \
0       0.627586    0.573840              0.283019         0.593060   
1       0.575862    0.510549              0.245283         0.274448   
2       0.627586    0.611814              0.320755         0.757098   
3       0.989655    0.664557              0.207547         0.558360   
4       0.627586    0.495781              0.490566         0.444795   

   color_intensity       hue  od280/od315_of_diluted_wines   proline  
0         0.372014  0.455285                      0.970696  0.561341  
1         0.

# Standardization:
- Standardization transforms data to have a mean of 0 and a standard deviation of 1.
- It doesn't bound the values to a specific range, but it's useful when the features have different units or scales.
- Standardized data has zero mean and unit variance, making it easier to compare different features.


Example: 
1.Let's say you have two features in a dataset: height (in centimeters) and weight (in kilograms). Standardization would transform both features so that their mean becomes 0 and standard deviation becomes 1, making them directly comparable.
2.Consider a dataset containing exam scores (ranging from 0 to 100) and study hours (ranging from 0 to 50). Standardization would make both features comparable by transforming them to have a mean of 0 and a standard deviation of 1, regardless of their original units.


In [19]:
# Importing Libraries:
# We import necessary libraries: load_wine to load the Wine dataset, StandardScaler 
# for standardization, and pandas for working with DataFrames.
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_wine

In [21]:
# Loading the Wine Dataset:
# We load the Wine dataset using load_wine() from scikit-learn and create a DataFrame wine_df to store the feature data.
wine_data1=load_wine()
wine_df1=pd.DataFrame(data=wine_data1.data, columns=wine_data1.feature_names)


In [22]:
# Displaying the Original Dataset:
# We print the first few rows of the original Wine dataset (wine_df) to understand its structure and values.
wine_df1.head(5)

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0


In [23]:
# Initializing StandardScaler:
# We create an instance of StandardScaler and assign it to the variable scaler. 
# This scaler will be used to perform standardization on the dataset.
scaler1=StandardScaler()


In [32]:
# Standardizing the Dataset:
# We standardize the original dataset using the fit_transform() method of the scaler. 
# This method fits the scaler to the data and transforms it simultaneously. 
# The standardized data is stored in the variable wine_standardized.
wine_Standardized=scaler1.fit_transform(wine_df1)
print(wine_Standardized)

[[ 1.51861254 -0.5622498   0.23205254 ...  0.36217728  1.84791957
   1.01300893]
 [ 0.24628963 -0.49941338 -0.82799632 ...  0.40605066  1.1134493
   0.96524152]
 [ 0.19687903  0.02123125  1.10933436 ...  0.31830389  0.78858745
   1.39514818]
 ...
 [ 0.33275817  1.74474449 -0.38935541 ... -1.61212515 -1.48544548
   0.28057537]
 [ 0.20923168  0.22769377  0.01273209 ... -1.56825176 -1.40069891
   0.29649784]
 [ 1.39508604  1.58316512  1.36520822 ... -1.52437837 -1.42894777
  -0.59516041]]


In [28]:
# Converting Standardized Data to DataFrame:
# We convert the standardized data (wine_standardized) back to a pandas 
# DataFrame (wine_standardized_df) for easy viewing and further analysis.
wine_Standardized_df=pd.DataFrame(data=wine_Standardized,columns=wine_data1.feature_names)

In [30]:
# Displaying the Standardized Dataset:
# Finally, we print the first few rows of the standardized Wine dataset (wine_standardized_df) 
# to observe how the features have been standardized with a mean of 0 and a standard deviation of 1.
print("Satndardized Data",wine_Standardized_df)

Satndardized Data       alcohol  malic_acid       ash  alcalinity_of_ash  magnesium  \
0    1.518613   -0.562250  0.232053          -1.169593   1.913905   
1    0.246290   -0.499413 -0.827996          -2.490847   0.018145   
2    0.196879    0.021231  1.109334          -0.268738   0.088358   
3    1.691550   -0.346811  0.487926          -0.809251   0.930918   
4    0.295700    0.227694  1.840403           0.451946   1.281985   
..        ...         ...       ...                ...        ...   
173  0.876275    2.974543  0.305159           0.301803  -0.332922   
174  0.493343    1.412609  0.414820           1.052516   0.158572   
175  0.332758    1.744744 -0.389355           0.151661   1.422412   
176  0.209232    0.227694  0.012732           0.151661   1.422412   
177  1.395086    1.583165  1.365208           1.502943  -0.262708   

     total_phenols  flavanoids  nonflavanoid_phenols  proanthocyanins  \
0         0.808997    1.034819             -0.659563         1.224884   
1      

In [31]:
wine_Standardized_df.head(5)

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,1.518613,-0.56225,0.232053,-1.169593,1.913905,0.808997,1.034819,-0.659563,1.224884,0.251717,0.362177,1.84792,1.013009
1,0.24629,-0.499413,-0.827996,-2.490847,0.018145,0.568648,0.733629,-0.820719,-0.544721,-0.293321,0.406051,1.113449,0.965242
2,0.196879,0.021231,1.109334,-0.268738,0.088358,0.808997,1.215533,-0.498407,2.135968,0.26902,0.318304,0.788587,1.395148
3,1.69155,-0.346811,0.487926,-0.809251,0.930918,2.491446,1.466525,-0.981875,1.032155,1.186068,-0.427544,1.184071,2.334574
4,0.2957,0.227694,1.840403,0.451946,1.281985,0.808997,0.663351,0.226796,0.401404,-0.319276,0.362177,0.449601,-0.037874
