<a href="https://colab.research.google.com/github/Jhansipothabattula/Machine_Learning/blob/main/Day29.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Scaling and Normalization

**Importance of Scaling and Normalization in Machine Learning**

- What is Scaling and Normalization?

  - Preprocessing techniques used to transform numerical features to a common range or distribution

- Why is Scaling and Normalization important?

  - Improves Algorithm Perfomance

  - Ensures Fair Comparisons

  - Stabilizes Training

**Methods: Min-Max Scaling, Standardization(Z-Score Scaling)**

- Min-Max Scaling

  - Transform features to a specified range, typically[0, 1]

  - Ensures all feature values are within the same range

  - Use cases: k-NN or Neural Networks

  - Limitations: Sensitive to outliers, as extreme values can distort the scale

- Standardization(Z-Score Scaling)

  - Centers the data around zero and scales it to have a standard deviation of 1

  - Ensures a standard normal distribution for each feature

  - Use cases: SVM, Logistics Regression, and PCA

  - Advantages: Handles outliers better than Min-Max Scaling

**When to Use Scaling and Normalization for Different Algorithms**

- Algorthims that Require Scaling

  - Distance-Based Algorithms

    - k-NN, SVM, K-Means clsutering

  - Gradient-Based Models

    - Linear Regression,, Logistic Regression and Neural Networks

- Algorithms Less Sensitive to Scaling

  - Tree-based Models

    - Decision Trees, Random Forests, Gradient Boosting

**1.Apply Min-Max Scaling and Standardization to a dataset using scikit-learn**

**2.Observe the effects of scaling on model perfomance by training a k-NN classifier before and after scaling**

In [5]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import MinMaxScaler, StandardScaler
import pandas as pd

# Load Iris Dataset
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Display dataset Information
print("Dataset Information:")
print(X.describe())
print("Target Classes: \n", data.target_names)

# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train k-NN Classifier (without scaling)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Predict and Evaluate (without scaling)
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy without Scaling: \n", accuracy)

# Apply Min-Max Scaling
scaler_minmax = MinMaxScaler()
X_train_scaled_minmax = scaler_minmax.fit_transform(X_train)
X_test_scaled_minmax = scaler_minmax.transform(X_test)

# Train k-NN Classifier on scaled data
knn_scaled_minmax = KNeighborsClassifier(n_neighbors=5)
knn_scaled_minmax.fit(X_train_scaled_minmax, y_train)

# Predict and Evaluate (with scaling)
y_pred_scaled_minmax = knn_scaled_minmax.predict(X_test_scaled_minmax)
accuracy_scaled_minmax = accuracy_score(y_test, y_pred_scaled_minmax)

# Display Results
print("Accuracy with Min-Max Scaling: \n", accuracy_scaled_minmax)

# Apply Standardization
scaler_std = StandardScaler()
X_standardized = scaler_std.fit_transform(X)

# Split Dataset for standardized data
X_train_std, X_test_std, y_train_std, y_test_std = train_test_split(X_standardized, y, test_size=0.2, random_state = 42)

# Train k-NN Classifier on Standardized Data
knn_std = KNeighborsClassifier(n_neighbors=5)
knn_std.fit(X_train_std, y_train_std)

# Predict and Evaluate
y_pred_std = knn_std.predict(X_test_std)
accuracy_std = accuracy_score(y_test_std, y_pred_std)
print("Accuracy with Standardization: \n", accuracy_std)

Dataset Information:
       sepal length (cm)  sepal width (cm)  petal length (cm)  \
count         150.000000        150.000000         150.000000   
mean            5.843333          3.057333           3.758000   
std             0.828066          0.435866           1.765298   
min             4.300000          2.000000           1.000000   
25%             5.100000          2.800000           1.600000   
50%             5.800000          3.000000           4.350000   
75%             6.400000          3.300000           5.100000   
max             7.900000          4.400000           6.900000   

       petal width (cm)  
count        150.000000  
mean           1.199333  
std            0.762238  
min            0.100000  
25%            0.300000  
50%            1.300000  
75%            1.800000  
max            2.500000  
Target Classes: 
 ['setosa' 'versicolor' 'virginica']
Accuracy without Scaling: 
 1.0
Accuracy with Min-Max Scaling: 
 1.0
Accuracy with Standardization: 
 1.0