In [None]:
# -*- coding: utf-8 -*-
"""Data_Preprocessing_PCA_scaling.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1-iHkX1j0pG199_7e497L467_h6x34b_20
"""

import numpy as np
from sklearn.preprocessing import MinMaxScaler, normalize
from sklearn.decomposition import PCA
import pandas as pd

"""**Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.**

**Answer:**

Min-Max scaling is a data preprocessing technique used to rescale numerical features to a specific range, typically between 0 and 1. It transforms the data by linearly scaling it to the desired range, preserving the relationships between the original values.

**Formula:**

$X_{scaled} = \frac{X - X_{min}}{X_{max} - X_{min}}$

**How it's used:**

* **Normalization:** It normalizes the data, ensuring that all features contribute equally to the model, preventing features with larger ranges from dominating.
* **Algorithm sensitivity:** Some machine learning algorithms are sensitive to the scale of the input features. Min-Max scaling can help improve the performance of these algorithms.
* **Image processing:** It is used to scale pixel intensities to a specific range.

**Example:**
"""

data = np.array([10, 20, 30, 40, 50]).reshape(-1, 1)  # Reshape to a column vector
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)
print("Original Data:\n", data)
print("Scaled Data:\n", scaled_data)

"""**Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.**

**Answer:**

The Unit Vector technique, also known as normalization or L2 normalization, scales the data such that the Euclidean norm (length) of each vector is 1. It transforms the data by dividing each data point by its magnitude.

**Formula:**

$X_{normalized} = \frac{X}{\|X\|}$

**Differences from Min-Max scaling:**

* **Range:** Min-Max scaling scales data to a specific range (e.g., 0 to 1), while Unit Vector scaling scales data to have a unit norm.
* **Focus:** Min-Max scaling preserves the relationships between the original values, while Unit Vector scaling focuses on the direction of the vectors.
* **Applications:** Min-Max scaling is often used when the range of the data is important, while Unit Vector scaling is used when the direction of the data is important, such as in text processing (e.g., cosine similarity).

**Example:**
"""

data = np.array([[3, 4], [5, 12], [6, 8]])
normalized_data = normalize(data, norm='l2')
print("Original Data:\n", data)
print("Normalized Data:\n", normalized_data)

"""**Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.**

**Answer:**

PCA (Principal Component Analysis) is a dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional representation while preserving as much variance as possible. It identifies the principal components, which are orthogonal directions that capture the most variance in the data.

**How it's used:**

* **Dimensionality reduction:** It reduces the number of features in a dataset, simplifying the model and reducing computational cost.
* **Feature extraction:** It extracts the most important features from the data, which can improve the performance of machine learning models.
* **Data visualization:** It can be used to visualize high-dimensional data in a lower-dimensional space.

**Example:**
"""

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
pca = PCA(n_components=2)
reduced_data = pca.fit_transform(data)
print("Original Data:\n", data)
print("Reduced Data:\n", reduced_data)

"""**Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.**

**Answer:**

PCA is a feature extraction technique. It transforms the original features into a new set of features, called principal components, which are linear combinations of the original features. These principal components capture the most variance in the data, and they can be used as new features for machine learning models.

**How PCA is used for Feature Extraction:**

1.  **Calculate the covariance matrix:** The covariance matrix describes the relationships between the original features.
2.  **Calculate the eigenvectors and eigenvalues:** The eigenvectors represent the directions of the principal components, and the eigenvalues represent the amount of variance captured by each principal component.
3.  **Select the principal components:** Select the principal components that capture the most variance.
4.  **Transform the data:** Transform the original data into the new feature space defined by the selected principal components.

**Example:**
"""

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
pca = PCA(n_components=2)
new_features = pca.fit_transform(data)
print("Original Data:\n", data)
print("Extracted Features (Principal Components):\n", new_features)

"""**Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.**

**Answer:**

To preprocess the data using Min-Max scaling, I would follow these steps:

1.  **Identify the numerical features:** In this case, the numerical features are price, rating, and delivery time.
2.  **Apply Min-Max scaling to each numerical feature:** I would use the `MinMaxScaler` from scikit-learn to scale each feature to a range of 0 to 1.
3.  **Combine the scaled features with the categorical features (if any):** If the dataset contains categorical features, I would encode them using techniques like one-hot encoding and combine them with the scaled numerical features.

**Python example:**
"""

data = pd.DataFrame({
    'price': [10, 20, 30, 40, 50],
    'rating': [3, 4, 5, 2, 1],
    'delivery_time': [20, 15, 10, 25, 30]
})
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)
scaled_df = pd.DataFrame(scaled_data, columns=data.columns)
print(scaled_df)

"""**Q6. You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.**

**Answer:**

To reduce the dimensionality of the stock price dataset using PCA, I would follow these steps:

1.  **Standardize the data:** Stock price data can have features with different scales. I would standardize the data using `StandardScaler` to ensure that all features have a mean of 0 and a standard deviation of 1.
2.  **Apply PCA:** I would use the `PCA` class from scikit-learn to perform PCA on the standardized data.
3.  **Determine the number of principal components:** I would analyze the explained variance ratio to determine the number of principal components that capture a significant amount of variance (e.g., 95%).
4.  **Transform the data:** I would transform the original data into the new feature space defined by the selected principal components.

**Python example:**
"""

from sklearn.preprocessing import StandardScaler

data = np.random.rand(100, 20)  # Example dataset with 100 samples and 20 features
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
pca = PCA(n_components=10)  # Retain 10 principal components
reduced_data = pca.fit_transform(scaled_data)
print("Original Data Shape:", data.shape)
print("Reduced Data Shape:", reduced_data.shape)

"""**Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.**
"""

data = np.array([1, 5, 10, 15, 20]).reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_data = scaler.fit