### Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application .

Ans: Min max scaling is used to fit the given data points into in range 
between 0 and 1 using formula, 
                            X_scaled = (x(i) - x_min)/(x_max - x_min) 
Then the transformed data is fitted into the required macchine leanring model for
required prediction.

In [None]:
# Illustration by code 
import numpy as np
import seaborn as sns  
from sklearn.preprocessing import MinMaxScaler 

data = [[1,23],[89,43],[22,45],[11,23]]
scaler = MinMaxScaler()
scaler.fit_transform(data) 

# The output is the normalized numerical values of the given data

### Q2.What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application. 

Ans: UNIT vector is a one of the method of Feature Scaling it converts the every data point to it's normal form by dividing with it's magnitude.
But in MIn-Max scaling the data points are converted in the range[0,1].

In [None]:
import numpy as np
from sklearn.preprocessing import Normalizer

# Example dataset
data = np.array([[160, 50],
                 [170, 60],
                 [180, 70]])

# Initialize Normalizer with l2 norm
normalizer = Normalizer(norm='l2')

# Fit the Normalizer to the data and transform it
unit_scaled_data = normalizer.fit_transform(data)

# Print the scaled data
print("Unit Vector Scaled Data:")
print(unit_scaled_data)


### Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

Ans: Principal Component Analysis (PCA) is a dimensionality reduction technique used to identify patterns in data and to represent it in a more compact form. It achieves this by transforming the original features into a new set of orthogonal (uncorrelated) features called principal components. These principal components are ordered by the amount of variance they explain in the data, with the first principal component explaining the most variance, followed by the second, and so on.

PCA works by finding the directions (principal components) in which the data varies the most. It then projects the data onto these directions, resulting in a lower-dimensional representation while preserving the maximum amount of variance in the data.

Here's how PCA is typically applied:

1. **Standardize the data**: PCA is sensitive to the scale of the features, so it's important to standardize the data (subtract mean and divide by standard deviation) before applying PCA.

2. **Compute the covariance matrix**: PCA calculates the covariance matrix of the standardized data, which represents the relationships between all pairs of features.

3. **Compute the eigenvectors and eigenvalues of the covariance matrix**: PCA decomposes the covariance matrix into its eigenvectors and eigenvalues. Eigenvectors represent the directions of maximum variance (principal components), while eigenvalues represent the amount of variance explained by each eigenvector.

4. **Select the principal components**: PCA selects the top \( k \) eigenvectors (principal components) corresponding to the \( k \) largest eigenvalues to retain the most important information in the data.

5. **Transform the data**: Finally, PCA projects the original data onto the selected principal components to obtain the lower-dimensional representation.



In [None]:
# Here's an example of how PCA can be applied using Python's scikit-learn library:


from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Target variable

# Standardize the features
X_standardized = (X - X.mean(axis=0)) / X.std(axis=0)

# Initialize PCA with 2 principal components
pca = PCA(n_components=2)

# Fit and transform the standardized data
X_pca = pca.fit_transform(X_standardized)

# Print the explained variance ratio
print("Explained Variance Ratio:", pca.explained_variance_ratio_)

# Print the transformed data
print("Transformed Data:")
print(X_pca)


### Q4.What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.

Ans: PCA and feature extraction are closely related concepts, as PCA can be used as a feature extraction technique. Feature extraction refers to the process of deriving new features from the original set of features in a dataset. It aims to reduce the dimensionality of the data while preserving the most important information.

PCA achieves feature extraction by transforming the original features into a new set of orthogonal features called principal components. These principal components are linear combinations of the original features and capture the directions of maximum variance in the data. By selecting a subset of principal components that explain the most variance, PCA effectively extracts the most informative features from the original dataset.

Here's how PCA can be used for feature extraction:

Standardize the data: As with PCA for dimensionality reduction, it's important to standardize the data (subtract mean and divide by standard deviation) before applying PCA for feature extraction.
1. Compute the covariance matrix: PCA calculates the covariance matrix of the standardized data, representing the relationships between all pairs of features.
2. Compute the eigenvectors and eigenvalues: PCA decomposes the covariance matrix into its eigenvectors and eigenvalues. Eigenvectors represent the directions of maximum variance (principal components), while eigenvalues represent the amount of variance explained by each eigenvector.
3. Select the principal components: PCA selects the top k eigenvectors (principal components) corresponding to the k largest eigenvalues to retain the most important information in the data.
4. Transform the data: Finally, PCA projects the original data onto the selected principal components to obtain the lower-dimensional representation, effectively extracting new features from the original dataset.

In [None]:
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Target variable

# Standardize the features
X_standardized = (X - X.mean(axis=0)) / X.std(axis=0)

# Initialize PCA with 2 principal components
pca = PCA(n_components=2)

# Fit and transform the standardized data
X_extracted_features = pca.fit_transform(X_standardized)

# Print the transformed data
print("Extracted Features:")
print(X_extracted_features)


### Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.

Ans: To use Min-Max scaling to preprocess the data for building a recommendation system for a food delivery service, you would follow these steps:

1. **Understand the dataset**: First, carefully examine the dataset to identify the features that need to be scaled. In this case, the features could include price, rating, and delivery time.

2. **Perform Min-Max scaling**: Apply Min-Max scaling to each feature individually. This process will rescale each feature to a range between 0 and 1, ensuring that they all have a similar scale.

3. **Normalization formula**: Utilize the Min-Max scaling formula for each feature:
   \[ X_{\text{scaled}} = \frac{X - X_{\text{min}}}{X_{\text{max}} - X_{\text{min}}} \]
   Where:
   - \( X \) is the original value of the feature.
   - \( X_{\text{min}} \) is the minimum value of the feature in the dataset.
   - \( X_{\text{max}} \) is the maximum value of the feature in the dataset.
   - \( X_{\text{scaled}} \) is the scaled value of the feature.

4. **Implement scaling**: Use a library like scikit-learn in Python to implement Min-Max scaling. Here's a basic example:

```python
from sklearn.preprocessing import MinMaxScaler

# Assuming 'data' is your dataset containing features like price, rating, and delivery time
# Initialize MinMaxScaler
scaler = MinMaxScaler()

# Apply Min-Max scaling to each feature individually
scaled_data = scaler.fit_transform(data)
```

5. **Verify the scaled data**: Check the scaled data to ensure that each feature now falls within the desired range of 0 to 1.

6. **Use scaled data for recommendation system**: Utilize the scaled data in building your recommendation system. The scaled features will now have a similar scale, preventing any one feature from dominating the recommendation process due to its larger magnitude.

By applying Min-Max scaling, you ensure that features such as price, rating, and delivery time are on the same scale, which can lead to more accurate recommendations in your food delivery service recommendation system.

### Q6.You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

Using PCA to reduce the dimensionality of the dataset for predicting stock prices involves the following steps:

1. **Understand the dataset**: Thoroughly examine the dataset containing features such as company financial data (e.g., revenue, earnings, debt-to-equity ratio) and market trends (e.g., stock market indices, interest rates, economic indicators).

2. **Standardize the data**: Since PCA is sensitive to the scale of the features, it's important to standardize the data before applying PCA. Standardization involves subtracting the mean and dividing by the standard deviation for each feature.

3. **Apply PCA**: Once the data is standardized, apply PCA to the dataset to identify the principal components. PCA will find the directions in which the data varies the most and represent the original features in terms of these principal components.

4. **Determine the number of components**: Decide on the number of principal components to retain. This decision can be based on the cumulative explained variance ratio. It's common to retain enough components to explain a significant portion (e.g., 80-90%) of the total variance in the data.

5. **Transform the data**: Transform the standardized data using the selected principal components. This will result in a lower-dimensional representation of the dataset, where each data point is represented by a reduced set of features (the principal components).

6. **Model training**: Use the reduced-dimensional dataset for training your stock price prediction model. Since the dataset now contains fewer features, training the model may be computationally faster and less prone to overfitting.

Here's a high-level overview of how you could implement PCA for dimensionality reduction in Python using scikit-learn:

```python
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Assuming 'data' is your dataset containing features
# Step 2: Standardize the data
scaler = StandardScaler()
standardized_data = scaler.fit_transform(data)

# Step 3: Apply PCA
pca = PCA(n_components=0.90)  # Retain 90% of the variance
pca.fit(standardized_data)

# Step 4: Determine the number of components
n_components = pca.n_components_

# Step 5: Transform the data
reduced_data = pca.transform(standardized_data)

# Step 6: Use reduced data for model training
# Continue with model training using 'reduced_data' as features
```

In this example:
- We standardize the data using `StandardScaler`.
- We apply PCA with the goal of retaining 90% of the variance.
- We transform the standardized data using the selected principal components.
- The reduced-dimensional dataset (`reduced_data`) can then be used for training your stock price prediction model.

### Q7.For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.

In [29]:
import numpy as np

# Define the dataset
data = np.array([1, 5, 10, 15, 20])

# Find the minimum and maximum values
min_val = np.min(data)
max_val = np.max(data)

# Apply Min-Max scaling
scaled_data = (data - min_val) / (max_val - min_val)

# Transform the scaled values to the range of -1 to 1
scaled_data_new = 2 * scaled_data - 1

print("Original values:", data)
print("Min-Max scaled values:", scaled_data_new)


Original values: [ 1  5 10 15 20]
Min-Max scaled values: [-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


### Q8.For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?

In [30]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import numpy as np

# Define the dataset containing features: [height, weight, age, gender, blood pressure]
data = np.array([
    [160, 60, 30, 1, 120],
    [170, 65, 35, 0, 130],
    [180, 70, 40, 1, 140],
    [165, 55, 25, 0, 110],
    [175, 75, 45, 1, 150]
])

# Step 1: Standardize the data
scaler = StandardScaler()
standardized_data = scaler.fit_transform(data)

# Step 2: Apply PCA
pca = PCA()
pca.fit(standardized_data)

# Step 3: Determine the number of components to retain based on explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_
cumulative_variance_ratio = np.cumsum(explained_variance_ratio)

# Determine the number of principal components to retain (e.g., retain enough components to explain 90% of the variance)
n_components = np.argmax(cumulative_variance_ratio >= 0.9) + 1

print("Number of principal components to retain:", n_components)


Number of principal components to retain: 2
