# Q-1

### Min-Max scaling, also known as normalization, is a data preprocessing technique used to transform numerical features within a specific range. It rescales the data to a common range of values, typically between 0 and 1, based on the minimum and maximum values of the feature.
### It helps to eliminate the influence of different scales and units of measurement among features, allowing them to be directly comparable.
### It can improve the performance of machine learning algorithms that are sensitive to the scale of the input features, such as distance-based algorithms (e.g., K-means clustering) or algorithms that rely on gradient descent optimization.

### Here's an example to illustrate the application of Min-Max scaling in Python using the sklearn library:

In [None]:
from sklearn.preprocessing import MinMaxScaler

# Create an instance of MinMaxScaler
scaler = MinMaxScaler()

# Fit and transform the data
scaled_data = scaler.fit_transform(data)


### In this example, data represents the original numerical data that needs to be scaled. First, an instance of the MinMaxScaler class is created. Then, the fit_transform() method is applied to the data, which computes the minimum and maximum values of each feature and scales the data accordingly. The resulting scaled_data will have the same number of rows and columns as the original data, but with the values scaled between 0 and 1.

# Q-2

### The Unit Vector technique, also known as normalization, is a feature scaling method that scales the values of each feature to have a unit norm. Unlike Min-Max scaling, which scales the values to a specific range, Unit Vector scaling focuses on the direction or orientation of the feature vector rather than its magnitude.
### The formula for Unit Vector scaling is:
### unit_vector = value / norm
### where value is the original value of a data point and norm is the Euclidean norm (magnitude) of the feature vector.
### The main difference between Unit Vector scaling and Min-Max scaling is that Unit Vector scaling preserves the relationship between the values of the features while adjusting their magnitudes. It ensures that all feature vectors have the same length (unit norm) and points in the same direction.
### Unit Vector scaling is particularly useful when the magnitude of the feature values is less important compared to their relative orientations or directions. It is commonly used in text classification or document analysis tasks, where the frequency of occurrence of words in a document is more relevant than the actual counts.

### Example to illustrate the application of Unit Vector scaling in Python using the sklearn library:

In [None]:
from sklearn.preprocessing import Normalizer

# Create an instance of Normalizer
scaler = Normalizer(norm='l2')

# Fit and transform the data
scaled_data = scaler.fit_transform(data)


### In this example, data represents the original numerical data that needs to be scaled. An instance of the Normalizer class is created, specifying the normalization norm as 'l2' to calculate the Euclidean norm. Then, the fit_transform() method is applied to the data, which scales the values of each feature vector to have a unit norm. The resulting scaled_data will have the same number of rows and columns as the original data, with each feature vector having a unit norm.

# Q-3

### PCA, which stands for Principal Component Analysis, is a popular dimensionality reduction technique used to transform a high-dimensional dataset into a lower-dimensional space while retaining most of the important information. It achieves this by identifying the principal components, which are linear combinations of the original features, that capture the maximum amount of variance in the data.
### Example to illustrate the application of PCA in Python using the sklearn library:

In [None]:
from sklearn.decomposition import PCA

# Create an instance of PCA with the desired number of components
pca = PCA(n_components=2)

# Fit and transform the data
reduced_data = pca.fit_transform(data)


### In this example, data represents the original high-dimensional dataset. An instance of the PCA class is created with the n_components parameter set to 2, indicating that we want to retain 2 principal components. The fit_transform() method is then applied to the data, which performs PCA on the data and returns the transformed lower-dimensional representation stored in reduced_data. This reduced data can be used for visualization, analysis, or as input to downstream machine learning models.

# Q-4

### PCA (Principal Component Analysis) can be used for feature extraction, which involves selecting or creating a subset of features that are most relevant or informative for a given task. Feature extraction aims to reduce the dimensionality of the dataset while preserving as much relevant information as possible.
### The relationship between PCA and feature extraction lies in the fact that PCA can be used to transform the original features into a new set of uncorrelated variables, called principal components. These principal components are linear combinations of the original features and capture the maximum variance in the data. By selecting a subset of the top-ranked principal components, we can effectively perform feature extraction.
### Example to illustrate how PCA can be used for feature extraction in Python using the sklearn library:

In [None]:
from sklearn.decomposition import PCA

# Create an instance of PCA with the desired number of components
pca = PCA(n_components=2)

# Fit and transform the data
reduced_features = pca.fit_transform(data)

# Use the transformed features for further analysis or modeling


### In this example, data represents the original dataset with multiple features. We create an instance of the PCA class with n_components set to 2, indicating that we want to extract 2 features. The fit_transform() method is then applied to the data, which performs PCA and returns the transformed features stored in reduced_features.

### The resulting reduced_features will have a reduced dimensionality compared to the original dataset, with each sample represented by two principal components. These components can be considered as the extracted features that capture the most important information in the data. We can then use these transformed features for further analysis, visualization, or as input to machine learning models.

# Q-5

### In the context of building a recommendation system for a food delivery service, Min-Max scaling can be used to preprocess the numerical features like price, rating, and delivery time. The goal of Min-Max scaling is to rescale the values of these features to a common range, typically between 0 and 1, based on their minimum and maximum values.
### Step-by-step explanation of how Min-Max scaling can be applied to preprocess the data:


### 1. Identify the numerical features: In this case, we have features like price, rating, and delivery time, which are numerical and need to be scaled.
### 2. Compute the minimum and maximum values: Calculate the minimum and maximum values of each feature. This can be done by iterating through the dataset or using built-in functions like min() and max().
### 3. Apply Min-Max scaling formula.
### 4. Perform Min-Max scaling: Use the computed minimum and maximum values to scale the feature values accordingly. This can be done manually or using libraries like sklearn.preprocessing.MinMaxScaler in Python.

### By applying Min-Max scaling, the numerical features will be transformed to a common range between 0 and 1. This ensures that the features contribute equally to the recommendation system and avoids any bias caused by differences in their scales. It also allows for direct comparison and similarity calculations among the features.

### Once the data is preprocessed using Min-Max scaling, it can be used as input to train recommendation models, such as collaborative filtering or content-based filtering, to generate personalized recommendations for users based on their preferences, price ranges, ratings, and delivery times.

# Q-6

### In the context of building a model to predict stock prices with a dataset containing multiple features, PCA (Principal Component Analysis) can be used to reduce the dimensionality of the dataset. The goal of PCA is to transform the original features into a new set of uncorrelated variables, called principal components, while retaining most of the important information.
### An overview of how PCA can be applied to reduce the dimensionality of the dataset:
### 1.Preprocess the data: Start by preprocessing the dataset, which may involve steps like handling missing values, normalizing or standardizing the features, and encoding categorical variables if applicable.
### 2.Standardize the features: PCA assumes that the features are standardized with zero mean and unit variance. Therefore, it's important to standardize the features to ensure they contribute equally to the analysis.
### 3.Apply PCA: Once the data is preprocessed, apply PCA to the standardized features. PCA computes the principal components, which are linear combinations of the original features, capturing the maximum variance in the data.
### 4.Determine the number of components: Evaluate the cumulative explained variance ratio or eigenvalues associated with each principal component to determine the number of components to retain. Typically, you aim to retain a sufficient number of components that explain a significant portion of the variance (e.g., 80-90% or higher).
### 5.Project the data onto the selected components: After determining the desired number of components, project the standardized features onto those components to obtain the reduced-dimensional representation of the dataset.

### By reducing the dimensionality of the dataset using PCA, several benefits can be achieved:

### Dimensionality reduction: PCA helps to reduce the number of features, which can improve computational efficiency, reduce noise, and address the curse of dimensionality.

### Eliminating multicollinearity: PCA transforms the original features into uncorrelated principal components, addressing issues of multicollinearity that may exist in the dataset.

### Capturing important information: PCA retains the most important information by selecting the principal components that explain the most variance in the data.

### After applying PCA, the reduced-dimensional dataset can be used as input for training a stock price prediction model, such as regression or time series forecasting models. The reduced features can provide a more compact representation of the original data while still capturing the essential patterns and relationships necessary for accurate predictions.

# Q-7

In [18]:
from sklearn.preprocessing import MinMaxScaler

values = [1, 5, 10, 15, 20]

scaler = MinMaxScaler(feature_range=(-1,1))

values_2d = [[value] for value in values]

scaled_data = scaler.fit_transform(values_2d)

scaled_flat_data = [[value[0]] for value in scaled_data]

In [19]:
scaled_flat_data

[[-0.9999999999999999],
 [-0.5789473684210525],
 [-0.05263157894736836],
 [0.47368421052631593],
 [1.0]]

# Q-8

### To determine the number of principal components to retain for feature extraction using PCA, you typically consider the cumulative explained variance ratio or eigenvalues associated with each principal component. The cumulative explained variance ratio indicates the amount of variance explained by each principal component in relation to the total variance in the dataset.