Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

Ans - 

**Min-Max Scaling (Normalization) :**

- Min-Max scaling is a technique used in machine learning to rescale numerical features within a specific range.
- The goal is to bring all the feature values into a similar scale, usually between 0 and 1, or sometimes between -1 and 1.
- This makes it easier for the algorithm to learn patterns from the data, as features with larger magnitudes won't dominate those with smaller magnitudes.

**Example :**

In [5]:
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
df = sns.load_dataset('taxis')
df.head()

Unnamed: 0,pickup,dropoff,passengers,distance,fare,tip,tolls,total,color,payment,pickup_zone,dropoff_zone,pickup_borough,dropoff_borough
0,2019-03-23 20:21:09,2019-03-23 20:27:24,1,1.6,7.0,2.15,0.0,12.95,yellow,credit card,Lenox Hill West,UN/Turtle Bay South,Manhattan,Manhattan
1,2019-03-04 16:11:55,2019-03-04 16:19:00,1,0.79,5.0,0.0,0.0,9.3,yellow,cash,Upper West Side South,Upper West Side South,Manhattan,Manhattan
2,2019-03-27 17:53:01,2019-03-27 18:00:25,1,1.37,7.5,2.36,0.0,14.16,yellow,credit card,Alphabet City,West Village,Manhattan,Manhattan
3,2019-03-10 01:23:59,2019-03-10 01:49:51,1,7.7,27.0,6.15,0.0,36.95,yellow,credit card,Hudson Sq,Yorkville West,Manhattan,Manhattan
4,2019-03-30 13:27:42,2019-03-30 13:37:14,3,2.16,9.0,1.1,0.0,13.4,yellow,credit card,Midtown East,Yorkville West,Manhattan,Manhattan


In [6]:
min_max = MinMaxScaler()
min_max

In [7]:
min_max.fit_transform(df[['total']])

array([[0.06713923],
       [0.0461042 ],
       [0.07411249],
       ...,
       [0.09220839],
       [0.03169663],
       [0.10869064]])

---
Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.

Ans -

**Unit Vector :**

- The Unit Vector technique, also known as Unit Norm scaling, is a feature scaling method used to normalize data by scaling each feature so that its magnitude (length) becomes 1. 
- This technique preserves the direction of the data while ensuring that all features are on the same scale.
- This technique used in machine learning to adjust the range of numerical features to make them easier to work with and compare.
- Imagine you have a dataset with different features, and each feature has values that range from different minimum to maximum values.
- For example, one feature might range from 0 to 100, while another ranges from -1000 to 1000.
- The unit vector method works by transforming each feature so that its values fall within the same range, typically between 0 and 1.
- Unit Vector scaling is particularly useful when the direction of the data is important and it ensures that all features have a length of 1, Min-Max scaling transforms the values of a dataset into a predefined range (typically 0 to 1).

**Example :**

In [8]:
import seaborn as sns
import pandas as pd
from sklearn.preprocessing import normalize
df = sns.load_dataset('titanic')
df

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


In [9]:
df.dropna(subset=['age', 'fare'], inplace=True)  # Drop rows with NaN values in 'age' or 'fare' columns

In [10]:
normalized_df = pd.DataFrame(normalize(df[['age', 'fare']]), columns = ['age', 'fare'])
normalized_df

Unnamed: 0,age,fare
0,0.949757,0.312988
1,0.470417,0.882444
2,0.956551,0.291564
3,0.550338,0.834942
4,0.974555,0.224148
...,...,...
709,0.801231,0.598355
710,0.901002,0.433816
711,0.535052,0.844819
712,0.654931,0.755689


---

Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.

Ans - 

**PCA (Principal Component Analysis) :**

- PCA (Principal Component Analysis) is a technique used for feature extraction in machine learning and data analysis.
- It's primarily employed to reduce the dimensionality of a dataset while preserving as much variance as possible.
- This reduction in dimensionality can help in simplifying the dataset, making it easier to analyze, visualize, and model.
- It is a technique to simplify complex data by finding its most important patterns. 
- It does this by transforming the data into a new set of variables called principal components. 
- These components capture the main sources of variation in the data. 
- By focusing on these key patterns, PCA helps reduce the amount of information needed to describe the data, making it easier to understand and analyze.
- PCA is one of the most commonly used unsupervised machine learning algorithms.
- It's used in a variety of applications, including:
    - **Exploratory data analysis**
    - **Dimensionality reduction**
    - **Information compression**
    - **Data de-noising**
    
**Example :**
- Imagine you have a basket of fruits with various attributes like weight, size, color, and sweetness. 
- PCA helps simplify this complex data by finding the most important patterns, like overall size, sweetness, and color intensity. 
- By focusing on these main patterns, PCA reduces the number of features, making it easier to understand and analyze the dataset.

---

Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.

Ans - 

**Relationship Between PCA (Principal Component Analysis) and Feature Extraction :**
- PCA is a feature extraction technique that simplifies data by identifying and capturing important patterns. 
- It transforms original features into new variables called principal components. 
- These components represent the most significant variations in the data. 
- By selecting a subset of principal components, PCA extracts a reduced set of informative features, improving efficiency in machine learning tasks.

**PCA Be Used For Feature Extraction By :**
- PCA helps in feature extraction by transforming the original features into a smaller set of principal components.
- These components capture the main patterns or variations in the data. 
- By selecting a subset of these components, we effectively extract a reduced set of features that retains most of the important information while discarding less relevant details. 
- This simplified set of features can improve the performance of machine learning models by reducing noise, computational complexity, and overfitting.

**Example :**
- In a dataset about cars with features like horsepower and weight, PCA simplifies by finding main patterns, like car size and power. 
- It transforms these into fewer variables called principal components, capturing key characteristics. 
- This simplification helps analyze cars more easily.

---

Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.

Ans - 

**How We Can Use Min-Max Scaling To Preprocess The Data :**
- By scaling the features using Min-Max scaling, we ensure that no single feature dominates the analysis due to differences in scale. 
- This helps prevent bias and ensures that the recommendation system can effectively consider all features when making recommendations.
- By applying Min-Max scaling to features like price and delivery time in the food delivery service dataset, we can ensure that all features are on a similar scale, making them directly comparable and facilitating analysis.

**Min-Max Scaling (Normalization) :**
- Normalization, also known as Min-Max scaling, is a technique used in machine learning to rescale numerical features within a specific range.
- The goal is to bring all the feature values into a similar scale, usually between 0 and 1, or sometimes between -1 and 1.
- This makes it easier for the algorithm to learn patterns from the data, as features with larger magnitudes won't dominate those with smaller magnitudes.
- It is a preprocessing technique used to rescale numerical features within a specific range, typically between 0 and 1 or -1 and 1. 

**Here's How We Can Use Min-Max Scaling :**
1. For each numerical feature (e.g. price, delivery time), identify the minimum and maximum values in the dataset.
2. Subtract the minimum value from each data point and divide by the range (maximum value minus minimum value) to scale the feature values between 0 and 1.
3. Optionally, you can also scale the feature values between -1 and 1 by subtracting the mean from each data point and dividing by half of the range.

---

Q6. You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

Ans -

**PCA simplifies data while retaining essential information, aiding in efficient model training for predicting stock prices. By applying PCA we can simplify the dataset by focusing on the most important patterns in the data. This can lead to more efficient model training, reduced computational complexity, and improved model generalization performance in predicting stock prices.**

**To use PCA for dimensionality reduction in predicting stock prices, follow these steps :**
1. **Identify Features**: Select relevant features like financial data and market trends from the dataset.
2. **Standardize Features**: Ensure all features have zero mean and unit variance.
3. **Apply PCA**: Use PCA to find underlying patterns and transform features into principal components.
4. **Select Components**: Choose the number of principal components based on explained variance.
5. **Reduce Dimensionality**: Project data onto selected principal components to reduce dimensionality.
6. **Train Model**: Use the reduced-dimensional data to train the stock price prediction model.

---

Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.

Ans -

In [11]:
import numpy as np
data = np.array([1, 5, 10, 15, 20])

Min = np.min(data)
Max = np.max(data)

# Apply Min-Max scaling
data_scaled = 2 * ((data - Min) / (Max - Min)) - 1

print("Original Dataset :", data)
print("Min-Max Scaled Dataset :", data_scaled)

Original Dataset : [ 1  5 10 15 20]
Min-Max Scaled Dataset : [-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


---
Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?

Ans -

- The optimal number of principal components to perform Feature Extraction using PCA may vary depending on factors such as the complexity of the data, the specific modeling task, and computational constraints. 
- Experimentation and validation are often necessary to determine the most suitable number of components for your particular application.
- It's crucial to understand the context of the analysis and specific goals of the project. 
    - If we're interested in physical characteristics, such as those relevant to healthcare or fitness, features like height, weight, and blood pressure may be more important.
    - If we're focusing on demographic factors, gender and age could be more significant.
    - In some cases, the importance of a feature may depend on its correlation with the target variable or its predictive power in a model.
    
---