In [None]:
#Question 1

In feature engineering, a parameter refers to the characteristics or properties used to transform raw data into features that are more meaningful for machine learning models. Essentially, parameters help define how data should be processed and converted into a format suitable for the model to learn from. Here are a few common types of parameters in feature engineering:

1. **Scaling Parameters**: These parameters involve scaling features to a specific range. For example, normalization scales features to a range between 0 and 1, while standardization scales features to have a mean of 0 and a standard deviation of 1.

2. **Encoding Parameters**: These parameters define how categorical variables are transformed into numerical values. Examples include one-hot encoding, label encoding, and binary encoding.

3. **Aggregation Parameters**: These parameters are used to create features by aggregating data over a specific window or group. For instance, calculating the rolling average, sum, or count over a time period.

4. **Interaction Parameters**: These parameters create new features by combining existing features through mathematical operations such as addition, multiplication, or division. For example, creating a new feature by multiplying the `age` and `income` features.

5. **Text Processing Parameters**: These parameters define how textual data is transformed into numerical features. Examples include tokenization, stemming, lemmatization, and vectorization techniques like TF-IDF or word embeddings.

6. **Date and Time Parameters**: These parameters extract meaningful features from date and time data, such as the day of the week, month, hour, or time difference between events.


In [None]:
#Question 2

**Correlation** is a statistical measure that describes the extent to which two variables are related to each other. It tells us whether and how strongly pairs of variables are associated. The correlation coefficient, typically denoted as \( r \), ranges from -1 to 1:
- \( r = 1 \): Perfect positive correlation, where as one variable increases, the other also increases in a perfect linear relationship.
- \( r = -1 \): Perfect negative correlation, where as one variable increases, the other decreases in a perfect linear relationship.
- \( r = 0 \): No correlation, indicating that there's no linear relationship between the variables.

### Negative Correlation
A **negative correlation** means that as one variable increases, the other variable tends to decrease, and vice versa. In other words, the variables move in opposite directions. This is also known as an inverse correlation.

For example:
- **Stock Prices and Gold Prices**: Often, when stock prices fall, gold prices tend to rise, and vice versa. Investors might turn to gold as a safe haven during market downturns.
- **Temperature and Heating Bills**: As the temperature outside decreases, the heating bills tend to increase because more energy is used to heat homes.

A negative correlation is typically represented with a correlation coefficient between -1 and 0.



In [None]:
#Question 3

**Machine Learning (ML)** is a subset of artificial intelligence (AI) that focuses on the development of algorithms and statistical models enabling computers to learn from and make predictions or decisions based on data, without being explicitly programmed. Essentially, it involves training a model on a dataset to recognize patterns and make informed decisions or predictions when presented with new data.

### Main Components of Machine Learning:
1. **Data**: The foundation of machine learning. High-quality, relevant data is essential for training accurate models. It includes features (input variables) and labels (output variables) in supervised learning.

2. **Features**: Individual measurable properties or characteristics of the data. Feature engineering involves selecting, modifying, or creating features to improve the model's performance.

3. **Models**: Mathematical representations or algorithms used to make predictions or decisions. Common models include linear regression, decision trees, neural networks, and support vector machines.

4. **Training**: The process of feeding data into the model to learn the underlying patterns. The model adjusts its parameters based on the data to minimize errors and improve accuracy.

5. **Evaluation**: Assessing the model's performance using metrics like accuracy, precision, recall, F1 score, and confusion matrix. Evaluation helps determine how well the model generalizes to new data.

6. **Hyperparameters**: Settings or configurations of the learning process that are set before training. Examples include learning rate, number of layers in a neural network, and the number of trees in a random forest.

7. **Loss Function**: A mathematical function that measures the difference between the model's predictions and the actual values. The goal is to minimize the loss function during training.

8. **Optimization Algorithms**: Techniques used to minimize the loss function and find the best parameters for the model. Common optimization algorithms include gradient descent, stochastic gradient descent, and Adam optimizer.

9. **Validation and Testing**: Splitting the data into training, validation, and test sets to evaluate the model's performance on unseen data. Validation helps fine-tune the model, while testing assesses its final performance.

10. **Deployment**: Integrating the trained model into a real-world application where it can make predictions or decisions on new data. Deployment includes monitoring and updating the model as needed.


In [None]:
#Question 4

The **loss value**, also known as the loss function or cost function, measures how well a machine learning model's predictions match the actual data. It provides a quantitative assessment of the model's performance, and its primary purpose is to guide the optimization process during training. Here’s how the loss value helps determine the quality of a model:

### 1. **Indicator of Model Accuracy**
The loss value quantifies the difference between the predicted values and the actual values. A lower loss value indicates that the model's predictions are closer to the actual values, signifying better performance. Conversely, a higher loss value suggests poor predictions and a need for model improvement.

### 2. **Guiding the Training Process**
During training, the model's parameters are adjusted to minimize the loss value. Optimization algorithms (e.g., gradient descent) use the loss value to update the model's parameters iteratively. The goal is to find the parameter values that minimize the loss, leading to more accurate predictions.

### 3. **Comparing Models**
The loss value can be used to compare different models or different versions of the same model. By evaluating the loss on a validation set (a subset of data not used in training), we can determine which model performs best on unseen data.

### 4. **Evaluating Overfitting and Underfitting**
The loss value helps identify overfitting and underfitting issues:
- **Overfitting**: The model performs well on the training data but poorly on the validation data, resulting in a low training loss but high validation loss.
- **Underfitting**: The model performs poorly on both the training and validation data, resulting in high loss values for both.

### Example of Loss Functions
- **Mean Squared Error (MSE)**: Commonly used for regression tasks, it measures the average squared difference between predicted and actual values.
- **Cross-Entropy Loss**: Often used for classification tasks, it measures the difference between the predicted probability distribution and the actual distribution.

### Practical Example:
Consider a linear regression model predicting house prices based on square footage. During training, the model's parameters (slope and intercept) are adjusted to minimize the loss value (e.g., MSE). A lower loss value indicates that the model's predictions are closer to the actual house prices, thus signifying a good model.


In [None]:
#Question 5

**Continuous and categorical variables** are types of data used in statistical analysis and machine learning, each serving different purposes:

### Continuous Variables
Continuous variables, also known as quantitative or numerical variables, can take on an infinite number of values within a given range. They are measured on a continuous scale and often represent physical quantities or measurements. Examples include:
- **Temperature**: Can be measured in degrees Celsius or Fahrenheit and can take any value within a given range.
- **Height**: Can be measured in centimeters, inches, or any other unit, and can take any value within the range of human height.
- **Weight**: Can be measured in kilograms, pounds, or any other unit, and can take any value within the range of human or object weight.

### Categorical Variables
Categorical variables, also known as qualitative or discrete variables, represent distinct categories or groups. They take on a limited number of values and are often used to represent labels or classes. Examples include:
- **Gender**: Categories such as male, female, and other.
- **Marital Status**: Categories such as single, married, divorced, and widowed.
- **Color**: Categories such as red, blue, green, and yellow.

### Key Differences:
- **Scale**: Continuous variables are measured on a continuous scale, while categorical variables represent distinct groups or categories.
- **Values**: Continuous variables can take an infinite number of values within a range, while categorical variables have a limited set of possible values.
- **Mathematical Operations**: Continuous variables can be subjected to arithmetic operations (e.g., addition, subtraction, multiplication, division), while categorical variables are typically analyzed using counting, frequency, and proportions.

### Practical Example:
Suppose we are analyzing a dataset of students' exam scores.
- The **exam scores** (e.g., 75, 88, 92) would be a continuous variable because they can take any value within the range of possible scores.
- The **grade levels** (e.g., freshman, sophomore, junior, senior) would be categorical variables because they represent distinct categories.


In [None]:
#Question 6

Handling categorical variables is crucial in machine learning since most algorithms require numerical input. Here are some common techniques to transform categorical variables into numerical formats that models can interpret:

### 1. **One-Hot Encoding**
One-hot encoding converts each category into a new binary column (0 or 1). This is useful when you have a small number of categories.

Example:
| Color  | Red | Blue | Green |
|--------|-----|------|-------|
| Red    |  1  |  0   |   0   |
| Blue   |  0  |  1   |   0   |
| Green  |  0  |  0   |   1   |

### 2. **Label Encoding**
Label encoding assigns a unique integer to each category. This method can be efficient but may imply an ordinal relationship where none exists.

Example:
| Color | Encoded |
|-------|---------|
| Red   |    0    |
| Blue  |    1    |
| Green |    2    |

### 3. **Binary Encoding**
Binary encoding first encodes the categories as integers and then converts those integers into binary code. Each binary digit is then split into separate columns.

Example for 4 categories:
| Category | Integer | Binary | Col1 | Col2 | Col3 |
|----------|---------|--------|------|------|------|
|   A      |    1    |  001   |  0   |  0   |  1   |
|   B      |    2    |  010   |  0   |  1   |  0   |
|   C      |    3    |  011   |  0   |  1   |  1   |
|   D      |    4    |  100   |  1   |  0   |  0   |

### 4. **Target Encoding**
Target encoding replaces a categorical value with the mean of the target variable for that category. This method is useful for high-cardinality features and typically applies to supervised learning tasks.

Example:
| Category | Average Target Value |
|----------|----------------------|
|    A     |         0.8          |
|    B     |         0.4          |
|    C     |         0.6          |

### 5. **Frequency Encoding**
Frequency encoding replaces each category with its frequency (or probability) of occurrence in the dataset.

Example:
| Category | Frequency |
|----------|-----------|
|    A     |    0.5    |
|    B     |    0.3    |
|    C     |    0.2    |

### Choosing the Right Technique
The choice of encoding technique depends on the problem and dataset characteristics:
- Use **one-hot encoding** for variables with a small number of categories.
- **Label encoding** is simple and can be used when the categories have an ordinal relationship.
- **Binary encoding** is effective for high-cardinality categorical variables.
- **Target encoding** leverages information from the target variable but may require regularization to prevent overfitting.
- **Frequency encoding** is useful when the category frequency provides meaningful information.


In [None]:
#Question 7

**Training and testing a dataset** are key steps in building and evaluating machine learning models.

### Training Dataset
The **training dataset** is the portion of the data used to train a machine learning model. During this phase, the model learns the patterns and relationships in the data by adjusting its parameters to minimize the error (loss) between its predictions and the actual values.

### Testing Dataset
The **testing dataset** is the portion of the data used to evaluate the performance of the trained model. The model makes predictions on this unseen data, and the results are compared to the actual values to assess how well the model generalizes to new, unseen data.

### Why Split the Data?
Splitting the data into training and testing sets is essential to ensure that the model is evaluated on data it hasn't seen before. This helps prevent overfitting, where the model performs well on the training data but poorly on new data.

### Example of Data Splitting
Suppose we have a dataset with 1,000 samples. We might split it as follows:
- **Training Dataset**: 80% (800 samples)
- **Testing Dataset**: 20% (200 samples)

### Workflow:
1. **Prepare Data**: Clean and preprocess the data.
2. **Split Data**: Divide the data into training and testing sets.
3. **Train Model**: Use the training dataset to train the model.
4. **Evaluate Model**: Use the testing dataset to evaluate the model's performance.
5. **Optimize Model**: Fine-tune the model based on evaluation results.
6. **Deploy Model**: Use the optimized model in real-world applications.

This process helps ensure that the model performs well on new, unseen data and generalizes effectively to different scenarios.


In [None]:
#Question 8

`sklearn.preprocessing` is a module in the popular machine learning library, scikit-learn, that provides various methods and tools to preprocess and transform data. Preprocessing is a crucial step in the machine learning pipeline, as it involves preparing raw data into a suitable format for model training and evaluation. Here's a quick overview of some key functionalities provided by `sklearn.preprocessing`:

### 1. **Scaling and Normalization**
- **StandardScaler**: Standardizes features by removing the mean and scaling to unit variance.
- **MinMaxScaler**: Transforms features by scaling each feature to a given range, usually between 0 and 1.
- **MaxAbsScaler**: Scales each feature by its maximum absolute value, preserving the sign and sparsity of the data.
- **RobustScaler**: Scales features using statistics that are robust to outliers (e.g., median and interquartile range).

### 2. **Encoding Categorical Variables**
- **OneHotEncoder**: Converts categorical variables into a one-hot numeric array.
- **LabelEncoder**: Encodes categorical labels with integer values.
- **OrdinalEncoder**: Encodes categorical features as an ordinal array.

### 3. **Binarization**
- **Binarizer**: Converts numerical features to binary values based on a threshold.

### 4. **Imputation**
- **SimpleImputer**: Handles missing values by replacing them with a specified strategy (e.g., mean, median, most frequent).
- **KNNImputer**: Imputes missing values using the k-Nearest Neighbors approach.

### 5. **Generating Polynomial Features**
- **PolynomialFeatures**: Generates polynomial and interaction features, allowing for the creation of more complex feature sets.

### 6. **Discretization**
- **KBinsDiscretizer**: Discretizes continuous features into k bins using various strategies (e.g., uniform, quantile, k-means).

### Example Code:
Here's an example of how to use `StandardScaler` to scale features:

```python
from sklearn.preprocessing import StandardScaler
import numpy as np

# Sample data
data = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])

# Initialize the scaler
scaler = StandardScaler()

# Fit and transform the data
scaled_data = scaler.fit_transform(data)

print("Scaled Data:\n", scaled_data)
```

These preprocessing techniques help ensure that the data is in a consistent format and improves the performance of machine learning models.

In [None]:
#Question 9

A **test set** is a subset of a dataset used to evaluate the performance of a trained machine learning model. It consists of data that the model has not seen during the training phase, making it crucial for assessing how well the model generalizes to new, unseen data. The test set helps ensure that the model's performance metrics accurately reflect its ability to make predictions on real-world data.

### Key Points about the Test Set:
1. **Evaluation Purpose**: The primary purpose of the test set is to provide an unbiased evaluation of the model's performance. It helps detect overfitting, where the model performs well on the training data but poorly on new data.
2. **Held-out Data**: The test set is separate from the training and validation sets. It is held out until the very end of the model development process to ensure that the evaluation is based on unseen data.
3. **Performance Metrics**: Common metrics used to evaluate the model on the test set include accuracy, precision, recall, F1 score, mean squared error (MSE), and area under the curve (AUC), among others.

### Example Workflow:
1. **Split the Data**: Divide the dataset into training, validation, and test sets (e.g., 70% training, 15% validation, 15% test).
2. **Train the Model**: Use the training set to train the machine learning model.
3. **Tune the Model**: Use the validation set to fine-tune hyperparameters and prevent overfitting.
4. **Evaluate the Model**: Use the test set to evaluate the final model's performance and report metrics.

### Practical Example:
```python
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split the data into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Test Set Accuracy:", accuracy)
```

In this example, we split the Iris dataset into training and test sets, train a Random Forest classifier on the training set, and evaluate its accuracy on the test set.


In [None]:
#Question 10

### Splitting Data for Model Fitting in Python
To split data for model fitting (training and testing) in Python, we can use the `train_test_split` function from the `scikit-learn` library. This function randomly divides the dataset into training and testing sets based on the specified ratio. Here's a simple example:

```python
from sklearn.model_selection import train_test_split
import numpy as np

# Example data
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
y = np.array([0, 1, 0, 1, 0])

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("X_train:", X_train)
print("X_test:", X_test)
print("y_train:", y_train)
print("y_test:", y_test)
```
### Approach to a Machine Learning Problem:
1. **Define the Problem**:
   - Clearly understand the problem and identify the type of problem (classification, regression, clustering, etc.).
   
2. **Collect and Prepare Data**:
   - Gather relevant data from various sources.
   - Clean the data by handling missing values, removing duplicates, and correcting errors.
   - Split the data into training, validation, and test sets.

3. **Exploratory Data Analysis (EDA)**:
   - Visualize the data to understand distributions, relationships, and patterns.
   - Generate summary statistics and identify outliers.

4. **Feature Engineering**:
   - Select relevant features and create new features if necessary.
   - Encode categorical variables, scale numerical variables, and handle missing values.

5. **Select a Model**:
   - Choose appropriate machine learning algorithms based on the problem type and data characteristics.
   - Consider multiple algorithms to compare performance.

6. **Train the Model**:
   - Fit the model to the training data.
   - Use cross-validation to fine-tune hyperparameters and prevent overfitting.

7. **Evaluate the Model**:
   - Assess the model's performance using the validation set and appropriate metrics (e.g., accuracy, precision, recall, F1 score, mean squared error).
   - Identify any issues like overfitting or underfitting.

8. **Optimize the Model**:
   - Improve the model by fine-tuning hyperparameters, adding or removing features, or selecting a different algorithm.
   - Use techniques like grid search or random search for hyperparameter tuning.

9. **Test the Model**:
   - Evaluate the final model on the test set to obtain an unbiased estimate of its performance.

10. **Deploy the Model**:
    - Integrate the trained model into a real-world application.
    - Monitor the model's performance and update it as needed.

### Example Workflow:
Here's a complete workflow to illustrate the process:

```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Test Set Accuracy:", accuracy)
```



In [None]:
#Question 11

**Exploratory Data Analysis (EDA)** is a crucial step in the data science process, and it's performed before fitting a model to the data for several important reasons:

### 1. **Understanding Data Distribution**
EDA helps you understand the distribution and characteristics of your data, such as central tendencies (mean, median), variability (standard deviation, range), and the presence of skewness or kurtosis. This knowledge is essential for choosing appropriate algorithms and preprocessing steps.

### 2. **Identifying Outliers and Anomalies**
EDA allows you to detect outliers or anomalies that could skew your model's performance. Identifying and addressing these anomalies (e.g., by removing or transforming them) ensures that your model isn't adversely affected by abnormal data points.

### 3. **Handling Missing Values**
EDA helps identify missing values in your dataset and provides insights into their patterns. You can then decide on the best strategy to handle them, such as imputation, deletion, or using algorithms that can handle missing data natively.

### 4. **Feature Selection and Engineering**
Through EDA, you can determine which features are relevant to your predictive task and which ones might be redundant or irrelevant. It also helps in creating new features (feature engineering) that can enhance model performance.

### 5. **Detecting Multicollinearity**
EDA helps identify highly correlated features (multicollinearity), which can affect the stability and interpretability of some models. By understanding these relationships, you can decide to combine, transform, or remove certain features.

### 6. **Visualizing Data Relationships**
EDA involves visualizing relationships between variables using plots like scatter plots, histograms, box plots, and heatmaps. These visualizations provide intuitive insights into data patterns and interactions that might not be apparent from numerical summaries alone.

### 7. **Validating Assumptions**
Many machine learning algorithms have assumptions about the data (e.g., linearity, normality, homoscedasticity). EDA helps validate these assumptions and decide if transformations are necessary to meet these requirements.

### 8. **Guiding Model Selection**
The insights gained from EDA guide the selection of appropriate machine learning algorithms. For example, understanding the nature of your target variable (classification vs. regression) and the relationships between features can influence your choice of model.

### Practical Example of EDA:
```python
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load a sample dataset
data = sns.load_dataset('titanic')

# Basic statistics
print(data.describe())

# Missing values
print(data.isnull().sum())

# Histograms for numerical features
data.hist(bins=20, figsize=(10, 8))
plt.show()

# Box plots for numerical features
sns.boxplot(data=data)
plt.show()

# Correlation matrix heatmap
sns.heatmap(data.corr(), annot=True, cmap='coolwarm')
plt.show()

# Pair plot for visualizing relationships between features
sns.pairplot(data.dropna(), hue='survived')
plt.show()
```



In [None]:
#Question 12

**Correlation** is a statistical measure that describes the extent to which two variables are related to each other. It tells us whether and how strongly pairs of variables are associated. The correlation coefficient, typically denoted as \( r \), ranges from -1 to 1:
- \( r = 1 \): Perfect positive correlation, where as one variable increases, the other also increases in a perfect linear relationship.
- \( r = -1 \): Perfect negative correlation, where as one variable increases, the other decreases in a perfect linear relationship.
- \( r = 0 \): No correlation, indicating that there's no linear relationship between the variables.

### Negative Correlation
A **negative correlation** means that as one variable increases, the other variable tends to decrease, and vice versa. In other words, the variables move in opposite directions. This is also known as an inverse correlation.

For example:
- **Stock Prices and Gold Prices**: Often, when stock prices fall, gold prices tend to rise, and vice versa. Investors might turn to gold as a safe haven during market downturns.
- **Temperature and Heating Bills**: As the temperature outside decreases, the heating bills tend to increase because more energy is used to heat homes.

A negative correlation is typically represented with a correlation coefficient between -1 and 0.

### Example:
Suppose we have two variables: \( X \) (the number of hours studied) and \( Y \) (the number of distractions). If we find a negative correlation between them (e.g., \( r = -0.6 \)), it would mean that as the number of hours studied increases, the number of distractions tends to decrease.


In [None]:
#Question 13

A **negative correlation** means that as one variable increases, the other variable tends to decrease, and vice versa. Essentially, the variables move in opposite directions. This inverse relationship can be represented by a correlation coefficient between -1 and 0. Here are a couple of examples to illustrate negative correlation:

- **Stock Prices and Gold Prices**: Often, when stock prices fall, gold prices tend to rise, and vice versa. Investors might turn to gold as a safe haven during market downturns.
- **Temperature and Heating Bills**: As the temperature outside decreases, heating bills tend to increase because more energy is used to heat homes.

Imagine two variables, \( X \) (the number of hours studied) and \( Y \) (the number of distractions). If we find a negative correlation between them (e.g., \( r = -0.6 \)), it would mean that as the number of hours studied increases, the number of distractions tends to decrease.


In [None]:
#Question 14

You can find the correlation between variables in Python using various methods and libraries, such as `pandas` and `numpy`. Here's a step-by-step guide to calculating the correlation between variables:

### Using Pandas:
Pandas provides a convenient method called `corr()` to calculate the correlation between columns in a DataFrame.

```python
import pandas as pd

# Sample data
data = {
    'X': [1, 2, 3, 4, 5],
    'Y': [2, 4, 6, 8, 10],
    'Z': [5, 4, 3, 2, 1]
}
df = pd.DataFrame(data)

# Calculate correlation matrix
correlation_matrix = df.corr()
print(correlation_matrix)
```

### Using Numpy:
Numpy provides a method called `corrcoef()` to calculate the correlation coefficient between arrays.

```python
import numpy as np

# Sample data
X = np.array([1, 2, 3, 4, 5])
Y = np.array([2, 4, 6, 8, 10])
Z = np.array([5, 4, 3, 2, 1])

# Calculate correlation coefficient matrix
correlation_matrix = np.corrcoef([X, Y, Z])
print(correlation_matrix)
```

### Visualizing Correlation with Seaborn:
You can also visualize the correlation matrix using a heatmap with the Seaborn library.

```python
import seaborn as sns
import matplotlib.pyplot as plt

# Calculate correlation matrix using pandas
correlation_matrix = df.corr()

# Create a heatmap
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.show()
```

### Example Output:
The correlation matrix will show the correlation coefficients between the variables. For example, in the pandas code, you would get:

```
          X         Y         Z
X  1.000000  1.000000 -1.000000
Y  1.000000  1.000000 -1.000000
Z -1.000000 -1.000000  1.000000
```

In this example:
- The correlation between `X` and `Y` is 1, indicating a perfect positive correlation.
- The correlation between `X` and `Z` is -1, indicating a perfect negative correlation.


In [None]:
#Question 15

**Causation** refers to a relationship between two variables where one variable directly affects or influences the other. In other words, causation implies that changes in one variable (the cause) lead to changes in another variable (the effect). Establishing causation typically requires controlled experiments or strong evidence of a direct cause-and-effect relationship.

### Difference Between Correlation and Causation:

**Correlation**: Measures the strength and direction of the relationship between two variables. It does not imply causation; it simply indicates that the variables move together in some way, either positively or negatively. Correlation can be identified through observational data and statistical analysis.

**Causation**: Indicates that one variable directly influences another. To establish causation, one usually needs to conduct controlled experiments or longitudinal studies where the influence of confounding factors is minimized.

### Example:

#### Correlation Example:
Suppose we find a positive correlation between ice cream sales and drowning incidents. As ice cream sales increase, the number of drowning incidents also increases. This observation might lead one to think that ice cream consumption causes drowning.

However, this is a correlation, not causation. The actual causative factor here could be **temperature**. During the summer months, higher temperatures lead to increased ice cream sales and more people swimming, which can result in more drowning incidents. Therefore, the correlation between ice cream sales and drowning incidents is influenced by the underlying variable—temperature.

#### Causation Example:
Imagine a study investigating the effect of a new medication on reducing blood pressure. If the study is well-designed with a control group and random assignment, and it finds that patients taking the medication experience a significant reduction in blood pressure compared to those in the control group, we can infer causation. In this case, the medication is the cause, and the reduction in blood pressure is the effect.




In [None]:
#Question 16

An **optimizer** in machine learning is an algorithm that adjusts the parameters of a model to minimize the loss function, thereby improving the model's performance. Optimizers are essential for training models, as they guide the learning process by updating weights and biases based on the gradients of the loss function.

### Types of Optimizers:

#### 1. **Gradient Descent (GD)**
Gradient Descent is the most basic optimization algorithm. It updates the model's parameters in the opposite direction of the gradient of the loss function with respect to the parameters. The learning rate (\(\alpha\)) controls the size of the steps taken.

**Example**:
```python
# Gradient Descent Pseudo-code
for each epoch:
    for each batch:
        weights = weights - learning_rate * gradient_of_loss
```

#### 2. **Stochastic Gradient Descent (SGD)**
SGD is a variant of Gradient Descent that updates the model's parameters using one training example at a time, rather than the entire dataset. This makes it faster and suitable for large datasets.

**Example**:
```python
# Stochastic Gradient Descent Pseudo-code
for each epoch:
    for each training example:
        weights = weights - learning_rate * gradient_of_loss
```

#### 3. **Mini-Batch Gradient Descent**
This method is a compromise between GD and SGD. It updates the model's parameters using small batches of training data, providing a balance between the efficiency of SGD and the stability of GD.

**Example**:
```python
# Mini-Batch Gradient Descent Pseudo-code
for each epoch:
    for each mini-batch:
        weights = weights - learning_rate * gradient_of_loss
```

#### 4. **Momentum**
Momentum is an extension of SGD that accumulates a velocity vector in the direction of the gradient, smoothing updates and accelerating convergence.

**Example**:
```python
# Momentum Pseudo-code
velocity = 0
for each epoch:
    for each mini-batch:
        velocity = momentum * velocity - learning_rate * gradient_of_loss
        weights = weights + velocity
```

#### 5. **Adam (Adaptive Moment Estimation)**
Adam is an adaptive learning rate optimization algorithm that combines the advantages of both AdaGrad and RMSProp. It computes adaptive learning rates for each parameter.

**Example**:
```python
# Adam Pseudo-code
m = 0  # First moment estimate
v = 0  # Second moment estimate
t = 0  # Time step

for each epoch:
    for each mini-batch:
        t += 1
        g = gradient_of_loss
        m = beta1 * m + (1 - beta1) * g
        v = beta2 * v + (1 - beta2) * (g ** 2)
        m_hat = m / (1 - beta1 ** t)
        v_hat = v / (1 - beta2 ** t)
        weights = weights - learning_rate * m_hat / (sqrt(v_hat) + epsilon)
```

#### 6. **RMSProp (Root Mean Square Propagation)**
RMSProp adapts the learning rate for each parameter by dividing the learning rate by an exponentially decaying average of squared gradients.

**Example**:
```python
# RMSProp Pseudo-code
cache = 0

for each epoch:
    for each mini-batch:
        g = gradient_of_loss
        cache = decay_rate * cache + (1 - decay_rate) * (g ** 2)
        weights = weights - learning_rate * g / (sqrt(cache) + epsilon)
```



In [None]:
#Question 17

`sklearn.linear_model` is a module in the scikit-learn library that provides a wide range of linear models for regression and classification tasks. Linear models are a class of algorithms where the prediction is a linear combination of the input features. Here are some of the key models available within this module:

### 1. **Linear Regression**
Used for regression tasks, it models the relationship between the dependent variable and one or more independent variables by fitting a linear equation.

**Example**:
```python
from sklearn.linear_model import LinearRegression

# Initialize the model
model = LinearRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)
```

### 2. **Ridge Regression**
Ridge regression is a regularized version of linear regression that includes a penalty term to prevent overfitting by shrinking the coefficients.

**Example**:
```python
from sklearn.linear_model import Ridge

# Initialize the model
model = Ridge(alpha=1.0)

# Fit the model to the training data
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)
```

### 3. **Lasso Regression**
Lasso regression adds a penalty term that encourages sparsity in the coefficients, effectively performing feature selection.

**Example**:
```python
from sklearn.linear_model import Lasso

# Initialize the model
model = Lasso(alpha=0.1)

# Fit the model to the training data
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)
```

### 4. **Logistic Regression**
Used for binary and multiclass classification tasks, logistic regression models the probability of the output belonging to a particular class.

**Example**:
```python
from sklearn.linear_model import LogisticRegression

# Initialize the model
model = LogisticRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)
```

### 5. **Elastic Net**
Elastic Net combines the penalties of both Ridge and Lasso regression, balancing between L1 and L2 regularization.

**Example**:
```python
from sklearn.linear_model import ElasticNet

# Initialize the model
model = ElasticNet(alpha=1.0, l1_ratio=0.5)

# Fit the model to the training data
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)
```

### 6. **Perceptron**
The Perceptron is a simple classification algorithm that makes predictions using a linear function. It's a basic form of neural network.

**Example**:
```python
from sklearn.linear_model import Perceptron

# Initialize the model
model = Perceptron()

# Fit the model to the training data
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)
```



In [None]:
#Question 18

The `model.fit()` method is a crucial part of the machine learning workflow, and it is used to train a machine learning model on a given dataset. When you call `model.fit()`, the model learns from the training data by adjusting its internal parameters to minimize the error between its predictions and the actual target values.

### What `model.fit()` Does:
1. **Learning Process**: `model.fit()` takes in the training data and the corresponding target values (labels) and uses them to learn the underlying patterns in the data.
2. **Parameter Adjustment**: During the fitting process, the model iteratively adjusts its parameters (weights and biases) to minimize the loss function, which measures the difference between the predicted and actual target values.
3. **Model Training**: The method trains the model on the provided data, preparing it to make predictions on new, unseen data.

### Required Arguments:
The arguments that must be given to `model.fit()` depend on the type of model and the specific library being used. However, most models in scikit-learn require the following two arguments:

1. **X (Features)**: The training data, typically represented as a 2D array or DataFrame where each row is a sample and each column is a feature.
2. **y (Target)**: The target values (labels) corresponding to the training data, typically represented as a 1D array or Series.

### Example in Scikit-learn:
Here is an example of using `model.fit()` with a simple linear regression model:

```python
from sklearn.linear_model import LinearRegression
import numpy as np

# Sample training data
X_train = np.array([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]])
y_train = np.array([2, 3, 4, 5, 6])

# Initialize the model
model = LinearRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

# The model is now trained and ready to make predictions
```

### Additional Arguments:
Some models may accept additional optional arguments in `model.fit()`, such as:
- `sample_weight`: An array of weights to apply to individual samples during training.
- `epochs`: The number of times to iterate over the training data (commonly used in neural networks).
- `batch_size`: The number of samples to use in each batch during training (commonly used in neural networks).

These additional arguments allow for more fine-tuned control over the training process.


In [None]:
#Question 19

The `model.predict()` method is used to generate predictions from a trained machine learning model. It takes input data (features) and returns the predicted output based on the model's learned parameters. Essentially, it applies the model to new data to make predictions.

### What `model.predict()` Does:
1. **Generates Predictions**: The method takes input data and uses the trained model to produce predictions. These predictions can be continuous values (for regression tasks) or class labels (for classification tasks).
2. **Applies Learned Parameters**: The model applies the parameters (weights and biases) it learned during training to the input data to calculate the predicted values.
3. **Inference**: The method is used during the inference phase, where the goal is to make predictions on new, unseen data.

### Required Arguments:
The main argument required by `model.predict()` is:
- **X (Features)**: The input data for which you want to generate predictions. This is typically represented as a 2D array or DataFrame where each row is a sample and each column is a feature.

### Example in Scikit-learn:
Here is an example of using `model.predict()` with a simple linear regression model:

```python
from sklearn.linear_model import LinearRegression
import numpy as np

# Sample training data
X_train = np.array([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]])
y_train = np.array([2, 3, 4, 5, 6])

# Sample test data
X_test = np.array([[6, 7], [7, 8], [8, 9]])

# Initialize the model
model = LinearRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

# Make predictions on the test data
predictions = model.predict(X_test)

print("Predictions:", predictions)
```

In this example:
- The model is trained using `X_train` and `y_train`.
- The `model.predict()` method is used to make predictions on `X_test`.
- The predicted values are stored in the `predictions` array.


In [None]:
#Question 20

**Continuous and categorical variables** are types of data used in statistical analysis and machine learning, each serving different purposes:

### Continuous Variables
Continuous variables, also known as quantitative or numerical variables, can take on an infinite number of values within a given range. They are measured on a continuous scale and often represent physical quantities or measurements. Examples include:
- **Temperature**: Can be measured in degrees Celsius or Fahrenheit and can take any value within a given range.
- **Height**: Can be measured in centimeters, inches, or any other unit, and can take any value within the range of human height.
- **Weight**: Can be measured in kilograms, pounds, or any other unit, and can take any value within the range of human or object weight.

### Categorical Variables
Categorical variables, also known as qualitative or discrete variables, represent distinct categories or groups. They take on a limited number of values and are often used to represent labels or classes. Examples include:
- **Gender**: Categories such as male, female, and other.
- **Marital Status**: Categories such as single, married, divorced, and widowed.
- **Color**: Categories such as red, blue, green, and yellow.

### Key Differences:
- **Scale**: Continuous variables are measured on a continuous scale, while categorical variables represent distinct groups or categories.
- **Values**: Continuous variables can take an infinite number of values within a range, while categorical variables have a limited set of possible values.
- **Mathematical Operations**: Continuous variables can be subjected to arithmetic operations (e.g., addition, subtraction, multiplication, division), while categorical variables are typically analyzed using counting, frequency, and proportions.

### Practical Example:
Suppose we are analyzing a dataset of students' exam scores.
- The **exam scores** (e.g., 75, 88, 92) would be a continuous variable because they can take any value within the range of possible scores.
- The **grade levels** (e.g., freshman, sophomore, junior, senior) would be categorical variables because they represent distinct categories.


In [None]:
#Question 21

**Feature scaling** is a preprocessing technique used to normalize the range of independent variables or features in a dataset. It involves transforming the data so that the features are on a similar scale, typically within a fixed range such as [0, 1] or with a mean of 0 and a standard deviation of 1. This process ensures that no single feature dominates the others due to its scale, which can significantly impact the performance of machine learning models.

### Common Feature Scaling Techniques:

1. **Standardization (Z-score normalization)**
Standardization scales the features to have a mean of 0 and a standard deviation of 1.
- Formula: \( X' = \frac{X - \mu}{\sigma} \)
- Where \( \mu \) is the mean and \( \sigma \) is the standard deviation.

**Example**:
```python
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
```

2. **Min-Max Scaling (Normalization)**
Min-Max scaling scales the features to a fixed range, typically [0, 1].
- Formula: \( X' = \frac{X - X_{min}}{X_{max} - X_{min}} \)

**Example**:
```python
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
```

3. **Max Abs Scaling**
Max Abs scaling scales each feature by its maximum absolute value, preserving the sign and sparsity of the data.
- Formula: \( X' = \frac{X}{|X_{max}|} \)

**Example**:
```python
from sklearn.preprocessing import MaxAbsScaler

scaler = MaxAbsScaler()
X_scaled = scaler.fit_transform(X)
```

4. **Robust Scaling**
Robust scaling uses statistics that are robust to outliers, such as the median and the interquartile range (IQR).
- Formula: \( X' = \frac{X - \text{median}}{\text{IQR}} \)

**Example**:
```python
from sklearn.preprocessing import RobustScaler

scaler = RobustScaler()
X_scaled = scaler.fit_transform(X)
```

### How Feature Scaling Helps in Machine Learning:

1. **Improves Convergence Speed**:
   - Many optimization algorithms, such as gradient descent, converge faster when features are on a similar scale.

2. **Enhances Model Performance**:
   - Models like Support Vector Machines (SVM) and K-Nearest Neighbors (KNN) are sensitive to the scale of the data. Feature scaling ensures that all features contribute equally, improving model performance.

3. **Reduces Bias**:
   - Unscaled features can bias the model towards features with larger scales, leading to suboptimal predictions. Scaling ensures that each feature contributes proportionately.

4. **Stabilizes Numerical Computations**:
   - Feature scaling can prevent numerical instability issues in algorithms that involve distance metrics or covariance matrices.

5. **Standardizes Interpretations**:
   - When features are scaled similarly, it becomes easier to interpret the coefficients of models like linear regression.

### Practical Example:
Let's say we have a dataset with features representing height (in centimeters) and weight (in kilograms). If we don't scale the features, the model might give more importance to weight simply because its values are larger. By scaling the features, we ensure that both height and weight contribute equally to the model's learning process.


In [None]:
#Question 22

To perform feature scaling in Python, you can use the `scikit-learn` library, which provides various scaling techniques such as StandardScaler, MinMaxScaler, MaxAbsScaler, and RobustScaler. Here are the steps to perform scaling using these techniques:

### 1. **StandardScaler**:
Standardizes features by removing the mean and scaling to unit variance.

```python
from sklearn.preprocessing import StandardScaler

# Sample data
X = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

# Initialize the scaler
scaler = StandardScaler()

# Fit the scaler to the data and transform the data
X_scaled = scaler.fit_transform(X)

print("Standardized Data:\n", X_scaled)
```

### 2. **MinMaxScaler**:
Scales features to a specified range, typically between 0 and 1.

```python
from sklearn.preprocessing import MinMaxScaler

# Initialize the scaler
scaler = MinMaxScaler()

# Fit the scaler to the data and transform the data
X_scaled = scaler.fit_transform(X)

print("Normalized Data:\n", X_scaled)
```

### 3. **MaxAbsScaler**:
Scales each feature by its maximum absolute value, preserving the sign and sparsity of the data.

```python
from sklearn.preprocessing import MaxAbsScaler

# Initialize the scaler
scaler = MaxAbsScaler()

# Fit the scaler to the data and transform the data
X_scaled = scaler.fit_transform(X)

print("MaxAbs Scaled Data:\n", X_scaled)
```

### 4. **RobustScaler**:
Scales features using statistics that are robust to outliers, such as the median and the interquartile range (IQR).

```python
from sklearn.preprocessing import RobustScaler

# Initialize the scaler
scaler = RobustScaler()

# Fit the scaler to the data and transform the data
X_scaled = scaler.fit_transform(X)

print("Robust Scaled Data:\n", X_scaled)
```

### Practical Example:
Let's use a simple dataset and apply these scaling techniques:

```python
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler, MaxAbsScaler, RobustScaler

# Sample data as a DataFrame
data = {
    'Feature1': [1, 3, 5, 7, 9],
    'Feature2': [2, 4, 6, 8, 10]
}
df = pd.DataFrame(data)

# Initialize scalers
scalers = {
    'StandardScaler': StandardScaler(),
    'MinMaxScaler': MinMaxScaler(),
    'MaxAbsScaler': MaxAbsScaler(),
    'RobustScaler': RobustScaler()
}

# Apply each scaler and print the results
for scaler_name, scaler in scalers.items():
    scaled_data = scaler.fit_transform(df)
    print(f"\n{scaler_name}:\n", scaled_data)
```


In [None]:
#Question 23

`sklearn.preprocessing` is a module in the popular machine learning library, scikit-learn, that provides various methods and tools to preprocess and transform data. Preprocessing is a crucial step in the machine learning pipeline, as it involves preparing raw data into a suitable format for model training and evaluation.
##Functionalities

### 1. **Scaling and Normalization**
- **StandardScaler**: Standardizes features by removing the mean and scaling to unit variance.
- **MinMaxScaler**: Transforms features by scaling each feature to a given range, usually between 0 and 1.
- **MaxAbsScaler**: Scales each feature by its maximum absolute value, preserving the sign and sparsity of the data.
- **RobustScaler**: Scales features using statistics that are robust to outliers (e.g., median and interquartile range).

### 2. **Encoding Categorical Variables**
- **OneHotEncoder**: Converts categorical variables into a one-hot numeric array.
- **LabelEncoder**: Encodes categorical labels with integer values.
- **OrdinalEncoder**: Encodes categorical features as an ordinal array.

### 3. **Binarization**
- **Binarizer**: Converts numerical features to binary values based on a threshold.

### 4. **Imputation**
- **SimpleImputer**: Handles missing values by replacing them with a specified strategy (e.g., mean, median, most frequent).
- **KNNImputer**: Imputes missing values using the k-Nearest Neighbors approach.

### 5. **Generating Polynomial Features**
- **PolynomialFeatures**: Generates polynomial and interaction features, allowing for the creation of more complex feature sets.

### 6. **Discretization**
- **KBinsDiscretizer**: Discretizes continuous features into k bins using various strategies (e.g., uniform, quantile, k-means).

### Example Code:
Here's an example of how to use `StandardScaler` to scale features:

```python
from sklearn.preprocessing import StandardScaler
import numpy as np

# Sample data
data = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])

# Initialize the scaler
scaler = StandardScaler()

# Fit and transform the data
scaled_data = scaler.fit_transform(data)

print("Scaled Data:\n", scaled_data)
```




In [None]:
#Question 24

To split data for model fitting (training and testing) in Python, you can use the `train_test_split` function from the `scikit-learn` library. This function allows you to randomly divide the dataset into training and testing sets based on a specified ratio.

### Example:
```python
from sklearn.model_selection import train_test_split
import numpy as np

# Sample data
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
y = np.array([0, 1, 0, 1, 0])

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("X_train:", X_train)
print("X_test:", X_test)
print("y_train:", y_train)
print("y_test:", y_test)
```

### Explanation:
- **`train_test_split`**: This function from the `sklearn.model_selection` module is used to split arrays or matrices into random train and test subsets.
- **`X`**: The feature data, typically a 2D array where each row is a sample and each column is a feature.
- **`y`**: The target data (labels), typically a 1D array.
- **`test_size`**: The proportion of the dataset to include in the test split. In this example, 20% of the data is used for testing.
- **`random_state`**: Controls the shuffling applied to the data before splitting. Providing a specific value ensures reproducibility of the splits.

### Output:
- **`X_train`**: The training set features.
- **`X_test`**: The test set features.
- **`y_train`**: The training set labels.
- **`y_test`**: The test set labels.

This method helps ensure that your machine learning model is trained on one portion of the data and evaluated on another, unseen portion, which is crucial for assessing the model's performance and generalization ability.


In [None]:
#Question 25

**Data encoding** is the process of transforming categorical data into a numerical format that can be used by machine learning algorithms. Many machine learning models require numerical input, so converting categorical variables (such as text labels) into numerical values is essential. There are several techniques for encoding categorical data, each with its own advantages and use cases.

### Common Data Encoding Techniques:

1. **One-Hot Encoding**:
One-hot encoding converts each category into a new binary column (0 or 1). This is useful when you have a small number of categories and want to avoid implying any ordinal relationship between them.

**Example**:
```python
from sklearn.preprocessing import OneHotEncoder
import pandas as pd

# Sample data
data = pd.DataFrame({'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red']})

# Initialize the encoder
encoder = OneHotEncoder(sparse=False)

# Fit and transform the data
encoded_data = encoder.fit_transform(data[['Color']])

print(encoded_data)
```

Output:
```
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]]
```

2. **Label Encoding**:
Label encoding assigns a unique integer to each category. This method can be efficient but may imply an ordinal relationship where none exists.

**Example**:
```python
from sklearn.preprocessing import LabelEncoder

# Sample data
data = ['Red', 'Blue', 'Green', 'Blue', 'Red']

# Initialize the encoder
encoder = LabelEncoder()

# Fit and transform the data
encoded_data = encoder.fit_transform(data)

print(encoded_data)
```

Output:
```
[2 0 1 0 2]
```

3. **Binary Encoding**:
Binary encoding first encodes the categories as integers and then converts those integers into binary code. Each binary digit is then split into separate columns.

**Example**:
```python
from category_encoders import BinaryEncoder
import pandas as pd

# Sample data
data = pd.DataFrame({'Category': ['A', 'B', 'C', 'A', 'B']})

# Initialize the encoder
encoder = BinaryEncoder()

# Fit and transform the data
encoded_data = encoder.fit_transform(data['Category'])

print(encoded_data)
```

Output:
```
   Category_0  Category_1
0           0           1
1           0           0
2           1           1
3           0           1
4           0           0
```

4. **Target Encoding**:
Target encoding replaces a categorical value with the mean of the target variable for that category. This method is useful for high-cardinality features and typically applies to supervised learning tasks.

**Example**:
```python
from category_encoders import TargetEncoder
import pandas as pd

# Sample data
data = pd.DataFrame({'Category': ['A', 'B', 'C', 'A', 'B'],
                     'Target': [1, 0, 1, 1, 0]})

# Initialize the encoder
encoder = TargetEncoder()

# Fit and transform the data
encoded_data = encoder.fit_transform(data['Category'], data['Target'])

print(encoded_data)
```

Output:
```
   Category
0       1.0
1       0.0
2       1.0
3       1.0
4       0.0
```

5. **Frequency Encoding**:
Frequency encoding replaces each category with its frequency (or probability) of occurrence in the dataset.

**Example**:
```python
import pandas as pd

# Sample data
data = pd.DataFrame({'Category': ['A', 'B', 'C', 'A', 'B']})

# Calculate frequency encoding
encoding = data['Category'].value_counts() / len(data)

# Map encoding to the data
data['Encoded'] = data['Category'].map(encoding)

print(data)
```

Output:
```
  Category  Encoded
0        A      0.4
1        B      0.4
2        C      0.2
3        A      0.4
4        B      0.4
```

### Choosing the Right Encoding Technique:
The choice of encoding technique depends on the problem and dataset characteristics:
- Use **one-hot encoding** for variables with a small number of categories.
- **Label encoding** is simple and can be used when the categories have an ordinal relationship.
- **Binary encoding** is effective for high-cardinality categorical variables.
- **Target encoding** leverages information from the target variable but may require regularization to prevent overfitting.
- **Frequency encoding** is useful when the category frequency provides meaningful information.
