### 1. **What is a parameter?**  
In machine learning, a parameter is a variable in the model that the algorithm optimizes during training, such as weights and biases in a neural network.

---

### 2. **What is correlation?**  
Correlation measures the relationship between two variables, showing whether they move together (positive correlation) or in opposite directions (negative correlation).

---

### 3. **What does negative correlation mean?**  
Negative correlation means that as one variable increases, the other decreases. For example, an increase in exercise time may correlate with a decrease in weight.

---

### 4. **Define Machine Learning. What are the main components in Machine Learning?**  
Machine learning involves building algorithms that can learn patterns from data and make predictions.  
Main components:  
1. **Data**: Input data to train the model.  
2. **Features**: Relevant input variables for prediction.  
3. **Model**: Algorithm for learning patterns.  
4. **Training**: Learning phase using labeled data.  
5. **Evaluation**: Measuring model performance.  

---

### 5. **How does the loss value help in determining whether the model is good or not?**  
The loss value indicates how far the model's predictions are from the actual values. Lower loss signifies a better model.

---

### 6. **What are continuous and categorical variables?**  
- **Continuous variables**: Numerical values with infinite possibilities (e.g., height, weight).  
- **Categorical variables**: Non-numerical values grouped into categories (e.g., colors, gender).  

---

### 7. **How do we handle categorical variables in Machine Learning? What are the common techniques?**  
Common techniques include:  
1. **One-Hot Encoding**: Converts categories into binary columns.  
2. **Label Encoding**: Assigns a unique number to each category.  
3. **Target Encoding**: Uses the mean of the target variable for each category.

---

### 8. **What do you mean by training and testing a dataset?**  
- **Training dataset**: Used to train the model.  
- **Testing dataset**: Used to evaluate the model's performance.

---

### 9. **What is `sklearn.preprocessing`?**  
`sklearn.preprocessing` is a module in Scikit-learn used for feature scaling, normalization, encoding, and other preprocessing tasks.

---

### 10. **What is a Test set?**  
A test set is a portion of the dataset used to evaluate the model's generalization performance on unseen data.

In [1]:
#11. How do we split data for model fitting (training and testing) in Python?

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### 12. **How do you approach a Machine Learning problem?**  
1. Define the problem.  
2. Collect and clean data.  
3. Perform Exploratory Data Analysis (EDA).  
4. Feature engineering and preprocessing.  
5. Choose a model.  
6. Train and validate the model.  
7. Test the model.  
8. Tune hyperparameters if needed.  

---

### 13. **Why do we have to perform EDA before fitting a model to the data?**  
EDA helps identify patterns, detect anomalies, and understand relationships between variables, which improves model selection and feature engineering.

---

### 14. **What is correlation?**  
Correlation measures the relationship between two variables, showing whether they move together (positive correlation) or in opposite directions (negative correlation).


---

### 15. **What does negative correlation mean?**  
Negative correlation means that as one variable increases, the other decreases. For example, an increase in exercise time may correlate with a decrease in weight.


In [2]:
#16. How can you find correlation between variables in Python?

import pandas as pd
correlation_matrix = df.corr()
print(correlation_matrix)

### 17. **What is causation? Explain the difference between correlation and causation with an example.**  
- **Causation**: One event directly causes another.  
- **Difference**: Correlation shows a relationship, but causation proves a direct cause-effect.  
Example: Ice cream sales and drowning incidents correlate but do not cause each other; both are influenced by summer heat.

---

### 18. **What is an Optimizer? What are different types of optimizers? Explain each with an example.**  
An optimizer minimizes the loss function to improve the model.  
Examples:  
1. **SGD (Stochastic Gradient Descent)**: Updates weights using random data samples.  
2. **Adam**: Combines momentum and adaptive learning rates for faster convergence.  
3. **RMSprop**: Adjusts learning rate for each parameter to prevent overshooting.

---

### 19. **What is `sklearn.linear_model`?**  
It is a module in Scikit-learn for linear models like linear regression, logistic regression, and ridge regression.

---

In [5]:
#20. What does `model.fit()` do? What arguments must be given?
#>> model.fit()` trains the model on the data.
#>> Arguments: Input data (`X`) and target variable (`y`):

model.fit(X_train, y_train)

In [None]:
#21. What does `model.predict()` do? What arguments must be given?
#>> `model.predict()` generates predictions using the trained model.
#>> Argument: Input data (`X`):

predictions = model.predict(X_test)

---

### 22. **What are continuous and categorical variables?**  
- **Continuous variables**: Numerical values with infinite possibilities (e.g., height, weight).  
- **Categorical variables**: Non-numerical values grouped into categories (e.g., colors, gender).

---
### 23. **What is feature scaling? How does it help in Machine Learning?**  
Feature scaling standardizes or normalizes the range of independent variables or features to ensure all features contribute equally to the model. It is essential for algorithms sensitive to feature magnitudes, such as SVMs, k-NN, and Gradient Descent.  
- **Standardization**: Transforms data to have a mean of 0 and a standard deviation of 1.  
- **Normalization**: Scales data between a fixed range, typically [0, 1].

---


### 24. **How do we perform scaling in Python?**  
We use Scikit-learn's preprocessing module for scaling:  
1. **Standardization** using `StandardScaler`:  
   ```python
   from sklearn.preprocessing import StandardScaler
   scaler = StandardScaler()
   X_scaled = scaler.fit_transform(X)
   ```  

2. **Normalization** using `MinMaxScaler`:  
   ```python
   from sklearn.preprocessing import MinMaxScaler
   scaler = MinMaxScaler()
   X_normalized = scaler.fit_transform(X)
   ```

---

### 25. **What is `sklearn.preprocessing`?**  
`sklearn.preprocessing` is a module in Scikit-learn that provides tools for transforming features and preprocessing data. It includes:  
- Scaling: `StandardScaler`, `MinMaxScaler`.  
- Encoding: `OneHotEncoder`, `LabelEncoder`.  
- Imputation: Handling missing values.  
- Polynomial feature transformation.

---

### 26. **How do we split data for model fitting (training and testing) in Python?**  
We use `train_test_split` from `sklearn.model_selection`:  
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
- **`test_size`**: Proportion of the dataset used for testing (e.g., 20%).  
- **`random_state`**: Ensures reproducibility.

---

### 27. **Explain data encoding?**  
Data encoding converts categorical variables into numerical format so they can be used in machine learning models.  
Types of encoding:  
1. **Label Encoding**: Assigns a unique integer to each category.  
   ```python
   from sklearn.preprocessing import LabelEncoder
   encoder = LabelEncoder()
   encoded = encoder.fit_transform(categorical_data)
   ```  

2. **One-Hot Encoding**: Converts categories into binary columns.  
   ```python
   from sklearn.preprocessing import OneHotEncoder
   encoder = OneHotEncoder()
   one_hot_encoded = encoder.fit_transform(categorical_data.reshape(-1, 1)).toarray()
   ```

3. **Target Encoding**: Replaces categories with the mean of the target variable for each category.

