# Feature Engineering

---

**Q1. What is a parameter?**  
A parameter is an internal variable of a machine learning model that is learned from the training data. In models like linear regression, parameters are the coefficients (weights) and intercepts that define the relationship between input features and output predictions. They are optimized during the training process using algorithms like gradient descent.

---

**Q2. What is correlation? What does negative correlation mean?**  
Correlation measures the statistical relationship between two variables. It ranges from -1 to +1. A positive correlation means that as one variable increases, the other tends to increase. A negative correlation means that as one increases, the other tends to decrease.  
Negative correlation means that two variables move in opposite directions. For example, if time spent watching TV increases, and grades tend to decrease, these variables are negatively correlated. A correlation value of -1 represents a perfect negative relationship.

---

**Q3. Define Machine Learning. What are the main components in Machine Learning?**  
Machine Learning is a subset of AI that enables systems to learn from data without being explicitly programmed.  
**Main components:**
- **Data**: Raw input used to learn patterns.
- **Model**: The algorithm or function being trained (e.g., decision tree).
- **Loss function**: Measures how wrong the model is.
- **Optimizer**: Adjusts model parameters to minimize loss.
- **Evaluation Metrics**: Measure model performance.

---

**Q4. How does loss value help in determining whether the model is good or not?**  
The loss value quantifies the error between predicted and actual values. A lower loss indicates better performance. Common loss functions:
- **MSE (Mean Squared Error)** for regression
- **Cross-entropy** for classification  
During training, optimizers minimize this loss to improve predictions.

---

**Q5. What are continuous and categorical variables?**  
- **Continuous Variables**: Numeric and measurable (e.g., height, temperature).
- **Categorical Variables**: Represent categories or labels (e.g., gender, color).  
Handling both types correctly is key for effective modeling.

---

**Q6. How do we handle categorical variables in Machine Learning? What are the common techniques?**  
Categorical variables are handled by encoding them into numeric values:
- **Label Encoding**: Converts categories into integer labels.
- **One-Hot Encoding**: Creates binary columns for each category.
- **Ordinal Encoding**: Assumes order among categories.  
Use `sklearn.preprocessing` or `pandas.get_dummies()` for these techniques.

---

**Q7. What do you mean by training and testing a dataset?**  
- **Training**: The model learns from this portion of data.
- **Testing**: Evaluates model’s performance on unseen data.  
Splitting ensures that the model generalizes well and doesn’t just memorize the training data.

---

**Q8. What is `sklearn.preprocessing`?**  
`sklearn.preprocessing` is a module in Scikit-learn used for transforming features. Key tools:
- `StandardScaler`, `MinMaxScaler` – Feature scaling
- `LabelEncoder`, `OneHotEncoder` – Categorical encoding
- `PolynomialFeatures` – Creating interaction terms

---

**Q9. What is a test set?**  
A test set is a subset of the data used only to evaluate the performance of a trained model. It should not be used during training to prevent data leakage.

---

**Q10. How do we split data for model fitting (training and testing) in Python? How do you approach a Machine Learning problem?**  
Use `train_test_split`:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
```

To approach a machine learning problem:
1. Understand the problem and data
2. Collect and clean the dataset
3. Perform EDA (exploratory data analysis)
4. Engineer and select features
5. Encode and scale features
6. Split the data
7. Train various models
8. Evaluate and tune the best model
9. Deploy and monitor

---

**Q11. Why do we have to perform EDA before fitting a model to the data?**  
EDA (Exploratory Data Analysis) helps in:
- Understanding variable distributions
- Detecting missing values, outliers
- Visualizing relationships
- Informing feature selection  
It ensures better modeling decisions and avoids poor assumptions.

---

**Q12. What is correlation?**  
(Already answered in Q2)

---

**Q13. What does negative correlation mean?**  
(Already answered in Q2)

---

**Q14. How can you find correlation between variables in Python?**  
Use `.corr()` in pandas:
```python
import pandas as pd
df.corr()
```
Visualize with seaborn:
```python
import seaborn as sns
sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
```

---

**Q15. What is causation? Explain difference between correlation and causation with an example.**  
- **Correlation**: Two variables are related.
- **Causation**: One variable causes the other to change.

**Example**:  
Correlation: Ice cream sales and drowning increase in summer.  
Causation: Hot weather causes both; but ice cream does not cause drowning.

---

**Q16. What is an Optimizer? What are different types of optimizers? Explain each with an example.**  
An optimizer updates model parameters to minimize loss.

**Types:**
- **SGD**: Basic stochastic gradient descent.
- **Adam**: Combines momentum and adaptive learning.
- **RMSprop**: Adaptive learning with moving average of squared gradients.

```python
import tensorflow as tf
model.compile(optimizer='adam', loss='binary_crossentropy')
```

---

**Q17. What is `sklearn.linear_model`?**  
This module contains linear models like:
- `LinearRegression`
- `LogisticRegression`
- `Ridge`, `Lasso`  
Used for regression and classification tasks.

---

**Q18. What does `model.fit()` do? What arguments must be given?**  
Trains the model on training data.
```python
model.fit(X_train, y_train)
```
Arguments:
- `X_train`: features
- `y_train`: labels

---

**Q19. What does `model.predict()` do? What arguments must be given?**  
Generates predictions from the trained model.
```python
predictions = model.predict(X_test)
```
Arguments:
- `X_test`: features to predict

---

**Q20. What are continuous and categorical variables?**  
(Already answered in Q5)

---

**Q21. What is feature scaling? How does it help in Machine Learning?**  
Feature scaling standardizes or normalizes numeric variables. It ensures all features contribute equally and improves convergence of gradient-based models.

---

**Q22. How do we perform scaling in Python?**  
```python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
```

---

**Q23. What is `sklearn.preprocessing`?**  
(Already answered in Q8)

---

**Q24. How do we split data for model fitting (training and testing) in Python?**  
(Already answered in Q10)

---

**Q25. Explain data encoding.**  
Data encoding converts categorical variables into a numerical format that ML models can understand.

**Types:**
- Label Encoding
- One-Hot Encoding
- Binary Encoding  
Use `sklearn.preprocessing` or `pandas.get_dummies()`.

---
