# **Assignment : Feature Engineering**

---



---







### **1. What is a parameter?**

A **parameter** is a variable used by a model to make predictions. In Machine Learning, parameters are learned from training data, such as the **weights** in a linear regression model.

---

### **2. What is correlation?**

**Correlation** measures the **relationship** between two variables — how one changes with respect to the other. It ranges from **-1 to +1**.

---

### **3. What does negative correlation mean?**

A **negative correlation** means as one variable **increases**, the other **decreases**. For example, more exercise may correlate negatively with weight.

---

### **4. Define Machine Learning. What are the main components in Machine Learning?**

**Machine Learning (ML)** is a field of AI where machines learn from **data** to make decisions or predictions.

Main components:

* **Data**
* **Model**
* **Loss Function**
* **Optimizer**
* **Evaluation Metrics**

---

### **5. How does loss value help in determining whether the model is good or not?**

The **loss value** shows the difference between predicted and actual values. A **low loss** means a better-performing model.

---

### **6. What are continuous and categorical variables?**

* **Continuous**: Numerical values (e.g., age, salary)
* **Categorical**: Groupings/labels (e.g., gender, country)

---

### **7. How do we handle categorical variables in Machine Learning? What are the common techniques?**

Common techniques:

* **Label Encoding**
* **One-Hot Encoding**
* **Ordinal Encoding**

---

### **8. What do you mean by training and testing a dataset?**

* **Training**: Model learns patterns
* **Testing**: Evaluate model on unseen data

---

### **9. What is sklearn.preprocessing?**

`sklearn.preprocessing` is a module in **scikit-learn** for **data transformation** tasks like scaling, encoding, and normalization.

---

### **10. What is a Test set?**

The **Test set** is a portion of the dataset used **only for evaluating** the model’s performance after training.

---

### **11. How do we split data for model fitting (training and testing) in Python?**

Using `train_test_split` from `sklearn.model_selection`:

```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
```

---

### **12. How do you approach a Machine Learning problem?**

Steps:

1. Understand the problem
2. Collect and clean data
3. Perform EDA (Exploratory Data Analysis)
4. Feature Engineering
5. Train model
6. Evaluate and tune
7. Deploy

---

### **13. Why do we have to perform EDA before fitting a model to the data?**

EDA helps to:

* Understand **data distribution**
* Detect **missing values/outliers**
* Choose the right **features and models**

---

### **14. How can you find correlation between variables in Python?**

Using **Pandas**:

```python
import pandas as pd
df.corr()
```

Or using **Seaborn**:

```python
import seaborn as sns
sns.heatmap(df.corr(), annot=True)
```

---

### **15. What is causation? Explain difference between correlation and causation with an example.**

* **Correlation**: Two variables change together.
* **Causation**: One variable **causes** the other to change.

Example:

* Correlation: Ice cream sales ↑ and drowning cases ↑.
* Causation: Both increase due to hot weather (a third variable).

---

### **16. What is an Optimizer? What are different types of optimizers? Explain each with an example.**

An **Optimizer** minimizes the **loss function**.

Common types:

* **SGD** (Stochastic Gradient Descent)
* **Adam** (Adaptive Moment Estimation)
* **RMSprop**

Example:

```python
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
```

---

### **17. What is sklearn.linear\_model?**

It’s a module in scikit-learn for **linear models** like:

* `LinearRegression`
* `LogisticRegression`

---

### **18. What does model.fit() do? What arguments must be given?**

Trains the model on data.

Arguments:

* `X_train` (features)
* `y_train` (target)

Example:

```python
model.fit(X_train, y_train)
```

---

### **19. What does model.predict() do? What arguments must be given?**

It **predicts output** for new input data.

Arguments:

* `X_test` (input features)

Example:

```python
predictions = model.predict(X_test)
```

---

### **20. What are continuous and categorical variables?**

(Same as Q6)

* **Continuous**: Measurable quantities
* **Categorical**: Classes or categories

---

### **21. What is feature scaling? How does it help in Machine Learning?**

**Feature scaling** normalizes values to bring all features to the **same scale**, improving model performance and convergence speed.

---

### **22. How do we perform scaling in Python?**

Using `StandardScaler`:

```python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
```

Or `MinMaxScaler`:

```python
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
```

---

### **23. What is sklearn.preprocessing?**

(Repeated — same as Q9)

It includes tools to **scale**, **normalize**, and **encode** data for ML models.

---

### **24. How do we split data for model fitting (training and testing) in Python?**

(Repeated — same as Q11)

---

### **25. Explain data encoding.**

**Data encoding** converts **categorical variables into numerical form**.

Common techniques:

* **Label Encoding**
* **One-Hot Encoding**

Example using Pandas:

```python
pd.get_dummies(df, columns=['Category'])
```

---



**26**. **What does negative correlation mean?**

A **negative correlation** means that as one variable increases, the other decreases— they move in **opposite directions.

