### **1. What is a parameter?**  
A **parameter** is a numerical value that defines a characteristic of a population or a Machine Learning model.  
- In statistics, parameters describe the entire population (e.g., population mean, standard deviation).  
- In ML, parameters are learned from training data (e.g., weights in neural networks, coefficients in linear regression).  
- Parameters help models make accurate predictions.  

---

### **2. What is correlation?**  
**Correlation** measures the relationship between two variables. It ranges from **-1 to +1**:  
- **+1** → Perfect positive correlation (both variables increase together).  
- **0** → No correlation (no relationship).  
- **-1** → Perfect negative correlation (one increases, the other decreases).  

---

### **3. What does negative correlation mean?**  
A **negative correlation** means that as one variable increases, the other decreases.  
- Example: As the price of a product increases, the sales decrease.  
- It is represented by a **correlation coefficient between -1 and 0**.  

---

### **4. Define Machine Learning. What are the main components in Machine Learning?**  
**Machine Learning (ML)** is a branch of AI that enables computers to learn from data and make predictions.  

**Main Components:**  
1. **Dataset** – The data used for training/testing.  
2. **Features** – Input variables used to make predictions.  
3. **Model** – The algorithm that learns patterns from data.  
4. **Loss Function** – Measures model error.  
5. **Optimizer** – Adjusts parameters to minimize loss.  
6. **Evaluation Metrics** – Measures model accuracy (e.g., accuracy, RMSE).  

---

### **5. How does loss value help in determining whether the model is good or not?**  
- The **loss value** quantifies the difference between predicted and actual values.  
- **Lower loss** → Better model performance.  
- **High loss** → Model needs improvement (e.g., better data, tuning hyperparameters).  
- A **good model** has low training and testing loss.  
- **Overfitting** occurs if training loss is low but test loss is high.  

---

### **6. What are continuous and categorical variables?**  
- **Continuous Variables** → Numeric variables that can take infinite values (e.g., height, temperature).  
- **Categorical Variables** → Variables with fixed categories (e.g., gender, colors).  
- ML models require categorical variables to be converted into numerical form.  

---

### **7. How do we handle categorical variables in Machine Learning? What are the common techniques?**  
Categorical variables need to be encoded for ML models.  

**Common Techniques:**  
1. **One-Hot Encoding** – Converts categories into binary columns (e.g., Male → [1,0], Female → [0,1]).  
2. **Label Encoding** – Assigns numerical labels to categories (e.g., Red → 1, Blue → 2).  
3. **Ordinal Encoding** – Used for ranked categories (e.g., Small → 1, Medium → 2, Large → 3).  

---

### **8. What do you mean by training and testing a dataset?**  
- **Training Set** → Used to train the ML model.  
- **Testing Set** → Used to evaluate the model's performance.  
- Splitting data into training and testing prevents **overfitting**.  

---

### **9. What is sklearn.preprocessing?**  
`sklearn.preprocessing` is a **Scikit-Learn** module for data preprocessing.  
It provides tools for:  
- **Scaling** (StandardScaler, MinMaxScaler)  
- **Encoding categorical variables** (OneHotEncoder, LabelEncoder)  
- **Handling missing values** (SimpleImputer)  

---

### **10. What is a Test set?**  
The **Test set** is a portion of the dataset **not used for training** but for evaluating the model’s performance.  
It helps determine how well the model generalizes to new, unseen data.  

---

### **11. How do we split data for model fitting (training and testing) in Python?**  
Using **train_test_split** from `sklearn.model_selection`:  
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
Here, **20% of data** is used for testing.  

---

### **12. How do you approach a Machine Learning problem?**  
1. **Understand the problem** – Define the objective.  
2. **Collect data** – Gather and preprocess data.  
3. **EDA (Exploratory Data Analysis)** – Identify patterns, missing values, and outliers.  
4. **Feature Engineering** – Transform variables, handle missing data.  
5. **Split data** – Divide into training/testing sets.  
6. **Select a model** – Choose an appropriate ML algorithm.  
7. **Train the model** – Fit the model using training data.  
8. **Evaluate performance** – Use test data to check accuracy.  
9. **Optimize and Tune** – Improve the model by tuning hyperparameters.  

---

### **13. Why do we have to perform EDA before fitting a model to the data?**  
EDA helps:  
- Identify missing values and outliers.  
- Understand feature distributions.  
- Find correlations between variables.  
- Select the most relevant features.  
- Improve model accuracy by preprocessing data properly.  

---

### **14. How can you find correlation between variables in Python?**  
Using `pandas.corr()`:  
```python
import pandas as pd
df.corr()
```
Using `seaborn` heatmap:  
```python
import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
plt.show()
```

---

### **15. What is causation? Explain difference between correlation and causation with an example.**  
- **Correlation** → Two variables move together, but one **does not cause** the other.  
- **Causation** → One variable **directly affects** the other.  

**Example:**  
- **Correlation:** Ice cream sales & shark attacks increase in summer.  
- **Causation:** Increased temperature **causes** more ice cream sales.  

---

### **16. What is an Optimizer? What are different types of optimizers?**  
An **optimizer** updates model parameters to minimize loss.  

**Types of Optimizers:**  
1. **Gradient Descent** – Adjusts weights based on gradients.  
2. **SGD (Stochastic Gradient Descent)** – Faster but noisier updates.  
3. **Adam (Adaptive Moment Estimation)** – Efficient and widely used.  

---

### **17. What is sklearn.linear_model?**  
`sklearn.linear_model` provides Scikit-Learn’s **linear models**, including:  
- **LinearRegression** (for regression tasks).  
- **LogisticRegression** (for classification tasks).  
- **Ridge and Lasso Regression** (for regularization).  

---

### **18. What does model.fit() do? What arguments must be given?**  
**model.fit()** trains the model using data.  

Example:  
```python
model.fit(X_train, y_train)
```
Arguments:  
- `X_train` → Features.  
- `y_train` → Target variable.  

---

### **19. What does model.predict() do? What arguments must be given?**  
**model.predict()** makes predictions on new data.  

Example:  
```python
y_pred = model.predict(X_test)
```
Arguments:  
- `X_test` → Feature dataset for prediction.  

---

### **20. What is Feature Scaling? How does it help in Machine Learning?**  
Feature Scaling normalizes data so that all features contribute equally.  
It improves performance for **distance-based models** (e.g., KNN, SVM).  

---

### **21. How do we perform Scaling in Python?**  
Using `StandardScaler`:  
```python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
```
Using `MinMaxScaler`:  
```python
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
```

---

### **22. Explain Data Encoding.**  
Data encoding converts categorical data into numerical form.  

**Types:**  
1. **Label Encoding** – Assigns numbers to categories.  
2. **One-Hot Encoding** – Converts categories into binary columns.  
3. **Ordinal Encoding** – Used for ordered categories.  

---
