Q1 What is a Parameter?

A **parameter** is an internal variable in a model that the algorithm learns from data during training.

Example:
- In Linear Regression → `y = mx + c`, the parameters are `m` (slope) and `c` (intercept).
- In Neural Networks → weights and biases are parameters.


Q2 What is Correlation?

**Correlation** measures how two numerical variables move together.

- Range: `-1 ≤ r ≤ +1`
- `+1`: Perfect positive correlation
- `0`: No correlation
- `-1`: Perfect negative correlation


Q2 What does Negative Correlation mean?

A negative correlation means that as one variable increases, the other decreases.

Example: As price increases, demand decreases.


Q3 Define Machine Learning. What are the main components in Machine Learning?

Machine Learning (ML) is a branch of AI where computers learn from data without being explicitly programmed.

Main Components:
1. Data
2. Model
3. Parameters
4. Loss Function
5. Optimizer
6. Evaluation Metrics


Q4 How does loss value help in determining whether the model is good or not?

- Loss quantifies the difference between predicted and actual values.
- Lower loss = better performance.
- High training loss → underfitting; low training but high validation loss → overfitting.


Example: Calculate MSE loss

```python
from sklearn.metrics import mean_squared_error
y_true = [3, 5, 7]
y_pred = [2.8, 5.2, 7.1]
print(mean_squared_error(y_true, y_pred))
```


Q5 What are Continuous and Categorical Variables?

| Type | Description | Example |
|------|--------------|----------|
| **Continuous** | Numeric values | Height, Salary |
| **Categorical** | Represent groups | Gender, City |


Q6 How do we handle categorical variables in Machine Learning? What are the common techniques?

Common techniques:
1. Label Encoding
2. One-Hot Encoding
3. Target Encoding




In [1]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

df = pd.DataFrame({'Color': ['Red', 'Blue', 'Green']})
le = LabelEncoder()
df['Label'] = le.fit_transform(df['Color'])
print(df)

   Color  Label
0    Red      2
1   Blue      0
2  Green      1


Q7 What do you mean by training and testing a dataset?

- Training dataset: Used to train the model.
- Testing dataset: Used to evaluate performance on unseen data.


Q8 What is sklearn.preprocessing?

`sklearn.preprocessing` is a module for transforming data — scaling, encoding, normalization, etc.


Q9 What is a Test set?

A test set is the portion of data reserved for final evaluation to check generalization performance.


Q10 How do we split data for model fitting (training and testing) in Python?




In [2]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

NameError: name 'X' is not defined

Q10 How do you approach a Machine Learning problem?

1. Understand the problem
2. Collect & clean data
3. Perform EDA
4. Preprocess (encode, scale)
5. Split data
6. Choose model
7. Train & evaluate
8. Tune hyperparameters
9. Test & deploy


Q11 Why do we have to perform EDA before fitting a model to the data?

EDA helps to:
- Detect missing/outlier values
- Understand distributions
- Identify correlations
- Prevent data leakage


Q12 What is correlation?

Correlation is a statistical measure that describes the strength and direction of a relationship between two (or more) variables.

In simple terms, it tells you how changes in one variable are associated with changes in another.

Q13 What does negative correlation mean?

Negative Correlation: A relationship where one variable increases while the other decreases (r between -1 and 0). Example: Temperature ↑ → Heating bills ↓


Meaning: When one variable increases, the other decreases.

Direction: Opposite movement between variables.

Correlation Coefficient (r): Between -1 and 0.

Example: Temperature ↑ → Heating bills ↓.

Q14  How can you find correlation between variables in Python?




In [3]:
import pandas as pd

data = pd.DataFrame({'A':[1,2,3,4],'B':[2,4,6,8]})
print(data.corr())

     A    B
A  1.0  1.0
B  1.0  1.0


Q15  What is Causation? Explain difference between correlation and causation with an example.

- Correlation:Two variables move together.
- Causation:One variable causes change in another.

Example:
Ice cream sales and drowning are correlated due to hot weather (common cause), but ice cream doesn’t cause drowning.


Q16  What is an Optimizer? What are different types of optimizers? Explain each with an example.

An optimizer updates model parameters to minimize loss.

| Optimizer | Description |
|------------|-------------|
| SGD | Updates weights using gradient descent |
| Momentum | Adds inertia to updates |
| RMSProp | Adjusts learning rate adaptively |
| Adam | Combines RMSProp + Momentum |

Example (PyTorch):

    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)


Q17 What is sklearn.linear_model ?

Contains linear models like:
- `LinearRegression`
- `LogisticRegression`
- `Ridge`, `Lasso`, etc.

```python
from sklearn.linear_model import LinearRegression
model = LinearRegression()
```


Q18 What does model.fit() do? What arguments must be given?

- Fits/trains the model on the training data.
- Arguments: `X_train`, `y_train`


Q19 What does model.predict() do? What arguments must be given?

- Predicts output using trained model.
- Argument: `X_test`


Q20 What are continuous and categorical variables?

1.Continuous Variables:

Can take any numerical value within a range.

Have measurable quantities (can be decimals or fractions).

Examples: Height, weight, temperature, time, income.

2.Categorical Variables:

Represent groups or categories, not numerical values.

Can be nominal (no order) or ordinal (ordered categories).

Examples: Gender, blood type, color, education level.

In short:

Continuous = measurable numbers

Categorical = names or labels

Q21  What is Feature Scaling? How does it help in Machine Learning?

Feature scaling normalizes features to similar ranges.
Helps speed up convergence and improve accuracy.

    from sklearn.preprocessing import StandardScaler
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)



Q22 How do we perform scaling in Python?




In [3]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler


data = {
    'Age': [18, 25, 30, 45, 50],
    'Salary': [20000, 35000, 40000, 60000, 80000]
}

X = pd.DataFrame(data)
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

X_scaled = pd.DataFrame(X_scaled, columns=X.columns)
print(X_scaled)


       Age    Salary
0  0.00000  0.000000
1  0.21875  0.250000
2  0.37500  0.333333
3  0.84375  0.666667
4  1.00000  1.000000


Q23 What is sklearn.preprocessing?

sklearn.preprocessing is a module in Scikit-learn (sklearn) that provides tools for data preprocessing — the step where you prepare raw data before feeding it into a machine learning model.


In [4]:
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

X = pd.DataFrame({'Age': [18, 25, 30, 45], 'Salary': [20000, 35000, 60000, 80000]})

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

print(X_scaled)


[[0.         0.        ]
 [0.25925926 0.25      ]
 [0.44444444 0.66666667]
 [1.         1.        ]]


Q24 How do we split data for model fitting (training and testing) in Python?

To split data into training and testing sets in Python, you typically use the train_test_split function from sklearn.model_selection.

This helps evaluate how well your machine learning model performs on unseen (test) data.

In [6]:
from sklearn.model_selection import train_test_split
import pandas as pd

# Sample data
X = pd.DataFrame({'feature': [1, 2, 3, 4, 5]})
y = pd.Series([0, 1, 0, 1, 0])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(X_train)
print(y_train)



   feature
4        5
2        3
0        1
3        4
4    0
2    0
0    0
3    1
dtype: int64


Q25 Explain Data Encoding

**Data Encoding** converts text/categorical data into numeric format.

- Label Encoding
- One-Hot Encoding
- Binary/Hash Encoding
```python
pd.get_dummies(df['Category'])
```
