## 1. What is a parameter?
A **parameter** in machine learning refers to a variable that the model learns from the training data. These parameters help define the model's structure and behavior. Examples include weights in neural networks and coefficients in linear regression.


## 2. What is correlation?
**Correlation** measures the relationship between two variables. It determines how one variable changes concerning another. The correlation coefficient (r) ranges from -1 to 1:
- **1**: Perfect positive correlation
- **0**: No correlation
- **-1**: Perfect negative correlation

## 3. What does negative correlation mean?
A **negative correlation** means that as one variable increases, the other decreases. For example, in a study of exercise vs. body fat percentage, an increase in exercise might lead to a decrease in body fat, indicating a negative correlation.


## 4. Define Machine Learning. What are the main components in Machine Learning?
**Machine Learning (ML)** is a field of AI that enables computers to learn from data without being explicitly programmed.

### Main components in ML:
1. **Data** – Input data used for training and testing.
2. **Features** – Important characteristics extracted from data.
3. **Model** – A mathematical function that learns from the data.
4. **Loss Function** – Measures how well the model performs.
5. **Optimization Algorithm** – Adjusts parameters to minimize loss.
6. **Training Process** – Model learns patterns from data.
7. **Evaluation** – Testing how well the model generalizes to unseen data.

## 5. How does loss value help in determining whether the model is good or not?
The **loss value** quantifies the difference between the model’s predictions and actual values. A lower loss value indicates better model performance. Common loss functions include:
- **Mean Squared Error (MSE)** – Used in regression problems.
- **Cross-Entropy Loss** – Used in classification problems.

## 6. What are continuous and categorical variables?
- **Continuous variables**: Numeric values with an infinite range (e.g., height, weight, temperature).
- **Categorical variables**: Discrete values representing categories or labels (e.g., colors, gender, city names).


## 7. How do we handle categorical variables in Machine Learning? What are the common techniques?
Handling categorical variables is crucial for ML models. Common techniques include:
1. **Label Encoding** – Assigns a unique number to each category.
2. **One-Hot Encoding** – Creates binary columns for each category.
3. **Ordinal Encoding** – Assigns ordered values to categorical data.
4. **Frequency Encoding** – Replaces categories with their occurrence count.
5. **Target Encoding** – Uses the mean of the target variable per category.

## 8. What do you mean by training and testing a dataset?
- **Training dataset**: Used to train the model.
- **Testing dataset**: Used to evaluate model performance.
Splitting data into training and testing sets prevents overfitting and ensures generalization.


## 9. What is sklearn.preprocessing?
`sklearn.preprocessing` is a module in Scikit-learn that provides tools for feature scaling, normalization, and encoding categorical variables. Examples:
```python
from sklearn.preprocessing import StandardScaler, OneHotEncoder

## 10. What is a Test set?
The test set is a subset of data used to evaluate the trained model. It is separate from the training data to measure how well the model generalizes to unseen data.

In [3]:
from sklearn.model_selection import train_test_split

# Sample dataset (features and labels)
X = [[1], [2], [3], [4], [5]]  # Features
y = [10, 20, 30, 40, 50]       # Target values

# Splitting data into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Print the results
print("X_train:", X_train)
print("X_test:", X_test)
print("y_train:", y_train)
print("y_test:", y_test)


X_train: [[5], [3], [1], [4]]
X_test: [[2]]
y_train: [50, 30, 10, 40]
y_test: [20]


## Steps to Approach a Machine Learning Problem

- **Understand the problem** – Define the objective and goals.
- **Collect & preprocess data** – Handle missing values, encoding, and normalization.
- **Feature engineering** – Select and transform features.
- **Choose a model** – Select the appropriate algorithm.
- **Train the model** – Fit the model to training data.
- **Evaluate performance** – Use metrics like accuracy, precision, recall, etc.
- **Optimize the model** – Fine-tune hyperparameters.
- **Deploy the model** – Integrate into a production environment.


## 11. Why do we have to perform EDA before fitting a model to the data?
**Exploratory Data Analysis (EDA)** helps us understand the dataset before training a model. It helps in:
- Detecting missing values and outliers.
- Identifying data distributions.
- Understanding relationships between variables.
- Selecting relevant features.
- Choosing the right preprocessing steps.

### 12. What is correlation?
**Correlation** measures the relationship between two variables. It shows how one variable changes with respect to another. The correlation coefficient (r) ranges from:
- **1** → Strong positive correlation.
- **0** → No correlation.
- **-1** → Strong negative correlation.

## 13. What does negative correlation mean?
A **negative correlation** means that as one variable increases, the other decreases.  
**Example:** As the number of hours spent watching TV increases, physical activity decreases.


## 14. How can you find correlation between variables in Python?
In Python, we use Pandas to calculate correlation:

```python
import pandas as pd

# Sample dataset
data = {'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]}
df = pd.DataFrame(data)

# Compute correlation
print(df.corr())

### 15. What is causation? Explain the difference between correlation and causation with an example.
- **Causation** means that one variable **directly affects** another.
- **Correlation** only shows that two variables are related but does not imply cause-effect.

**Example:**  
- **Correlation:** Ice cream sales and drowning incidents are correlated (both increase in summer).  
- **Causation:** Eating more ice cream does not cause drowning. The **real cause** is summer heat.


### 16. What is an Optimizer? What are different types of optimizers? Explain each with an example.
An **optimizer** updates the model’s parameters to minimize the loss function.

#### Types of Optimizers:
1. **Gradient Descent:** Basic optimization method.
   ```python
   from tensorflow.keras.optimizers import SGD
   optimizer = SGD(learning_rate=0.01)
2. Adam Optimizer: Adaptive learning rate
3. RMSprop: Used for recurrent neural networks (RNNs).

##17. What is sklearn.linear_model?
sklearn.linear_model is a module in Scikit-learn for implementing linear models such as:

- **Linear Regression**
- **Logistic Regression**
- **Ridge Regression**
- **Lasso Regression**


## 18. What does model.fit() do? What arguments must be given?
model.fit() trains the model on data.
Arguments:

X_train – Features for training.
y_train – Target variable.

## 19. What does model.predict() do? What arguments must be given?
model.predict() makes predictions based on trained data.
Arguments:

X_test – Features for making predictions.

## 20. What are continuous and categorical variables?
- Continuous Variables: Numeric values with infinite range (e.g., height, weight).
- Categorical Variables: Discrete values representing categories (e.g., gender, city).

## 21. What is feature scaling? How does it help in Machine Learning?
Feature scaling ensures that all numerical features have similar scales, preventing one feature from dominating others.

## 22. How do we perform scaling in Python?
Using StandardScaler or MinMaxScaler from Scikit-learn:

In [4]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


## 23. What is sklearn.preprocessing?
sklearn.preprocessing provides functions for:

Scaling (StandardScaler, MinMaxScaler)
Encoding categorical data (OneHotEncoder, LabelEncoder)
Normalization (Normalizer)

## 24. How do we split data for model fitting (training and testing) in Python?
Using train_test_split from Scikit-learn:

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## 25. Explain Data Encoding

Data encoding converts categorical variables into numerical values, allowing machine learning models to process them effectively.

### Types of Encoding:

- **Label Encoding**: Assigns a unique number to each category. It is useful for ordinal categorical variables but may introduce unintended relationships between categories.

- **One-Hot Encoding**: Creates separate binary columns for each category. It is useful for nominal categorical variables (where order doesn’t matter).

### When to Use:
- **Label Encoding**: When dealing with ordinal data (e.g., Low, Medium, High).
- **One-Hot Encoding**: When dealing with nominal data (e.g., colors, city names).
