


#**Feature Engineering**

----

##**1. What is a parameter?**

 - A parameter is a value that a machine learning model learns from the data during training. It's like a knob that the model adjusts to fit the data better.

  - For example, in a simple linear regression (predicting house prices based on size), the parameters are the slope and intercept of the line.

**Example:** Imagine predicting the price of a pizza based on its size. The parameter might be "price per square inch." The model learns this value from the data.

----

#**2. What is correlation? What does negative correlation mean?**

 - Correlation measures the relationship between two variables. It tells us how much they change together.

-  **Positive Correlation:** When one variable increases, the other also tends to increase.
 - **Negative Correlation:** When one variable increases, the other tends to decrease.

**Example:**
* **Positive:** Study time and exam scores (more study, higher score).
* **Negative:** Temperature and the sale of hot chocolate (higher temperature, fewer hot chocolate sales).

---

#**3. Define Machine Learning. What are the main components in Machine Learning?**

 - Machine Learning is a field of computer science that enables computers to learn from data without explicit programming. It allows systems to improve their performance on a specific task through experience.

**Main Components:**
* **Data:** The raw material used to train the model.
* **Model:** The algorithm that learns patterns from the data.
* **Learning Algorithm:** The process by which the model adjusts its parameters to fit the data.

**Example:** A spam email filter learns from examples of spam and non-spam emails to classify new emails.

----

#**4. How does loss value help in determining whether the model is good or not?**

 -  The loss value (or error) measures how poorly the model's predictions match the actual data. A lower loss value indicates a better-performing model.

**Example:** If a model predicts house prices and the loss value is high, it means the predictions are far from the actual prices, indicating a poor model.

-----

#**5. What are continuous and categorical variables?**

-
* **Continuous Variables:** Can take any value within a range (e.g., height, temperature).
* **Categorical Variables:** Represent categories or groups (e.g., color, gender).

**Example:**
* **Continuous:** Age (e.g., 25.5 years).
* **Categorical:** Eye color (e.g., blue, brown, green).


-----

#**6. How do we handle categorical variables in Machine Learning? What are the common techniques?**

 - Machine Learning models work with numbers, not categories. So, we convert categorical variables into numerical form.

**Common Techniques:**
* **One-Hot Encoding:** Creates a new binary column for each category.
* **Label Encoding:** Assigns a unique integer to each category.

**Example:** For "color" (red, blue, green), one-hot encoding creates three columns: "is_red," "is_blue," "is_green," with 1 or 0 indicating the presence of a color.

-----

#**7. What do you mean by training and testing a dataset?**


* **Training:** The process of teaching the model by showing it the data and adjusting its parameters.
* **Testing:** Evaluating the model's performance on unseen data to see how well it generalizes.

**Example:** You train a model to recognize cats using a set of cat images. Then, you test it on new cat images it hasn't seen before.

-----

#**8. What is sklearn.preprocessing?**

 - `sklearn.preprocessing` is a module in scikit-learn (a Python library) that provides tools for data preprocessing, such as scaling, encoding, and transforming data.

**Example:** Using `sklearn.preprocessing.StandardScaler` to standardize numerical features.

-----

#**9. What is a Test set?**

-  A test set is a portion of the data that is held back from the training process. It's used to evaluate the model's performance on unseen data.

**Example:** After training a model to predict stock prices, you use the test set to see how well it predicts prices it hasn't seen before.

------

#**10. How do we split data for model fitting (training and testing) in Python?**

Using `train_test_split` from `sklearn.model_selection`.

```python
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

-----

#**11. Why do we have to perform EDA before fitting a model to the data?**

 - Exploratory Data Analysis (EDA) helps us understand the data, identify patterns, and detect issues like missing values or outliers, which can affect model performance.

**Example:** Plotting histograms to see the distribution of numerical features.


-----

#**12. What is correlation?**

- Correlation measures the relationship between two variables, indicating how they move together.

**Positive Correlation:** Both variables increase or decrease together.

**Negative Correlation:** One variable increases as the other decreases.

Example:
** Positive:** Study time and exam scores (more study, higher scores).

**Negative:** Temperature and the number of people wearing sweaters (higher temperature, fewer sweaters).

----

#**13. What does negative correlation mean?**

 - Negative correlation means that as one variable increases, the other tends to decrease.

Example: Think about the relationship between exercise and body fat percentage.

As you exercise more (increase in exercise), your body fat percentage generally tends to decrease (decrease in body fat). This is an example of a negative correlation.

----

#**14. How can you find correlation between variables in Python?**

-  Using the `corr()` method in pandas.

```python
import pandas as pd

correlation_matrix = df.corr()
```

----

#**15. What is causation? Explain the difference between correlation and causation with an example.**


**Correlation:** Two variables move together.
**Causation:** One variable causes the other to change.

**Example:** Ice cream sales and sunburns are correlated (both increase in summer), but ice cream doesn't cause sunburns. The common cause is the weather.

----

#**16. What is an Optimizer? What are different types of optimizers? Explain each with an example.**

-  An optimizer is an algorithm that adjusts the model's parameters to minimize the loss function.

**Types:**
 **Gradient Descent:** Iteratively moves towards the minimum loss.
 **Adam:** Adaptive Moment Estimation, adjusts learning rates for each parameter.

**Example:** Gradient Descent is like rolling a ball down a hill to find the lowest point.

----

#**17. What is sklearn.linear_model?**

 - `sklearn.linear_model` is a module in scikit.learn that provides tools for linear models like linear regression and logistic regression.

**Example:** Using `sklearn.linear_model.LinearRegression` to build a linear regression model.

-----

#**18. What does model.fit() do? What arguments must be given?**

 - `model.fit()` trains the model using the training data.

**Arguments:**
* `X_train`: The features of the training data.
* `y_train`: The target variable of the training data.

**Example:** `model.fit(X_train, y_train)`

----

**19. What does model.predict() do? What arguments must be given?**

-`model.predict()` uses the trained model to make predictions on new data.

**Arguments:**
* `X_test`: The features of the data you want to predict.

**Example:** `predictions = model.predict(X_test)`

----

#**20. What are continuous and categorical variables?**

 - In machine learning, we often deal with different types of data. Two common types are:

**Continuous Variables:** These are variables that can take on any value within a given range. Think of them as measurements.

Examples: A person's height (e.g., 165.5 cm), temperature (e.g., 25.7°C), or the price of a house (e.g., ₹5,500,000).
Categorical Variables: These variables represent distinct categories or groups.

Examples: A person's blood type (A, B, AB, O), the type of fruit (apple, banana, orange), or a person's marital status (single, married, divorced).

----

#**21. What is feature scaling? How does it help in Machine Learning?**

 - Feature scaling is the process of normalizing the range of features. It helps models converge faster and improves performance.

**Example:** Scaling features like age and income to have similar ranges.

----

#**22. How do we perform scaling in Python?**

 - Using `StandardScaler` or `MinMaxScaler` from `sklearn.preprocessing`.

```python
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
```
-----

**23. What is sklearn.preprocessing?**

 - sklearn.preprocessing is a Python library that provides functions for transforming data before feeding it into a machine learning model. This can include scaling numerical features, encoding categorical variables, or handling missing values.

  - Example:

###**Python**

from sklearn.preprocessing import StandardScaler

### Create a scaler object
scaler = StandardScaler()

### Fit the scaler to your data and transform it
scaled_data = scaler.fit_transform(your_data)


------

#**24. How do we split data for model fitting (training and testing) in Python?**

- In Python, we commonly use the train_test_split function from the sklearn.model_selection module to divide our data into training and testing sets.

Here's how it works:

Python

from sklearn.model_selection import train_test_split

#### Assuming 'X' is your feature data and 'y' is your target variable

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

----

#**25. Explain data encoding?**

 -  Data encoding is the process of converting categorical data into numerical data so that machine learning models can process it.

**Example:** One-hot encoding converting "color" into numerical columns.