# Feture Engineering






## **1. What is a parameter?**

- A parameter in machine learning is an internal variable whose value is determined during training. For example, in linear regression (`y = wx + b`), weights (`w`) and bias (`b`) are parameters learned from the data to fit the model.



## **2. What is correlation?**

- Correlation measures the relationship and dependency between two variables. It quantifies how much one variable changes in relation to another, usually on a scale from -1 (perfect negative) to +1 (perfect positive).



## **3. What does negative correlation mean?**

- Negative correlation means that as one variable increases, the other decreases. It is marked by a correlation coefficient less than 0. For example, as the temperature decreases, heater sales might increase.


## **4. Define Machine Learning. What are the main components in Machine Learning?**

- Machine Learning is a subset of Artificial Intelligence where systems learn from data, identify patterns, and make decisions with minimal human intervention.

**Main Components:**
   - **Data:** The information used to train and test models.
   - **Model:** The mathematical algorithm that learns patterns from data.
   - **Loss Function:** Measures the error between predicted and true values.
   - **Optimizer:** Updates model parameters to minimize the loss.
   - **Evaluation Metric:** Assesses the model’s performance.



## **5. How does loss value help in determining whether the model is good or not?**

- The loss value quantifies the error in predictions. A lower loss suggests the model's predictions are close to actual values, indicating better performance. High loss points to poor model fit.



## **6. What are continuous and categorical variables?**

-  - **Continuous variables:** Numeric values that can take any value within a range (e.g., height, weight, temperature).
   - **Categorical variables:** Represent categories or groups (e.g., gender, city, color).



## **7. How do we handle categorical variables in Machine Learning? What are the common techniques?**

- Categorical variables are transformed into numeric form for model compatibility.

**Common Techniques:**
  - **Label Encoding:** Assigns each category a unique integer.
  - **One-Hot Encoding:** Transforms every category into a separate binary column.



## **8. What do you mean by training and testing a dataset?**

- - **Training dataset:** The portion of data used to fit the model.
  - **Testing dataset:** The portion of data used to evaluate model performance on unseen data.


## **9. What is sklearn.preprocessing?**

- `sklearn.preprocessing` is a scikit-learn module providing tools for scaling, transforming, and encoding data before training machine learning models.


## **10. What is a Test set?**

- A test set is a data subset not used during model training, reserved to evaluate model performance objectively.


## **11. How do we split data for model fitting (training and testing) in Python?**

- We can use `train_test_split`:

  `from sklearn.model_selection import train_test_split

   Assume X and y are defined (features and target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)`


## **12. How do you approach a Machine Learning problem?**

- 1. **Understand the Problem**
  2. **Collect Data**
  3. **Clean and Preprocess Data**
  4. **Perform EDA (Exploratory Data Analysis)**
  5. **Feature Engineering**
  6. **Model Selection and Training**
  7. **Evaluation**
  8. **Model Tuning**
  9. **Deployment**


## **13. Why do we have to perform EDA before fitting a model to the data?**

- EDA helps you understand data distributions, detect outliers, spot data quality issues, and identify relationships between features, guiding better preprocessing and feature engineering.



## **14. What is correlation?**

- Correlation quantifies how two variables are related to each other, measuring association strength and direction. (Repeated question.)

## **15. What does negative correlation mean?**

- Negative correlation means one variable's increase leads to the other's decrease. (Repeated question.)


## **16. How can you find correlation between variables in Python?**

- Using pandas' `.corr()`:

       `import pandas as pd

        data = {'A': , 'B': }

        df = pd.DataFrame(data)

        print(df.corr())`



## **17. What is causation? Explain difference between correlation and causation with an example.**

- **Causation** means changes in one variable cause changes in another.
**Correlation** means two variables are related, but one may not cause the other.

**Example:** Ice cream sales and drowning both increase in summer (correlated), but buying ice cream does not cause drowning.



## **18. What is an Optimizer? What are different types of optimizers? Explain each with an example.**

- An **optimizer** updates model parameters to minimize loss.

    - **Gradient Descent:** Updates all parameters using the gradient calculated from the entire dataset.
    - **Stochastic Gradient Descent (SGD):** Updates parameters using a single or batch of samples at a time.
    - **Adam Optimizer:** Combines adaptive moment estimation with gradient descent for faster convergence.


## **19. What is sklearn.linear_model?**

- `sklearn.linear_model` is a Python module within scikit-learn containing linear models like LinearRegression and LogisticRegression.


## **20. What does model.fit() do? What arguments must be given?**

- `model.fit()` trains the model using the provided data.

  **Arguments:**
  - `X`: feature data
  - `y`: target values

        from sklearn.linear_model import LinearRegression
        model = LinearRegression()
        model.fit(X_train, y_train)




## **21. What does model.predict() do? What arguments must be given?**

- `model.predict()` provides predictions for new data points.

    **Argument:**
   - `X`: feature data


      predictions = model.predict(X_test)   




## **22. What are continuous and categorical variables?**

- - **Continuous:** Data that can take any value in a range (e.g., price, height).
   - **Categorical:** Data with finite labels or classes (e.g., blood type, brand).



## **23. What is feature scaling? How does it help in Machine Learning?**

- Feature scaling ensures all features contribute equally to model training by bringing them to the same scale, which is critical for algorithms sensitive to feature magnitude (e.g., KNN, SVM).


## **24. How do we perform scaling in Python?**

- Using `StandardScaler` or `MinMaxScaler`:

      from sklearn.preprocessing import StandardScaler
      scaler = StandardScaler()
      X_scaled = scaler.fit_transform(X)


## **25. What is sklearn.preprocessing?**

- A module in scikit-learn for preprocessing data, including scaling, normalizing, and encoding.

## **26. How do we split data for model fitting (training and testing) in Python?**

- We can use `train_test_split`:

  `from sklearn.model_selection import train_test_split

   Assume X and y are defined (features and target)

  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)`


## **27. Explain data encoding?**

- Encoding is the process of converting categorical data to numerical values so models can process them.

      from sklearn.preprocessing import LabelEncoder
      df = pd.DataFrame({'City': ['Delhi', 'Mumbai', 'Chennai']})
      encoder = LabelEncoder()
      df['City_encoded'] = encoder.fit_transform(df['City'])
      print(df)

  Or with one-hot encoding:

      import pandas as pd
      df = pd.get_dummies(df, columns=['City'])
      print(df)






       