# Theory Questions

1. What is a parameter ?

    - In Machine Learning, a parameter is an internal variable that a model learns from the training data. For example, in linear regression, the slope (m) and intercept (b) in the equation y = mx + b are parameters. The model adjusts these during training to best fit the data.

2. What is correlation ?
    What does negative correlation mean ?

    - ***Correlation*** measures the relationship between two variables—how they move in relation to each other. It ranges from -1 to +1. A value close to +1 means strong positive correlation, while close to -1 means strong negative correlation.

        ***Negative correlation*** means that as one variable increases, the other decreases. For example, if more hours of study reduce the number of mistakes in a test, they have a negative correlation.

3. Define Machine Learning. What are the main components in Machine Learning ? 

    - ***Machine Learning (ML)*** is a branch of AI where computers learn patterns from data to make decisions. Its main components are:

        - Data

        - Features (input variables)

        - Model/Algorithm

        - Loss function

        - Optimizer (for improving the model)

        - Training and testing process

4. How does loss value help in determining whether the model is good or not ?

    - The loss value tells us how far the model's predictions are from the actual values. A low loss means better accuracy. During training, the model tries to minimize this value using optimizers.

5. What are continuous and categorical variables ? 

    - ***Continuous variables*** can take any numeric value (e.g., height, weight).

        ***Categorical variables*** represent categories or groups (e.g., gender: male/female, color: red/blue).

6. How do we handle categorical variables in Machine Learning? What are the common techniques ?

    - Categorical variables must be converted into numbers. Common techniques include:

        - *Label Encoding* (assigning numbers to categories)

        - *One-Hot Encoding* (creating binary columns for each category)

        **Example using One-Hot Encoding :-**
        Color = Red, Blue, Green → becomes → Red: [1,0,0], Blue: [0,1,0], Green: [0,0,1]

7. What do you mean by training and testing a dataset ?

    - ***Training dataset*** is used to teach the model patterns in the data.

        ***Testing dataset*** is used to check how well the model performs on new, unseen data.

8. What is sklearn.preprocessing ?

    - It is a module in Scikit-learn that provides functions to preprocess data, such as:

        - Scaling features

        - Encoding categorical variables

        - Handling missing values

        **Example :** `from sklearn.preprocessing import StandardScaler`

9. What is a Test set ?

    - A test set is a portion of the dataset that is ***not*** used during training. It helps evaluate how well the model generalizes to new data.

10. How do we split data for model fitting (training and testing) in Python ?
     How do you approach a Machine Learning problem?

     -  We use train_test_split() from sklearn.model_selection:

          ```python
          from sklearn.model_selection import train_test_split
          X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
          
          ```

          Steps include:

          ***1.*** Understanding the problem and data

          ***2.*** Performing EDA (Exploratory Data Analysis)

          ***3.*** Preprocessing data (handling nulls, encoding, scaling)

          ***4.*** Splitting data

          ***5.*** Choosing and training a model

          ***6.*** Evaluating model performance

          ***7.*** Tuning and improving

11. Why do we have to perform EDA before fitting a model to the data ?

    - EDA helps understand the dataset's structure, patterns, and anomalies. It reveals:

        - Missing values

        - Outliers

        - Variable relationships (via plots). This improves feature selection and model accuracy.

12. What is correlation ?

    - Correlation measures the relationship between two variables—how they move in relation to each other. It ranges from -1 to +1. A value close to +1 means strong positive correlation, while close to -1 means strong negative correlation.

13. What does negative correlation mean ?

    - Negative correlation means that as one variable increases, the other decreases. For example, if more hours of study reduce the number of mistakes in a test, they have a negative correlation.

14. How can you find correlation between variables in Python ?

    - Use `.corr() `method in pandas:

        ```python
        import pandas as pd
        df.corr()
        ```
        You can visualize it using a heatmap with Seaborn:

        ```python
        import seaborn as sns
        sns.heatmap(df.corr(), annot=True)
        ```

15. What is causation? Explain difference between correlation and causation with an example.

    - ***Causation*** means one variable directly affects another.
        ***Correlation*** means two variables move together but may not affect each other.

        **Example :-**

        - *Correlation*: Ice cream sales and drowning cases (both rise in summer)

        - *Causation*: Smoking causes lung disease

16. What is an Optimizer? What are different types of optimizers? Explain each with an example.

    - Optimizers adjust the model’s parameters to reduce loss.
        Common optimizers:

         - ***SGD (Stochastic Gradient Descent)***: Updates weights using a small batch of data.

         - ***Adam***: Combines momentum and RMSProp; adapts learning rate.

        **Example:**

        ```python
        optimizer = tf.keras.optimizers.Adam()
        ```

17. What is sklearn.linear_model ?

    - It is a Scikit-learn module for applying linear models like:

        - LinearRegression()

        - LogisticRegression()

        Used to build simple regression or classification models.

18. What does model.fit() do ? What arguments must be given ?

    - `model.fit()` trains the model using the training data.

        **Example:**

        ```python
        model.fit(X_train, y_train)
        ```
        
        **Arguments:**

        - X_train: input features

        - y_train: target values

19. What does model.predict() do ? What arguments must be given ?

    - `model.predict()` makes predictions on new data using the trained model.

      **Example:**

      ```python
      predictions = model.predict(X_test)
      ```  

      **Argument:**

        - `X_test`: test feature set

20. What are continuous and categorical variables ?

    - ***Continuous variables*** can take any numeric value (e.g., height, weight).

        ***Categorical variables*** represent categories or groups (e.g., gender: male/female, color: red/blue).

21. What is feature scaling ? How does it help in Machine Learning ?

    - Feature scaling standardizes data to bring all features to the same scale. It helps algorithms like KNN, SVM, and gradient descent to perform better and converge faster.

22. How do we perform scaling in Python ?

    - Using `StandardScaler` or `MinMaxScaler` from `sklearn.preprocessing`:

        **Example:**

        ```python
        from sklearn.preprocessing import StandardScaler
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        ```

23. What is sklearn.preprocessing ?

    - It’s a Scikit-learn module used for preparing data before modeling. It includes:

        - Encoding (Label, One-Hot)

        - Scaling (Standard, MinMax)

        - Imputing missing values

24. How do we split data for model fitting (training and testing) in Python ?

    -  We use train_test_split() from sklearn.model_selection:

          ```python
          from sklearn.model_selection import train_test_split
          X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
          
          ```

25. Explain data encoding ?

    - Data encoding transforms categorical values into numeric format so models can understand them.

        **Types:**

        - ***Label Encoding***: Converts categories to integers.

        - ***One-Hot Encoding***: Creates binary columns for each category.

        **Example:**
        ```python
        from sklearn.preprocessing import OneHotEncoder
        ```