1. What is a parameter?
   - A parameter in feature engineering is a setting that affects how a feature is created or modified.
   - For example:
     - In scaling, the mean and standard deviation are parameters.
     - In binning, the number of bins is a parameter.
   - These values are chosen before training and help transform data for better model performance.

2. What is correlation? What does negative correlation mean?
   - Correlation measures the relationship between two variables, showing how they move together.
     - Positive correlation: Both variables increase or decrease together.
     - Negative correlation: One variable increases while the other decreases.
   - For example, temperature & hot coffee sales have a negative correlation- as temperature rises, coffee sales drop.

3. Define Machine Learning. What are the main components in Machine Learning?
   - Machine Learning is a method where computers learn patterns from data to make predictions or decisions without being explicitly programmed.
   - Main components of ML:
     - Data - Raw information used for training.
     - Features - Important attributes extracted from data.
     - Model - An algorithm that learns from data.
     - Training - Process of teaching the model using data.
     - Evaluation - Testing the model's accuracy and performance.
     - Prediciton - Using the trained model to make decisions.

4. How does loss value help in determining whether the model is good or not?
   - The loss value measures how far the model's predictions are from the actual values.
     - Low loss = The model is making accurate predictions(good model).
     - High loss = The model has errors and needs improvement.
   - It helps in tuning the model by adjusting parameters to reduce errors.

5. What are continuous and categorical variables?
   - Continuous variables:
     - A variable that can take any numerical value within a range.
       - Example: Height, weight, temperature.
   - Categorical variables:
     - A variable that represents distinct groups or categories.
       - Example: Gender(Male/Female), Colors(Red/Blue/Green).

6. How do we handle categorical variables in Machine Learning? What are the common techniques?
   - Categorical variables need to be converted into numerical form for ML models.
   - Common Techniques:
     - Label Encoding - Assigns a unique number to each category(e.g., Male -> 0, Female -> 1).
     - One-Hot Encoding - Creates separate binary columns for each category(e.g., Red -> [1,0,0], Blue -> [0,1,0]).
     - Ordinal Encoding - Assigns numbers based on order(e.g., Small -> 1, Medium -> 2, Large -> 3).
     - Frequency Encoding - Replaces categories with their occurence count.
     - Target Encoding - Replaces categories with the mean of the target variable.

7. What do you mean by training and testing a dataset?
   - Training Dataset: Used to teach the model by finding patterns in the data.
   - Testing Dataset: Used to evaluate the model's performance on unseen data.
   - Example:
     - If you're training a model to recognize cats and dogs.
       - The training set helps the model learn the difference.
       - The testing set checks if the model correctly identifies new images.

8. What is sklearn.preprocessing?
   - sklearn.preprocessing is a module in Scikit-Learn that provides tools for scaling, transforming, and encoding data before training a machine learning model.
   - Common functions
     - StandardScaler - Standardizes data(mean = 0, std = 1).
     - MinMaxScaler - Scales data to a fixed range (e.g., 0 to 1).
     - LabelEncoder - Converts categorical labels into numbers.
     - OneHotEncoder - Converts categories into binary vectors.
     - Binarizer - Converts values into 0s and 1s based on threshold.
   - It helps improve model performance by making data suitable for learning.

9. What is a Test set?
   - A test set is a portion of the dataset used to evaluate a trained machine learning model. It contains unseen data to check how well the model performs on new inputs.
   - Example: If training a model recognize cats and dogs, the test set includes new images the model hasn't seen before to measure accuracy.

10. How do we split data for model fitting (training and testing) in Python? How do you approach a Machine Learning problem?
    - In Python, we use train_test_split from sklearn.model_selection to split data into training and testing sets.
    - Example:
              from sklearn.model_selection import train_test_split

              X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    - Explanation:
      - X, y -> Features and target variable
      - test_size=0.2 -> 20% of data for testing, 80% for training.
      - random_state=42 -> Ensure reproducibilty

    - This helps train the model on one part and test it on unseen data.
    - Approach to a Machine Learning Problem:
      - Define the Problem - Understand the goal and data requirements.
      - Collect Data - Gather relevant and high-quality data.
      - Preprocess Data - Handle missing values, remove duplicates, and clean data.
      - Feature Engineering - Select, create, and transform features.
      - Split Data - Divide into training and testing sets.
      - Choose a Model - Select the right algorithm based on the problem.
      - Train the Model - Fit the model using the training data.
      - Evaluate the Model - Test performance using the test data.
      - Tune the Model - Optimize hyperparameters for better accuracy.
      - Deploy and Monitor - Use the model in real-world applications and track its performance.
    - This structured approach ensures a systematic and effective ML solution.

11. Why do we have to perform EDA before fitting a model to the data?
    - Exploratory Data Analysis (EDA) helps us understand the dataset before training a model. It is important because:
      - Detects Missing Values - Helps handle incomplete data.
      - Finds Outliers - Identifies unusual values that may affect the model.
      - Understands Data Distribution - Checks patterns, trends, and relationships.
      - Identifies Feature Importance - Helps select the right features for training.
      - Prevents Data Leakage - Ensures proper data splitting and avoids biased models.
    - EDA improves model accuracy by ensuring clean, well-structured data.

12. What is correlation?
    - Correlation measures the relationship between two variables and how they move together.
      - Positive Correlation: Both variables increase or decrease together.
      - Negative Correlation: One variable increases while the other decreases.
      - Zero Correlation: No relationship between variables.
    - Example:
      - Height & Weight → Positive Correlation
      - emperature & Hot Coffee Sales → Negative Correlation

13. What does negative correlation mean?
    - Negative correlation means that as one variable increases, the other decreases (or vice versa).
    - Example:
      - Temperature & Hot Coffee Sales → As temperature rises, coffee sales drop.
      - Exercise Time & Body Weight → More exercise leads to lower weight.
    - A stronger negative correlation (closer to -1) means a stronger inverse relationship.

14. How can you find correlation between variables in Python?
    - Use Pandas and NumPy to calculate correlation between variables.
    - Example:
             import pandas as pd

             # Sample DataFrame
             data = {'A': [1, 2, 3, 4], 'B': [4, 3, 2, 1]}
             df = pd.DataFrame(data)

             # Calculate correlation
             correlation_matrix = df.corr()
             print(correlation_matrix)

    - Other Methods:
      - Pearson Correlation (default in .corr()) - Measures linear relationship.
      - Spearman & Kendall Correlation - Used for ranked or non-linear data.
             df.corr(method='spearman')  # Spearman correlation
             df.corr(method='kendall')   # Kendall correlation

    - This helps understand relationships between variables in datasets.

15. What is causation? Explain difference between correlation and causation with an example.
  - Causation means that one event directly causes another to happen.
  - Difference Between Correlation and Causation
    - Correlation: Two variables move together but one does not necessarily cause the other.
    - Causation: One variable directly affects the other.
  - Example:
    - Correlation: Ice cream sales and drowning cases increase together.
      - But eating ice cream doesn't cause drowning (summer heat is the real factor).
    - Causation: More exercise leads to weight loss.
      - Here, exercise directly causes weight loss.
    - Key Rule: Correlation does not imply causation!

16. What is an Optimizer? What are different types of optimizers? Explain each with an example.
   - An optimizer in machine learning adjusts the model's parameters (weights) to minimize the loss function and improve accuracy.
     - Types of Optimizers:
   - Gradient Descent (GD)
       - Updates weights using the entire dataset.
       - Example: Used in simple regression models.
   - Stochastic Gradient Descent (SGD)
       - Updates weights using one random sample at a time.
       - Example: Used in online learning and large datasets.
   - Mini-Batch Gradient Descent
       - Updates weights using small batches of data.
       - Example: Used in deep learning for efficiency.
   - Adam (Adaptive Moment Estimation)
       - Combines momentum and adaptive learning rates for faster convergence.
       - Example: Used in deep learning models like CNNs and RNNs.
   - RMSprop (Root Mean Square Propagation)
       - Adjusts learning rates based on recent gradients, preventing large updates.
       - Example: Works well for recurrent neural networks (RNNs).
   - Adagrad (Adaptive Gradient Algorithm)
       - Adapts learning rate based on past gradients, useful for sparse data.
       - Example: Used in NLP and recommendation systems.
   - Each optimizer helps improve model performance based on the problem and dataset.

17. What is sklearn.linear_model?
    - sklearn.linear_model is a module in Scikit-Learn that provides various linear models for regression and classification tasks.
    - Common Models:
      - a. LinearRegression - Fits a straight line to predict continuous values.
              from sklearn.linear_model import LinearRegression
              model = LinearRegression()
      - b. LogisticRegression - Used for binary/multi-class classification.
              from sklearn.linear_model import LogisticRegression
              model = LogisticRegression()
      - c. Ridge & Lasso Regression - Linear regression with regularization to prevent overfitting.
              from sklearn.linear_model import Ridge, Lasso
              ridge_model = Ridge(alpha=1.0)
              lasso_model = Lasso(alpha=0.1)
      - d. SGDClassifier & SGDRegressor - Uses Stochastic Gradient Descent for large datasets.
              from sklearn.linear_model import SGDClassifier, SGDRegressor
              clf = SGDClassifier()
              reg = SGDRegressor()

      - This module is useful for solving regression and classification problems efficiently.

18. What does model.fit() do? What arguments must be given?
    - model.fit() trains a machine learning model by learning patterns from the given data. It adjusts model parameters to minimize errors.
    - Required Arguments:
      - X (Features/Input Data) - The independent variables.
      - y (Target/Labels) - The dependent variable (what we want to predict).
    - Example:
              from sklearn.linear_model import LinearRegression

              model = LinearRegression()
              model.fit(X_train, y_train)  # Training the model
    - Once trained, the model can make predictions using .predict().

19. What does model.predict() do? What arguments must be given?
    - model.predict() makes predictions using the trained model on new or unseen data.
    - Required Argument:
      - X (Features/Input Data) - The data for which predictions are needed.
    - Example:
              y_pred = model.predict(X_test)  # Predicting on test data
    - It outputs predicted values based on learned patterns.

20. What are continuous and categorical variables?
    - Continuous Variables
      - Can take any numeric value within a range.
      - Example: Height, weight, temperature.
    - Categorical Variables
      - Represent distinct groups or categories.
      - Example: Gender (Male/Female), Colors (Red/Blue/Green).
    - Key Difference: Continuous variables are measured, while categorical variables are grouped.

21. What is feature scaling? How does it help in Machine Learning?
    - Feature scaling is the process of normalizing or standardizing numerical data so that all features have a similar scale.
    - How Does It Help in Machine Learning?
      - Improves Model Performance - Prevents features with larger values from dominating.
      - Speeds Up Training - Helps gradient-based models converge faster.
      - Better Distance Calculations - Essential for algorithms like KNN and K-Means.
    - Common Methods:
      - Standardization (StandardScaler) - Scales data to have mean = 0 and std = 1.
      - Normalization (MinMaxScaler) - Scales data between 0 and 1.
    - Example:
             from sklearn.preprocessing import StandardScaler

             scaler = StandardScaler()
             X_scaled = scaler.fit_transform(X)

    - Feature scaling ensures fair comparisons and better model accuracy.

22. How do we perform scaling in Python?
    - Use sklearn.preprocessing for feature scaling.
    - Standardization (StandardScaler)
      - Scales data to have mean = 0 and standard deviation = 1.
            from sklearn.preprocessing import StandardScaler

            scaler = StandardScaler()
            X_scaled = scaler.fit_transform(X)

    - Normalization (MinMaxScaler)
      - Scales data between 0 and 1.
           from sklearn.preprocessing import MinMaxScaler

           scaler = MinMaxScaler()
           X_scaled = scaler.fit_transform(X)

    - Feature scaling helps improve model performance and training speed.

23. What is sklearn.preprocessing?
    - sklearn.preprocessing is a module that provides tools for scaling, transforming, and encoding data before training a machine learning model.
    - Common Functions:
      - StandardScaler - Standardizes data (mean = 0, std = 1).
      - MinMaxScaler - Scales data between 0 and 1.
      - LabelEncoder - Converts categorical labels into numbers.
      - OneHotEncoder - Converts categorical data into binary vectors.
      - Binarizer - Converts values into 0s and 1s based on a threshold.
    - Example:
             from sklearn.preprocessing import StandardScaler

             scaler = StandardScaler()
             X_scaled = scaler.fit_transform(X)

    - It helps make data suitable for machine learning models.

24. How do we split data for model fitting (training and testing) in Python?
    - In Python, we use train_test_split from sklearn.model_selection to split data into training and testing sets.
    - Example:
              from sklearn.model_selection import train_test_split

              X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    - Explanation:
      - X, y -> Features and target variable
      - test_size=0.2 -> 20% of data for testing, 80% for training.
      - random_state=42 -> Ensure reproducibilty

    - This helps train the model on one part and test it on unseen data.

25. Explain data encoding?
    - Data encoding is the process of converting categorical data into numerical format so machine learning models can understand it.
    - Common Encoding Methods:
      - a. Label Encoding - Assigns a unique number to each category.
            from sklearn.preprocessing import LabelEncoder
            encoder = LabelEncoder()
            y_encoded = encoder.fit_transform(y)
      - b. One-Hot Encoding - Converts categories into binary vectors.
            from sklearn.preprocessing import OneHotEncoder
            encoder = OneHotEncoder(sparse=False)
            X_encoded = encoder.fit_transform(X)
      - c. Ordinal Encoding - Assigns numbers based on category order (for ranked data).
      - d. Target Encoding - Replaces categories with the mean of the target variable.
    - Encoding ensures categorical data is properly used in ML models.


          


             
        


















   

