1. What is a parameter?
  - A **parameter** is a variable or value that helps define and control the behavior of a function, system, or model. In programming, parameters are used to pass information to functions, allowing them to perform specific tasks based on given inputs.


2.  What is correlation? What does negative correlation mean?
  - **Correlation** is a statistical measure that describes the relationship between two variables. It indicates how one variable changes in relation to another—whether they move together (positive correlation), in opposite directions (negative correlation), or show no connection (zero correlation).
  - **Negative correlation** means that as one variable increases, the other decreases. In other words, they move in opposite directions. For example, in economics, higher unemployment rates often correlate negatively with consumer spending—when job losses rise, spending tends to drop.


3. Define Machine Learning. What are the main components in Machine Learning?
  - **Machine Learning (ML)** is a subset of artificial intelligence (AI) that enables computers to learn from data and improve their performance on a task without being explicitly programmed. It uses algorithms to identify patterns, make predictions, and optimize decisions based on experience.

  1. **Data** – The foundation of ML; high-quality, structured data is crucial for training models.
  2. **Features** – Characteristics or attributes of data that influence predictions.
  3. **Algorithms** – Mathematical models that process data and learn patterns.
  4. **Model** – The trained system that makes predictions or classifications.
  5. **Training** – The process where the model learns from data using algorithms.
  6. **Evaluation** – Assessing model performance using metrics like accuracy or precision.
  7. **Deployment** – Integrating the trained model into real-world applications for decision-making.



4. How does loss value help in determining whether the model is good or not?
  - Loss value measures how far a model's predictions are from the actual values. A lower loss indicates better accuracy, meaning the model is learning well. A higher loss suggests errors, requiring adjustments like better data, optimized parameters, or different algorithms to improve performance. It helps in evaluating and refining models for better predictions.



5.  What are continuous and categorical variables?
  - **Continuous variables** are numerical and can take any value within a range. They are measurable and have infinite possible values, such as height, weight, or temperature.

  - **Categorical variables** represent distinct groups or categories. They are non-numerical and include labels like colors, types of food, or gender.



6. How do we handle categorical variables in Machine Learning? What are the common techniques?
  - Handling **categorical variables** in Machine Learning is crucial because models typically work with numerical data. Here are some common techniques to transform categorical variables into a format suitable for ML algorithms:


  1. **Label Encoding** – Assigns unique numerical values to each category (e.g., "Red" = 0, "Blue" = 1).
  2. **One-Hot Encoding** – Creates binary columns for each category (e.g., "Red" → [1, 0, 0], "Blue" → [0, 1, 0]).
  3. **Ordinal Encoding** – Assigns numbers based on order/rank (e.g., "Low" = 1, "Medium" = 2, "High" = 3).
  4. **Binary Encoding** – Converts categories into binary numbers to reduce dimensionality.
  5. **Target Encoding** – Replaces categories with the mean of their corresponding target values (used in supervised learning).
  6. **Frequency/Count Encoding** – Represents categories using their frequency in the dataset.
  7. **Embedding Layers (for Deep Learning)** – Maps categories to dense vector representations.

  Each method has its pros and cons, depending on the dataset and model used.



7.  What do you mean by training and testing a dataset?
  - **Training a dataset** means using it to teach a machine learning model by adjusting its parameters to learn patterns and make accurate predictions.

  - **Testing a dataset** involves evaluating the trained model on unseen data to check its performance and ensure it generalizes well to new inputs.



8. What is sklearn.preprocessing?
  - sklearn.preprocessing is a module in Scikit-learn that provides tools for transforming raw data into a format suitable for machine learning models. It includes methods for scaling, normalizing, encoding categorical variables, and feature engineering to improve model performance.

  - **StandardScaler** – Scales data to have a mean of 0 and variance of 1.
  - **MinMaxScaler** – Normalizes features within a fixed range (e.g., 0 to 1).
  - **LabelEncoder** – Converts categorical labels into numerical values.
  - **OneHotEncoder** – Converts categorical data into binary columns.
  - **PolynomialFeatures** – Creates interaction terms for feature expansion.

  



9. What is a Test set?
  - A Test set is a portion of a dataset used to evaluate a trained machine learning model. It contains unseen data that helps measure how well the model generalizes to new inputs. A good performance on the test set indicates a reliable model.



10. How do we split data for model fitting (training and testing) in Python?
 How do you approach a Machine Learning problem?
  - Splitting Data for Model Fitting in Python:
    To split data into training and testing sets, we use train_test_split from Scikit-learn (sklearn.model_selection). This helps in evaluating model performance effectively.

    from sklearn.model_selection import train_test_split

   Sample data (X: features, y: target)
  X = [[1], [2], [3], [4], [5], [6]]
  y = [10, 20, 30, 40, 50, 60]

   Splitting data (80% training, 20% testing)
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

  print("Training set:", X_train, y_train)
  print("Testing set:", X_test, y_test)

  test_size = 0.2 means 20% data goes into testing.
  random_state = 42 ensures reproducibility of results.

  - Approach to Solving a Machine Learning Problem:
  A structured approach helps in building an effective ML model. Here’s a general workflow:
  1. **Define the Problem**  
   - Understand the goal: Classification? Regression? Clustering?
   - Identify the target variable.

  2. **Collect & Preprocess Data**  
    - Gather reliable datasets.
    - Handle missing values, duplicates, and outliers.
    - Perform feature selection and engineering.

  3. **Explore & Visualize Data**  
    - Generate summary statistics.
    - Use visualizations (histograms, scatter plots, correlations).

  4. **Choose and Train a Model**  
    - Select a suitable algorithm (Linear Regression, Decision Trees, Neural Networks, etc.).
    - Split data for **training** and **testing**.
    - Fit the model on the training data.

  5. **Evaluate the Model**  
    - Use metrics like accuracy, precision, recall, RMSE.
    - Tune hyperparameters using techniques like **Grid Search** or **Cross-validation**.

  6. **Deploy the Model**  
    - Integrate the model into applications.
    - Monitor performance on real-world data.

  7. **Improve & Iterate**  
    - Continuously refine based on feedback and new data.
    - Optimize with better feature selection or alternative algorithms.




11.  Why do we have to perform EDA before fitting a model to the data?
  - **Exploratory Data Analysis (EDA)** helps understand the dataset before fitting a model. It reveals patterns, anomalies, missing values, correlations, and feature distributions, ensuring the data is clean and well-structured. Proper EDA improves model accuracy, prevents errors, and helps select the best features, making the ML pipeline more efficient.





14. How can you find correlation between variables in Python?
  - in Python, we can correlation between variables using Pandas or Numpy.

  Using Pandas:

  import pandas as pd

  data = {'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]}
  df = pd.DataFrame(data)

  correlation_matrix = df.corr()
  print(correlation_matrix)


  Using Numpy:

  import numpy as np

  x = [1, 2, 3, 4, 5]
  y = [10, 20, 30, 40, 50]

  correlation = np.corrcoef(x, y)
  print(correlation)

  These methods return the correlation coefficient, with values ranging from -1 to 1:

  +1 → Strong positive correlation

  -1 → Strong negative correlation

  0 → No correlation



15.  What is causation? Explain difference between correlation and causation with an example.
  - **Causation** means that one event directly causes another to happen. In other words, a change in one variable leads to a change in another.

  Difference Between Correlation and Causation:
  - **Correlation** shows that two variables are related but does not imply that one causes the other.  
  - **Causation** confirms that one variable directly affects another.

  Example:  
  - **Correlation:** Ice cream sales increase when swimming pool drownings rise. (Both happen more in summer, but ice cream does not cause drownings!)  
  - **Causation:** Drinking contaminated water causes food poisoning.



16. What is an Optimizer? What are different types of optimizers? Explain each with an example.
  - An optimizer is an algorithm that adjusts a machine learning model's parameters (weights) to minimize the loss function and improve accuracy. It plays a crucial role in training deep learning models by refining predictions through iterative adjustments.


  Types or Optimizers with Examples:
  1. **Gradient Descent** – Basic optimization method that updates weights by computing the gradient of the loss function.  
   - Example: Used in Linear Regression to find the best-fit line.

  2. **Stochastic Gradient Descent (SGD)** – Updates weights using a single random data point instead of the whole dataset, making training faster.  
    - Example: Used in image classification models like CNNs.

  3. **Momentum** – Enhances **SGD** by adding velocity, helping the optimizer move past local minima and stabilize learning.  
    - Example: Used in deep learning architectures like ResNet.

  4. **Adam (Adaptive Moment Estimation)** – Combines **momentum** and **RMSprop**, dynamically adjusting learning rates for better convergence.  
    - Example: Commonly used in deep learning models for NLP and computer vision.

  5. **RMSprop (Root Mean Square Propagation)** – Maintains an adaptive learning rate to speed up training and prevent instability.  
    - Example: Used in reinforcement learning applications.



17. What is sklearn.linear_model ?
  - sklearn.linear_model is a module in Scikit-learn that provides linear models for regression and classification tasks. It includes algorithms like Linear Regression, Logistic Regression, Ridge, Lasso, and Elastic Net, which help in predicting continuous values or classifying data based on features.



18. What does model.fit() do? What arguments must be given?
  - The fit() function trains a machine learning model by learning from the given data. It adjusts the model’s parameters based on the input features and target values to make accurate predictions.

  Arguments:
  x_train - feature/input data used for training.
  y_train - target/output data corresponding to x_train.



19. What does model.predict() do? What arguments must be given?
  - The predict() function is used after training a model to make predictions on new or unseen data based on learned patterns.

  Argumetns:
  X_test - input feature for which predictions are needed.



20. What are continuous and categorical variables?
  - Continuous variables are numerical values that can take any range within a given limit. They are measurable and can have infinite possible values, such as height, temperature, or time.

  - Categorical variables represent distinct groups or labels rather than numerical measurements. They define categories like colors, types of cars, or customer preferences.



21. What is feature scaling? How does it help in Machine Learning?
  - **Feature Scaling in Machine Learning**  
  **Feature scaling** is a preprocessing technique that standardizes or normalizes numerical data, ensuring all features have a consistent range. Since different features may have varying scales, scaling helps models process them effectively.

  How It Helps:  
  - **Improves Model Accuracy** – Prevents bias from large-valued features.
  - **Boosts Convergence Speed** – Speeds up training, especially for gradient-based models.
  - **Enhances Performance in Distance-Based Models** – Essential for algorithms like k-NN and k-Means clustering.




22.  How do we perform scaling in Python?
  - Feature scaling can be done using Scikit-learn's preprocessing module. The two most common techniques are:
  
  1. Standardization

  from sklearn.preprocessing import StandardScaler
  scaler = StandardScaler()
  X_scaled = scaler.fit_transform(X)


  2. Min-Max Scaling

  from sklearn.preprocessing import MinMaxScaler
  scaler = MinMaxScaler()
  X_scaled = scaler.fit_transform(X)




24. Explain data encoding?
  - Data encoding transforms categorical variables into numerical values so machine learning models can process them effectively.

  Common Encoding Techniques:

  1. **Label Encoding** – Assigns a unique number to each category (e.g., "Red" → 0, "Blue" → 1).  
  2. **One-Hot Encoding** – Creates binary columns for each category (e.g., "Red" → [1, 0, 0], "Blue" → [0, 1, 0]).  
  3. **Ordinal Encoding** – Assigns ordered numerical values based on ranking (e.g., "Low" → 1, "Medium" → 2, "High" → 3).  
  4. **Target Encoding** – Uses the mean of a category’s target variable as its encoded value.  
  5. **Binary Encoding** – Converts categories into binary digits to reduce dimensionality.

