# Assignment Questions

Q1. What is a parameter?

 - A parameter is a value that you pass to a function, procedure, or command to customize its behavior. It acts like a placeholder for input values that a function can use to perform its task.

Q2. What is correlation?
What does negative correlation mean?

 - Correlation is a statistical measure that describes the strength and direction of a relationship between two variables.

 ❌ Negative Correlation

When one variable increases, the other decreases.

Example: Hours of study vs. Number of errors on a test

Another: Price of a product vs. Quantity demanded

So, negative correlation means the two variables move in opposite directions.

Q3. Define Machine Learning. What are the main components in Machine Learning?

 - Machine Learning is a branch of Artificial Intelligence (AI) that enables systems to learn from data, identify patterns, and make decisions with minimal human intervention.

  🧩 Main Components of Machine Learning

  -   Data

     The most critical component.

     Includes raw input like text, numbers, images, or sensor readings.

     The quality and quantity of data strongly affect the model’s performance.

  -  Model

     A mathematical representation of a real-world process.

     It learns patterns from the data.

     Examples: Linear regression, decision trees, neural networks.

  - Algorithm

     The method used to train the model.

     It defines how the model will learn from the data.

     Examples: Gradient Descent, Random Forest, K-Means.

  - Training

     The process where the model learns from the historical data.

     It adjusts internal parameters to minimize prediction error.

  - Evaluation

     Checks how well the model performs using metrics like:

     Accuracy, Precision, Recall, F1-Score, RMSE, etc.

     Uses a test dataset or cross-validation.

  - Prediction / Inference

     Using the trained model to make predictions or decisions on new (unseen) data.

  - Feedback / Optimization

     Model performance is monitored and improved over time.

     Helps in refining the model using new or updated data.

Q4. How does loss value help in determining whether the model is good or not?

 - The loss value measures how far off a model’s predictions are from the actual outcomes. A lower loss indicates the model is performing well, while a higher loss suggests poor predictions. It helps guide the training process and evaluate model quality.

Q5. What are continuous and categorical variables?

 - ✅ Continuous Variables:

Variables that can take any numerical value within a range.

Example: Height, Weight, Temperature, Income.

✅ Categorical Variables:

Variables that represent categories or groups, often non-numeric.

Example: Gender (Male/Female), Country, Product Type, Yes/No.

In short, continuous = numbers you can measure, categorical = labels or categories you can group.

Q6. How do we handle categorical variables in Machine Learning? What are the common techniques?

 - 🛠️ Common Techniques to Handle Categorical Variables:

  - Label Encoding

     Assigns each category a unique number.

     Example: ["Red", "Blue", "Green"] → [0, 1, 2]

     Good for ordinal data (where order matters).

  - One-Hot Encoding

     Creates binary columns for each category.

     Example: ["Red", "Blue"] → [1,0], [0,1]

     Common for nominal data (no order).

  - Ordinal Encoding

     Like label encoding but keeps the order meaningful.

     Example: ["Low", "Medium", "High"] → [1, 2, 3]

  - Target Encoding (Mean Encoding)

     Replaces a category with the mean of the target variable for that category.

     Useful for high-cardinality categorical features.

  - Frequency Encoding

     Replaces categories with how often they appear in the dataset.

Q7. What do you mean by training and testing a dataset?

 - ✅ Training Dataset:

Used to train the model — i.e., help it learn patterns and relationships in the data.

The model adjusts its parameters based on this data.

✅ Testing Dataset:

Used to evaluate the model’s performance on unseen data.

Helps check if the model can generalize well beyond the data it was trained on.

Q8. What is sklearn.preprocessing?

 - sklearn.preprocessing is a module in Scikit-learn that provides tools to prepare and transform your data before feeding it into a machine learning model.

Q9. What is a Test set?

 - A test set is a portion of the dataset that is kept separate and not used during training. It is used to evaluate how well a trained machine learning model performs on unseen data.



Q10. How do we split data for model fitting (training and testing) in Python?
     How do you approach a Machine Learning problem?

 - In Python, you can use train_test_split from sklearn.model_selection to split your data into training and testing sets.

 Approach to a Machine Learning Problem:
When solving a Machine Learning problem, you generally follow these steps:

Define the Problem:

Understand the problem you're trying to solve (e.g., classification, regression, clustering).

Understand the data and the desired outcome.

Data Collection:

Gather and collect data from reliable sources.

Data Preprocessing:

Clean the data: Handle missing values, remove duplicates, etc.

Feature engineering: Create or modify features to improve model performance.

Encode categorical variables (e.g., Label Encoding, One-Hot Encoding).

Scale features (e.g., using StandardScaler or MinMaxScaler).

Split the Data:

Split the data into training and testing sets (typically 80/20 or 70/30).

Select a Model:

Choose an appropriate machine learning algorithm (e.g., Decision Trees, Random Forest, SVM, Linear Regression).

Train the Model:

Fit the model on the training data using the fit() method.

Evaluate the Model:

Evaluate the model's performance on the test set using metrics like accuracy, precision, recall, F1-score, or MSE.

Model Tuning:

Fine-tune the model’s hyperparameters to improve performance using techniques like GridSearchCV or RandomizedSearchCV.

Deploy the Model:

Once you're happy with the performance, deploy the model into production for real-time predictions.

Q11. Why do we have to perform EDA before fitting a model to the data?

 - EDA is crucial because it helps you understand the data before applying any machine learning model.

Q12. What is correlation?

 - Correlation is a statistical measure that describes the strength and direction of a relationship between two variables.

Q13. What does negative correlation mean?

 - Negative correlation means that when one variable increases, the other variable decreases (or vice versa). In other words, the variables move in opposite directions.

Q14. How can you find correlation between variables in Python?

 - You can find the correlation between variables in Python using Pandas’ corr() method, which gives a correlation matrix for all numeric columns. To visualize it, use Seaborn's heatmap(), which shows the strength of relationships using colors and values. For two specific variables, you can also use NumPy’s corrcoef() function.

Q15. What is causation? Explain difference between correlation and causation with an example.

 - Causation means that one variable directly affects another — a change in one causes a change in the other.

 Causation refers to a relationship where one variable directly affects or causes a change in another variable. It implies a cause-and-effect connection. On the other hand, correlation simply indicates that two variables are related or move together, but it doesn’t prove that one causes the other. For example, there may be a strong correlation between ice cream sales and drowning incidents during summer, but this does not mean ice cream causes drowning. Instead, both are influenced by a third factor — hot weather. In contrast, causation would be seen in a scenario like increased hours of studying leading to higher test scores, where one directly impacts the other. Thus, while correlation can suggest a potential relationship, causation confirms it with a direct link.

Q16. What is an Optimizer? What are different types of optimizers? Explain each with an example.

 - An optimizer is an algorithm used to adjust the weights and biases of a machine learning model during training to minimize the loss function (i.e., error). It helps the model learn the best parameters for accurate predictions.

⚙️ Common Types of Optimizers (with examples):
1️⃣ Gradient Descent (GD)
Basic optimization technique that updates weights by computing the gradient of the loss function.

Formula:
w = w - α * ∂L/∂w
(where α is learning rate, ∂L/∂w is the gradient)

Example:
Used in simple linear regression to minimize the difference between predicted and actual values.

2️⃣ Stochastic Gradient Descent (SGD)
Updates weights one sample at a time instead of the whole dataset.

Faster but may fluctuate during training.

Example:
Used in online learning or when training on large datasets like image classification.

3️⃣ Mini-Batch Gradient Descent
A balance between GD and SGD: updates weights using small batches of data.

Most commonly used in deep learning.

Example:
Training neural networks using TensorFlow or PyTorch with batch sizes of 32 or 64.

4️⃣ Adam (Adaptive Moment Estimation)
Combines momentum and RMSProp.

Adapts the learning rate for each parameter, leading to faster and more stable convergence.

Example:
Popular optimizer for deep learning models like CNNs or RNNs.

5️⃣ RMSprop (Root Mean Square Propagation)
Adjusts the learning rate based on the average of recent gradients.

Works well with non-stationary objectives (i.e., when the data distribution changes).

Example:
Used in training recurrent neural networks (RNNs).

6️⃣ Adagrad
Adapts learning rate based on the frequency of parameter updates.

Works well for sparse data (like text data).

Example:
Used in natural language processing tasks like sentiment analysis.

Q17. What is sklearn.linear_model ?

 - sklearn.linear_model is a module in the Scikit-learn library that provides linear models for regression and classification tasks.

Q18. What does model.fit() do? What arguments must be given?

 - The model.fit() function in Scikit-learn is used to train the machine learning model on your data. It learns the relationship between the input features (X) and the target variable (y) by adjusting internal parameters.



Q19. What does model.predict() do? What arguments must be given?

 - model.predict() is used to make predictions using a trained model. After the model has been fitted with model.fit(), you pass new input data to predict() to get the predicted output (labels or values).

model.predict(X) arguments must be given



Q20. What are continuous and categorical variables?

 - Continuous and Categorical Variables:

Continuous Variables are numeric variables that can take an infinite number of values within a range.
Example: height, weight, temperature, income.

Categorical Variables are variables that represent groups or categories. They can be nominal (no order, like gender or color) or ordinal (with order, like education level or rating).
Example: gender, country, product type, rating (low/medium/high).

👉 Continuous = measurable numbers

👉 Categorical = labels or categories

Q21. What is feature scaling? How does it help in Machine Learning?

 - Feature scaling is the process of standardizing or normalizing the range of independent variables (features) so that they are on a similar scale.

 ✅ Why is it Important in Machine Learning?

Improves model performance: Some algorithms (like KNN, SVM, Logistic Regression) are sensitive to the scale of data.

Faster convergence: Gradient-based models (e.g., linear regression, neural networks) train faster when features are scaled.

Avoids bias: Features with larger ranges can dominate those with smaller ranges without scaling.

Q22. How do we perform scaling in Python?

 - You can scale your features using StandardScaler or MinMaxScaler from the sklearn.preprocessing module.

Q23. What is sklearn.preprocessing?

 - sklearn.preprocessing is a module in Scikit-learn that provides tools to prepare and transform your data before feeding it into a machine learning model

Q24. How do we split data for model fitting (training and testing) in Python?

 - In Python, you can use the train_test_split() function from the sklearn.model_selection module to split your data into training and testing sets.



Q25. Explain data encoding?

 - Data encoding refers to the process of converting categorical variables into a numeric format that can be used by machine learning models. Many algorithms require input data to be in numeric form because they perform mathematical calculations that rely on numerical values.