In [None]:
# 1. What is a parameter?

# A parameter in Machine Learning is a configuration variable that is learned from data during training.
# Example: In linear regression, the slope (weights) and intercept (bias) are parameters.
# These are not set manually, but adjusted by algorithms like gradient descent.

In [None]:
# 2. What is correlation?

# Correlation is a statistical measure that describes the relationship between two variables.
# Values range from -1 to +1.
# +1 = strong positive relationship, -1 = strong negative relationship, 0 = no relationship.

In [None]:
# 3. What does negative correlation mean?

# A negative correlation means that as one variable increases, the other decreases.
# Example: The number of hours spent watching TV 📺 vs. exam scores 📊.
# More TV time → lower scores → negative correlation.

In [None]:
# 4. Define Machine Learning. What are the main components in Machine Learning?

# Machine Learning (ML) is a field of AI that enables systems to learn from data and improve without being explicitly programmed.

# Main components:
# Data – Input for training.
# Features – Independent variables used for learning.
# Model – Algorithm/architecture that learns patterns.
# Loss Function – Measures prediction error.
# Optimizer – Minimizes loss by updating parameters.
# Evaluation – Validating performance using metrics.

In [None]:
# 5. How does loss value help in determining whether the model is good or not?

# The loss value tells how far the model’s predictions are from the actual output.
# Low loss → better model performance.
# Example: In regression, Mean Squared Error (MSE); in classification, Cross-Entropy Loss.
# If loss decreases over iterations, the model is learning correctly.

In [None]:
# 6. What are continuous and categorical variables?

# Continuous variables: Numeric, can take infinite values. (e.g., height, salary, temperature)
# Categorical variables: Represent groups/categories. (e.g., gender: Male/Female, country: India/USA)

In [None]:
# 7. How do we handle categorical variables in Machine Learning? What are the common techniques?

# Handling categorical data is essential because ML algorithms work with numbers.

# Techniques:
# Label Encoding – Assigns a number to each category (e.g., Male=0, Female=1).
# One-Hot Encoding – Creates binary columns for each category.
# Target Encoding – Replaces category with mean of target variable.

In [None]:
# 8. What do you mean by training and testing a dataset?

# Training dataset: Used to teach the model patterns.
# Testing dataset: Used to evaluate performance on unseen data.

# This ensures the model generalizes well.

In [None]:
# 9. How do you approach a Machine Learning problem?

# General ML problem-solving approach:
# Understand the problem & business objective.
# Collect & explore data.
# Perform EDA (Exploratory Data Analysis).
# Preprocess data (cleaning, encoding, scaling).
# Split into train/test sets.
# Train model(s).
# Evaluate using metrics (accuracy, precision, recall, etc.).
# Tune hyperparameters.
# Deploy the best-performing model.

In [None]:
# 10. Why do we have to perform EDA before fitting a model to the data?

# EDA (Exploratory Data Analysis) helps in:
# Understanding distributions, patterns, and relationships.
# Identifying missing values, outliers, and anomalies.
# Choosing correct preprocessing techniques.
# Deciding which features are useful.
# EDA ensures the dataset is clean and ready for modeling.

In [None]:
# 11. What is causation? Explain difference between correlation and causation with an example.

# Correlation: Two variables move together but don’t necessarily affect each other.
# Causation: One variable directly affects the other.

# Example:
# Ice cream sales 🍦 and drowning incidents 🏊 are correlated (both increase in summer).
# But ice cream does not cause drowning → correlation without causation.
# Smoking 🚬 causes lung cancer → causation.

In [None]:
# 12. What is an Optimizer? What are different types of optimizers? Explain each with an example.

# An optimizer updates model parameters to minimize loss.

# Types:
# Gradient Descent – Updates parameters in steps proportional to negative gradient.
# Stochastic Gradient Descent (SGD) – Uses one sample at a time for faster updates.
# Adam (Adaptive Moment Estimation) – Combines momentum & adaptive learning rates, widely used.
# RMSProp – Adjusts learning rate for each parameter individually.
# Example: In neural networks, Adam optimizer updates weights after each batch.

In [None]:
# 13. What is feature scaling? How does it help in Machine Learning?

# Feature scaling normalizes/standardizes numerical data.
# Helps because many ML models (like KNN, SVM, Gradient Descent) are sensitive to different ranges of values.

# Example: Age (20–60) vs. Salary (10,000–1,00,000). Without scaling, salary dominates.

In [None]:
# 14. Explain data encoding.

# Data encoding is the process of converting categorical variables into numerical format so that ML models can process them.

# Types:

# Label Encoding – assigns numeric labels.
# One-Hot Encoding – creates binary variables.
# Ordinal Encoding – assigns ordered values.

In [5]:
1. #Perform correlation between variables in Python

#Correlation shows relationships between numerical variables.

import pandas as pd

data = {
    'Age': [20, 22, 25, 30, 35],
    'Salary': [20000, 25000, 32000, 40000, 50000],
    'Experience': [1, 2, 3, 5, 7]
}
df = pd.DataFrame(data)

# Correlation matrix
correlation_matrix = df.corr()
print(correlation_matrix)



                 Age    Salary  Experience
Age         1.000000  0.997500    0.999422
Salary      0.997500  1.000000    0.997249
Experience  0.999422  0.997249    1.000000


In [8]:
#2 Encode categorical variables in Python

#Categorical variables (e.g., Gender, Country) need encoding.

import pandas as pd

data = {
    'Name': ['A', 'B', 'C', 'D'],
    'Gender': ['Male', 'Female', 'Female', 'Male'],
    'City': ['Delhi', 'Mumbai', 'Delhi', 'Bangalore']
}
df = pd.DataFrame(data)

# Label Encoding
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
df['Gender_Label'] = label_encoder.fit_transform(df['Gender'])

# One-Hot Encoding
df_onehot = pd.get_dummies(df, columns=['City'])

print(df)
print(df_onehot)

  Name  Gender       City  Gender_Label
0    A    Male      Delhi             1
1    B  Female     Mumbai             0
2    C  Female      Delhi             0
3    D    Male  Bangalore             1
  Name  Gender  Gender_Label  City_Bangalore  City_Delhi  City_Mumbai
0    A    Male             1           False        True        False
1    B  Female             0           False       False         True
2    C  Female             0           False        True        False
3    D    Male             1            True       False        False


In [9]:
#3 Perform scaling in Python

#Scaling ensures all features are in the same range.

import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler


data = {
    'Age': [20, 25, 30, 35, 40],
    'Salary': [20000, 35000, 50000, 65000, 80000]
}
df = pd.DataFrame(data)

# Standardization (mean=0, variance=1)
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
df_scaled = pd.DataFrame(scaled_data, columns=['Age', 'Salary'])

# Normalization (0 to 1 range)
minmax = MinMaxScaler()
normalized_data = minmax.fit_transform(df)
df_normalized = pd.DataFrame(normalized_data, columns=['Age', 'Salary'])

print("Standardized Data:\n", df_scaled)
print("\nNormalized Data:\n", df_normalized)

Standardized Data:
         Age    Salary
0 -1.414214 -1.414214
1 -0.707107 -0.707107
2  0.000000  0.000000
3  0.707107  0.707107
4  1.414214  1.414214

Normalized Data:
     Age  Salary
0  0.00    0.00
1  0.25    0.25
2  0.50    0.50
3  0.75    0.75
4  1.00    1.00


In [10]:
#4. Split dataset into training and testing in Python

# Splitting ensures the model is tested on unseen data.

import pandas as pd
from sklearn.model_selection import train_test_split

data = {
    'Feature1': [5, 10, 15, 20, 25, 30, 35, 40],
    'Feature2': [1, 2, 3, 4, 5, 6, 7, 8],
    'Target':   [0, 1, 0, 1, 0, 1, 0, 1]
}
df = pd.DataFrame(data)

X = df[['Feature1', 'Feature2']]   # Features
y = df['Target']                   # Target

# Splitting into 70% training, 30% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print("Training Features:\n", X_train)
print("\nTesting Features:\n", X_test)

Training Features:
    Feature1  Feature2
7        40         8
2        15         3
4        25         5
3        20         4
6        35         7

Testing Features:
    Feature1  Feature2
1        10         2
5        30         6
0         5         1
