Question 1: Explain the differences between AI, ML, Deep Learning (DL), and Data Science (DS).

Artificial Intelligence (AI) is a broad field that focuses on creating systems capable of performing tasks that typically require human intelligence.
Machine Learning (ML) is a subset of AI where algorithms learn patterns from data to make predictions or decisions without being explicitly programmed.
Deep Learning (DL) is a subset of ML that uses neural networks with many layers to model complex patterns in large datasets.
Data Science (DS) is the interdisciplinary field focused on extracting insights from data using statistics, ML, and domain knowledge.

Question 2: What are the types of machine learning? Describe each with one real-world example.

1. Supervised Learning: The model is trained on labeled data. Example: Predicting house prices using historical data.
2. Unsupervised Learning: The model finds patterns in unlabeled data. Example: Customer segmentation using clustering.
3. Reinforcement Learning: The model learns by interacting with an environment and receiving feedback in the form of rewards or penalties. Example: Training robots to walk.

Question 3: Define overfitting, underfitting, and the bias-variance tradeoff in machine learning.

Overfitting: When a model learns noise and details in the training data so well that it performs poorly on new data.
Underfitting: When a model is too simple and fails to capture patterns in the data.
Bias-Variance Tradeoff: The balance between bias (error from overly simplistic models) and variance (error from overly complex models).

Question 4: What are outliers in a dataset, and list three common techniques for handling them.

Outliers are data points that significantly deviate from the majority of observations.
Techniques:
1. Removal of outliers if they are due to data entry errors.
2. Transformation (e.g., log transformation) to reduce impact.
3. Capping or Winsorization to limit extreme values.

Question 5: Explain the process of handling missing values and mention one imputation technique for numerical and one for categorical data.

Handling missing values involves identifying, understanding the reason for missingness, and deciding whether to remove or impute them.
Numerical imputation: Replace missing values with the mean.
Categorical imputation: Replace missing values with the mode.

Question 6: Write a Python program that creates a synthetic imbalanced dataset and prints the class distribution.

In [None]:
from sklearn.datasets import make_classification
import numpy as np

X, y = make_classification(n_samples=1000, n_features=5, n_classes=2,
                           weights=[0.9, 0.1], random_state=42)

unique, counts = np.unique(y, return_counts=True)
print(dict(zip(unique, counts)))

Question 7: Implement one-hot encoding using pandas.

In [None]:
import pandas as pd

colors = ['Red', 'Green', 'Blue', 'Green', 'Red']
df = pd.DataFrame({'Color': colors})
encoded_df = pd.get_dummies(df, columns=['Color'])
print(encoded_df)

Question 8: Generate samples, introduce missing values, impute, and plot histograms.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

data = np.random.normal(50, 10, 1000)
missing_indices = np.random.choice(1000, 50, replace=False)
data_with_nan = data.copy()
data_with_nan[missing_indices] = np.nan

# Plot before imputation
plt.hist(data_with_nan[~np.isnan(data_with_nan)], bins=30, edgecolor='black')
plt.title('Before Imputation')
plt.show()

# Impute with mean
mean_val = np.nanmean(data_with_nan)
data_with_nan[np.isnan(data_with_nan)] = mean_val

# Plot after imputation
plt.hist(data_with_nan, bins=30, edgecolor='black')
plt.title('After Imputation')
plt.show()

Question 9: Implement Min-Max scaling.

In [None]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

data = np.array([[2], [5], [10], [15], [20]])
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)
print(scaled_data)

Question 10: Data preparation plan for customer transaction dataset.

Step 1: Handle missing ages → Impute with median (numerical).
Step 2: Handle outliers in transaction amount → Use IQR method to detect and cap values.
Step 3: Handle imbalance → Apply SMOTE or class weighting.
Step 4: Encode categorical variables → Use one-hot encoding for nominal features.
Step 5: Scale numerical features → Apply standard scaling or min-max scaling.