Question 1: What is Dimensionality Reduction? Why is it important in machine
learning?
Answer: Dimensionality Reduction is the process of reducing the number of input features (variables) in a dataset while keeping as much important information as possible.
**Its important in machine learning**
- Removes noise and redundant features
- Reduces computation time
- Helps prevent Overfitting
- Makes data visualization possible
- Improves model accuracy

Question 2: Name and briefly describe three common dimensionality reduction
techniques.
Answer:
1. Principal Component Analysis (PCA)
- PCA transforms the original features into a smaller set of new features called principal components.
- These components capture maximum variance (important information) from the data.
- It is widely used for speeding up models, noise removal, and visualization.

2. Linear Discriminant Analysis (LDA)
- LDA is a supervised technique used for classification problems.
- It reduces dimensions by finding feature combinations that maximize separation between classes.
- Useful when the goal is to improve classification performance.

3. t-SNE (t-Distributed Stochastic Neighbor Embedding)
- A technique mainly used for visualizing high-dimensional data in 2D or 3D.
- It preserves local structure, meaning points that are close in high-dimensional space remain close in the plot.
- Popular in visualizing clusters (e.g., images, embeddings).

Question 3: What is clustering in unsupervised learning? Mention three popular
clustering algorithms.
Answer: Clustering is a method in unsupervised learning where we group similar data points together.
**Three Popular Clustering Algorithms**
1. K-Means Clustering
- Groups data into K clusters based on distance.
- Simple, fast, and widely used.

2. Hierarchical Clustering
- Builds a tree-like structure of clusters.
- You can choose the number of clusters by cutting the tree.

3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
- Forms clusters based on densely packed points.
- Can detect irregular-shaped clusters and outliers.

Question 4: Explain the concept of anomaly detection and its significance.
Answer: Anomaly detection is the process of identifying unusual, rare, or abnormal data points that do not follow the expected pattern.

**Significance of Anomaly Detection**
1. Detects Fraud
Used in:
- Credit card transactions
- Online payments
- Banking systems
It helps catch suspicious activities early.

2. Improves Safety
In industries, anomaly detection can identify:
- Machine failures
- Equipment overheating
- System breakdown risk
This prevents accidents.

3. Ensures Data Quality
Anomalies may be due to:
- Mistakes in data entry
- Sensor errors
- Incorrect measurements
Detecting them improves dataset accuracy

4. Helps in Security
Useful in:
- Network intrusion detection
- Detecting abnormal login attempts
- Spotting cyberattacks

Question 5: List and briefly describe three types of anomaly detection techniques.
Answer:
1. Statistical Methods
These techniques assume that normal data follows a certain statistical pattern (like mean and standard deviation).
Anything too far from the expected range is marked as an anomaly.

2. Clustering-Based Methods
These methods group similar data points into clusters.
Points that don't fit into any cluster or are too far from cluster centers are considered anomalies.

3. Machine Learning-Based Methods (Model-Based)
These models learn what “normal data” looks like.
Data points the model cannot explain well are flagged as anomalies.

Common ML methods:
- Isolation Forest
- One-Class SVM
- Autoencoders (Neural Networks)

Question 6: What is time series analysis? Mention two key components of time series data.
Answer: Time series analysis is the process of studying data that is collected over time at regular intervals (daily, monthly, yearly, etc.) to understand patterns, trends, and future behavior.
Examples:
- Daily temperature readings
- Monthly sales data
- Stock prices changing every minute

Two Key Components of Time Series Data
1. Trend
A long-term increase or decrease in the data.
Example: Sales increasing steadily year after year.

2. Seasonality
Patterns that repeat at regular intervals.
Example: Ice-cream sales rising every summer.

Question 7: Describe the difference between seasonality and cyclic behavior in time series.
Answer:
**Seasonality**
- Repeating patterns at fixed, regular intervals.
- Happens due to calendar-based factors like weather, festivals, or months.
Examples:
- Ice-cream sales increase every summer
- Shopping spikes every December

**Cyclic Behavior**
- Long-term ups and downs that do not follow a fixed schedule.
- Usually influenced by economic, social, or business cycles.
Examples:
- Economic recession and recovery
- Housing market rise and fall

Question 8: Write Python code to perform K-means clustering on a sample dataset.
(Include your Python code and output in the code box below.)
Answer:



In [1]:
import numpy as np
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

# Create a sample dataset
X, y = make_blobs(n_samples=200, centers=3, random_state=42)

# Apply K-Means
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

# Output cluster centers and first 10 labels
output = {
    "Cluster Centers": kmeans.cluster_centers_.tolist(),
    "First 10 Cluster Labels": kmeans.labels_[:10].tolist()
}

output


{'Cluster Centers': [[-2.6588212877132675, 8.957568212540515],
  [-6.745393807586039, -6.851443370075764],
  [4.632182275818423, 2.1012137717846726]],
 'First 10 Cluster Labels': [2, 0, 1, 1, 1, 1, 0, 1, 2, 1]}

Question 9: What is inheritance in OOP? Provide a simple example in Python.
Answer: Inheritance is an Object-Oriented Programming (OOP) concept where a child class (subclass) can inherit properties and methods from a parent class (superclass).

- It allows code reusability
- Makes programs easier to extend and maintain

In [2]:
# Parent Class
class Animal:
    def sound(self):
        return "This animal makes a sound"

# Child Class (inherits from Animal)
class Dog(Animal):
    def sound(self):
        return "Dog barks"

# Create objects
a = Animal()
d = Dog()

print(a.sound())   # Output: This animal makes a sound
print(d.sound())   # Output: Dog barks


This animal makes a sound
Dog barks


Question 10: How can time series analysis be used for anomaly detection?
Answer: Time series analysis helps detect unusual or unexpected data points that do not follow the normal pattern over time.
It looks for points that suddenly behave differently from the usual trend, seasonality, or patterns in the data.

1. Detect deviations from trends

If values suddenly jump or drop far away from the usual trend, they are flagged as anomalies.

2. Identify unusual seasonal behavior

If data behaves differently from normal seasonal patterns, it is considered an anomaly.

3. Use forecasting models to detect anomalies

Models like ARIMA, LSTM, Prophet can predict future values.
If the actual value is far from the predicted value, it is an anomaly.

4. Use statistical thresholds

- Time series data often uses:
- Mean
- Standard deviation
- Moving average
- Z-scores