Question 1 : What is Dimensionality Reduction? Why is it important in machine
learning?

Answer

Dimensionality Reduction is a process in machine learning where we reduce the number of input variables (features) while keeping as much important information as possible.
In simple words: it means shrinking the dataset but keeping the meaning.

Example:
If your dataset has 100 features, dimensionality reduction may reduce it to 10–20 meaningful features.

Common techniques include:

PCA (Principal Component Analysis)

t-SNE

LDA (Linear Discriminant Analysis)

Autoencoders

Why is Dimensionality Reduction Important in Machine Learning?

1. Reduces Overfitting                Fewer features → less noise → the model learns meaningful patterns instead of memorizing.

2. Improves Training Speed
Models train much faster when the number of features is small.

3. Improves Visualization
We can compress high-dimensional data (like 100D) into 2D or 3D to visualize clusters or patterns.

4. Removes Multicollinearity
Many features in a dataset may be redundant or correlated. Dimensionality reduction keeps only the most important ones.

5. Reduces Storage & Computation Cost
Less data → less memory → more efficiency.

Question 2: Name and briefly describe three common dimensionality reduction
techniques..

Answer

1. Principal Component Analysis (PCA)

PCA is a linear dimensionality reduction technique that transforms the original features into a new set of uncorrelated features called principal components.
These components capture the maximum variance in the data, allowing us to reduce dimensions while keeping important information.

2. Linear Discriminant Analysis (LDA)

LDA is a supervised technique that reduces dimensions by finding a new feature space that maximizes class separability.
It is mainly used for classification tasks and tries to keep features that best separate different classes.

3. t-Distributed Stochastic Neighbor Embedding (t-SNE)

t-SNE is a nonlinear technique mainly used for visualization.
It reduces high-dimensional data to 2D or 3D, preserving local structure (nearby points remain close).
It is very effective for visualizing clusters in datasets like images or text embeddings.

Question 3: What is clustering in unsupervised learning? Mention three popular
clustering algorithms.

Answer

Clustering is an unsupervised learning technique that groups data points into clusters based on their similarity.
The idea is that points in the same cluster are more similar to each other than to points in other clusters.

It is used when we do not have labeled data, and we want to discover natural patterns or groups in the dataset.

Common applications: customer segmentation, anomaly detection, image grouping, document clustering, etc.

Three Popular Clustering Algorithms
1. K-Means Clustering

Divides data into K groups based on distance to the cluster centers (centroids).

Simple, fast, and widely used.

2. Hierarchical Clustering

Builds a tree-like structure (dendrogram) of clusters.

Can be agglomerative (bottom-up) or divisive (top-down).

3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Forms clusters based on dense regions of data.

Can detect arbitrary-shaped clusters and identify noise/outliers.

Question 4: Explain the concept of anomaly detection and its significance

Answer

Concept of Anomaly Detection

Anomaly detection is the process of identifying data points, events, or patterns that deviate significantly from the normal behavior of a system.
These unusual observations are called anomalies or outliers.
It is commonly used when abnormal events are rare but critical, and labels may not be available.

Significance of Anomaly Detection
1. Fraud Detection

Used in banking and finance to detect unusual transactions that may indicate credit card fraud or identity theft.

2. Network and Cybersecurity

Helps in spotting suspicious network activity, malware, or intrusions by detecting abnormal patterns.

3. Fault & Failure Detection

Industries use it to detect machine failures or sensor faults early, preventing major damage or downtime.

4. Quality Control in Manufacturing

Identifies defective or abnormal products during production.

5. Healthcare Monitoring

Detects abnormal patient vital signs, helping with early diagnosis of emergencies.


Question 5: List and briefly describe three types of anomaly detection techniques.

Answer
1. Statistical (Probability-Based) Methods

These methods assume that normal data follows a certain statistical distribution (e.g., Gaussian).
Any point that falls far from the expected distribution is flagged as an anomaly.
Example: Z-score, Gaussian model, moving averages.

2. Distance-Based Methods

These methods detect anomalies by measuring the distance between data points.
Points that are far away from most other points are considered anomalies.
Example: K-Nearest Neighbors (KNN) anomaly detection.

3. Density-Based Methods

These methods look at how dense the neighborhood around a data point is.
Points in low-density regions are classified as anomalies.
Example: DBSCAN, Local Outlier Factor (LOF).

Question 6: What is time series analysis? Mention two key components of time series
data.

Answer
Time series analysis is a method of analyzing data that is collected over time at regular intervals (e.g., hourly, daily, monthly).
The goal is to identify patterns, trends, and relationships in the data to make forecasts or understand past behavior.

Examples: stock prices, temperature readings, sales data, electricity usage.

Two Key Components of Time Series Data
1. Trend

The long-term increase or decrease in the data over time.
Example: A steady rise in yearly sales.

2. Seasonality

Regular, repeating patterns that occur at fixed intervals.
Example: Higher ice cream sales every summer.

Question 7: Describe the difference between seasonality and cyclic behavior in time
series.


Answer

Seasonality refers to regular, repeating patterns in a time series that occur at fixed, known intervals.

These patterns are usually driven by calendar-related effects such as daily, weekly, monthly, or yearly cycles.

Example:
High sales during every Diwali season or increased electricity usage every summer.

Cyclic Behavior

Cyclic behavior refers to fluctuations that occur over long periods, but not at fixed or regular intervals.

Cycles are usually related to economic or business conditions, and their duration can vary widely.

Question 8: Write Python code to perform K-means clustering on a sample dataset.
(Include your Python code and output in the code box below.)


Answer

In [3]:
import numpy as np
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=100, centers=3, random_state=42)


kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

labels = kmeans.labels_
centroids = kmeans.cluster_centers_

labels[:10], centroids



(array([1, 2, 0, 2, 1, 2, 0, 2, 2, 0], dtype=int32),
 array([[-2.66780392,  8.93576069],
        [-6.95170962, -6.67621669],
        [ 4.49951001,  1.93892013]]))

Question 9: What is inheritance in OOP? Provide a simple example in Python.

Answer

Inheritance is an important concept in Object-Oriented Programming (OOP) where one class (child / subclass) can acquire the properties and methods of another class (parent / superclass).
It promotes code reusability, extensibility, and a clean structure.

In [4]:

class Animal:
    def sound(self):
        return "Animals make different sounds"


class Dog(Animal):
    def sound(self):
        return "Dog barks"


a = Animal()
d = Dog()

print(a.sound())
print(d.sound())


Animals make different sounds
Dog barks


Question 10: How can time series analysis be used for anomaly detection?

Answer

Time series analysis can be used for anomaly detection by identifying unexpected changes or unusual patterns in data that is recorded over time. The goal is to detect points or periods that do not follow the normal trend, seasonality, or behavior of the time series.

How Time Series Helps in Anomaly Detection
1. Detecting Sudden Spikes or Drops

Time series models (like ARIMA, Moving Average, or Exponential Smoothing) predict expected values.
If the actual value deviates significantly from the prediction → it's an anomaly.

Example:
A sudden drop in website traffic or a sudden spike in electricity usage.

2. Identifying Seasonal Anomalies

Time series models understand seasonality (e.g., daily, monthly, yearly patterns).
If a value breaks the usual seasonal pattern → anomaly.

Example:
Very low sales during a festival when sales are normally high.

3. Detecting Trend Changes

Unexpected change in long-term upward or downward trend indicates an anomaly.

Example:
Steady temperature rise suddenly stops or reverses sharply.

4. Using Statistical Thresholds

We can compute rolling mean and standard deviation.
If a data point falls outside acceptable limits (like 3 standard deviations) → anomaly.

5. Using Machine Learning Time Series Models

Models like LSTM, Prophet, and Isolation Forest can learn time-based patterns and detect anomalies when behavior changes.

Summary

Time series analysis detects anomalies by comparing observed values with expected patterns derived from:

Trends

Seasonality

Forecasting models

Statistical deviations