Q1. Define a Markov Decision Process (MDP).

A Markov Decision Process (MDP) is a mathematical framework used to model decision-making problems where outcomes are partly random and partly controlled by an agent. It helps determine the best sequence of actions to maximize long-term rewards in an environment.

Q2. List and explain the five components of an MDP.

An MDP consists of five components:

1. All possible situations the agent can be in.

2. All possible actions the agent can take.

3. Probability of moving from one state to another after an action.

4. Immediate feedback received after taking an action.

5. Determines how much future rewards are valued compared to immediate rewards.

Q3. Explain the Markov Property.

The Markov Property states that the future state depends only on the current state and action, not on the sequence of past states. In other words, the present contains all the information needed to make decisions.

Q4. Difference between policy vs value function, and reward vs return.

Policy vs Value Function:
A policy defines what action to take in each state, while a value function estimates how good it is to be in a state (or take an action) in terms of future rewards.

Reward vs Return:
Reward is the immediate feedback after an action, whereas return is the total accumulated reward over time, usually discounted.

Q5. Robot grid scenario analysis.

States: Each position in the grid

Actions: Move up, down, left, right

Rewards: Positive reward for reaching the goal, negative reward for each move

Task Type: Sequential decision-making problem

Deterministic Policy: Always move in the direction that minimizes distance to the goal

Q6. Numerical Exercise (Œ≥ = 0.9, rewards = [5, 2, 1]).

The return is calculated as:

ùê∫
=
5
+
0.9
√ó
2
+
0.9
2
√ó
1
=
5
+
1.8
+
0.81
=
7.61
G=5+0.9√ó2+0.9
2
√ó1=5+1.8+0.81=7.61

In [1]:
#Q1. Generate numbers from 1 to 20 and extract even numbers.

numbers = list(range(1, 21))

even_numbers_loop = []
for num in numbers:
    if num % 2 == 0:
        even_numbers_loop.append(num)

even_numbers_comp = [num for num in numbers if num % 2 == 0]

even_numbers_loop, even_numbers_comp


([2, 4, 6, 8, 10, 12, 14, 16, 18, 20], [2, 4, 6, 8, 10, 12, 14, 16, 18, 20])

In [2]:
#Q2. Write a function to compute the mean of a list.

def calculate_mean(values):
    return sum(values) / len(values)

sample_data = [10, 20, 30, 40]
calculate_mean(sample_data)


25.0

In [3]:
#Q3. Create NumPy arrays and compute statistics.

import numpy as np

data_array = np.array([4, 8, 15, 16, 23, 42])

mean_value = np.mean(data_array)
std_value = np.std(data_array)

mean_value, std_value


(np.float64(18.0), np.float64(12.315302134607444))

In [4]:
#Q4. Create a Pandas DataFrame and filter rows.

import pandas as pd

data = {
    "Name": ["A", "B", "C", "D"],
    "Score": [85, 60, 90, 55]
}

df = pd.DataFrame(data)

filtered_df = df[df["Score"] > 70]
filtered_df


Unnamed: 0,Name,Score
0,A,85
2,C,90


Section 3: Decision Trees

Q1. What problems do decision trees solve?

Decision trees are used for classification and regression problems. They work well for structured data and problems that require interpretable decision rules.

Q2. Define entropy and information gain.



Entropy measures the uncertainty in a dataset. Information gain measures how much entropy is reduced after splitting the data based on a feature.

Q3. Why do decision trees overfit?



Decision trees overfit when they grow too deep and start memorizing noise in the training data instead of learning general patterns.

Q4. Best root node (intuitive).



The best root node is the feature that splits the data most effectively into pure subsets, usually the one with the highest information gain.

Python Practice: Train a DecisionTreeClassifier.

 Train a DecisionTreeClassifier with max_depth=3 and generate predictions.

In [5]:
from sklearn.tree import DecisionTreeClassifier

X = [[25], [30], [45], [35], [22]]
y = [0, 0, 1, 1, 0]

model = DecisionTreeClassifier(max_depth=3)
model.fit(X, y)

predictions = model.predict(X)
predictions


array([0, 0, 1, 1, 0])

Section 4: K-Means Clustering


Q1. Is K-Means supervised or unsupervised?




K-Means is an unsupervised learning algorithm because it does not use labeled data.

Q2. Describe the K-Means algorithm steps.



1.Choose the number of clusters (k)

2.Initialize cluster centroids

3.Assign points to the nearest centroid

4.Update centroids

5.Repeat until convergence

Q3. Why is feature scaling important?





Feature scaling ensures that all features contribute equally to distance calculations used by K-Means.

Q4. Explain the Elbow Method and inertia.



Inertia measures the sum of squared distances within clusters. The Elbow Method helps choose k by finding a point where inertia stops decreasing rapidly.

ppython Practice: Fit a KMeans model with k=3 and extract cluster labels.

In [6]:
from sklearn.cluster import KMeans
import numpy as np

data_points = np.array([
    [25, 40000],
    [45, 80000],
    [23, 35000],
    [35, 65000]
])

kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(data_points)

cluster_labels = kmeans.labels_
cluster_labels


array([1, 0, 1, 2], dtype=int32)

Section 5: Applied Scenario

You are given customer data (Age, Income, Spending Score). Choose between Decision Tree, K-Means, or MDP
for segmentation and justify your choice. If recommendations are sequential, which concept applies?


answer::-

K-Means clustering is the most suitable algorithm for customer segmentation because the goal is to group customers based on similarities in their attributes such as Age, Income, and Spending Score, without having any predefined class labels. This makes the problem an unsupervised learning task. K-Means works by measuring distances between data points and forming clusters such that customers within the same cluster have similar characteristics and behavior patterns. These clusters can then be used by businesses to design targeted marketing strategies, personalized offers, and customer-specific services.

On the other hand, Decision Trees are mainly used for supervised learning tasks where labeled outcomes are available, such as predicting whether a customer will buy a product or not. Since customer segmentation does not involve labeled target variables, Decision Trees are less appropriate for this task.

If customer recommendations or decisions depend on a sequence of actions over time, such as recommending products based on previous purchases, browsing history, or long-term engagement, then a Markov Decision Process (MDP) becomes more suitable. MDPs are designed for sequential decision-making problems, where the current decision affects future states and rewards. In such cases, the system learns an optimal policy that maximizes long-term rewards rather than making isolated decisions.

In summary, K-Means is best for static customer segmentation based on similarity, while MDPs are appropriate when customer interactions are dynamic and decisions must consider future outcomes.