1. Problem Definition & Objective
a. Selected Project Track

AI in Personalized Learning

b. Problem Statement

Traditional education systems often follow a one-size-fits-all approach, where the same learning content is delivered to all students regardless of their individual understanding and progress. This can lead to learning gaps, reduced engagement, and inefficient study efforts, especially when students struggle with specific topics.

With the availability of student performance data, there is a growing need for intelligent systems that can analyze learning progress and provide personalized recommendations. This project addresses this need by developing an AI-based system that adapts learning paths based on a student’s performance across different topics.

c. Objective of the Project

The objectives of this project are:

To analyze student performance data across multiple topics.

To classify students into learning levels (Beginner, Intermediate, Advanced) using AI techniques.

To recommend personalized learning paths based on identified strengths and weaknesses.

To demonstrate the role of AI in enabling adaptive and data-driven learning systems.

## 2. Data Understanding & Preparation

This section describes the dataset used for the personalized learning recommendation system, including the data source, structure, preprocessing steps, and feature preparation required for model training.


### 2.1 Dataset Source

The dataset used in this project is a **synthetic student performance dataset** created specifically for demonstrating an AI-based personalized learning system.

The dataset simulates quiz scores of students across different learning topics. Synthetic data is suitable for this project as it avoids privacy concerns, allows controlled experimentation, and clearly illustrates how personalized learning systems operate in real-world educational platforms.


In [25]:
import pandas as pd
import numpy as np

# Create a synthetic dataset
data = {
    "student_id": [1,1,1,2,2,2,3,3,3,4,4,4],
    "topic": ["Basics","Loops","Arrays",
              "Basics","Loops","Arrays",
              "Basics","Loops","Arrays",
              "Basics","Loops","Arrays"],
    "score": [85, 60, 40,
              78, 55, 65,
              92, 88, 80,
              50, 45, 30]
}

df = pd.DataFrame(data)
df


Unnamed: 0,student_id,topic,score
0,1,Basics,85
1,1,Loops,60
2,1,Arrays,40
3,2,Basics,78
4,2,Loops,55
5,2,Arrays,65
6,3,Basics,92
7,3,Loops,88
8,3,Arrays,80
9,4,Basics,50


### 2.3 Data Exploration

To understand the dataset, we examine its structure, data types, and basic statistical properties.


In [26]:
# Display basic information
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   student_id  12 non-null     int64 
 1   topic       12 non-null     object
 2   score       12 non-null     int64 
dtypes: int64(2), object(1)
memory usage: 420.0+ bytes


In [27]:
# Statistical summary of scores
df.describe()


Unnamed: 0,student_id,score
count,12.0,12.0
mean,2.5,64.0
std,1.167748,20.538213
min,1.0,30.0
25%,1.75,48.75
50%,2.5,62.5
75%,3.25,81.25
max,4.0,92.0


### 2.4 Data Cleaning & Preprocessing

The dataset is small and well-structured, with no missing or duplicate values. However, basic preprocessing is still performed to ensure data quality and consistency.

The learning topics are categorical in nature and will be encoded for use in machine learning models.


In [28]:
# Check for missing values
df.isnull().sum()


Unnamed: 0,0
student_id,0
topic,0
score,0


In [29]:
# Encode topic as numerical values
df["topic_encoded"] = df["topic"].astype("category").cat.codes
df


Unnamed: 0,student_id,topic,score,topic_encoded
0,1,Basics,85,1
1,1,Loops,60,2
2,1,Arrays,40,0
3,2,Basics,78,1
4,2,Loops,55,2
5,2,Arrays,65,0
6,3,Basics,92,1
7,3,Loops,88,2
8,3,Arrays,80,0
9,4,Basics,50,1


### 2.5 Feature Engineering

For personalized learning recommendations, the primary feature is the student's performance score for each topic.

Additionally, a learning level label is created based on score thresholds:
- Beginner: Score < 50
- Intermediate: Score between 50 and 75
- Advanced: Score > 75

These labels will be used as target variables for the AI model.


In [30]:
# Function to assign learning level
def assign_level(score):
    if score < 50:
        return "Beginner"
    elif score <= 75:
        return "Intermediate"
    else:
        return "Advanced"

df["learning_level"] = df["score"].apply(assign_level)
df


Unnamed: 0,student_id,topic,score,topic_encoded,learning_level
0,1,Basics,85,1,Advanced
1,1,Loops,60,2,Intermediate
2,1,Arrays,40,0,Beginner
3,2,Basics,78,1,Advanced
4,2,Loops,55,2,Intermediate
5,2,Arrays,65,0,Intermediate
6,3,Basics,92,1,Advanced
7,3,Loops,88,2,Advanced
8,3,Arrays,80,0,Advanced
9,4,Basics,50,1,Intermediate


### 2.6 Final Prepared Dataset

After preprocessing and feature engineering, the dataset is ready for model training and recommendation generation. The final dataset contains encoded features and labeled learning levels that enable the AI system to adapt learning paths based on student performance.


## 3. Model / System Design

This section explains the design of the AI-based personalized learning path recommendation system, including the AI techniques used, the system pipeline, and the justification for design choices.


### 3.1 AI Technique Used

This project uses a **Machine Learning–based classification approach combined with a rule-based recommendation strategy** to personalize learning paths.

- **Machine Learning (ML)** is used to classify a student's learning level for each topic based on performance scores.
- **Rule-based logic** is used to recommend appropriate learning material corresponding to the predicted learning level.

A **Decision Tree Classifier** is selected as the primary ML model due to its simplicity, interpretability, and suitability for small educational datasets.


### 3.2 System Architecture / Pipeline

The system follows a structured AI pipeline consisting of the following steps:

1. **Input Data Collection**  
   Student performance data (quiz scores per topic) is provided as input.

2. **Data Preprocessing**  
   - Cleaning and validation of scores  
   - Encoding categorical features  
   - Generation of learning level labels  

3. **Model Training**  
   A Decision Tree Classifier is trained using student scores to predict learning levels:
   - Beginner  
   - Intermediate  
   - Advanced  

4. **Prediction & Inference**  
   The trained model predicts the learning level for each student-topic pair.

5. **Recommendation Generation**  
   Based on the predicted learning level, the system recommends:
   - Remedial content for beginners  
   - Practice material for intermediate learners  
   - Advanced resources for advanced learners  

6. **Output**  
   Personalized learning paths are generated for each student.


### 3.3 Scalability and Extensibility

Although the current system is implemented as a prototype, it is designed to be scalable. The model can be extended to:

- Support more topics and subjects
- Integrate real-time student data
- Be deployed as a web or mobile application
- Incorporate advanced models such as neural networks or reinforcement learning

This modular design allows easy integration with real-world e-learning platforms.


## 4. Core Implementation

This section implements the core AI logic of the personalized learning path recommendation system. It includes model training, prediction of learning levels, and generation of personalized learning recommendations based on student performance.


### 4.1 Model Training

A Decision Tree Classifier is trained to predict the learning level of students based on their performance scores. The model learns decision rules that map scores to learning levels.


In [31]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# Encode target labels
label_encoder = LabelEncoder()
df["learning_level_encoded"] = label_encoder.fit_transform(df["learning_level"])

# Features and target
X = df[["score"]]
y = df["learning_level_encoded"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Initialize and train the model
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)


### 4.2 Model Prediction

The trained model is used to predict the learning level for unseen student-topic score inputs.


In [32]:
# Make predictions
y_pred = model.predict(X_test)

# Decode predicted labels
predicted_levels = label_encoder.inverse_transform(y_pred)
predicted_levels


array(['Beginner', 'Intermediate', 'Advanced', 'Advanced'], dtype=object)

### 4.3 Recommendation Logic

Based on the predicted learning level, the system recommends appropriate learning material. This recommendation logic ensures that students receive content suited to their current understanding.


In [33]:
# Recommendation function
def recommend_content(level):
    if level == "Beginner":
        return "Recommended: Basic tutorials and remedial practice exercises"
    elif level == "Intermediate":
        return "Recommended: Practice problems and conceptual explanations"
    else:
        return "Recommended: Advanced tutorials and challenge problems"

# Apply recommendations
recommendations = [recommend_content(level) for level in predicted_levels]
recommendations


['Recommended: Basic tutorials and remedial practice exercises',
 'Recommended: Practice problems and conceptual explanations',
 'Recommended: Advanced tutorials and challenge problems',
 'Recommended: Advanced tutorials and challenge problems']

### 4.4 Sample Personalized Recommendations

The following table shows sample predictions and corresponding personalized learning recommendations generated by the system.


In [34]:
# Display sample results
results = X_test.copy()
results["Predicted Learning Level"] = predicted_levels
results["Recommendation"] = recommendations
results


Unnamed: 0,score,Predicted Learning Level,Recommendation
10,45,Beginner,Recommended: Basic tutorials and remedial prac...
9,50,Intermediate,Recommended: Practice problems and conceptual ...
0,85,Advanced,Recommended: Advanced tutorials and challenge ...
8,80,Advanced,Recommended: Advanced tutorials and challenge ...


### 4.5 End-to-End Pipeline Execution

The system successfully performs the complete pipeline:
- Accepts student performance scores
- Predicts learning levels using an AI model
- Generates personalized learning recommendations

This demonstrates a functional AI-based personalized learning system prototype.


### 4.6 Student Performance Evaluation Module

This module simulates a real-world scenario where a student provides their performance data. The system evaluates the input using the trained AI model and generates personalized learning recommendations.


In [41]:
def evaluate_student(student_id, topic_scores):
    """
    student_id: int
    topic_scores: dictionary {topic: score}
    """
    print(f"\nEvaluation Report for Student ID: {student_id}")
    print("=" * 60)

    weak_topics = []
    moderate_topics = []
    strong_topics = []

    for topic, score in topic_scores.items():
        # Create DataFrame with correct feature name
        input_df = pd.DataFrame({"score": [score]})

        # Predict learning level
        predicted_level_encoded = model.predict(input_df)[0]
        predicted_level = label_encoder.inverse_transform([predicted_level_encoded])[0]

        # Categorize topics
        if predicted_level == "Beginner":
            weak_topics.append(topic)
        elif predicted_level == "Intermediate":
            moderate_topics.append(topic)
        else:
            strong_topics.append(topic)

        # Recommendation
        recommendation = recommend_content(predicted_level)

        print(f"\nTopic: {topic}")
        print(f"Score: {score}")
        print(f"Learning Level: {predicted_level}")
        print(f"{recommendation}")

    # Overall Summary
    print("\n" + "=" * 60)
    print("OVERALL PERFORMANCE SUMMARY")

    print(f"\nStrong Topics: {strong_topics if strong_topics else 'None'}")
    print(f"Moderate Topics: {moderate_topics if moderate_topics else 'None'}")
    print(f"Weak Topics: {weak_topics if weak_topics else 'None'}")

    if weak_topics:
        print("\nOverall Recommendation:")
        print("Focus on strengthening weak topics before moving to advanced material.")
    else:
        print("\nOverall Recommendation:")
        print("Good progress overall. Continue practicing and explore advanced topics.")


In [42]:
# Simulated student input
student_input = {
    "Basics": 72,
    "Loops": 45,
    "Arrays": 88
}

# Run evaluation
evaluate_student(student_id=101, topic_scores=student_input)



Evaluation Report for Student ID: 101

Topic: Basics
Score: 72
Learning Level: Advanced
Recommended: Advanced tutorials and challenge problems

Topic: Loops
Score: 45
Learning Level: Beginner
Recommended: Basic tutorials and remedial practice exercises

Topic: Arrays
Score: 88
Learning Level: Advanced
Recommended: Advanced tutorials and challenge problems

OVERALL PERFORMANCE SUMMARY

Strong Topics: ['Basics', 'Arrays']
Moderate Topics: None
Weak Topics: ['Loops']

Overall Recommendation:
Focus on strengthening weak topics before moving to advanced material.


In [43]:
def interactive_student_evaluation():
    print("AI-Based Personalized Learning Evaluation")
    print("=" * 60)

    # Student ID
    student_id = input("Enter Student ID: ")

    # Topics to evaluate
    topics = ["Basics", "Loops", "Arrays"]
    topic_scores = {}

    for topic in topics:
        while True:
            try:
                score = float(input(f"Enter score for {topic} (0–100): "))
                if 0 <= score <= 100:
                    topic_scores[topic] = score
                    break
                else:
                    print("Please enter a valid score between 0 and 100.")
            except ValueError:
                print("Invalid input. Please enter a numeric value.")

    # Evaluate student
    evaluate_student(student_id, topic_scores)


In [44]:
interactive_student_evaluation()


AI-Based Personalized Learning Evaluation
Enter Student ID: 1
Enter score for Basics (0–100): 100
Enter score for Loops (0–100): 75
Enter score for Arrays (0–100): 85

Evaluation Report for Student ID: 1

Topic: Basics
Score: 100.0
Learning Level: Advanced
Recommended: Advanced tutorials and challenge problems

Topic: Loops
Score: 75.0
Learning Level: Advanced
Recommended: Advanced tutorials and challenge problems

Topic: Arrays
Score: 85.0
Learning Level: Advanced
Recommended: Advanced tutorials and challenge problems

OVERALL PERFORMANCE SUMMARY

Strong Topics: ['Basics', 'Loops', 'Arrays']
Moderate Topics: None
Weak Topics: None

Overall Recommendation:
Good progress overall. Continue practicing and explore advanced topics.


## 5. Evaluation & Analysis

This section evaluates the performance of the AI model used in the personalized learning path recommendation system. Both quantitative metrics and qualitative analysis are used to assess the effectiveness of the system.


In [37]:
from sklearn.metrics import accuracy_score, classification_report

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
accuracy


1.0

In [38]:
# Detailed classification report
print(classification_report(y_test, y_pred, target_names=label_encoder.classes_))


              precision    recall  f1-score   support

    Advanced       1.00      1.00      1.00         2
    Beginner       1.00      1.00      1.00         1
Intermediate       1.00      1.00      1.00         1

    accuracy                           1.00         4
   macro avg       1.00      1.00      1.00         4
weighted avg       1.00      1.00      1.00         4



### 5.3 Sample Output Analysis

The model successfully predicts learning levels based on student performance scores. Sample predictions demonstrate that:

- Students with low scores are classified as **Beginner** and receive remedial recommendations.
- Students with moderate scores are classified as **Intermediate** and receive practice-oriented content.
- Students with high scores are classified as **Advanced** and receive challenging learning resources.

This confirms that the system adapts learning paths according to individual performance.


### 5.4 Qualitative Evaluation

In addition to numerical accuracy, the system is evaluated qualitatively by examining the relevance of the generated recommendations.

The recommendations align with pedagogical principles by ensuring that:
- Learners are not overwhelmed with advanced content prematurely.
- Weak areas are reinforced through targeted learning materials.
- Strong areas are encouraged through advanced challenges.


### 5.5 Limitations of the System

Despite its effectiveness, the system has several limitations:

- The dataset is synthetic and limited in size, which may not capture the full diversity of real-world student behavior.
- The model uses only quiz scores and does not consider other factors such as learning speed, engagement level, or preferred learning style.
- The Decision Tree model may overfit on small datasets.

These limitations highlight the need for larger datasets and more advanced modeling techniques in future implementations.


## 6. Ethical Considerations & Responsible AI

This section discusses the ethical aspects and responsible use of artificial intelligence in the context of personalized learning systems.


### 6.1 Bias and Fairness

AI-based learning systems can unintentionally introduce bias if the training data does not represent all types of learners fairly.

In this project, bias may arise due to:
- Use of a small synthetic dataset
- Limited representation of diverse learning behaviors

To mitigate bias, the system is designed to use transparent decision rules and interpretable models. Future versions can incorporate larger and more diverse datasets to improve fairness.


### 6.2 Dataset Limitations and Transparency

The dataset used in this project is synthetic and simplified. While this supports experimentation and privacy preservation, it may not fully reflect real-world educational scenarios.

Clear documentation of data sources and assumptions ensures transparency and prevents misuse or overinterpretation of results.


### 6.3 Responsible Use of AI

This system is intended to support learners and educators rather than replace human decision-making.

The recommendations generated by the system should be used as guidance, with final learning decisions made by students or instructors. Ethical deployment requires maintaining human oversight and avoiding over-reliance on automated recommendations.


### Future Scope

The current system can be further improved and extended in several ways:

- Integration of real-world student data from learning management systems
- Inclusion of additional features such as learning pace, engagement level, and learning style
- Deployment as a web or mobile application for real-time usage
- Use of advanced AI techniques such as deep learning or reinforcement learning for dynamic personalization
- Incorporation of feedback loops to continuously improve recommendations

These enhancements would make the system more robust, scalable, and applicable to real-world personalized learning platforms.
