<a href="https://colab.research.google.com/github/JisnaP/Agentic_AI_Powered_Chatbot_for_Queries_Related_to_CLAT_2025/blob/main/Personalized_Mentor_Recommendation_System_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import OneHotEncoder

In [None]:
mentor_data = [
    {"MentorID": "M1", "Subjects": "Mathematics, English", "College": "Delhi University", "Preparation_Level": "Advanced", "Learning_Style": "Visual"},
    {"MentorID": "M2", "Subjects": "History, Political Science", "College": "IIT Delhi", "Preparation_Level": "Intermediate", "Learning_Style": "Auditory"},
    {"MentorID": "M3", "Subjects": "Mathematics, Logic", "College": "DU", "Preparation_Level": "Advanced", "Learning_Style": "Kinesthetic"},
    {"MentorID": "M4", "Subjects": "English, Reasoning", "College": "St. Stephen's", "Preparation_Level": "Beginner", "Learning_Style": "Visual"},
    {"MentorID": "M5", "Subjects": "English, Mathematics", "College": "Delhi University", "Preparation_Level": "Advanced", "Learning_Style": "Auditory"}
]

mentors_df = pd.DataFrame(mentor_data)

In [None]:
mentors_df

Unnamed: 0,MentorID,Subjects,College,Preparation_Level,Learning_Style
0,M1,"Mathematics, English",Delhi University,Advanced,Visual
1,M2,"History, Political Science",IIT Delhi,Intermediate,Auditory
2,M3,"Mathematics, Logic",DU,Advanced,Kinesthetic
3,M4,"English, Reasoning",St. Stephen's,Beginner,Visual
4,M5,"English, Mathematics",Delhi University,Advanced,Auditory


In [None]:
aspirant_profile = {
    "Preferred_Subjects": "Mathematics, English",
    "Target_College": "Delhi University",
    "Preparation_Level": "Advanced",
    "Learning_Style": "Visual"
}

In [None]:
def combine_features(row):
    return f"{row['Subjects']} {row['College']} {row['Preparation_Level']} {row['Learning_Style']}"

In [None]:
mentors_df["combined_features"] = mentors_df.apply(combine_features, axis=1)


In [None]:
mentors_df["combined_features"]

Unnamed: 0,combined_features
0,"Mathematics, English Delhi University Advanced..."
1,"History, Political Science IIT Delhi Intermedi..."
2,"Mathematics, Logic DU Advanced Kinesthetic"
3,"English, Reasoning St. Stephen's Beginner Visual"
4,"English, Mathematics Delhi University Advanced..."


In [None]:
aspirant_features = (f"{aspirant_profile['Preferred_Subjects']} "
                     f"{aspirant_profile['Target_College']} "
                     f"{aspirant_profile['Preparation_Level']} "
                     f"{aspirant_profile['Learning_Style']}")

In [None]:
aspirant_features

'Mathematics, English Delhi University Advanced Visual'

In [None]:
corpus = mentors_df["combined_features"].tolist() + [aspirant_features]

In [None]:
corpus

['Mathematics, English Delhi University Advanced Visual',
 'History, Political Science IIT Delhi Intermediate Auditory',
 'Mathematics, Logic DU Advanced Kinesthetic',
 "English, Reasoning St. Stephen's Beginner Visual",
 'English, Mathematics Delhi University Advanced Auditory',
 'Mathematics, English Delhi University Advanced Visual']

In [None]:
tokenized_corpus = [doc.lower().replace(",", "").split() for doc in corpus]


In [None]:
all_tokens = set(token for doc in tokenized_corpus for token in doc)

In [None]:
all_tokens = sorted(all_tokens)


In [None]:
def vectorize(tokens, vocabulary):
    vector = np.zeros(len(vocabulary))
    for token in tokens:
        if token in vocabulary:
            index = vocabulary.index(token)
            vector[index] = 1
    return vector


In [None]:
vocabulary = all_tokens

In [None]:
vectorized_docs = np.array([vectorize(doc, vocabulary) for doc in tokenized_corpus])


In [None]:
vectorized_docs

array([[1., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
        0., 1., 1.],
       [0., 1., 0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 0., 1., 0.,
        0., 0., 0.],
       [1., 0., 0., 0., 1., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0.,
        0., 0., 0.],
       [0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1.,
        1., 0., 1.],
       [1., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
        0., 1., 0.],
       [1., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
        0., 1., 1.]])

In [None]:
mentor_vectors = vectorized_docs[:-1]
aspirant_vector = vectorized_docs[-1].reshape(1, -1)

In [None]:
similarities = cosine_similarity(mentor_vectors, aspirant_vector).flatten()

# Attach similarity scores to the mentor dataframe
mentors_df["similarity_score"] = similarities

In [None]:
top_mentors = mentors_df.sort_values("similarity_score", ascending=False).head(3)

print("Top Mentor Recommendations:")
print(top_mentors[["MentorID", "combined_features", "similarity_score"]])

Top Mentor Recommendations:
  MentorID                                  combined_features  \
0       M1  Mathematics, English Delhi University Advanced...   
4       M5  English, Mathematics Delhi University Advanced...   
2       M3         Mathematics, Logic DU Advanced Kinesthetic   

   similarity_score  
0          1.000000  
4          0.833333  
2          0.365148  


. **Feedback Loop:**  
   Collect explicit feedback (e.g., ratings, comments) or implicit signals (e.g., mentor profile clicks, session attendance)
   from aspirants. This feedback can be used to adjust the similarity scoring function. For example, if certain subjects or features
   are consistently associated with higher engagement, their weights can be increased in the feature representation.


**Machine Learning Models:**  
   Beyond simple similarity measures, one could train a model (e.g., a regression model or a ranking model) that predicts
   the probability of a successful mentorship pairing. As more feedback is collected, the model can be retrained to improve predictions.


**AI-Augmented Retrieval:**  
   By integrating retrieval augmentation techniques (for example using Llama Index and RAG),
   the system can automatically incorporate up-to-date mentor profiles, external reviews, and other real-time
   data to refine recommendations dynamically.
