## Step 1: Import Required Libraries

We begin by importing the necessary Python libraries:

- `pandas` for handling and organizing data in DataFrames.
- `random` to help randomly generate mock data for aspirants and mentors.


In [None]:
import pandas as pd
import random

## Step 2: Generate Mock Data for Aspirants and Mentors

We create sample profiles for both aspirants and mentors:

- Define lists of possible subjects, colleges, learning styles, and teaching styles.
- Generate:
  - 300 **aspirant profiles** with random combinations of preferred subjects, target colleges, and learning styles.
  - 300 **mentor profiles** with random combinations of expertise, alma maters, and teaching styles.

This mock data simulates realistic inputs to train and test the recommendation system.


In [None]:
subjects = ["Constitution", "Contracts", "Torts", "Property", "Criminal", "Evidence", "Administrative Law", "International Law"]
colleges = ["NLSIU", "NLU Delhi", "NUJS", "NLU Jodhpur", "NLU Bangalore"]
learning_styles = ["Self-paced", "Structured", "Interactive"]
teaching_styles = ["Structured", "Interactive", "Self-paced"]

aspirants_data = {
    "Aspirant": [f"Aspirant {i+1}" for i in range(300)],
    "Preferred Subjects": [", ".join(random.sample(subjects, 2)) for _ in range(300)],
    "Target College": [random.choice(colleges) for _ in range(300)],
    "Learning Style": [random.choice(learning_styles) for _ in range(300)]
}

mentors_data = {
    "Mentor": [f"Mentor {i+1}" for i in range(300)],
    "Expertise": [", ".join(random.sample(subjects, 2)) for _ in range(300)],
    "College": [random.choice(colleges) for _ in range(300)],
    "Teaching Style": [random.choice(teaching_styles) for _ in range(300)]
}

## Step 3: Convert to DataFrames and Save as CSV

We now convert the aspirants and mentors data into `pandas` DataFrames.  
Then, we save these DataFrames as CSV files (`aspirants_data.csv` and `mentors_data.csv`) for future use.

This allows us to easily inspect, load, or share the data later.


In [None]:
aspirants_df = pd.DataFrame(aspirants_data)
mentors_df = pd.DataFrame(mentors_data)

aspirants_df.to_csv("aspirants_data.csv", index=False)
mentors_df.to_csv("mentors_data.csv", index=False)

## Step 4: Preview Aspirants Data

To verify that the mock data was generated correctly, we display the first few rows of the aspirants DataFrame using `.head()`.


In [None]:
print(aspirants_df.head())


     Aspirant               Preferred Subjects Target College Learning Style
0  Aspirant 1           Constitution, Evidence          NLSIU     Structured
1  Aspirant 2           Evidence, Constitution      NLU Delhi     Self-paced
2  Aspirant 3                  Torts, Property  NLU Bangalore     Structured
3  Aspirant 4  International Law, Constitution      NLU Delhi    Interactive
4  Aspirant 5           Evidence, Constitution    NLU Jodhpur    Interactive


## Step 5: Preview Mentorss Data

To verify that the mock data was generated correctly, we display the first few rows of the mentors DataFrame using `.head()`.


In [None]:
print(mentors_df.head())

     Mentor                    Expertise        College Teaching Style
0  Mentor 1  Property, International Law           NUJS     Structured
1  Mentor 2           Evidence, Criminal    NLU Jodhpur     Structured
2  Mentor 3           Property, Criminal      NLU Delhi     Structured
3  Mentor 4             Torts, Contracts  NLU Bangalore     Structured
4  Mentor 5           Criminal, Property          NLSIU     Structured


## Step 6: Feature Extraction with TF-IDF

To match aspirants with mentors based on subjects, we convert their text data into numerical features using **TF-IDF (Term Frequency-Inverse Document Frequency)**.

- `TfidfVectorizer` transforms the "Preferred Subjects" of aspirants into feature vectors.
- We use the same vectorizer to transform the mentors' "Expertise" fields, ensuring both are in the same vector space.


In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
aspirant_features = vectorizer.fit_transform(aspirants_df["Preferred Subjects"])
mentor_features = vectorizer.transform(mentors_df["Expertise"])


## Step 7: Recommend Mentors Using Cosine Similarity

We now compute the similarity between aspirants and mentors based on their subject preferences:

- **Cosine similarity** is used to measure how similar each aspirant is to each mentor.
- For each aspirant, we find the **top 5 most similar mentors**.
- We extract and display the matching mentors' details for the first aspirant as a sample.


In [None]:
from sklearn.metrics.pairwise import cosine_similarity

similarity_scores = cosine_similarity(aspirant_features, mentor_features)
top_3_mentors_indices = similarity_scores.argsort(axis=1)[:, -5:]
top_3_mentors = [mentors_df.iloc[idx].reset_index(drop=True) for idx in top_3_mentors_indices]

print(top_3_mentors[0])


       Mentor               Expertise        College Teaching Style
0  Mentor 118  Evidence, Constitution    NLU Jodhpur    Interactive
1   Mentor 89  Constitution, Evidence    NLU Jodhpur    Interactive
2   Mentor 98  Constitution, Evidence           NUJS     Self-paced
3   Mentor 92  Constitution, Evidence  NLU Bangalore    Interactive
4   Mentor 41  Constitution, Evidence          NLSIU     Self-paced


## Step 8: Incorporate Feedback to Adjust Feature Weights

We simulate a feedback system to fine-tune the recommendation engine:

- A small **feedback dataset** includes ratings and comments from aspirants about their matched mentors.
- We calculate the **average rating** from this feedback.
- Based on the average:
  - If the rating is high, we give more weight to subject matching.
  - If the rating is low, we increase weight on learning style to try and improve personalization.

This simple logic can evolve into a more robust learning loop using real-time user data.


In [None]:
feedback_data = {
    "Aspirant": ["Aspirant 1", "Aspirant 2", "Aspirant 3"],
    "Mentor Selected": ["Mentor A", "Mentor B", "Mentor C"],
    "Rating": [4.5, 3.8, 5.0],
    "Feedback": ["Very insightful", "Helpful but needs better explanation", "Excellent guidance"]
}

feedback_df = pd.DataFrame(feedback_data)

feature_weights = {
    "Subjects": 0.4,
    "Target College": 0.3,
    "Learning Style": 0.3
}

average_rating = np.mean(feedback_df["Rating"])

if average_rating > 4.0:
    feature_weights["Subjects"] += 0.1
elif average_rating < 3.0:
    feature_weights["Learning Style"] += 0.1

print("Updated Feature Weights:", feature_weights)


Updated Feature Weights: {'Subjects': 0.5, 'Target College': 0.3, 'Learning Style': 0.3}
