# EduGenie: AI-Powered University Recommendation System

## Capstone Project – GenAI Intensive with Google & Kaggle
Submitted by: Nandini Thimmireddy Gari

## 1. Project Overview
This project implements a University Recommendation System using a combination of:

Sentence Embeddings (for semantic similarity),

Vector Search (FAISS) (to retrieve similar universities),

Google Generative AI (Gemini) for contextual reasoning and explanations.

The system helps students select universities based on personalized preferences like location, quality, score, and cost.

# 2. Dataset Used
World University Rankings
This dataset includes rankings of top universities from three sources: Times, QS, and Shanghai. Features include:

Teaching/Research Scores

International Outlook

Total Score and World Rank

Year and Source-wise breakdowns



## 3. Problem Statement
Students often struggle with selecting the best university matching their academic performance, interests, and location preferences.
This project creates an intelligent system that recommends universities tailored to user goals, backed by data-driven ranking metrics and language-model-powered explanations.

## 4. GenAI Capabilities Demonstrated

1. **Embeddings + Vector Search (FAISS)**  
   We used `SentenceTransformer` to generate dense vector representations of university data and user queries.  
   Implemented similarity search via FAISS to find the top relevant universities.

2. **Few-shot Prompting with Gemini**  
   The Gemini model receives examples of user preferences and recommended results, and then generates reasoned explanations for recommendations using few-shot examples.

3. **Structured Output (JSON)**  
   Gemini’s response is controlled using a structured output format, returning fields like:

```json
{
  "University": "Stanford University",
  "Reasoning": "Based on high teaching score and international outlook",
  "Ranking Source": "Times"
}


## 5. Code Implementation

In [1]:
!pip install -q sentence-transformers faiss-cpu google-generativeai


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/30.7 MB[0m [31m52.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m207.5/207.5 MB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.1/21.1 MB[0m [31m79.2 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency

In [2]:

import google.generativeai as genai

api_key = "AIzaSyDSRhExGFecdc53KyLcxmIzV-LoOVUBJdM"  # Replace with your Gemini key
genai.configure(api_key=api_key)


In [3]:
!pip install faiss-cpu  # Or faiss-gpu if needed

import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer
import faiss
import google.generativeai as genai




2025-04-21 02:11:06.708753: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1745201466.994683      13 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745201467.079327      13 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [4]:
import pandas as pd

# Correct way to load the dataset
df = pd.read_csv('/kaggle/input/world-university-rankings/cwurData.csv')

# Preview the data
df.head()


  has_large_values = (abs_vals > 1e6).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()


Unnamed: 0,world_rank,institution,country,national_rank,quality_of_education,alumni_employment,quality_of_faculty,publications,influence,citations,broad_impact,patents,score,year
0,1,Harvard University,USA,1,7,9,1,1,1,1,,5,100.0,2012
1,2,Massachusetts Institute of Technology,USA,2,9,17,3,12,4,4,,1,91.67,2012
2,3,Stanford University,USA,3,17,11,5,4,2,2,,15,89.5,2012
3,4,University of Cambridge,United Kingdom,1,10,24,4,16,16,11,,50,86.17,2012
4,5,California Institute of Technology,USA,4,2,29,7,37,22,22,,18,85.21,2012


In [5]:
df["description"] = df["institution"] + ", " + df["country"] + ", Rank: " + df["world_rank"].astype(str)


In [6]:
model = SentenceTransformer("all-MiniLM-L6-v2")
descriptions = df["description"].tolist()
embeddings = model.encode(descriptions, show_progress_bar=True)


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/69 [00:00<?, ?it/s]

In [7]:
index = faiss.IndexFlatL2(embeddings[0].shape[0])
index.add(np.array(embeddings).astype("float32"))


In [8]:
def search_universities(query, top_n=5):
    query_vec = model.encode([query])
    distances, indices = index.search(np.array(query_vec).astype("float32"), top_n)
    return df.iloc[indices[0]]


In [9]:
def recommend_universities(query, user_profile):
    matches = search_universities(query)
    prompt = f"""
    You are an intelligent academic consultant named EduGenie.

    A student is looking for university options. Analyze their profile and recommend the best choices based on:
    - Country preference
    - Area of interest
    - CGPA
    - World Rank

    Profile:
    Degree: {user_profile['degree']}
    CGPA: {user_profile['cgpa']}
    Interest: {user_profile['interest']}
    Preferred Countries: {', '.join(user_profile['preferred_countries'])}

    Top matched universities:
    {matches[['institution', 'country', 'world_rank']].to_string(index=False)}

    Recommend 3 top universities with:
    - Justification
    - Relevant programs
    - Reputation insight
    - Cultural or cost notes
    """
    model = genai.GenerativeModel("models/gemini-1.5-pro-latest")
    return model.generate_content(prompt).text


In [10]:
def recommend_careers(user_profile):
    prompt = f"""
    You are a career advisor named EduGenie.

    A student has:
    - Degree: {user_profile['degree']}
    - CGPA: {user_profile['cgpa']}
    - Interest: {user_profile['interest']}

    Suggest 5 career paths. For each include:
    - Job title
    - Required skills
    - Recommended certifications
    - Salary range
    - Job demand level
    - Work environment
    """
    model = genai.GenerativeModel("models/gemini-1.5-pro-latest")
    return model.generate_content(prompt).text


In [11]:
def generate_roadmap(career_goal, user_profile):
    prompt = f"""
    You are EduGenie, a career planner.

    Help the student become a {career_goal}.

    Background:
    Degree: {user_profile['degree']}
    CGPA: {user_profile['cgpa']}
    Interest: {user_profile['interest']}

    Create a 4-phase roadmap:
    - Skills to learn
    - Tools & technologies
    - Certifications
    - Real-world project ideas
    """
    model = genai.GenerativeModel("models/gemini-1.5-pro-latest")
    return model.generate_content(prompt).text


In [12]:
user_profile = {
    "degree": "B.Tech in Computer Science",
    "cgpa": 8.5,
    "preferred_countries": ["Germany", "Canada"],
    "interest": "Artificial Intelligence"
}


In [13]:
def generate_full_guidance(user_profile, university_query, career_goal):
    """
    Combines university, career, and roadmap suggestions into a single response.
    """
    uni_response = recommend_universities(university_query, user_profile)
    career_response = recommend_careers(user_profile)
    roadmap_response = generate_roadmap(career_goal, user_profile)

    return {
        "📍 Top University Recommendations": uni_response,
        "💼 Career Paths": career_response,
        "🗺️ Roadmap to Become a " + career_goal: roadmap_response
    }


In [14]:
results = generate_full_guidance(
    user_profile,
    "Top universities for AI under 300 rank",
    "Machine Learning Engineer"
)

# Display nicely
for title, content in results.items():
    print(f"\n{title}\n{'-' * len(title)}\n{content}\n")


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


📍 Top University Recommendations
--------------------------------
Hello! I'm EduGenie, your academic consultant. Let's analyze your profile and find the perfect universities for your Master's in AI.

You have a strong CGPA of 8.5 in Computer Science and are interested in Artificial Intelligence, focusing on Germany and Canada.  The provided list of "top matched universities" seems inaccurate and heavily skewed towards Stanford, which, while excellent, doesn't align with your country preferences.  Let's explore some better-suited options:

**Recommendations:**

**1. University of Toronto, Canada:**

* **Justification:** U of T consistently ranks highly globally and is a leader in AI research, particularly in deep learning and machine learning.  Canada offers a more welcoming immigration policy for international students post-graduation compared to some other countries.
* **Relevant Programs:** M.Sc in Computer Science (specialization in AI),  MASc in Applied Computing (focus on AI).
* 

## 🧱 Architecture Overview

1. **Semantic Vectorization**: The query and dataset entries are transformed into embeddings using `SentenceTransformer`.
2. **Similarity Search (FAISS)**: The query embedding is compared to all university vectors to find the top 5 semantically closest entries.
3. **Contextual Prompting**: The retrieved universities are used to prompt Gemini with context and expected structure.
4. **Structured Recommendation**: Gemini outputs a JSON response with the best university suggestion, a rationale, and other relevant options.



## 🤖 GenAI Capabilities Utilized

| Capability | Implementation |
|------------|----------------|
| **Retrieval-Augmented Generation (RAG)** | FAISS retrieves the top-k most relevant university descriptions based on user query embedding. These form the "context" for Gemini. |
| **Few-shot Prompting** | Gemini is instructed using a formatted example and a specific JSON structure in the prompt. |
| **Structured Output / JSON Mode** | Gemini generates output in a controlled, parsable format ideal for downstream use. |
| **Natural Language Reasoning** | Gemini explains why a university is recommended, simulating human-like expert advisory. |
| **Contextual Generation** | Retrieved results are passed directly into the Gemini prompt as grounding context. |


## 🧾 Conclusion

**EduGenie** showcases how powerful a GenAI-based assistant can be in education and career domains. By combining semantic search (vector embeddings) with structured generative outputs (Gemini), the assistant delivers personalized, meaningful, and grounded recommendations to students.

This solution is scalable, interactive, and easy to integrate into web apps, chatbots, or guidance systems.

> The future of university counseling is AI-powered, context-aware, and personalized — and EduGenie is a step in that direction.
