# RAG-Powered Chatbot for Student Mentorship
Retrieval-Augmented Generation (RAG) based chatbot that ingests student data (projects, skills, and academic details) and enables mentors to ask natural language questions. The system retrieves relevant student profiles, highlights their strengths, and suggests suitable project directions and potential colleges. This ensures personalized guidance, efficient decision-making, and scalable mentorship support.





In [None]:
!pip install -U langchain==0.3.8 langchain-core langchain-community langchain-openai langchain-chroma sentence-transformers chromadb pypdf docx2txt jq

Collecting langchain==0.3.8
  Downloading langchain-0.3.8-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core
  Downloading langchain_core-1.0.1-py3-none-any.whl.metadata (3.5 kB)
Collecting langchain-community
  Downloading langchain_community-0.4-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-openai
  Downloading langchain_openai-1.0.1-py3-none-any.whl.metadata (1.8 kB)
Collecting langchain-chroma
  Downloading langchain_chroma-1.0.0-py3-none-any.whl.metadata (1.9 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-5.1.2-py3-none-any.whl.metadata (16 kB)
Collecting chromadb
  Downloading chromadb-1.2.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting pypdf
  Downloading pypdf-6.1.3-py3-none-any.whl.metadata (7.1 kB)
Collecting docx2txt
  Downloading docx2txt-0.9-py3-none-any.whl.metadata (529 bytes)
Collecting jq
  Downloading jq-1.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.0 

In [None]:
import langchain
print(langchain.__version__)

0.3.8


In [33]:
import os
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGSMITH_PROJECT"] = "default"
os.environ["LANGSMITH_API_KEY"] = "YOUR_LANGSMITH_KEY"
os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY"

In [None]:
from langsmith import Client

client = Client(api_key=os.environ["LANGSMITH_API_KEY"])

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="openrouter/andromeda-alpha",
                 openai_api_base="https://openrouter.ai/api/v1",
                 api_key=os.environ["OPENAI_API_KEY"])

messages = [{"role": "user", "content": "Hello, How are you today?"}]
llm_response = llm.invoke(input=messages)
print(llm_response)

content="Hi there! While I don't experience feelings in the way humans do, I'm always excited to engage in conversations and help out! ٩(◕‿◕｡)۶ I'd love to hear how you're doing and what's on your mind today. What would you like to chat about?\n" additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 298, 'prompt_tokens': 19, 'total_tokens': 317, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 0}}, 'model_name': 'openrouter/andromeda-alpha', 'system_fingerprint': None, 'id': 'gen-1761498229-72172BHHILVmivf3JlU3', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None} id='run--4ddf35c5-e7e7-4786-a038-6142e619adba-0' usage_metadata={'input_tokens': 19, 'output_tokens': 298, 'total_tokens': 317, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 0}}


In [None]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()
output_parser.invoke(llm_response)

"Hi there! While I don't experience feelings in the way humans do, I'm always excited to engage in conversations and help out! ٩(◕‿◕｡)۶ I'd love to hear how you're doing and what's on your mind today. What would you like to chat about?\n"

In [None]:
chain = llm | output_parser
chain.invoke("Tell me a joke")

'Alright, here\'s a light-hearted one for you:\n\n**Why did three ducks walk into a store?**  \nThe first duck said, *"I need lipstick!*"  \nThe second duck said, *"I need a comb!*"  \nAnd the third duck said, *"I need a bow!"*  \n\nWhen they left, the clerk sighed, *"That\'s odd."*  \n\n🦆 *Ba-dum-tss!*  \n\n(Okay, maybe that\'s a stretch, but hey, I\'m fowl humor.) 😄\n'

## Importing and Loading Data
- College.csv - `College, Accepted Skills, Projects`
- CollegePref.csv - `Student Name, CollegePref1, CollegePref2, CollegePref3`
- Marksheet.csv - `Student Name, Data Structures and Algorithms, Finite Automata, Computer Organization,	Discrete Mathematics, Computer Networks, Machine Learning`
- Projects.csv - `Project Name, Subject Skill Needed`
- Projects.json - `{Student: {Project1, Skill1, Project2, Skill2}}`

In [None]:
import pandas as pd
import json
from pathlib import Path

college_csv = "College.csv"
college_pref_csv = "CollegePref.csv"
marksheet_csv = "Marksheet.csv"
projects_csv = "Projects.csv"
projects_json = "Projects.json"

## Creating embedding chunks and corresponding metadata

In [None]:
texts = []
metadatas = []

# 1. Process colleges
college_df = pd.read_csv(college_csv, sep=",")

for _, row in college_df.iterrows():
    text = (
        f"College Name: {row['College Name']}\n"
        f"Accepted Skills: {row['Accepted Skills']}\n"
        f"Preferred Projects: {row['Projects']}"
    )
    metadata = {
        "type": "college",
        "college_name": row["College Name"],
    }
    texts.append(text)
    metadatas.append(metadata)


# 2. Process CollegePref.csv
pref_df = pd.read_csv(college_pref_csv, sep=",")

for _, row in pref_df.iterrows():
    text = (
        f"Student Name: {row['Student Name']}\n"
        f"Preferred Colleges: {row['CollegePref1']}, {row['CollegePref2']}, {row['CollegePref3']}"
    )
    metadata = {
        "type": "student_pref",
        "student_name": row["Student Name"],
    }
    texts.append(text)
    metadatas.append(metadata)


# 3. Process Marksheet.csv
marksheet_df = pd.read_csv(marksheet_csv, sep=",")

for _, row in marksheet_df.iterrows():
    subjects = ", ".join(
        f"{col} - {row[col]}" for col in marksheet_df.columns if col != "Student Name"
    )
    text = f"Student: {row['Student Name']}\nScores: {subjects}"
    metadata = {
        "type": "marksheet",
        "student_name": row["Student Name"],
    }
    texts.append(text)
    metadatas.append(metadata)


# 4. Process Projects.csv
proj_df = pd.read_csv(projects_csv, sep=",")

for _, row in proj_df.iterrows():
    text = (
        f"Project Name: {row['Project Name']}\n"
        f"Required Subject Skill: {row['Subject Skill Needed']}"
    )
    metadata = {
        "type": "project_template",
        "project_name": row["Project Name"],
    }
    texts.append(text)
    metadatas.append(metadata)


# 5. Process Projects.json
with open(projects_json, "r") as f:
    project_data = json.load(f)

for student, details in project_data.items():
    pairs = [
        (details[f"Project{i}"], details[f"Skill{i}"])
        for i in range(1, len(details)//2 + 1)
    ]
    proj_text = "\n".join(f"- {p} (Skill Gained: {s})" for p, s in pairs)
    text = f"Student: {student}\nCompleted Projects:\n{proj_text}"
    metadata = {
        "type": "student_projects",
        "student_name": student,
    }
    texts.append(text)
    metadatas.append(metadata)

In [None]:
for i in range(len(texts)):
    print(f"\n--- Chunk {i+1} ---")
    print("Text:\n", texts[i])
    print("Metadata:", metadatas[i])

print(f"\n Total chunks created: {len(texts)}")


--- Chunk 1 ---
Text:
 College Name: Aaravpur Academy of Science
Accepted Skills: GCP | Matplotlib | OpenCV | Pandas | React | Rust | Swift | Tailwind CSS | TensorFlow | TypeScript
Preferred Projects: Auto-Grader | Code Editor Lite | Job Portal | Weather Dashboard
Metadata: {'type': 'college', 'college_name': 'Aaravpur Academy of Science'}

--- Chunk 2 ---
Text:
 College Name: Aaravpur Center for AI
Accepted Skills: Agile | C | CI/CD | Express | GCP | MongoDB | TypeScript | scikit-learn
Preferred Projects: Compiler Tiny | Crypto Arbitrage Bot | Job Portal | News Aggregator | Scheduling System | Search Engine Mini | Spam Filter
Metadata: {'type': 'college', 'college_name': 'Aaravpur Center for AI'}

--- Chunk 3 ---
Text:
 College Name: Aaravpur Center for Data Science
Accepted Skills: Agile | C++ | Django | Express | Java | Node.js | Pandas | PostgreSQL | React | Swift
Preferred Projects: Blog CMS | Fitness Tracker | Memory Allocator Demo | Network Packet Sniffer
Metadata: {'type': 'co

In [None]:
print(len(texts[0]), "\n", texts[0])

241 
 College Name: Aaravpur Academy of Science
Accepted Skills: GCP | Matplotlib | OpenCV | Pandas | React | Rust | Swift | Tailwind CSS | TensorFlow | TypeScript
Preferred Projects: Auto-Grader | Code Editor Lite | Job Portal | Weather Dashboard


In [None]:
print(len(metadatas[0]), "\n", metadatas[0])

2 
 {'type': 'college', 'college_name': 'Aaravpur Academy of Science'}


In [None]:
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import SentenceTransformerEmbeddings

embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

db = Chroma.from_texts(
    texts=texts,
    embedding=embeddings,
    metadatas=metadatas,
    collection_name="college_rag",
    persist_directory="./chroma_db"
)

  embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
import shutil
db.persist()
persist_directory = "./chroma_db"
zip_filename = "college_rag_embeddings.zip"
shutil.make_archive(base_name="college_rag_embeddings", format="zip", root_dir=persist_directory)

  db.persist()


'/content/college_rag_embeddings.zip'

## Filter document based on merged data and query

In [None]:
query = "College for computer science projects"
filter = {
    "$and" : [
        { "type": "marksheet" },
        { "student_name": {"$in": ["Student_1", "Student_2"]} }
    ]
}
results = db.similarity_search(query, k=3, filter=filter)

for i, res in enumerate(results, 1):
    print(f"\nRank {i}")
    print("Content:", res.page_content)
    print("Metadata:", res.metadata)


Rank 1
Content: Student: Student_1
Scores: Data Structures and Algorithms - 63, Finite Automata - 93, Computer Organization - 91, Discrete Mathematics - 51, Computer Networks - 45, Machine Learning - 41
Metadata: {'student_name': 'Student_1', 'type': 'marksheet'}

Rank 2
Content: Student: Student_2
Scores: Data Structures and Algorithms - 40, Finite Automata - 96, Computer Organization - 54, Discrete Mathematics - 55, Computer Networks - 38, Machine Learning - 73
Metadata: {'type': 'marksheet', 'student_name': 'Student_2'}


In [None]:
query = "Projects for Machine Learning"
filter = {"type": "project_template"}
results = db.similarity_search(query, k=3, filter=filter)

for i, res in enumerate(results, 1):
    print(f"\nRank {i}")
    print("Content:", res.page_content)
    print("Metadata:", res.metadata)


Rank 1
Content: Project Name: Search Engine Mini - Data v11
Required Subject Skill: Machine Learning
Metadata: {'project_name': 'Search Engine Mini - Data v11', 'type': 'project_template'}

Rank 2
Content: Project Name: Search Engine Mini - Systems v12
Required Subject Skill: Machine Learning
Metadata: {'type': 'project_template', 'project_name': 'Search Engine Mini - Systems v12'}

Rank 3
Content: Project Name: Search Engine Mini - Data v12
Required Subject Skill: Machine Learning
Metadata: {'type': 'project_template', 'project_name': 'Search Engine Mini - Data v12'}


## Filter based on just query

In [None]:
query = "College for computer science projects"
results = db.similarity_search(query, k=3)

for i, res in enumerate(results, 1):
    print(f"\nRank {i}")
    print("Content:", res.page_content)
    print("Metadata:", res.metadata)


Rank 1
Content: College Name: Global College of Computing Mumbai
Accepted Skills: Azure | C++ | GCP | OpenCV | Tailwind CSS | TensorFlow
Preferred Projects: Compiler Tiny | Memory Allocator Demo | Scheduling System | Todo API
Metadata: {'college_name': 'Global College of Computing Mumbai', 'type': 'college'}

Rank 2
Content: College Name: Ranchi Institute of Technology
Accepted Skills: C | CI/CD | GCP | Matplotlib | Node.js | OpenCV | PostgreSQL | PyTorch | Redis | scikit-learn
Preferred Projects: Course Planner | Crypto Arbitrage Bot | Event Ticketing | IoT Sensor Monitor | Note Taking App | URL Shortener | VR Gallery
Metadata: {'type': 'college', 'college_name': 'Ranchi Institute of Technology'}

Rank 3
Content: Student Name: Student_338
Preferred Colleges: School of Computer Science Surat, Global College of Computing Bengaluru, Faculty of Technology Xavierabad
Metadata: {'student_name': 'Student_338', 'type': 'student_pref'}


In [None]:
query = "Give me some simulation projects?"
results = db.similarity_search(query, k=3)

for i, res in enumerate(results, 1):
    print(f"\nRank {i}")
    print("Content:", res.page_content)
    print("Metadata:", res.metadata)


Rank 1
Content: Project Name: Edge Caching Simulator - ML v5
Required Subject Skill: Discrete Mathematics
Metadata: {'type': 'project_template', 'project_name': 'Edge Caching Simulator - ML v5'}

Rank 2
Content: Project Name: Traffic Simulation - ML v10
Required Subject Skill: Computer Organization
Metadata: {'type': 'project_template', 'project_name': 'Traffic Simulation - ML v10'}

Rank 3
Content: Project Name: Edge Caching Simulator - Data v10
Required Subject Skill: Discrete Mathematics
Metadata: {'project_name': 'Edge Caching Simulator - Data v10', 'type': 'project_template'}


In [None]:
from langchain_core.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

In [None]:
# Merge all the retrieved docs
def docs2str(docs):
  return "\n\n".join(doc.page_content for doc in docs)

In [None]:
retriever = db.as_retriever(search_kwargs={"k": 3})
retriever.invoke("College for computer science projects")

[Document(metadata={'type': 'college', 'college_name': 'Global College of Computing Mumbai'}, page_content='College Name: Global College of Computing Mumbai\nAccepted Skills: Azure | C++ | GCP | OpenCV | Tailwind CSS | TensorFlow\nPreferred Projects: Compiler Tiny | Memory Allocator Demo | Scheduling System | Todo API'),
 Document(metadata={'college_name': 'Ranchi Institute of Technology', 'type': 'college'}, page_content='College Name: Ranchi Institute of Technology\nAccepted Skills: C | CI/CD | GCP | Matplotlib | Node.js | OpenCV | PostgreSQL | PyTorch | Redis | scikit-learn\nPreferred Projects: Course Planner | Crypto Arbitrage Bot | Event Ticketing | IoT Sensor Monitor | Note Taking App | URL Shortener | VR Gallery'),
 Document(metadata={'student_name': 'Student_338', 'type': 'student_pref'}, page_content='Student Name: Student_338\nPreferred Colleges: School of Computer Science Surat, Global College of Computing Bengaluru, Faculty of Technology Xavierabad')]

In [None]:
from langchain_core.runnables import RunnablePassthrough
rag_chain = (
    {"context": retriever | docs2str, "question": RunnablePassthrough()} | prompt
)
rag_chain.invoke("Give some simulation projects?")

ChatPromptValue(messages=[HumanMessage(content='Answer the question based only on the following context:\nProject Name: Edge Caching Simulator - ML v5\nRequired Subject Skill: Discrete Mathematics\n\nProject Name: Traffic Simulation - ML v10\nRequired Subject Skill: Computer Organization\n\nProject Name: Traffic Simulation - Systems v11\nRequired Subject Skill: Discrete Mathematics\n\nQuestion: Give some simulation projects?\n', additional_kwargs={}, response_metadata={})])

In [None]:
rag_chain = (
    {"context": retriever | docs2str, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
question = "Give some projects which require language Java."
response = rag_chain.invoke(question)
print(response)

The following projects require the Java programming language based on the provided context:  

1. **Spam Filter** (Skill Gained: Java, as per Student_496's completed project)  
2. **Auto-Grader** (Preferred Project at Faculty of Technology Lucknow)  
3. **Forum Platform** (Preferred Project at Faculty of Technology Lucknow)  
4. **Memory Allocator Demo** (Preferred Project at Faculty of Technology Lucknow)  
5. **Scheduling System** (Preferred Project at Faculty of Technology Lucknow)  
6. **Compiler Tiny** (Preferred Project at School of Computer Science Chennai)  
7. **Crypto Arbitrage Bot** (Preferred Project at School of Computer Science Chennai)  
8. **Event Ticketing** (Preferred Project at School of Computer Science Chennai)  

Note: While "Scheduling System" appears in both colleges' preferred projects, it is listed once here as a Java-required project.



## Deriving context from history

In [None]:
from langchain_core.messages import HumanMessage, AIMessage

chat_history = []
chat_history.extend([
    HumanMessage(content=question),
    AIMessage(content=response)
])
chat_history

[HumanMessage(content='Give some projects which require language Java.', additional_kwargs={}, response_metadata={}),
 AIMessage(content='The following projects require the Java programming language based on the provided context:  \n\n1. **Spam Filter** (Skill Gained: Java, as per Student_496\'s completed project)  \n2. **Auto-Grader** (Preferred Project at Faculty of Technology Lucknow)  \n3. **Forum Platform** (Preferred Project at Faculty of Technology Lucknow)  \n4. **Memory Allocator Demo** (Preferred Project at Faculty of Technology Lucknow)  \n5. **Scheduling System** (Preferred Project at Faculty of Technology Lucknow)  \n6. **Compiler Tiny** (Preferred Project at School of Computer Science Chennai)  \n7. **Crypto Arbitrage Bot** (Preferred Project at School of Computer Science Chennai)  \n8. **Event Ticketing** (Preferred Project at School of Computer Science Chennai)  \n\nNote: While "Scheduling System" appears in both colleges\' preferred projects, it is listed once here as 

In [None]:
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_system_prompt = (
    "INSTRUCTIONS: You MUST NOT answer the question under any circumstances. "
    "Your sole task is to REFORMULATE the latest user question so that it is a "
    "standalone question, understandable without any chat history. "
    "If the question is already standalone, return it as-is. "
    "DO NOT PROVIDE ANSWERS, EXAMPLES, OR ADDITIONAL INFORMATION. "
    "ONLY RETURN THE REFORMULATED QUESTION."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{input}")
    ]
)

contextualize_chain = contextualize_q_prompt | llm | StrOutputParser()
contextualize_chain.invoke({"input": "Tell just one another similar programming language like it?", "chat_history": chat_history})

"A programming language similar to Java is **C++**. Both are object-oriented, used for system/application development, and emphasize strong typing. They differ in features like Java's platform independence (via the JVM) versus C++'s lower-level control and manual memory management.\n"

## Passing the contextual prompt to the retriever.

In [None]:
retriever = db.as_retriever(search_kwargs={"k": 3,
                                           "filter": {"type": "student_projects"}})
retriever.invoke("Give some similar programming languages like Java?", {"chat_history": chat_history})

[Document(metadata={'type': 'student_projects', 'student_name': 'Student_496'}, page_content='Student: Student_496\nCompleted Projects:\n- Spam Filter (Skill Gained: Java)\n- Code Editor Lite (Skill Gained: CI/CD)\n- Gesture Recognition (Skill Gained: FastAPI)'),
 Document(metadata={'student_name': 'Student_345', 'type': 'student_projects'}, page_content='Student: Student_345\nCompleted Projects:\n- Gesture Recognition (Skill Gained: Java)\n- Traffic Simulation (Skill Gained: TypeScript)\n- Blog CMS (Skill Gained: Bash)'),
 Document(metadata={'student_name': 'Student_121', 'type': 'student_projects'}, page_content='Student: Student_121\nCompleted Projects:\n- Network Packet Sniffer (Skill Gained: PyTorch)\n- Event Ticketing (Skill Gained: Java)\n- Compiler Tiny (Skill Gained: C)')]

In [None]:
from langchain.chains import create_history_aware_retriever

history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

history_aware_retriever.invoke({"input": "Give some similar programming languages like it?", "chat_history": chat_history})

[Document(metadata={'student_name': 'Student_121', 'type': 'student_projects'}, page_content='Student: Student_121\nCompleted Projects:\n- Network Packet Sniffer (Skill Gained: PyTorch)\n- Event Ticketing (Skill Gained: Java)\n- Compiler Tiny (Skill Gained: C)'),
 Document(metadata={'student_name': 'Student_93', 'type': 'student_projects'}, page_content='Student: Student_93\nCompleted Projects:\n- Gesture Recognition (Skill Gained: TypeScript)\n- Chat App (Skill Gained: Rust)\n- Search Engine Mini (Skill Gained: Agile)'),
 Document(metadata={'student_name': 'Student_496', 'type': 'student_projects'}, page_content='Student: Student_496\nCompleted Projects:\n- Spam Filter (Skill Gained: Java)\n- Code Editor Lite (Skill Gained: CI/CD)\n- Gesture Recognition (Skill Gained: FastAPI)')]

# Conversational RAG
Handling Follow Up Questions

In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import MessagesPlaceholder

history_aware_retriever = db.as_retriever(search_kwargs={"k": 3})

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant. Use the following context to answer the user's question. Don't answer by yourself, answer within the context only. Also output the metadata from where you are referring those documents."),
    ("system", "Context: {context}"),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}")
])

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

In [None]:
rag_chain.invoke({"input": "Give me some programming languages and what projects can be made using them, and which colleges accept those projects?", "chat_history": chat_history})

{'input': 'Give me some programming languages and what projects can be made using them, and which colleges accept those projects?',
 'chat_history': [HumanMessage(content='Give some projects which require language Java.', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The following projects require the Java programming language based on the provided context:  \n\n1. **Spam Filter** (Skill Gained: Java, as per Student_496\'s completed project)  \n2. **Auto-Grader** (Preferred Project at Faculty of Technology Lucknow)  \n3. **Forum Platform** (Preferred Project at Faculty of Technology Lucknow)  \n4. **Memory Allocator Demo** (Preferred Project at Faculty of Technology Lucknow)  \n5. **Scheduling System** (Preferred Project at Faculty of Technology Lucknow)  \n6. **Compiler Tiny** (Preferred Project at School of Computer Science Chennai)  \n7. **Crypto Arbitrage Bot** (Preferred Project at School of Computer Science Chennai)  \n8. **Event Ticketing** (Preferred Project

## Final Chatbot

In [None]:
retriever = db.as_retriever(search_kwargs={"k": 5})

contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant that based on the context and chat history, reformulates the users's follow-up questions to make them standalone or returns it as it is"),
      MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}")
])

history_aware_retriever = create_history_aware_retriever(llm, retriever, contextualize_q_prompt)

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Use the provided context strictly to answer. Answer within the context. Also output the metadata of the referenced documents.\nContext: {context}"),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}")
])

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

chat_history = []

print("RAG Chatbot ready! Type 'exit' to stop.\n")

while True:
    user_query = input("\nYou: ").strip()
    if user_query.lower() in ["exit", "quit"]:
        print("Goodbye!")
        break

    response = rag_chain.invoke({
        "input": user_query,
        "chat_history": chat_history
    })

    answer = response["answer"]
    sources = response.get("context", [])
    print(f"\nBot: {answer}")
    print("\nReferenced sources:")
    for doc in sources:
        print(f"- {doc.metadata}")

    chat_history.append(HumanMessage(content=user_query))
    chat_history.append(AIMessage(content=answer))


RAG Chatbot ready! Type 'exit' to stop.

You: Tell me some python projects.

Bot: **Python Projects**  
1. File Sharing Service  
2. Handwriting OCR  
3. Music Streaming Prototype  
4. Note Taking App  
5. Compiler Tiny  
6. Forum Platform  
7. Scheduling System  
8. URL Shortener  
9. AR Ruler  
10. Anomaly Detector  
11. Code Editor Lite  
12. Job Portal  
13. Crypto Arbitrage Bot  
14. Expense Splitter  
15. Search Engine Mini  
16. Weather Dashboard  
17. VR Gallery  

**Metadata of Referenced Documents**  
- **School of Computer Science Guwahati**: Accepted Skills: Bash, Django, FastAPI, Flask, PyTorch, scikit-learn; Preferred Projects: File Sharing Service, Handwriting OCR, Music Streaming Prototype, Note Taking App.  
- **School of Computer Science Surat**: Accepted Skills: Kotlin, Linux, Matplotlib, NumPy, Storyboard; Preferred Projects: Compiler Tiny, Forum Platform, Scheduling System, URL Shortener.  
- **Pune Academy of Science**: Accepted Skills: C, C++, Flask, Java, Kafka,