Assignment 2: Personalized Course Recommendation Engine
1. Background & Context Online learning platforms host thousands of courses across domains—learners often feel overwhelmed by choices. A personalized recommender that understands both course content and individual learner profiles can boost engagement and completion rates by suggesting the most relevant next steps.
2. Problem Statement “Design and implement a Course Recommendation Engine that—given a user query (completed courses + a short interests blurb)—returns the top-5 most relevant courses from a catalog of course offerings, using embedding models and a vector database for semantic matching.”

In [1]:
!python -m pip install pypdf sentence-transformers faiss-cpu --quiet



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [12]:
pip install langchain-chroma

I0000 00:00:1759948019.360222    4878 fork_posix.cc:71] Other threads are currently calling into gRPC, skipping fork() handlers


Collecting langchain-chroma
  Downloading langchain_chroma-0.2.6-py3-none-any.whl.metadata (1.1 kB)
Collecting langchain-core>=0.3.76 (from langchain-chroma)
  Downloading langchain_core-0.3.78-py3-none-any.whl.metadata (3.2 kB)
Collecting chromadb>=1.0.20 (from langchain-chroma)
  Downloading chromadb-1.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting build>=1.0.3 (from chromadb>=1.0.20->langchain-chroma)
  Using cached build-1.3.0-py3-none-any.whl.metadata (5.6 kB)
Collecting pybase64>=1.4.1 (from chromadb>=1.0.20->langchain-chroma)
  Using cached pybase64-1.4.2-cp311-cp311-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb>=1.0.20->langchain-chroma)
  Using cached posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb>=1.0.20->langchain-chroma)
  Downloading onnxruntime-1.23.1-cp311-cp311-manylinux_2_27_

In [5]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


Assigning the embedding model and model name for LLM

In [6]:
import os

embedding_model_name = "models/gemini-embedding-001"
model_name = "gemini-2.0-flash"

In [7]:
df=pd.read_csv("assignment2dataset.csv")
df.head(5)

Unnamed: 0,course_id,title,description
0,C001,Foundations of Machine Learning,Understand foundational machine learning algor...
1,C002,Deep Learning with TensorFlow and Keras,Explore neural network architectures using Ten...
2,C003,Natural Language Processing Fundamentals,Dive into NLP techniques for processing and un...
3,C004,Computer Vision and Image Processing,Learn the principles of computer vision and im...
4,C005,Reinforcement Learning Basics,Get introduced to reinforcement learning parad...


In [8]:
df["title_descrp"]=df["title"]+"//"+df["description"]

In [9]:
df.head(5)

Unnamed: 0,course_id,title,description,title_descrp
0,C001,Foundations of Machine Learning,Understand foundational machine learning algor...,Foundations of Machine Learning//Understand fo...
1,C002,Deep Learning with TensorFlow and Keras,Explore neural network architectures using Ten...,Deep Learning with TensorFlow and Keras//Explo...
2,C003,Natural Language Processing Fundamentals,Dive into NLP techniques for processing and un...,Natural Language Processing Fundamentals//Dive...
3,C004,Computer Vision and Image Processing,Learn the principles of computer vision and im...,Computer Vision and Image Processing//Learn th...
4,C005,Reinforcement Learning Basics,Get introduced to reinforcement learning parad...,Reinforcement Learning Basics//Get introduced ...


Generating vector embeddings

In [10]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings(model=embedding_model_name)

In [12]:
import os
vector_db_path = "VectorDB_Chroma_assignment_2"
os.makedirs(vector_db_path,exist_ok=True)

In [32]:
# pip uninstall opentelemetry-sdk opentelemetry-exporter-otlp

In [13]:
from langchain_chroma import Chroma

vectorstore = Chroma.from_texts(texts=df["title_descrp"].to_list(),
                                 embedding= embeddings, 
                                 persist_directory=vector_db_path,collection_name="dummydata",
                                    collection_metadata={"use_type":"TRAINING AND EXPERIMENTATION"} ,
                                    metadatas=[{"course_id": cid} for cid in df["course_id"]]) ###adding course_id as metadata

In [14]:
print("Number of docs dumped into vector DB")
print(len(vectorstore.get()['ids']))

Number of docs dumped into vector DB
25


In [15]:
# using vector db object to initialize a retriever object - to perform vector search/retrieval
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})

In [16]:
retrieved_docs = retriever.invoke("Python Programming for data science")
len(retrieved_docs)

5

In [None]:
# Test
ret_docs = vectorstore.similarity_search_with_score("Python programming for data science",k=5)
print(ret_docs[0])

(Document(id='128d285f-93ce-462b-a653-1e3cce4a17fa', metadata={'course_id': 'C016'}, page_content='Python Programming for Data Science//Learn Python fundamentals for data science: variables, control flow, functions, and object-oriented programming. Advance to data handling with pandas, numerical computing with NumPy, and basic plotting with matplotlib. You’ll build reproducible data workflows, clean and transform datasets, and perform exploratory analysis, laying the groundwork for machine learning and statistical modeling projects.'), 0.16565607488155365)


In [18]:
retrieved_docs[1].page_content


'Foundations of Machine Learning//Understand foundational machine learning algorithms including regression, classification, clustering, and dimensionality reduction. This course covers data pre-processing, feature engineering, model selection, hyperparameter tuning, and evaluation metrics. Hands-on labs use scikit-learn and Python to implement end-to-end workflows on real-world datasets, preparing learners for practical machine learning applications with interactive engaging exercises.'

In [19]:
# Perform similarity search in Chroma vector store
results = vectorstore.similarity_search_with_score("python programming for data science", k=5)

# Extract course IDs and similarity scores
recommendations = [(doc.metadata["course_id"], score) for doc, score in results]

In [20]:
print(results)

[(Document(id='128d285f-93ce-462b-a653-1e3cce4a17fa', metadata={'course_id': 'C016'}, page_content='Python Programming for Data Science//Learn Python fundamentals for data science: variables, control flow, functions, and object-oriented programming. Advance to data handling with pandas, numerical computing with NumPy, and basic plotting with matplotlib. You’ll build reproducible data workflows, clean and transform datasets, and perform exploratory analysis, laying the groundwork for machine learning and statistical modeling projects.'), 0.21110709011554718), (Document(id='02f6a707-a939-406d-916e-4022995801a4', metadata={'course_id': 'C001'}, page_content='Foundations of Machine Learning//Understand foundational machine learning algorithms including regression, classification, clustering, and dimensionality reduction. This course covers data pre-processing, feature engineering, model selection, hyperparameter tuning, and evaluation metrics. Hands-on labs use scikit-learn and Python to i

In [21]:
print(recommendations)

[('C016', 0.21110709011554718), ('C001', 0.39871352910995483), ('C017', 0.4062874913215637), ('C003', 0.42717573046684265), ('C002', 0.44606900215148926)]


Generation 

In [23]:
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """
Answer this question using the provided context only. list recommendations and comment on relevance. Give the course id as well. ensure to give top 5 recoomended courses
.Give the response in markdown format.
{question}

Context:
{context}
"""

prompt = PromptTemplate.from_template(message)
prompt

PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='\nAnswer this question using the provided context only. list recommendations and comment on relevance. Give the course id as well. ensure to give top 5 recoomended courses\n.Give the response in markdown format.\n{question}\n\nContext:\n{context}\n')

In [24]:
print(retriever)

tags=['Chroma', 'GoogleGenerativeAIEmbeddings'] vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x71b774222450> search_kwargs={'k': 5}


In [25]:
from langchain.chat_models import init_chat_model
llm = init_chat_model(model_name, model_provider="google_genai")

rag_chain = {"context": retriever, "question": RunnablePassthrough()} | prompt | llm #using langchain for chaining Runnable pass through is to catch the question in runtime

In [26]:
response = rag_chain.invoke("python programming")

print(response.content)

Based on the context provided, here are the top 5 recommended courses related to Python programming, along with comments on their relevance:

**Top 5 Recommended Courses:**

1.  **Course ID: C016** - *Python Programming for Data Science*
    *   **Relevance:** Highly relevant. This course directly focuses on Python programming, specifically tailored for data science applications. It covers fundamental Python concepts and essential libraries like pandas, NumPy, and matplotlib, crucial for anyone working with data in Python.
2.  **Course ID: C004** - *Computer Vision and Image Processing*
    *   **Relevance:** Relevant. While not solely focused on basic Python, this course uses Python extensively with libraries like OpenCV, scikit-image, and TensorFlow. It's a good choice if you're interested in applying Python to image-related tasks.
3.  **Course ID: C001** - *Foundations of Machine Learning*
    *   **Relevance:** Relevant. This course uses Python's scikit-learn library for implementi

Test Results

In [27]:
response_1 = rag_chain.invoke("""I’ve completed the ‘Python Programming for Data Science’ course and enjoy data 
visualization. What should I take next?""")

print(response_1.content)

Based on your completion of 'Python Programming for Data Science' and your interest in data visualization, here are the top 5 recommended courses:

1.  **Data Visualization with Tableau (C014)**: This course directly builds upon your interest in data visualization. It focuses on using Tableau to create compelling visuals and interactive dashboards. **Relevance:** High, as it aligns perfectly with your stated interest.

2.  **R Programming and Statistical Analysis (C017)**: Since you enjoy data visualization, learning R and ggplot2 can provide alternative visualization capabilities. This course also covers statistical analysis, broadening your data science skillset. **Relevance:** High, provides alternative visualization and expands analytical skills.

3.  **SQL for Data Analysis (C012)**: SQL is essential for data extraction and manipulation, a critical step before visualization. This course helps you retrieve and prepare data for visualization tools. **Relevance:** Medium, as SQL skil

In [28]:
response_2 = rag_chain.invoke("""I know Azure basics and want to manage containers and build CI/CD pipelines. 
Recommend courses.""")

print(response_2.content)

Based on your interest in managing containers and building CI/CD pipelines with your existing Azure knowledge, here are the top 3 recommended courses from the provided context:

1.  **Course ID: C009**
    *   **Course Title:** Containerization with Docker and Kubernetes
    *   **Relevance:** This course directly addresses your interest in container management. It covers Docker and Kubernetes, essential tools for containerizing and orchestrating applications. It's highly relevant for deploying applications on Azure.

2.  **Course ID: C008**
    *   **Course Title:** DevOps Practices and CI/CD
    *   **Relevance:** This course is crucial for building CI/CD pipelines. It covers essential DevOps tools like Git, Jenkins/GitHub Actions, Terraform, and automated testing. It directly aligns with your goal of automating software delivery.

3.  **Course ID: C007**
    *   **Course Title:** Cloud Computing with Azure
    *   **Relevance:** This course is relevant as it covers Azure Kubernetes 

In [29]:
response_3 = rag_chain.invoke("""My background is in ML fundamentals; I’d like to specialize in neural networks and 
production workflows..""")

print(response_3.content)

Here are the top 5 recommended courses based on your background and interests, along with relevance comments:

1.  **Deep Learning with TensorFlow and Keras (C002)**: Highly relevant. This course directly addresses your interest in neural networks and provides hands-on experience with TensorFlow and Keras, essential frameworks for deep learning.

2.  **MLOps: Productionizing Machine Learning (C025)**: Highly relevant. This course is crucial for your goal of specializing in production workflows. It covers the tools and practices for deploying and maintaining ML models at scale.

3.  **Computer Vision and Image Processing (C004)**: Relevant. Since you want to specialize in neural networks, this course teaches CNNs which is a type of neural network.

4.  **Foundations of Machine Learning (C001)**: Relevant. This course will give you a better understanding of the fundamentals of machine learning.

5.  **Reinforcement Learning Basics (C005)**: Somewhat relevant. This course teaches reinforc

In [30]:
response_4 = rag_chain.invoke("""I want to learn to build and deploy microservices with Kubernetes—what courses fit 
best?""")

print(response_4.content)

Here are the top 5 recommended courses based on your interest in learning to build and deploy microservices with Kubernetes, along with their relevance and course IDs:

1.  **Containerization with Docker and Kubernetes (C009)**
    *   **Relevance:** This course directly addresses your request. It covers Docker fundamentals and Kubernetes orchestration, including deploying microservices architectures. It is highly relevant.
2.  **APIs and Microservices Architecture (C010)**
    *   **Relevance:** This course is also highly relevant. It focuses on designing and implementing APIs and microservices, covering patterns, deployment, versioning, and security. It provides the architectural knowledge needed for building microservices.
3.  **Cloud Computing with Azure (C007)**
    *   **Relevance:** This course includes Azure Kubernetes Service (AKS). If you plan to deploy your microservices on Azure, this course is very relevant, as it will give you hands-on experience with the cloud platform.


In [31]:
response_5 = rag_chain.invoke("""I’m interested in blockchain and smart contracts but have no prior experience. Which 
courses do you suggest?""")

print(response_5.content)

Based on your interest in blockchain and smart contracts with no prior experience, here's a recommended course:

**Top Recommendation:**

*   **C023 - Blockchain Technology and Smart Contracts:** This course is highly relevant as it directly addresses your interest. It covers blockchain fundamentals like cryptographic hashing, consensus algorithms, and distributed ledgers. It also teaches smart contract development using Solidity on Ethereum, covering token standards, decentralized application patterns, and security best practices. The hands-on labs provide practical experience in deploying contracts and building a decentralized marketplace.

The other courses are not relevant to your request.
