### Assignment 2: Personalized Course Recommendation Engine


1. Background & Context
Online learning platforms host thousands of courses across domains—learners often feel 
overwhelmed by choices. A personalized recommender that understands both course 
content and individual learner profiles can boost engagement and completion rates by 
suggesting the most relevant next steps.

2. Problem Statement
“Design and implement a Course Recommendation Engine that—given a user query
(completed courses + a short interests blurb)—returns the top-5 most relevant courses 
from a catalog of course offerings, using embedding models and a vector database for 
semantic matching.”


### Installing necessary packages

In [2]:
!python -m pip install pypdf sentence-transformers faiss-cpu --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


#### Importing libraries

In [4]:

from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import pandas as pd


  from .autonotebook import tqdm as notebook_tqdm


### Assigning the embedding model and model name for LLM

In [3]:
import os

embedding_model_name = "models/gemini-embedding-001"
model_name = "gemini-2.0-flash"

### Reading the dataset

In [5]:
df=pd.read_csv("dummydatadataset.csv")
df.head(5)

Unnamed: 0,course_id,title,description
0,C001,Foundations of Machine Learning,Understand foundational machine learning algor...
1,C002,Deep Learning with TensorFlow and Keras,Explore neural network architectures using Ten...
2,C003,Natural Language Processing Fundamentals,Dive into NLP techniques for processing and un...
3,C004,Computer Vision and Image Processing,Learn the principles of computer vision and im...
4,C005,Reinforcement Learning Basics,Get introduced to reinforcement learning parad...


In [6]:
df["title_descrp"]=df["title"]+"//"+df["description"]

In [7]:
df.head(5)

Unnamed: 0,course_id,title,description,title_descrp
0,C001,Foundations of Machine Learning,Understand foundational machine learning algor...,Foundations of Machine Learning//Understand fo...
1,C002,Deep Learning with TensorFlow and Keras,Explore neural network architectures using Ten...,Deep Learning with TensorFlow and Keras//Explo...
2,C003,Natural Language Processing Fundamentals,Dive into NLP techniques for processing and un...,Natural Language Processing Fundamentals//Dive...
3,C004,Computer Vision and Image Processing,Learn the principles of computer vision and im...,Computer Vision and Image Processing//Learn th...
4,C005,Reinforcement Learning Basics,Get introduced to reinforcement learning parad...,Reinforcement Learning Basics//Get introduced ...


### Generating vector embeddings

In [8]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings(model=embedding_model_name)

In [23]:
import os
vector_db_path = "VectorDB_Chroma_assignment_2"
os.makedirs(vector_db_path,exist_ok=True)

In [None]:
from langchain_chroma import Chroma
vectorstore = Chroma.from_texts(texts=df["title_descrp"].to_list(),
                                 embedding= embeddings, 
                                 persist_directory=vector_db_path,collection_name="dummydata",
                                    collection_metadata={"use_type":"TRAINING AND EXPERIMENTATION"} ,
                                    metadatas=[{"course_id": cid} for cid in df["course_id"]]) ###adding course_id as metadata

In [25]:
print("Number of docs dumped into vector DB")
print(len(vectorstore.get()['ids']))

Number of docs dumped into vector DB
25


In [26]:
# using vector db object to initialize a retriever object - to perform vector search/retrieval
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})

In [27]:
retrieved_docs = retriever.invoke("Python Programming for data science")
len(retrieved_docs)

5

In [None]:
#### testing
ret_docs = vectorstore.similarity_search_with_score("Python programming for data science",k=5)
print(ret_docs[0])

(Document(id='0f4b0ec5-5161-4b84-a758-537c466883d7', metadata={'course_id': 'C016'}, page_content='Python Programming for Data Science//Learn Python fundamentals for data science: variables, control flow, functions, and object-oriented programming. Advance to data handling with pandas, numerical computing with NumPy, and basic plotting with matplotlib. You’ll build reproducible data workflows, clean and transform datasets, and perform exploratory analysis, laying the groundwork for machine learning and statistical modeling projects.'), 0.16565607488155365)


In [29]:
retrieved_docs[1].page_content

'Foundations of Machine Learning//Understand foundational machine learning algorithms including regression, classification, clustering, and dimensionality reduction. This course covers data pre-processing, feature engineering, model selection, hyperparameter tuning, and evaluation metrics. Hands-on labs use scikit-learn and Python to implement end-to-end workflows on real-world datasets, preparing learners for practical machine learning applications with interactive engaging exercises.'

In [32]:

   # Perform similarity search in Chroma vector store
results = vectorstore.similarity_search_with_score("python programming for data science", k=5)

# Extract course IDs and similarity scores
recommendations = [(doc.metadata["course_id"], score) for doc, score in results]


In [34]:
print(results)

[(Document(id='0f4b0ec5-5161-4b84-a758-537c466883d7', metadata={'course_id': 'C016'}, page_content='Python Programming for Data Science//Learn Python fundamentals for data science: variables, control flow, functions, and object-oriented programming. Advance to data handling with pandas, numerical computing with NumPy, and basic plotting with matplotlib. You’ll build reproducible data workflows, clean and transform datasets, and perform exploratory analysis, laying the groundwork for machine learning and statistical modeling projects.'), 0.21110709011554718), (Document(id='5d398890-3a56-4a76-8bde-35b6634ce89a', metadata={'course_id': 'C001'}, page_content='Foundations of Machine Learning//Understand foundational machine learning algorithms including regression, classification, clustering, and dimensionality reduction. This course covers data pre-processing, feature engineering, model selection, hyperparameter tuning, and evaluation metrics. Hands-on labs use scikit-learn and Python to i

In [33]:
print(recommendations)

[('C016', 0.21110709011554718), ('C001', 0.39871352910995483), ('C017', 0.4062874913215637), ('C003', 0.42717573046684265), ('C002', 0.44606900215148926)]


### Generation

In [60]:
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """
Answer this question using the provided context only. list recommendations and comment on relevance. Give the course id as well. ensure to give top 5 recoomended courses
.Give the response in markdown format.
{question}

Context:
{context}
"""

prompt = PromptTemplate.from_template(message)
prompt

PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='\nAnswer this question using the provided context only. list recommendations and comment on relevance. Give the course id as well. ensure to give top 5 recoomended courses\n.Give the response in markdown format.\n{question}\n\nContext:\n{context}\n')

In [61]:
print(retriever)

tags=['Chroma', 'GoogleGenerativeAIEmbeddings'] vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x7a76527eff50> search_kwargs={'k': 5}


In [62]:
from langchain.chat_models import init_chat_model
llm = init_chat_model(model_name, model_provider="google_genai")

rag_chain = {"context": retriever, "question": RunnablePassthrough()} | prompt | llm #using langchain for chaining Runnable pass through is to catch the question in runtime


In [63]:
response = rag_chain.invoke("python programming")

print(response.content)

Here are the top 5 recommended courses based on the context, along with their relevance to Python programming:

1.  **Course ID:** C016
    *   **Course Title:** Python Programming for Data Science
    *   **Relevance:** This course is directly relevant as it focuses on Python programming fundamentals specifically for data science, covering essential libraries like pandas, NumPy, and matplotlib.

2.  **Course ID:** C001
    *   **Course Title:** Foundations of Machine Learning
    *   **Relevance:** Highly relevant. While not solely focused on Python, it uses Python and scikit-learn for implementing machine learning algorithms.

3.  **Course ID:** C004
    *   **Course Title:** Computer Vision and Image Processing
    *   **Relevance:** Relevant. This course uses Python libraries like OpenCV, scikit-image, and TensorFlow for computer vision tasks.

4.  **Course ID:** C003
    *   **Course Title:** Natural Language Processing Fundamentals
    *   **Relevance:** Relevant. This course use

### Test Results

#### 1. “I’ve completed the ‘Python Programming for Data Science’ course and enjoy data visualization. What should I take next?”

In [64]:
response_1 = rag_chain.invoke("""I’ve completed the ‘Python Programming for Data Science’ course and enjoy data 
visualization. What should I take next?""")

print(response_1.content)

Okay, based on your completion of 'Python Programming for Data Science' and your interest in data visualization, here are my top 5 recommended courses:

1.  **Data Visualization with Tableau (C014)**
    *   **Relevance:** Directly addresses your interest in data visualization. Since you enjoyed data visualization and have a Python background, Tableau will provide a strong platform to expand those skills.
2.  **R Programming and Statistical Analysis (C017)**
    *   **Relevance:** Builds on your programming skills and introduces another popular language for data analysis and visualization. Knowledge of R and its visualization libraries (ggplot2) would be valuable.
3.  **SQL for Data Analysis (C012)**
    *   **Relevance:** Essential for data professionals. Being able to extract and manipulate data using SQL is crucial for creating visualizations from various data sources.
4.  **Foundations of Machine Learning (C001)**
    *   **Relevance:** Provides a strong foundation for machine lear

#### 2. I know Azure basics and want to manage containers and build CI/CD pipelines. Recommend courses.”

In [65]:
response_2 = rag_chain.invoke("""I know Azure basics and want to manage containers and build CI/CD pipelines. 
Recommend courses.""")

print(response_2.content)

Based on your interest in managing containers and building CI/CD pipelines after learning Azure basics, here are the top 3 recommended courses:

1.  **Containerization with Docker and Kubernetes (C009)**: This course is highly relevant as it directly addresses container management using Docker and Kubernetes. You'll learn the fundamentals of containerization, orchestration, and deployment of microservices, which aligns perfectly with your goals.
2.  **DevOps Practices and CI/CD (C008)**: This course is highly relevant as it covers essential DevOps methodologies, including CI/CD pipeline implementation, version control with Git, and automation using tools like Jenkins or GitHub Actions. It also touches upon container registry integration, which complements the containerization course.
3.  **Cloud Computing with Azure (C007)**: This course is relevant as it will allow you to further build your Azure skills in areas such as Azure Kubernetes Service and other relevant services.

The remain

#### 3.My background is in ML fundamentals; I’d like to specialize in neural networks and production workflows.

In [66]:
response_3 = rag_chain.invoke("""My background is in ML fundamentals; I’d like to specialize in neural networks and 
production workflows..""")

print(response_3.content)

Based on your background in ML fundamentals and interest in specializing in neural networks and production workflows, here are my top 5 course recommendations:

1.  **Course ID:** C002

    **Course Title:** Deep Learning with TensorFlow and Keras

    **Relevance:** This course is highly relevant as it directly focuses on neural network architectures using TensorFlow and Keras. It covers various types of neural networks (feedforward, CNNs, RNNs) and their applications, which is exactly what you're looking for.

2.  **Course ID:** C025

    **Course Title:** MLOps: Productionizing Machine Learning

    **Relevance:** This course is essential for your interest in production workflows. It covers the practices needed to deploy and maintain ML models at scale, including model versioning, CI/CD, monitoring, and data drift detection.

3.  **Course ID:** C004

    **Course Title:** Computer Vision and Image Processing

    **Relevance:** This course teaches the principles of computer vision a

#### 4. I want to learn to build and deploy microservices with Kubernetes—what courses fit best?

In [67]:
response_4 = rag_chain.invoke("""I want to learn to build and deploy microservices with Kubernetes—what courses fit 
best?""")

print(response_4.content)

Based on your request to learn how to build and deploy microservices with Kubernetes, here are the top 5 recommended courses from the provided context:

1.  **C009 - Containerization with Docker and Kubernetes:** This course is highly relevant as it directly covers container fundamentals with Docker and Kubernetes, including orchestration, deployments, services, ingress, cluster provisioning, autoscaling, rolling updates, and Helm. The hands-on labs deploy microservices architectures, making it an ideal choice.
2.  **C010 - APIs and Microservices Architecture:** This course is highly relevant as it focuses on designing and implementing RESTful and GraphQL APIs for microservices. It covers microservices patterns like service discovery, circuit breakers, API gateways, containerized deployment, versioning, and security. The labs guide you through building, testing, and deploying interconnected services.
3.  **C008 - DevOps Practices and CI/CD:** This course is relevant as it teaches DevOp

#### 5. I’m interested in blockchain and smart contracts but have no prior experience. Which courses do you suggest?

In [68]:
response_5 = rag_chain.invoke("""I’m interested in blockchain and smart contracts but have no prior experience. Which 
courses do you suggest?""")

print(response_5.content)

Based on your interest in blockchain and smart contracts with no prior experience, here are the top recommended courses from the provided context:

**Top 1 Recommended Course:**

*   **Course ID:** C023
*   **Course Title:** Blockchain Technology and Smart Contracts
*   **Relevance:** This course is highly relevant. It directly addresses your stated interest in blockchain and smart contracts. It covers fundamental concepts like cryptographic hashing, consensus algorithms, and distributed ledgers, as well as practical skills like developing smart contracts using Solidity on Ethereum. It also covers token standards, DApp patterns, and security, making it comprehensive for beginners.

Since only one course is directly related to the request, here are some courses that might be helpful to learn before or after the main course.

**Additional Recommended Courses (to complement C023):**
*(These courses are recommended to enhance your skills in related areas that might be beneficial for blockc