# **RAG Application** with LangChain and HuggingFace LLM

In [1]:
# Install the necessary packages
!pip install torch -q
!pip install transformers -q
!pip install numpy -q
!pip install langchain -q
!pip install langchain_community -q
!pip install langchain-chroma -q
!pip install sentence_transformers -q

In [2]:
import os
from google.colab import userdata

### Initialize HuggingFace LLM

Model repo url: https://huggingface.co/mistralai/Mistral-7B-v0.1

In [3]:
from langchain_community.llms import HuggingFaceHub

# Initialize the HuggingFace llm
llm = HuggingFaceHub(
      repo_id = "mistralai/Mistral-7B-v0.1",
      model_kwargs = {"temperature":0.1, "max_length":500},
      huggingfacehub_api_token = userdata.get('HUGGINGFACE_API_KEY'))

  llm = HuggingFaceHub(


### Initialize Embedding Model

Model url: https://sbert.net/

In [4]:
from langchain.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(
  model_name="sentence-transformers/all-mpnet-base-v2"
)

  embedding_model = HuggingFaceEmbeddings(
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


### Initialize Output Parser

In [5]:
from langchain.schema.output_parser import StrOutputParser

output_parser=StrOutputParser()

### Load PDF Document

In [6]:
!pip install pypdf -qU

In [7]:
from langchain_community.document_loaders import PyPDFLoader

# Load the PDF document
loader = PyPDFLoader("/content/codeprolk.pdf")

docs = loader.load()

In [8]:
len(docs)

4

In [9]:
docs[0]

Document(metadata={'source': '/content/codeprolk.pdf', 'page': 0}, page_content="Introduction to CodePRO LK  \nCodePRO LK  is a dynamic educational platform that offers a diverse range of technology -\nrelated courses in Sinhala, aimed at empowering Sri Lankans with valuable skills in \nprogramming, data science, and machine learning. Founded by Dinesh Piyasamara  during the \nCOVID -19 pandemic, CodePRO LK addresses the growing need for accessible, high -quality \ntech education tailored to the local community.  \n \nFounding and Vision  \nOrigin and Motivation  \nThe inception of CodePRO LK was driven by the challenges posed by the COVID -19 pandemic, \nwhich highlighted the necessity for remote learning and digital skills. Recognizing this, Dinesh \nPiyasamara launched CodePRO LK to provide Sri Lankan students with the  tools and knowledge \nto thrive in a digitally -driven world, all through their native language.  \nVision and Mission  \n• Vision : To assist talented Sri Lankans i

### Split Documents into Chunks

In [10]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Initialize the text splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)

# Split the documents into chunks
splits = text_splitter.split_documents(docs)

In [11]:
len(splits)

20

### Create Vector Store and Retriever

In [12]:
from langchain_chroma import Chroma

# Create a vector store from the document chunks
vectorstore = Chroma.from_documents(documents=splits, embedding=embedding_model)

In [20]:
# Create a retriever from the vector store
retriever = vectorstore.as_retriever()

### Define Prompt Template

In [21]:
from langchain.prompts import ChatPromptTemplate

# Define prompt template
template = """
Answer this question using the provided context only.

{question}

Context:
{context}

Answer:
"""

prompt=ChatPromptTemplate.from_template(template)

In [22]:
prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='\nAnswer this question using the provided context only.\n\n{question}\n\nContext:\n{context}\n\nAnswer:\n'), additional_kwargs={})])

### Chain Retriever and Prompt Template with LLM

In [23]:
from langchain.schema.runnable import RunnablePassthrough

chain = (
    {"context": retriever,  "question": RunnablePassthrough()}
    | prompt
    | llm
    | output_parser
)

#### Invoke RAG Chain with Example Questions

In [24]:
response = chain.invoke("who is codeprolk?")
print(response)

Human: 
Answer this question using the provided context only.

who is codeprolk?

Context:
[Document(metadata={'page': 3, 'source': '/content/codeprolk.pdf'}, page_content='Partnerships and Collaborations  \nCodePRO LK is exploring partnerships with educational institutions, tech companies, and \nindustry experts to enrich its content and provide learners with access to a broader range of \nresources and opportunities. These collaborations aim to bridge the gap be tween education and \nindustry, ensuring that learners are well -prepared for real -world challenges.'), Document(metadata={'page': 1, 'source': '/content/codeprolk.pdf'}, page_content='Community and Support  \nCodePRO LK has cultivated a vibrant community where learners can interact, share insights, and \nsupport each other. Additionally, the platform offers consultation services for personalized \nlearning support.  \n \nCodePRO LK YouTube Channel  \nOverview  \nThe CodePRO LK YouTube Channel  is a crucial extension of the 

In [28]:
response = chain.invoke("what are the courses they offer")
print(response)

Human: 
Answer this question using the provided context only.

what are the courses they offer

Context:
[Document(metadata={'page': 0, 'source': '/content/codeprolk.pdf'}, page_content='Course Offerings  \nVariety and Accessibility  \nCodePRO LK stands out for its wide array of free courses, all presented in Sinhala. The courses \ncater to various proficiency levels, from beginners to intermediates, ensuring that learners of all \nstages can benefit.  \nKey Courses  \n1. Python GUI – Tkinter : This course covers the essentials of creating graphical user'), Document(metadata={'page': 1, 'source': '/content/codeprolk.pdf'}, page_content='Learning Experience  \nCourse Structure  \nEach course is meticulously structured to provide a holistic learning experience, comprising:  \n• Video Lectures : Detailed tutorials that break down complex concepts.  \n• Quizzes : Interactive quizzes to reinforce learning.  \n• Assignments : Hands -on tasks to apply theoretical knowledge.  \nCommunity and S

In [29]:
response = chain.invoke("what are the popular videos in codeprolk youtube channel")
print(response)

Human: 
Answer this question using the provided context only.

what are the popular videos in codeprolk youtube channel

Context:
[Document(metadata={'page': 2, 'source': '/content/codeprolk.pdf'}, page_content='Community Engagement  \nThe YouTube channel has amassed a substantial following, with thousands of subscribers \nactively engaging with the content. Viewers frequently leave comments expressing their \nappreciation and sharing how the videos have assisted them in their learning journ eys. \nImpact  \nThe CodePRO LK YouTube channel has played a significant role in democratizing tech'), Document(metadata={'page': 1, 'source': '/content/codeprolk.pdf'}, page_content='Community and Support  \nCodePRO LK has cultivated a vibrant community where learners can interact, share insights, and \nsupport each other. Additionally, the platform offers consultation services for personalized \nlearning support.  \n \nCodePRO LK YouTube Channel  \nOverview  \nThe CodePRO LK YouTube Channel  is a