### RAG with Gemma and Langchain
> Summarize an URL using Gemma 

🌐 WebBaseLoader: Loads data from a URL.

🤗 HuggingFaceEmbeddings: Generates embeddings using the Hugging Face library.

🗃️ FAISS: Stores and searches through vectors.

🤖🧠 google/gemma-2b-it: The Gemma model loaded from Hugging Face.

[Hemanth HM](https://h3manth.com)

In [None]:
# Install the deps
!pip install langchain langchain_community transformers sentence_transformers faiss-cpu langchainhub -q
!pip install huggingface_hub -q

In [34]:
# Helper function
import textwrap

from IPython.display import display
from IPython.display import Markdown

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

In [35]:
# Import dependencies
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain import hub
from langchain.chains import RetrievalQA
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

In [5]:
# Load and split documents
loader = WebBaseLoader("https://h3manth.com")
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

In [6]:
# Create vector store
vectorstore = FAISS.from_documents(documents=all_splits, embedding=HuggingFaceEmbeddings())

In [24]:
# Create a prompt
from langchain.prompts import PromptTemplate

prompt_template  = """
<start_of_turn>user
Answer the question based on your knowledge and the provided context. 

**Before responding:**

* Think carefully about the question and the context.
* Write your step-by-step reasoning process.
* Explain your answer clearly and concisely.

**Question:**

{question}

**Context:**

{context}
<end_of_turn>
<start_of_turn>model
"""
prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
)


In [25]:
# Load the Model
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [27]:
# Create the pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=1000,
    temperature=0.1,
    top_p=0.95,
    repetition_penalty=1.15,
    do_sample=True,
)

In [28]:
# Create LLM instance 
from langchain_core.output_parsers import StrOutputParser

llm = HuggingFacePipeline(pipeline=pipe)
llm_chain = prompt | llm | StrOutputParser()

In [29]:
# Create RetrievalQA chain
from langchain_core.runnables import RunnablePassthrough

retriever = vectorstore.as_retriever()

rag_chain = (
 {"context": retriever, "question": RunnablePassthrough()}
    | llm_chain
)

In [36]:
# Ask question
question = "Summarize the data in 8 points"
summary = rag_chain.invoke(question)
to_markdown(summary)

> Sure, here is a summary of the data in 8 points:
> 
> * Hemanth.HM's paws on tech.
> * DuckDuckGo community leader.
> * Member of Node.js Foundation.
> * Google Launchpad Accelerator Mentor.
> * TC39er.us podcast host.
> * Curates JSfeatures.in.
> * Co-hosts ReadSpecWith.us.