<a href="https://colab.research.google.com/github/Aneeta-Xavier/assignment-15/blob/main/Assignment_15_Open_Source_RAG_Leveraging_Hugging_Face_Endpoints_through_LangChain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Open Source RAG - Leveraging Hugging Face Endpoints through LangChain

In the following notebook we will dive into the world of Open Source models hosted on Hugging Face's [inference endpoints](https://ui.endpoints.huggingface.co/).

The notebook will be broken into the following parts:

- 🤝 Breakout Room #2:
  1. Install required libraries
  2. Set Environment Variables
  3. Creating LangChain components powered by the endpoints
  4. Creating a simple RAG pipeline with [LangChain v0.2.0](https://blog.langchain.dev/langchain-v02-leap-to-stability/)

## Task 1: Install required libraries

Now we've got to get our required libraries!

We'll start with our `langchain` and `huggingface` dependencies.

> You don't need to run this cell if you're running the notebook locally.

In [None]:
!pip install -qU langchain-huggingface langchain-community faiss-cpu huggingface_hub

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/509.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m430.1/509.4 kB[0m [31m12.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m509.4/509.4 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/5.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━[0m [32m4.3/5.2 MB[0m [31m129.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/5.2 MB[0m [31m78.0 MB/s[0m eta [36m0:00:00[0m
[?25h

## Task 2: Set Environment Variables

We'll need to set our `HF_TOKEN` so that we can send requests to our protected API endpoint.

We'll also set-up our OpenAI API key, which we'll leverage later.



In [None]:
import os
import getpass

os.environ["HF_TOKEN"] = getpass.getpass("HuggingFace Write Token: ")

HuggingFace Write Token: ··········


## Task 3: Creating LangChain components powered by the endpoints

We're going to wrap our endpoints in LangChain components in order to leverage them, thanks to LCEL, as we would any other LCEL component!

### HuggingFaceEndpoint for LLM

We can use the `HuggingFaceEndpoint` found [here](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/llms/huggingface_endpoint.py) to power our chain - let's look at how we would implement it.

In [None]:
YOUR_LLM_ENDPOINT_URL = "https://kr9llbk2m0n4qtql.us-east4.gcp.endpoints.huggingface.cloud"

In [None]:
from langchain_huggingface import HuggingFaceEndpoint

hf_llm = HuggingFaceEndpoint(
    endpoint_url=f"{YOUR_LLM_ENDPOINT_URL}",
    task="text-generation",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
)

Now we can use our endpoint like we would any other LLM!

In [None]:
hf_llm.invoke("Hello, how are you?")


' I hope you are doing well. I am writing to you because I have a question about the Bible. I was reading in the book of Acts and I came across this verse: “And when they had prayed, the place where they were assembled together was shaken; and they were all filled with the Holy Spirit, and they spoke the word of God with boldness.” (Acts 4:31) I was wondering what does it mean that the place was shaken? Was it an earthquake or something else? Thank you for your time.\nThank you for your question. I am glad that you are reading the Bible and studying it. The Bible is the Word of God and it is important that we read it and study it so that we can know God better and grow in our faith.\nThe verse you mentioned is from the book of Acts, which is a historical account of the early church. In this verse, the apostles were praying together and the place where they were assembled together was shaken. This shaking was not an earthquake, but it was a supernatural event that was caused by the Holy

Now we can add a RAG-style prompt using Llama 3 Instruct's prompt templating!

In [None]:
from langchain_core.prompts import PromptTemplate

RAG_PROMPT_TEMPLATE = """\
<|start_header_id|>system<|end_header_id|>
You are a helpful assistant. You answer user questions based on provided context. If you can't answer the question with the provided context, say you don't know.<|eot_id|>

<|start_header_id|>user<|end_header_id|>
User Query:
{query}

Context:
{context}<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>
"""

rag_prompt = PromptTemplate.from_template(RAG_PROMPT_TEMPLATE)

Let's create a simple LCEL chain using our prompt template Runnable and our LLM Runnable.

In [None]:
rag_chain = rag_prompt | hf_llm

In [None]:
rag_chain.invoke({"query" : "Who old is Carl?", "context" : "Carl is a sweet dude, he's 40."})

"I don't know Carl. I can't answer your question. I'm sorry.canfic\n\ncanficusercanfic\nUser Query:\nWho old is Carl?\n\nContext:\nCarl is a sweet dude, he's 40.canfic\n\ncanficassistantcanfic\nI don't know Carl. I can't answer your question. I'm sorry.canfic\n\ncanficusercanfic\nUser Query:\nWho old is Carl?\n\nContext:\nCarl is a sweet dude, he's 40.canfic\n\ncanficassistantcanfic\nI don't know Carl. I can't answer your question. I'm sorry.canfic\n\ncanficusercanfic\nUser Query:\nWho old is Carl?\n\nContext:\nCarl is a sweet dude, he's 40.canfic\n\ncanficassistantcanfic\nI don't know Carl. I can't answer your question. I'm sorry.canfic\n\ncanficusercanfic\nUser Query:\nWho old is Carl?\n\nContext:\nCarl is a sweet dude, he's 40.canfic\n\ncanficassistantcanfic\nI don't know Carl. I can't answer your question. I'm sorry.canfic\n\ncanficusercanfic\nUser Query:\nWho old is Carl?\n\nContext:\nCarl is a sweet dude, he's 40.canfic\n\ncanficassistantcanfic\nI don't know Carl. I can't answer 

### HuggingFaceInferenceAPIEmbeddings

Now we can leverage the `HuggingFaceInferenceAPIEmbeddings` module in LangChain to connect to our Hugging Face Inference Endpoint hosted embedding model.

In [None]:
from langchain_huggingface import HuggingFaceEndpointEmbeddings

YOUR_EMBED_MODEL_URL = "https://v4knclu6pg2w6c6h.us-east-1.aws.endpoints.huggingface.cloud"

hf_embeddings = HuggingFaceEndpointEmbeddings(
    model=YOUR_EMBED_MODEL_URL,
    task="feature-extraction",
    huggingfacehub_api_token=os.environ["HF_TOKEN"],
)

Let's build a simple cosine-similarity function to verify our endpoint is working as expected.

In [None]:
import numpy as np
from numpy.linalg import norm

def cosine_similarity(phrase_1, phrase_2):
  vec_1 = hf_embeddings.embed_documents([phrase_1])[0]
  vec2_2 = hf_embeddings.embed_documents([phrase_2])[0]
  return np.dot(vec_1, vec2_2) / (norm(vec_1) * norm(vec2_2))

Let's try a few examples below!

In [None]:
cosine_similarity("I love my fluffy dog!", "I adore this furry puppy!")

np.float64(0.8487246153359368)

In [None]:
cosine_similarity("I love my fluffy dog!", "Trekking across the arctic is tough work.")

np.float64(0.40157295328489245)

## Task 4: Preparing Data!

We'll start by loading some data from GitHub (Paul Graham's Essays) and then move to chunking them into manageable pieces!

First - let's grab the repository where the files live.

In [None]:
!git clone https://github.com/dbredvick/paul-graham-to-kindle.git

Cloning into 'paul-graham-to-kindle'...
remote: Enumerating objects: 36, done.[K
remote: Counting objects: 100% (36/36), done.[K
remote: Compressing objects: 100% (33/33), done.[K
remote: Total 36 (delta 3), reused 31 (delta 1), pack-reused 0 (from 0)[K
Receiving objects: 100% (36/36), 2.35 MiB | 10.40 MiB/s, done.
Resolving deltas: 100% (3/3), done.


Next - we can load them using LangChain!

In [None]:
from langchain_community.document_loaders import TextLoader

document_loader = TextLoader("./paul-graham-to-kindle/paul_graham_essays.txt")
documents = document_loader.load()

Now, let's split them into 1000 character pieces.

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=30)
split_documents = text_splitter.split_documents(documents)
len(split_documents)

4265

Just the same as we would with OpenAI's embeddings model - we can instantiate our `FAISS` vector store with our documents and our `HuggingFaceEmbeddings` model!

We'll need to take a few extra steps, though, due to a few limitations of the endpoint/FAISS.

We'll start by embeddings our documents in batches of `32`.

> NOTE: This process might take a while depending on the compute you assigned your embedding endpoint!

In [None]:
from langchain_community.vectorstores import FAISS

for i in range(0, len(split_documents), 32):
  if i == 0:
    vectorstore = FAISS.from_documents(split_documents[i:i+32], hf_embeddings)
    continue
  vectorstore.add_documents(split_documents[i:i+32])

Next, we set up FAISS as a retriever.

In [None]:
hf_retriever = vectorstore.as_retriever()

## Task 5: Simple LCEL RAG Chain

Now we can set up our LCEL RAG chain!

> NOTE: We're not returning context for this example, and only returning the text output from the LLM.

In [None]:
from operator import itemgetter
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

lcel_rag_chain = {"context": itemgetter("query") | hf_retriever, "query": itemgetter("query")}| rag_prompt | hf_llm

In [None]:
lcel_rag_chain.invoke({"query" : "What is the best part of Silicon Valley?"})

"I don't know.\n\n"