# Open Source RAG - Leveraging Hugging Face Endpoints through LangChain

In the following notebook we will dive into the world of Open Source models hosted on Hugging Face's [inference endpoints](https://ui.endpoints.huggingface.co/).

The notebook will be broken into the following parts:

- 🤝 Breakout Room #2:
  1. Install required libraries
  2. Set Environment Variables
  3. Creating LangChain components powered by the endpoints
  4. Creating a simple RAG pipeline with [LangChain v0.2.0](https://blog.langchain.dev/langchain-v02-leap-to-stability/)

## Task 1: Install required libraries

Now we've got to get our required libraries!

We'll start with our `langchain` and `huggingface` dependencies.

> You don't need to run this cell if you're running the notebook locally.

In [29]:
!pip install langchain-huggingface

Defaulting to user installation because normal site-packages is not writeable


In [1]:
# !pip install -qU langchain-huggingface langchain-community faiss-cpu

!pip install chainlit langchain-huggingface langchain-community langchain-text-splitters langchain-core faiss-cpu tqdm python-dotenv

Defaulting to user installation because normal site-packages is not writeable


In [32]:
from langchain_community.llms import HuggingFaceEndpoint
from langchain_huggingface import HuggingFaceEndpoint
from langchain_community.llms import HuggingFaceEndpoint

In [34]:
!pip install langchain-community

Defaulting to user installation because normal site-packages is not writeable


In [35]:
import huggingface_hub
print(huggingface_hub.__version__)  # Should output '0.27.0'
print(huggingface_hub.__file__)     # Show the path where it is installed

0.27.0
C:\Users\dabra\AppData\Roaming\Python\Python313\site-packages\huggingface_hub\__init__.py


In [2]:
from IPython.display import HTML, display

def set_output_wrapping():
    display(HTML('''
    <style>
    pre {
        white-space: pre-wrap;
        word-wrap: break-word;
        max-width: 100%;
        overflow-x: hidden;
    }
    .output_area {
        white-space: pre-wrap;
        word-wrap: break-word;
        max-width: 100%;
        overflow-x: hidden;
    }
    .output_text {
        white-space: pre-wrap;
        word-wrap: break-word;
    }
        div.output {
        white-space: pre-wrap;
        word-wrap: break-word;
        max-width: 100%;
    }
    span {
        white-space: pre-wrap;
        word-wrap: break-word;
    }
    </style>
    '''))
set_output_wrapping()

## Task 2: Set Environment Variables

We'll need to set our `HF_TOKEN` so that we can send requests to our protected API endpoint.

We'll also set-up our OpenAI API key, which we'll leverage later.



In [3]:
import os
import getpass

os.environ["HF_TOKEN"] = getpass.getpass("HuggingFace Write Token: ")

## Task 3: Creating LangChain components powered by the endpoints

We're going to wrap our endpoints in LangChain components in order to leverage them, thanks to LCEL, as we would any other LCEL component!

### HuggingFaceEndpoint for LLM

We can use the `HuggingFaceEndpoint` found [here](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/llms/huggingface_endpoint.py) to power our chain - let's look at how we would implement it.

In [4]:
YOUR_LLM_ENDPOINT_URL = "https://plqere42yvuvlaq7.us-east-1.aws.endpoints.huggingface.cloud"

In [5]:
from langchain_community.llms import HuggingFaceEndpoint

hf_llm = HuggingFaceEndpoint(
    endpoint_url=f"{YOUR_LLM_ENDPOINT_URL}",
    task="text-generation",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
)

  hf_llm = HuggingFaceEndpoint(


Now we can use our endpoint like we would any other LLM!

In [6]:
hf_llm.invoke("Hello, how are you?")

' I am doing well, thanks for asking. I have been busy with work and other things, but I always make time for my blog. I have a few new posts lined up, so stay tuned!\nIn the meantime, I wanted to share some of my favorite books from 2022. I know it\'s already 2023, but I\'m a bit behind on my reading list, and I wanted to give you a sneak peek into what I\'ve been enjoying lately.\nHere are my top picks from 2022:\n1. "The Seven Husbands of Evelyn Hugo" by Taylor Jenkins Reid - This book is a sweeping romance that follows the life of reclusive Hollywood star Evelyn Hugo. It\'s a beautifully written story about love, identity, and the power of storytelling.\n2. "The Last Thing He Told Me" by Laura Dave - This novel is a mystery/thriller that follows a woman who discovers her husband\'s dark secrets after his disappearance. It\'s a gripping page-turner that explores themes of marriage, family, and deception.\n3. "The Invisible Life of Addie LaRue" by V.E. Schwab - This book is a histori

Now we can add a RAG-style prompt using Llama 3 Instruct's prompt templating!

In [7]:
from langchain_core.prompts import PromptTemplate

RAG_PROMPT_TEMPLATE = """\
<|start_header_id|>system<|end_header_id|>
You are a helpful assistant. You answer user questions based on provided context. If you can't answer the question with the provided context, say you don't know.<|eot_id|>

<|start_header_id|>user<|end_header_id|>
User Query:
{query}

Context:
{context}<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>
"""

rag_prompt = PromptTemplate.from_template(RAG_PROMPT_TEMPLATE)

Let's create a simple LCEL chain using our prompt template Runnable and our LLM Runnable.

In [8]:
rag_chain = rag_prompt | hf_llm

In [11]:
rag_chain.invoke({"query" : "Who old is Carl?", "context" : "Carl is a sweet dude, he's 40."})

'Carl is 40 years old.'

### HuggingFaceInferenceAPIEmbeddings

Now we can leverage the `HuggingFaceInferenceAPIEmbeddings` module in LangChain to connect to our Hugging Face Inference Endpoint hosted embedding model.

In [12]:
from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings

YOUR_EMBED_MODEL_URL = "https://yij47k87rjciyxy7.us-east-1.aws.endpoints.huggingface.cloud"

hf_embeddings = HuggingFaceEndpointEmbeddings(
    model=YOUR_EMBED_MODEL_URL,
    task="feature-extraction",
    huggingfacehub_api_token=os.environ["HF_TOKEN"],
)

Let's build a simple cosine-similarity function to verify our endpoint is working as expected.

In [13]:
import numpy as np
from numpy.linalg import norm

def cosine_similarity(phrase_1, phrase_2):
  vec_1 = hf_embeddings.embed_documents([phrase_1])[0]
  vec2_2 = hf_embeddings.embed_documents([phrase_2])[0]
  return np.dot(vec_1, vec2_2) / (norm(vec_1) * norm(vec2_2))

Let's try a few examples below!

In [14]:
cosine_similarity("I love my fluffy dog!", "I adore this furry puppy!")

0.8903063446222084

In [15]:
cosine_similarity("I love my fluffy dog!", "Trekking across the arctic is tough work.")

0.6667445107282144

## Task 4: Preparing Data!

We'll start by loading some data from GitHub (Paul Graham's Essays) and then move to chunking them into manageable pieces!

First - let's grab the repository where the files live.

In [16]:
!git clone https://github.com/dbredvick/paul-graham-to-kindle.git

Cloning into 'paul-graham-to-kindle'...


Next - we can load them using LangChain!

In [17]:
from langchain_community.document_loaders import TextLoader

document_loader = TextLoader("./paul-graham-to-kindle/paul_graham_essays.txt")
documents = document_loader.load()

Now, let's split them into 1000 character pieces.

In [18]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=30)
split_documents = text_splitter.split_documents(documents)
len(split_documents)

4267

Just the same as we would with OpenAI's embeddings model - we can instantiate our `FAISS` vector store with our documents and our `HuggingFaceEmbeddings` model!

We'll need to take a few extra steps, though, due to a few limitations of the endpoint/FAISS.

We'll start by embeddings our documents in batches of `32`.

> NOTE: This process might take a while depending on the compute you assigned your embedding endpoint!

In [19]:
from langchain_community.vectorstores import FAISS

for i in range(0, len(split_documents), 32):
  if i == 0:
    vectorstore = FAISS.from_documents(split_documents[i:i+32], hf_embeddings)
    continue
  vectorstore.add_documents(split_documents[i:i+32])

In [20]:
from langchain_community.vectorstores import FAISS
import numpy as np

vectorstore = None

for i in range(0, len(split_documents), 32):
    texts = [doc.page_content for doc in split_documents[i:i+32]]
    embeddings = hf_embeddings.embed_documents(texts)

    # 🔹 Ensure embeddings are in fixed-length format
    embeddings = [np.mean(e, axis=0) if isinstance(e[0], list) else e for e in embeddings]

    # 🔹 Convert to a proper 2D NumPy array
    embeddings = np.vstack(embeddings).astype(np.float32)

    print(f"Batch {i//32}: embeddings shape {embeddings.shape}")  # Debugging

    # ✅ Fix: Create list of (text, embedding) tuples
    text_embedding_pairs = list(zip(texts, embeddings))

    if vectorstore is None:
        vectorstore = FAISS.from_embeddings(text_embedding_pairs, hf_embeddings)
    else:
        vectorstore.add_embeddings(text_embedding_pairs)

Batch 0: embeddings shape (32, 768)
Batch 1: embeddings shape (32, 768)
Batch 2: embeddings shape (32, 768)
Batch 3: embeddings shape (32, 768)
Batch 4: embeddings shape (32, 768)
Batch 5: embeddings shape (32, 768)
Batch 6: embeddings shape (32, 768)
Batch 7: embeddings shape (32, 768)
Batch 8: embeddings shape (32, 768)
Batch 9: embeddings shape (32, 768)
Batch 10: embeddings shape (32, 768)
Batch 11: embeddings shape (32, 768)
Batch 12: embeddings shape (32, 768)
Batch 13: embeddings shape (32, 768)
Batch 14: embeddings shape (32, 768)
Batch 15: embeddings shape (32, 768)
Batch 16: embeddings shape (32, 768)
Batch 17: embeddings shape (32, 768)
Batch 18: embeddings shape (32, 768)
Batch 19: embeddings shape (32, 768)
Batch 20: embeddings shape (32, 768)
Batch 21: embeddings shape (32, 768)
Batch 22: embeddings shape (32, 768)
Batch 23: embeddings shape (32, 768)
Batch 24: embeddings shape (32, 768)
Batch 25: embeddings shape (32, 768)
Batch 26: embeddings shape (32, 768)
Batch 27: e

Next, we set up FAISS as a retriever.

In [21]:
hf_retriever = vectorstore.as_retriever()

## Task 5: Simple LCEL RAG Chain

Now we can set up our LCEL RAG chain!

> NOTE: We're not returning context for this example, and only returning the text output from the LLM.

In [22]:
from operator import itemgetter
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

lcel_rag_chain = {"context": itemgetter("query") | hf_retriever, "query": itemgetter("query")}| rag_prompt | hf_llm

In [23]:
print("FAISS Index Dimension:", vectorstore.index.d)

FAISS Index Dimension: 768


In [24]:
lcel_rag_chain.invoke({"query" : "What is the best part of Silicon Valley?"})

"Based on the provided context, the best part of Silicon Valley is its ability to attract and nurture startups, particularly those that are in their early stages. The key stage in the life of a startup is when they're operating out of a small space, such as an apartment, and the defining quality of Silicon Valley is that many successful startups were started there. This suggests that the best part of Silicon Valley is its ecosystem and culture that supports innovation and entrepreneurship, rather than its physical infrastructure or buildings."

In [26]:
!pip install --upgrade chainlit pydantic

Defaulting to user installation because normal site-packages is not writeable


In [27]:
!pip install "pydantic<2"

Defaulting to user installation because normal site-packages is not writeable
Collecting pydantic<2
  Using cached pydantic-1.10.21-cp313-cp313-win_amd64.whl.metadata (155 kB)
Using cached pydantic-1.10.21-cp313-cp313-win_amd64.whl (2.2 MB)
Installing collected packages: pydantic
  Attempting uninstall: pydantic
    Found existing installation: pydantic 2.10.6
    Uninstalling pydantic-2.10.6:
      Successfully uninstalled pydantic-2.10.6
Successfully installed pydantic-1.10.21


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain 0.3.19 requires pydantic<3.0.0,>=2.7.4, but you have pydantic 1.10.21 which is incompatible.
langchain-anthropic 0.3.9 requires pydantic<3.0.0,>=2.7.4, but you have pydantic 1.10.21 which is incompatible.
langchain-cohere 0.4.2 requires pydantic<3,>=2, but you have pydantic 1.10.21 which is incompatible.
langchain-core 0.3.41 requires pydantic<3.0.0,>=2.7.4; python_full_version >= "3.12.4", but you have pydantic 1.10.21 which is incompatible.
langchain-qdrant 0.2.0 requires pydantic<3.0.0,>=2.7.4, but you have pydantic 1.10.21 which is incompatible.
langsmith 0.2.11 requires pydantic<3.0.0,>=2.7.4; python_full_version >= "3.12.4", but you have pydantic 1.10.21 which is incompatible.
pydantic-settings 2.7.1 requires pydantic>=2.7.0, but you have pydantic 1.10.21 which is incompatible.
qdrant-client 1.13.2

In [25]:
!pip uninstall pydantic

^C
