<a href="https://colab.research.google.com/github/AdamClarkStandke/LangChainTextInteraction/blob/main/InteractivePDF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install langchain
!pip install -qU "langchain-chroma>=0.1.2"
!pip install langchain-huggingface
!pip install langchain-community
!pip install datasets

In [2]:
from langchain_huggingface import HuggingFaceEndpoint
from langchain_chroma import Chroma
import getpass
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_huggingface import HuggingFaceEmbeddings
from datasets import load_dataset

In [3]:
# Load text dataset to play with!!! :)
dataset = load_dataset(
  "SammyTime/plaything",
  revision="main"  # tag name, or branch name, or commit hash
)
print(dataset['train'][:]['text'][:5])

Downloading readme:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/34.5k [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

['Glyndŵr University Research Online  ', 'Conference Paper  ', 'Review of unmanned aircraft system technologies to enable ', 'beyond visual line of sight (BVLOS) operations  ', 'Davies, L., Bolam, R., Vagapov, Y. and Anuchin, A.  ']


In [4]:
inference_api_key = getpass.getpass("Enter your HF Inference API Key:\n\n")

Enter your HF Inference API Key:

··········


## HuggingFaceEmbeddings and HugginFaceEndpoint

###HugginFaceEmbeddings

As detailed at [SentenceTransformers](https://sbert.net/index.html) creates a sentance embedding space by using Sentence Transformers (a.k.a. SBERT). Sentence Transformers contains over 5,000 pre-trained Sentence Transformers models.

The HuggingFace Embeddings takes in the folloing arguments/paramters:

1.   model_name_or_path: If it is a filepath on disc, it loads the model from that path. If it is not a path, it first tries to download a pre-trained SentenceTransformer model. If that fails, tries to construct a model from the Hugging Face Hub with that name.
2.   device: Device (like “cuda”, “cpu”, “mps”, “npu”) that should be used for computation.
3.   similarity_fn_name:The name of the similarity function to use. Valid options are “cosine”, “dot”, “euclidean”, and “manhattan”. If not set, it is automatically set to “cosine” if similarity or similarity_pairwise are called while model.similarity_fn_name is still None.
4.   model_kwargs: Additional model configuration parameters to be passed to the Huggingface Transformers model.



---


###HuggingFaceEndpoint

As detailed at [LangChain](https://python.langchain.com/v0.2/docs/integrations/llms/huggingface_endpoint/) HuggingFace Endpoints interacts with the [Hugging Face Hub](https://huggingface.co/docs/hub/index) a platform with over 120k models and 20K datasets (also access to spaces which hosts ml applications). HuggingFace endpoint interacts with the [Severless Endpoint API](https://huggingface.co/docs/api-inference/en/index) to get access to over 150,000 publicly accessible machine learning models, or your own private models, via simple HTTP requests, with fast inference hosted on Hugging Face shared infrastructure (there is also a [dedicated Endpoint](https://huggingface.co/docs/inference-endpoints/en/index) for enterprise workloads) 🍫

After taking the course [Developing LLM Applications with LangChain](https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.datacamp.com%2Fcourses%2Fdeveloping-llm-applications-with-langchain&psig=AOvVaw3HVd5sI5Y0BO077PdAbWMI&ust=1724691156100000&source=images&cd=vfe&opi=89978449&ved=0CAYQrpoMahcKEwjo4unrzZCIAxUAAAAAHQAAAAAQBA) as taught by Jonathan Bennion and James Chapman I decided to try [Retrieval-Augmented Generation (aka RAG)](https://arxiv.org/pdf/2005.11401) using LangChain. As detailed by the authors of the RAG method:

> Large pre-trained language models have been shown to store factual knowledge
in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks...[w]e explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) — models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever.

The HuggingFace Endpoint takes in the folloing arguments/parmaters:


1.   model: This is the HuggingFace repository to download the pre-trained LLM model to use. The default model is [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct).
2.   max_new_tokens: the maximum number of tokens to generate. In other words, the size of the output sequence, not including the tokens in the prompt. The default is 512.
3.   top_k: The number of highest probability vocabulary tokens to keep for top-k-filtering. Is an int and the default value is None. See [top-K sampeling](https://arxiv.org/pdf/1904.09751) (has no max)
4.   top_p: If set to < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. Is a float and the default value is 0.95. See [nucleus sampling](https://arxiv.org/pdf/1904.09751) (range 0 to 1)
5.   typical_p: Typical Decoding mass. See [Typical Decoding for Natural Language Generation](https://arxiv.org/abs/2202.00666) for more information. Is a float and the default value is  0.95. (range 0 to 1)
6.   temperature: The value used to module the logits distribution. Is a float and the default value is 0.8. (must be greater than 0, has no max)
7.   repetition_penalty: The parameter for repetition penalty. 1.0 means no penalty. See [theta in Penalized Sampeling](https://arxiv.org/pdf/1909.05858.pdf) for more details. Is a float and the default value is 1.2. (has no max)
8.   return_full_text: Whether to prepend the prompt to the generated text
9.   seed: Random sampling seed







In [26]:
# HuggingFaceEmbeddings Parameters
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cuda'}
encode_kwargs = {'normalize_embeddings': True}
# Embedding space instantiation
embeddings = HuggingFaceEmbeddings(
  model_name=model_name,
  model_kwargs=model_kwargs,
  encode_kwargs=encode_kwargs
)
# HuggingFaceEndpoint LLM Parameters
#model = 'tiiuae/falcon-7b-instruct'
model = "mistralai/Mistral-7B-Instruct-v0.2"
max_new_tokens= 512
top_k= None
top_p=0.95
typical_p = 0.95
temperature = 0.1
repetition_penalty = 1.2
return_full_text=True
seed= 42
# LLM model instantiation
llm = HuggingFaceEndpoint(repo_id=model, huggingfacehub_api_token=inference_api_key,
                          max_new_tokens = max_new_tokens,
                          top_k = top_k,
                          top_p = top_p,
                          typical_p = typical_p,
                          temperature = temperature,
                          repetition_penalty = repetition_penalty,
                          return_full_text = return_full_text,
                          seed= seed
                          )

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [11]:
# Adding documents to Chroma Vector Database, using Maximal Marginal Relevance (MMR) as search algo
vectorstore = Chroma.from_texts(dataset['train'][:]['text'], embedding=embeddings)
retriever = vectorstore.as_retriever(search_type='mmr', search_kwargs={'k': 50, 'fetch_k': 200, 'lambda_mult': 0.50})

In [8]:
# Creating Chat Prompt Template for Retrevial Augmented Generation (RAG) for RAG-Langchain Pipeline
message = """
Answer the following question using the context provided:

Context:
{context}

Question:
{question}

Answer:
"""
prompt = ChatPromptTemplate.from_messages([('human', message)])

In [27]:
Question = "Define VLOS EVLOS BVLOS "  #@param { type: "string" }
# Creating RAG-Langchain to link the retriever, prompt, and llm
rag_chain = ({'context': retriever, "question": RunnablePassthrough()} | prompt | llm)
# Invoking RAG-LangChain by passing in a question regarding the document
response = rag_chain.invoke(Question)
print(response)

Human: 
Answer the following question using the context provided:

Context:
[Document(page_content='Fig. 1. VLOS, EVLOS and BVLOS illustrated [2].  '), Document(page_content='(BVLOS) Operations  '), Document(page_content='uavcoach.com/inside -bvlos  '), Document(page_content='It is apparent that BVLOS capability is becoming an '), Document(page_content='surroundings. The following text reviews the BVLOS '), Document(page_content='quite a few scenarios where BVLOS could be executed '), Document(page_content='a possible way to achieve safe BVLOS applications, was to '), Document(page_content='BVLOS innovators have been posed the task of exploring the '), Document(page_content='range as there is more to BVLOS applications than merely '), Document(page_content='that the advent of BVLOS operations will herald a new '), Document(page_content='The regulations surrounding BVLOS are currently '), Document(page_content='that BVLOS could be one step closer.  '), Document(page_content='surveillanc