<img src="images/roche-logo-blue.png" alt="Roche logo" style="float: right;" width="150"/>
<p>&nbsp;</p>
<p>&nbsp;</p>

# Building a Q&A engine with LangChain and open-source LLMs
## Marek Grzenkowicz

#### September 2023

## LLMs did not appear out of nowhere

- Natural language processing (NLP)
- Neural networks
- Deep learning
- Transformer architecture
- Word and sentence embeddings
- Evaluation benchmarks and leaderboards
- 🚧TODO: add more examples

## Tools

- [LangChain](https://python.langchain.com/docs/get_started/introduction.html) - framework for developing applications powered by language language models
- [Hugging Face Hub](https://huggingface.co/models) - repository of pre-trained language models
  - [Transformers](https://huggingface.co/docs/transformers/index) - downloading the models
- [lmsys/fastchat-t5-3b-v1.0](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0) - compact, open-source LLM from [lmsys.org](https://lmsys.org) 
- [ChromaDB](https://www.trychroma.com) - embedding database
- Jupyter Notebook with [RISE](https://github.com/damianavila/RISE) - IDE with slideshow feature

## Assumptions

- Try solving an actual business problem
- See if one can implement an LLM application without deep NLP knowledge
- Use an open source model
- Run the model on a laptop, instead making API call to a hosted model

## Why the `lmsys/fastchat-t5-3b-v1.0` model?

- 4GB GPU - 1B parameters at best
- 32GB RAM - 3B parameters

- [CobraMamba/mamba-gpt-3b-v3](https://huggingface.co/CobraMamba/mamba-gpt-3b-v3) claims to surpass some 12B models ([Open LLM leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard))
  - but it is slow ([Open LLM performance leaderboard](https://huggingface.co/spaces/optimum/llm-perf-leaderboard))


- [lmsys/fastchat-t5-3b-v1.0](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0)
  - _A commercial-friendly, compact, yet powerful chat assistant_
  - Praised in [Running a Hugging Face Large Language Model (LLM) locally on my laptop](https://www.markhneedham.com/blog/2023/06/23/hugging-face-run-llm-model-locally-laptop)
  - The first model to actually generate any response on my laptop

## Some duct tape first

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
# https://github.com/chroma-core/chroma/blob/main/chromadb/__init__.py#L57
import sys
__import__("pysqlite3")
sys.modules["sqlite3"] = sys.modules.pop("pysqlite3")

In [3]:
from langchain.llms import HuggingFacePipeline
from langchain import PromptTemplate, LLMChain

In [4]:
model_id, task = "lmsys/fastchat-t5-3b-v1.0", "text2text-generation"

# the model will be downloaded on first use, if not cached in ~/.cache/huggingface/hub/

model = HuggingFacePipeline.from_model_id(
    model_id=model_id,
    task=task,
    model_kwargs={
        "temperature": 0,
        "max_length": 1000
    },
)

/home/users/grzenkom/.pyenv/versions/3.10.12/envs/lang-chain-qa-env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32


You are using the legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565


In [5]:
template_text = """
{question}
"""
template = PromptTemplate(template=template_text, input_variables=["question"])
llm_chain = LLMChain(prompt=template, llm=model)

In [6]:
llm_chain("Who is Sheryl Crow?")["text"]

'<pad> Sheryl  Crow  is  an  American  singer,  songwriter,  and  actress.  She  is  best  known  for  her  role  as  the  lead  singer  and  lead  guitarist  of  the  rock  band  The  Band wagon,  and  for  her  role  as  the  lead  singer  and  lead  guitarist  of  the  alternative  rock  band  The  Mamas  and  the  Papas.  Crow  has  also  been  a  member  of  the  band  The  Mamas  and  the  Papas  since  its  formation  in  1995.\n'

In [7]:
template_text = """
Provide brief answers, use 10 words or less.
{question}
"""
template = PromptTemplate(template=template_text, input_variables=["question"])
llm_chain = LLMChain(prompt=template, llm=model)

In [8]:
llm_chain("Who is Sheryl Crow?")["text"]

'<pad> Singer-songwriter'

In [9]:
llm_chain("Who is Poland located?")["text"]

'<pad> Europe'

In [10]:
llm_chain("What is Bialowieza Forest?")["text"]

'<pad> Bialowieza Forest is a protected forest in Poland.'

In [11]:
llm_chain("What does the name 'Białowieża' mean in English?")["text"]

'<pad> "Bird of the Woods"'

In [12]:
llm_chain("What's the length of the Tsar's Trail and where does it begin?")["text"]

"<pad> The Tsar's Trail is a 900 mile long trail that begins in Moscow and ends in St. Petersburg."

In [13]:
# https://python.langchain.com/docs/integrations/document_loaders
from langchain.document_loaders import WikipediaLoader

loader = WikipediaLoader("Białowieża_Forest")
wiki_page = loader.load()

In [14]:
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://bpn.com.pl/index.php?option=com_content&task=view&id=651&Itemid=297&lang=en")
bpn_page = loader.load()

In [15]:
# https://python.langchain.com/docs/use_cases/question_answering/#step-1-load

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
all_splits = text_splitter.split_documents(wiki_page + bpn_page)

In [16]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma

# https://integrations.langchain.com/embeddings
hf_embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2",
    model_kwargs={
        'device': 'cpu'
    },
    encode_kwargs={
        'normalize_embeddings': False
    }
)

vectorstore = Chroma.from_documents(documents=all_splits, embedding=hf_embeddings)

In [17]:
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline

qa_chain = RetrievalQA.from_chain_type(model, retriever=vectorstore.as_retriever())

In [18]:
qa_chain({
    "query": "Provide brief answers, use 10 words or less. What does the name 'Białowieża' mean in English?"
})["result"]

"<pad> The  name  'Biaowiea'  means  'White  Tower'  in  English.\n"

In [19]:
qa_chain({
    "query": "Provide brief answers, use 10 words or less. What's the length of the Tsar's Trail and where does it begin?"
})["result"]

'<pad> 4  km  long,  starts  at  Przed  Kosym  Mostem  depot.\n'

In [20]:
qa_chain({
    "query": "Are there any walking trails in the Białowieża Forest?"
})["result"]

'<pad> No,  there  are  no  walking  trails  in  the  Biaowiea  Forest.\n'