# Langchain

In this tutorial, we shall use `Langchain` as a framework to create a retrieval based QA.  While `Llama index` is very nice, `Langchain` provides more complex workflow.

In [1]:
#comment this if you are using puffer or tokyo
import os
os.environ['http_proxy']  = 'http://192.41.170.23:3128'
os.environ['https_proxy'] = 'http://192.41.170.23:3128'

In [2]:
from watermark import watermark
print(watermark(packages="torch,langchain,transformers"))

torch       : 1.13.0
langchain   : 0.0.171
transformers: 4.21.3



In [3]:
from transformers.utils import logging
logging.set_verbosity(40)

## 1. Define model

First, we shall load the large language model that will be use for our question answering model.  Instead of using`flan-t5-large`, I will use another model called `fastchat` which is even better.

In [4]:
from langchain.llms.base import LLM
from transformers import pipeline
import torch

In [5]:
# from typing import Mapping, Optional, Any, List
# from langchain.callbacks.manager import CallbackManagerForLLMRun

# class CustomLLM(LLM):
    
#     n: int
#     model_name = "google/flan-t5-large"
#     pipeline = pipeline("text2text-generation", model=model_name, device=1, model_kwargs={"torch_dtype":torch.bfloat16})
        
#     @property
#     def _llm_type(self) -> str:
#         return "custom"
    
#     def _call(
#         self,
#         prompt: str,
#         stop: Optional[List[str]] = None,
#         run_manager: Optional[CallbackManagerForLLMRun] = None,
#     ) -> str:
#         if stop is not None:
#             raise ValueError("stop kwargs are not permitted.")
        
#         out = self.pipeline(prompt, max_length=9999)[0]["generated_text"]
#         return out[:self.n]
    
#     @property
#     def _identifying_params(self) -> Mapping[str, Any]:
#         """Get the identifying parameters."""
#         return {"n": self.n}

In [6]:
# llm=CustomLLM(n=500)

In [7]:
from langchain import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(device=1,
                                        model_id= 'lmsys/fastchat-t5-3b-v1.0', 
                                        task= 'text2text-generation',
                                        
                                        model_kwargs={ "max_length": 256, "temperature": 0,
                                                      "torch_dtype":torch.float32,
                                                      "repetition_penalty": 1.3})

Device has 2 GPUs available. Provide device={deviceId} to `from_model_id` to use availableGPUs for execution. deviceId is -1 (default) for CPU and can be a positive integer associated with CUDA device id.


## 2. Define data

Here, I would like to define a `langchain.document_loaders.DirectoryLoader` to read through my folders.  Note that I have defined multiple types of documents such as `txt` and `pdf`.  Of course, I can also read other types such as web.   As you can see, this is not available in `Llama index`

In [8]:
from langchain.document_loaders import DirectoryLoader

txt_loader = DirectoryLoader('txt', glob="**/*.txt")
pdf_loader = DirectoryLoader('pdf', glob="**/*.pdf")

In [9]:
#take all the loader
loaders = [pdf_loader, txt_loader]

#lets create document 
documents = []
for loader in loaders:
    documents.extend(loader.load())

detectron2 is not installed. Cannot use the hi_res partitioning strategy. Falling back to partitioning with another strategy.
Falling back to partitioning with ocr_only.


In [10]:
len(documents)

4

Now that we've loaded our document, we need to split the text up so that we don't run into token limits when we retrieve relevant document information.

To do so, we can use the `CharacterTextSplitter` from `LangChain` and in this case we'll split the documents in chunk sizes of 1000 characters, with no overlap between the chunks.

The text_splitter is used to split the documents and store the resulting chunks in a variable called `texts`:

In [11]:
from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

Created a chunk of size 1050, which is longer than the specified 1000
Created a chunk of size 1002, which is longer than the specified 1000
Created a chunk of size 1064, which is longer than the specified 1000
Created a chunk of size 1099, which is longer than the specified 1000
Created a chunk of size 1032, which is longer than the specified 1000
Created a chunk of size 1065, which is longer than the specified 1000
Created a chunk of size 1215, which is longer than the specified 1000
Created a chunk of size 1026, which is longer than the specified 1000
Created a chunk of size 1018, which is longer than the specified 1000
Created a chunk of size 1295, which is longer than the specified 1000


## 3. Embeddings

Next, we shall define the embeddings to embed our documents.  There are many but here we just use `HuggingFaceEmbeddings`.

Note: there are many good embeddings.....please try for example:  `hkunlp/instructor-base`

In [12]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

hfemb = HuggingFaceEmbeddings()
#note that huggingface embeddings use sentence transformers

## 4. Create vector store

Next, we create the vector store containing embeddings of each document, which would facilitate the search.  Here I use `Chromadb` which is a very efficient vector store.

In [13]:
from langchain.vectorstores import Chroma

persist_directory = 'db'

vectorstore = Chroma.from_documents(documents=texts, embedding=hfemb, persist_directory=persist_directory)

Using embedded DuckDB with persistence: data will be stored in: db


## 5. Query time

Now that we have the LLM and the vectorstores, there are many ways to run it.  In Langchain, it's called `chain`.   Here we shall use a simple one called `RetrievalQA`

In [14]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever())

In [15]:
prompt_trick = " Try to elaborate as much as you can."

In [16]:
query = "Where is AIT?" + prompt_trick
qa.run(query)

'<pad>  AIT  is  located  just  north  of  Bangkok,  Thailand.\n'

In [17]:
query = "What is AIT?" + prompt_trick
qa.run(query)

'<pad>  The  Asian  Institute  of  Technology  (AIT)  is  an  international  English-speaking  postgraduate  institution,  focusing  on  engineering,  environment,  and  management  studies.  It  was  founded  in  1959  and  offers  rigorous  academic,  research,  and  experiential  outreach  programs  to  prepare  graduates  for  professional  success  and  leadership  roles  in  Asia  and  beyond.  AIT  has  a  global  reputation  and  emphasizes  its  global  connections,  injection  of  innovation  into  research  and  teaching,  relevance  to  industry,  and  nurturing  of  entrepreneurship.  It  operates  as  a  multicultural  community  where  a  cosmopolitan  approach  to  living  and  learning  is  the  rule.\n'

In [18]:
query = "What kind of research does AIT do?" + prompt_trick
qa.run(query)

'<pad> Based  on  the  context,  it  seems  that  AIT  does  a  wide  range  of  research.  The  research  document  provided  by  Professor  Sudip  K.  Rakshit  provides  a  sampling  of  the  work  that  AIT  is  currently  engaged  in,  which  includes  various  fields  such  as  engineering,  computer  science,  mathematics,  and  social  sciences.  These  fields  are  mentioned  to  be  areas  where  AIT  has  made  significant  advances  and  are  being  shared  with  the  outside  world  through  this  document.\n'