# RAG approach to answer questions using flan-ul2 from watsonx.ai and Langchain


This notebook contains the steps and code to answer questions using the flan-ul2 Foundation Model from watsonx.ai and Langchain. The data is about the company policies of a fictitious company, and is synthetically generated. There are numerous articles about RAG on the internet and is not covered in the notebook. This notebook is to be executed first to get an handle on the data and the RAG approach, followed by the evaluation and chunking techniques.

## Contents
This notebooks contains the following:
1. Setup of required libraries and modules
2. Data Loading, pay attention to chunk size and overlap
3. Accessing LLM from WML
4. Answering the question using RAG approach

## Install the dependencies

Before starting this step, ensure that the Watson Machine Learning service is created and associated with this project. It might take few minutes to install all the dependencies.

In [1]:
!pip install "ibm-watson-machine-learning>=1.0.320" 
!pip install "pydantic>=1.10.0" 
!pip install langchain 
!pip install huggingface
!pip install huggingface-hub
!pip install sentence-transformers
!pip install chromadb

Collecting pydantic>=1.10.0
  Downloading pydantic-2.4.2-py3-none-any.whl (395 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m395.8/395.8 kB[0m [31m16.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting typing-extensions>=4.6.1
  Downloading typing_extensions-4.8.0-py3-none-any.whl (31 kB)
Collecting annotated-types>=0.4.0
  Downloading annotated_types-0.6.0-py3-none-any.whl (12 kB)
Collecting pydantic-core==2.10.1
  Downloading pydantic_core-2.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m27.0 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hInstalling collected packages: typing-extensions, annotated-types, pydantic-core, pydantic
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.3.0
    Uninstalling typing_extensions-4.3.0:
      Successfully uninstalled typing_extensions-4.3.0
Successfully installed annotated-types-

Collecting jsonpointer>=1.9
  Downloading jsonpointer-2.4-py2.py3-none-any.whl (7.8 kB)
Installing collected packages: typing-inspect, tenacity, sniffio, jsonpointer, exceptiongroup, marshmallow, jsonpatch, anyio, langsmith, dataclasses-json, langchain
  Attempting uninstall: tenacity
    Found existing installation: tenacity 8.0.1
    Uninstalling tenacity-8.0.1:
      Successfully uninstalled tenacity-8.0.1
Successfully installed anyio-3.7.1 dataclasses-json-0.6.1 exceptiongroup-1.1.3 jsonpatch-1.33 jsonpointer-2.4 langchain-0.0.315 langsmith-0.0.43 marshmallow-3.20.1 sniffio-1.3.0 tenacity-8.2.3 typing-inspect-0.9.0
Collecting huggingface
  Downloading huggingface-0.0.1-py3-none-any.whl (2.5 kB)
Installing collected packages: huggingface
Successfully installed huggingface-0.0.1
Collecting huggingface-hub
  Downloading huggingface_hub-0.18.0-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m15.6 MB/s[0m eta [36m0:00:00[

Collecting huggingface-hub>=0.4.0
  Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: sentence-transformers
  Building wheel for sentence-transformers (setup.py) ... [?25ldone
[?25h  Created wheel for sentence-transformers: filename=sentence_transformers-2.2.2-py3-none-any.whl size=125942 sha256=6fa551a0fe3fd629c44833472637f1f1040a3e9a10a34b9793d0fcb56b63ad0a
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/62/f2/10/1e606fd5f02395388f74e7462910fe851042f97238cbbd902f
Successfully built sentence-transformers
Installing collected packages: safetensors, regex, nltk, huggingface-hub, tokenizers, transformers, sentence-transformers
  Attempting uninstall: huggingface-hub
    Found existing installation: huggingface-hub 0.18.0
    Uninstalling huggingface-hub-0.18.0:
      Successfully uninstalled huggingface-hub-0.18

Collecting h11>=0.8
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-dotenv>=0.13
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Collecting httptools>=0.5.0
  Downloading httptools-0.6.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (428 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m428.8/428.8 kB[0m [31m29.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting watchfiles>=0.13
  Downloading watchfiles-0.21.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m33.9 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hCollecting uvloop!=0.15.0,!=0.15.1,>=0.14.0
  Downloading uvloop-0.18.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB)
[2K     [90m━━━━━━

## WatsonX.ai API Connection

Provide the Cloud IAM key to access the foundation models from the WML endpoint

In [1]:
import os, getpass
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": getpass.getpass("Please enter your WML api key (hit enter): ")
}

Please enter your WML api key (hit enter): ········


## Project Id definition

The foundation models need project id for the execution and also for the CUH

In [2]:
try:
    project_id = os.environ["PROJECT_ID"]
    
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

In [5]:
!pip install wget
import wget

filename = 'companyPolicies.txt'
url = 'https://raw.github.com/ravisrirangam/chunking_techniques/main/data/companypolicies.txt'

wget.download(url, out=filename)
print('file downloaded')

file downloaded


## Data Loading 

Langchain is used to split the document and create chunks. The chunk size is mentioned as 1000 in this notebook and in the subsequent notebooks, I'll show how to determine the optimal chunking technique. Though the chunk size is mentioned as 1000, the splitting is happening randomly, it's an issue with Langchain.

In [6]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

loader = TextLoader(filename)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
print(len(texts))

Created a chunk of size 1624, which is longer than the specified 1000
Created a chunk of size 1885, which is longer than the specified 1000
Created a chunk of size 1903, which is longer than the specified 1000
Created a chunk of size 1729, which is longer than the specified 1000
Created a chunk of size 1678, which is longer than the specified 1000
Created a chunk of size 2032, which is longer than the specified 1000
Created a chunk of size 1894, which is longer than the specified 1000


16


## Create an embedding model, using default from Hugging Face

The below code creates a default embedding model from HF and ingests them to Chromadb. Since, the focus is on chunking, not much focus is on embedding models

In [7]:
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)
print('documents ingested')

Downloading (…)a8e1d/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)b20bca8e1d/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)0bca8e1d/config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)e1d/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading (…)a8e1d/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading (…)8e1d/train_script.py:   0%|          | 0.00/13.1k [00:00<?, ?B/s]

Downloading (…)b20bca8e1d/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)bca8e1d/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

documents ingested


## flan-ul2 creation

The below code does the following:
1. Get the model_id
2. Create the parameters for the model
3. Initialize the model
4. Langchain wrapper for the model

In [8]:
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes

model_id = ModelTypes.FLAN_UL2

The decoding method is set to "greedy" to get a deterministic output, you can change it to "sample" and add temperature, top_k and top_p parameters

In [17]:
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 130,
    GenParams.MAX_NEW_TOKENS: 200
}

In [18]:
from ibm_watson_machine_learning.foundation_models import Model

model = Model(
    model_id=model_id,
    params=parameters,
    credentials=credentials,
    project_id=project_id
)

In [19]:
from ibm_watson_machine_learning.foundation_models.extensions.langchain import WatsonxLLM

flan_ul2_llm = WatsonxLLM(model=model)

## Get Answer to a question on one policy using RAG

Retrieving the chunks from chromadb is abstracted in the below code and will be discussed in the next section. PromptTemplate is also not used and the summary generated is also not complete. The code is for illustrative purpose and the parameters can be modified to get the deisred outcome.

In [20]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(llm=flan_ul2_llm, chain_type="stuff", retriever=docsearch.as_retriever())
query = "mobile policy"
qa.run(query)

"Yes, it helps to promote the responsible and secure use of mobile devices in line with legal and ethical standards. Every employee is expected to comprehend and abide by these guidelines. Regular reviews of the policy ensure its ongoing alignment with evolving technology and security best practices. Internet and Email Policy Our Internet and Email Policy is established to guide the responsible and secure use of these essential tools within our organization. We recognize their significance in daily business operations and the importance of adhering to principles that maintain security, productivity, and legal compliance. Acceptable Use: Company-provided internet and email services are primarily meant for job-related tasks. Limited personal use is allowed during non-work hours, provided it doesn't interfere with work responsibilities. Security: Safeguard your login credentials, avoiding the sharing of passwords. Exercise caution with email attachments and links from unknown sources. Pro

## Query on ChromaDB

If a native query was run on ChromaDB with "mobile policy", it can be noticed that 4 chunks have been retrieved. The first two chunks are matching the query and are the correct chunks. The next two chunks are not relevant to the query but had some semantic match and were returned in the query result. The LLM picked up the correct chunks and generated the summary of the content.

In [21]:
query = "mobile policy"
docs = docsearch.similarity_search(query)
print(len(docs))
for i in range(len(docs)):
    print(docs[i].page_content)
    print('\n\n')


4
4.	Mobile Phone Policy



The Mobile Phone Policy sets forth the standards and expectations governing the appropriate and responsible usage of mobile devices in the organization. The purpose of this policy is to ensure that employees utilize mobile phones in a manner consistent with company values and legal compliance.
Acceptable Use: Mobile devices are primarily intended for work-related tasks. Limited personal usage is allowed, provided it does not disrupt work obligations.
Security: Safeguard your mobile device and access credentials. Exercise caution when downloading apps or clicking links from unfamiliar sources. Promptly report security concerns or suspicious activities related to your mobile device.
Confidentiality: Avoid transmitting sensitive company information via unsecured messaging apps or emails. Be discreet when discussing company matters in public spaces.
Cost Management: Keep personal phone usage separate from company accounts and reimburse the company for any person