The purpose of this notebook is to explore the use of open source models such as "facebook/opt-125m" and "neuralmagic/Llama-2-7b-chat-quantized.w8a8", these models are relatively small in size and can be used from my g4dn.2xlarge instance.
In addition I compare the output from the open source model to openai.
The opensource model is loaded with vllm serving

In [1]:
!pip install transformers torch -q
!pip install langchain -q
!pip install -U langchain-community -q
!pip install python-dotenv openai -q
!pip3 install pysqlite3-binary -q
!pip install -U sentence-transformers -q

In [2]:
import boto3
import os
def download_file_from_s3(bucket_name, s3_file_key):
    # download files to local environment
    # Create an S3 client
    s3 = boto3.client('s3')
    local_file_path = s3_file_key.split('/')[-1]
    # Download the file from S3
    s3.download_file(bucket_name, s3_file_key, local_file_path)
    print(f"File {s3_file_key} downloaded from {bucket_name} to {local_file_path}")

def delete_file(file_path):
    os.remove(file_path)

In [3]:
import re
import string

_RE_COMBINE_WHITESPACE = re.compile(r"[ ]+", re.ASCII)
_RE_SHORT_LINES = re.compile("^.{1,3}\n", re.MULTILINE)
_RE_MULTILINE_BREAKS = re.compile("\n+", re.MULTILINE)
_RE_PAGE_CHAR = "\x0c"
_RE_LATIN_WHITESPACE_CHAR = re.compile("\xa0", re.ASCII)


# @markdown  - **clean_text** - clean text spaces,non-printable and line breaks
def clean_text(text):
    """Clean text from several white-space and line-breaks"""
    # remove several line breaks
    text = _RE_LATIN_WHITESPACE_CHAR.sub(" ", text)
    # remove several white spaces
    text = _RE_COMBINE_WHITESPACE.sub(" ", text).strip()
    # remove very short lines
    text = _RE_SHORT_LINES.sub("\n", text)
    # remove several line breaks
    text = _RE_MULTILINE_BREAKS.sub("\n", text)
    # remove unknown characters or non printable
    text = "".join([x for x in text if x in string.printable])

    return text.strip()

In [4]:
from langchain.embeddings.base import Embeddings
from sentence_transformers import SentenceTransformer
from typing import List

class SentenceTransformerEmbeddings(Embeddings):
    def __init__(self, model_name: str):
        self.model = SentenceTransformer(model_name)

    def embed_documents(self, documents: List[str]) -> List[List[float]]:
        return [self.model.encode(d).tolist() for d in documents]

    def embed_query(self, query: str) -> List[float]:
        return self.model.encode([query])[0].tolist()

  from tqdm.autonotebook import tqdm, trange


## Load the dataset

In [5]:
#for the test data I will use some (parsed) files from here s3://contract-intelligence-data/client-data/AAA/NY State Insurance/06-FRM-AR1/ 
# these are files of good quality

download_file_from_s3("contract-intelligence-data", "client-data/AAA/NY State Insurance/06-FRM-AR1/FRM-AR117-21-1230-2624_2024_163320/FRM-AR117-21-1230-2624_2024_163320.json")
download_file_from_s3("contract-intelligence-data", "client-data/AAA/NY State Insurance/06-FRM-AR1/FRM-AR117-21-1230-2638_2024_162334/FRM-AR117-21-1230-2638_2024_162334.json")
download_file_from_s3("contract-intelligence-data", "client-data/AAA/NY State Insurance/06-FRM-AR1/FRM-AR117-22-1252-6330_2024_16400/FRM-AR117-22-1252-6330_2024_16400.json")

download_file_from_s3("contract-intelligence-data","client-data/AAA/NY State Insurance/04-RPT-INIT/17-22-1250-8464/17-22-1250-8464.json")
download_file_from_s3("contract-intelligence-data","client-data/dragados/ol-elevated-guideway-and-stations-dmca-redacted-version.json")

File client-data/AAA/NY State Insurance/06-FRM-AR1/FRM-AR117-21-1230-2624_2024_163320/FRM-AR117-21-1230-2624_2024_163320.json downloaded from contract-intelligence-data to FRM-AR117-21-1230-2624_2024_163320.json
File client-data/AAA/NY State Insurance/06-FRM-AR1/FRM-AR117-21-1230-2638_2024_162334/FRM-AR117-21-1230-2638_2024_162334.json downloaded from contract-intelligence-data to FRM-AR117-21-1230-2638_2024_162334.json
File client-data/AAA/NY State Insurance/06-FRM-AR1/FRM-AR117-22-1252-6330_2024_16400/FRM-AR117-22-1252-6330_2024_16400.json downloaded from contract-intelligence-data to FRM-AR117-22-1252-6330_2024_16400.json
File client-data/AAA/NY State Insurance/04-RPT-INIT/17-22-1250-8464/17-22-1250-8464.json downloaded from contract-intelligence-data to 17-22-1250-8464.json
File client-data/dragados/ol-elevated-guideway-and-stations-dmca-redacted-version.json downloaded from contract-intelligence-data to ol-elevated-guideway-and-stations-dmca-redacted-version.json


In [6]:
import json
import glob
from tqdm import tqdm

def read_files(docs_dir: str):
    files = glob.glob(os.path.join(docs_dir,"*.json"), recursive=True)
    print(f"Total number of docs: {len(files)}")
    return files

def compose_dataset(docs_dir: str):
    files = read_files(docs_dir)
    print(files)
    # Read & Load the Dataset
    dataset = []
    for file in tqdm(files):
        # data in json format after ocr
        with open(file) as f:
            pdoc = json.load(f)
        dataset.append(pdoc)

    return dataset

In [7]:
dataset = compose_dataset(".")  

Total number of docs: 5
['./ol-elevated-guideway-and-stations-dmca-redacted-version.json', './FRM-AR117-21-1230-2624_2024_163320.json', './17-22-1250-8464.json', './FRM-AR117-21-1230-2638_2024_162334.json', './FRM-AR117-22-1252-6330_2024_16400.json']


100%|██████████| 5/5 [00:00<00:00, 284.27it/s]


In [8]:
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_community.embeddings import CohereEmbeddings, OpenAIEmbeddings
from langchain.chains.retrieval_qa.base import RetrievalQA
from langchain.schema import Document
from langchain.chat_models import ChatOpenAI

import pandas as pd

In [9]:
# The rag part, based on the one in LINT api

DEFAULT_CHUNK_SIZE = 3500  #1400 (had to reduce to fit into the facebook/opt-125m model)
DEFAULT_CHUNK_OVERLAP = 500
EMBEDDING_MODEL = "text-embedding-ada-002"#I will still use openai for embeddings
# next step can also try and replace the embeddings for opensource ones
LLM_MODEL_OPENAI = "gpt-3.5-turbo"
vector_db_path = './chroma_db'

SENTENCE_TRANSFORMER_MODEL = "multi-qa-mpnet-base-cos-v1"

In [10]:
from dotenv import load_dotenv, find_dotenv, dotenv_values
import openai
path_to_keys = 'keys.env'
temp = dotenv_values(path_to_keys)
openai_api_key = temp["OPENAI_API_KEY"]

### lets put the data to chroma db

In [11]:
__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [12]:
# !pip install hnswlib==0.7.0 -q
# !pip install chroma-hnswlib==0.7.3 -q
# !pip uninstall hnswlib chroma-hnswlib -y

In [13]:
%pip install chromadb==0.5 tiktoken -q

Note: you may need to restart the kernel to use updated packages.


In [14]:
from langchain.vectorstores import Chroma

def put_in_Chroma(doc_pages, doc_name, embedding_type="openai"):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=DEFAULT_CHUNK_SIZE, chunk_overlap=DEFAULT_CHUNK_OVERLAP)
    doc = [
                Document(page_content=clean_text(page), metadata={"page": i, "doc_name": doc_name})
                for i, page in enumerate(doc_pages)
            ]
    chunks = text_splitter.split_documents(doc)
    print('chunks: ', len(chunks))
    # Retrieve embedding function from code env resources
    
    if embedding_type == "openai":
        print("Using OpenAI embeddings")
        embeddings = OpenAIEmbeddings(model=EMBEDDING_MODEL, openai_api_key=openai_api_key)
    else:
        print("Using Sentence Transformer embeddings")
        embeddings = SentenceTransformerEmbeddings(SENTENCE_TRANSFORMER_MODEL)

    # Index the vector database by embedding then inserting document chunks
    db = Chroma.from_documents(chunks,
                            embedding=embeddings,
                            ids=[str(i) for i in range(len(chunks))],
                            persist_directory=vector_db_path)

    # Save vector database as persistent files in the output folder
    return db

In [15]:
# file_name = 'FRM-AR117-21-1230-2624_2024_163320'
# file_name = 'FRM-AR117-21-1230-2638_2024_162334'
file_name = 'FRM-AR117-22-1252-6330_2024_16400'
file_name = '17-22-1250-8464'
file_name = 'ol-elevated-guideway-and-stations-dmca-redacted-version'

In [16]:
%%time
for i in dataset:
    if i['name'] == file_name:
        doc_pages = i['text']
        break
print('pages: ', len(doc_pages))
db = put_in_Chroma(doc_pages, doc_name=file_name, embedding_type='transformer')

pages:  1344


chunks:  1448
Using Sentence Transformer embeddings




CPU times: user 1min 21s, sys: 3.51 s, total: 1min 24s
Wall time: 58.7 s


In [17]:
def get_gpt_llm():
    chat_params = {
        "model": "gpt-3.5-turbo", # Bigger context window
        "openai_api_key": openai_api_key,
        "temperature": 0.000001, 
    }
    llm = ChatOpenAI(**chat_params)
    return llm

def qa_retriever_openai(query, vector_db_path, file_id, k=4, embeddings_type="openai"):
    if embeddings_type == "openai":
        print("Using OpenAI Embeddings")
        embeddings = OpenAIEmbeddings(model=EMBEDDING_MODEL, openai_api_key=openai_api_key)
    else:
        print("Using Sentence Transformer Embeddings")
        embeddings = SentenceTransformerEmbeddings(SENTENCE_TRANSFORMER_MODEL)
    vectordb = Chroma(persist_directory=vector_db_path, embedding_function=embeddings)

    retriever = vectordb.as_retriever(search_kwargs={"k": k, "filter": {"doc_name": file_id}})

    qa = RetrievalQA.from_chain_type(llm=get_gpt_llm(), chain_type="stuff", 
                                    retriever=retriever, return_source_documents=True)
    res = qa({"query": query, "k": k})
    return res, retriever

In [18]:
question = "Who are the parties?"
question = "When is this agreement entered into?"

# question = "What type of form is that?"

answer, retriever = \
    qa_retriever_openai(question, vector_db_path="/home/ubuntu/yulia/vllm-exploratory/llm/xplore/chroma_db", \
    file_id=file_name, k=4, embeddings_type="transformer")

Using Sentence Transformer Embeddings


  vectordb = Chroma(persist_directory=vector_db_path, embedding_function=embeddings)
  llm = ChatOpenAI(**chat_params)
  res = qa({"query": query, "k": k})


In [19]:
answer

{'query': 'When is this agreement entered into?',
 'k': 4,
 'result': 'The agreement is entered into on the DMCA Effective Date.',
 'source_documents': [Document(metadata={'doc_name': 'ol-elevated-guideway-and-stations-dmca-redacted-version', 'page': 157}, page_content='In the event this Agreement is terminated under this Section 33.2, upon such termination, Project\nCo shall be released from liability that may arise in relation to the use by Contracting Authority\nor by a third party from and after the date of termination of this Agreement of the design\nConfidential Page 158\nKings Printer for Ontario Copyright 2024 This document must not be copied or reproduced in any manner without the\nwritten permission of the Ontario Infrastructure and Lands Corporation.'),
  Document(metadata={'doc_name': 'ol-elevated-guideway-and-stations-dmca-redacted-version', 'page': 308}, page_content='the Effective Date, subject to and in accordance with the provisions of this Assignment of\nSubcontract.\

In [20]:
print('Openai answer: ', answer['result'])

Openai answer:  The agreement is entered into on the DMCA Effective Date.


ran in terminal: `vllm serve neuralmagic/Llama-2-7b-chat-quantized.w8a8 --chat-template templates/template_chatml.jinja`

In [27]:
inference_server_url = "http://localhost:8000/v1"

# MODEL = "facebook/opt-125m"
# MODEL = "neuralmagic/Llama-2-7b-chat-quantized.w8a8"
MODEL = "neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8"
    
llm = ChatOpenAI(
    model=MODEL,
    openai_api_key="EMPTY",
    openai_api_base=inference_server_url,
    max_tokens=100,
    temperature=0,
)

In [28]:
def qa_retriever_llama(query, vector_db_path, file_id, k=4):
    embeddings = OpenAIEmbeddings(model=EMBEDDING_MODEL, openai_api_key=openai_api_key)
    vectordb = Chroma(persist_directory=vector_db_path, embedding_function=embeddings)

    retriever = vectordb.as_retriever(search_kwargs={"k": k, "filter": {"doc_name": file_id}})

    qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", 
                                    retriever=retriever, return_source_documents=True)
    res = qa({"query": query})
    return res, retriever

In [36]:
%%time
question = "Who are the parties?"
question = "When is this agreement entered into?"

answer_llama, retriever = qa_retriever_llama(question, vector_db_path="/home/ubuntu/yulia/vllm-exploratory/llm/xplore/chroma_db", file_id=file_name, k=4)

CPU times: user 72.6 ms, sys: 11.5 ms, total: 84.1 ms
Wall time: 2.65 s


In [37]:
print(answer_llama)
print(answer_llama['result'])

{'query': 'When is this agreement entered into?', 'result': "I don't know the date when this agreement was entered into. The text provided does not specify the date of the agreement.", 'source_documents': [Document(metadata={'doc_name': 'ol-elevated-guideway-and-stations-dmca-redacted-version', 'page': 196}, page_content='Ontario Line Elevated Guideway and Stations Development and Master Construction Agreement\nRedacted Version\n49.11\n49.12\n49.13\n49.14\n49.15\n49.16\nCumulative Remedies\nExcept as otherwise set forth in this Agreement or any Target Price Agreement then in effect, the\nrights, powers and remedies of each Party set forth in this Agreement and any Target Price\nAgreement then in effect are cumulative and are in addition to and without prejudice to any other\nright, power or remedy that may be available to such Party under this Agreement or any Target\nPrice Agreement then in effect.\nFurther Assurance\nEach Party shall do all things, from time to time, and execute all 

In [61]:
text = '\n--\n'.join([i.page_content for i in answer['source_documents']])
print(text)

10 04 21
O Constant Intermittent headaches
Dffontal area.
CiNervousness 1] Anxiety Fears Ci Irritability 0 Tense
kw PREVIOUS MEDIC, GICAL HISTORY
Negative ODMOHTNOCAD Q0AC Other
Patient states he OI she D ha id not have a MVA in the past.
Patient Cdenies [7 admits iohave an indwelling metal in body. +
Pregnant: C] Yes NoZIN/A = LMP:_/_/ .
. : HAMILY sD SOCIAL MISTORY :
ONoncontributory Smoking: Hes (No Aleohel: Ges No
other: :
, MIEDICATION:
EON Medication OTylenol uuprofen
i Othe: - - .
ALLERGIES:
! fo Allergies DAllergic to.
TLYSICAL EXAMINATION REVEALID
BP__/__ Pulse BPM GtReguler 0 Irregular RR,
PAfebrile O Febrile
. SICAL UXAM
. G P|
C.walf-developed and well nourished2O year old M1 OE.AAOX, normocephelie, no
neck masses, no bruits, ear canals are clear, no oropharyngeal erythema, Heart with
normal $2 regular sounds. Lungs are clear to auscultation bilaterally, J Abdomen with
good bowel sounds without guarding, rigidity or rebound, no clubbing or cyanosis
noticed on the extremitie

In [62]:
from langchain_core.prompts.prompt import PromptTemplate

In [50]:
llm = ChatOpenAI(
    model=MODEL,
    openai_api_key="EMPTY",
    openai_api_base=inference_server_url,
    max_tokens=200,
    temperature=0,
)

In [70]:
prompt = """You are an AI assistant, use the following text to provide answer if you don't know, say you don't know
        Context: {context}
        Question: {question}
        Be concise and short in your response.
"""

# context = text
question = "Who are the parties?"
# question = "Where did the accident occur?"
# question = "What is the date of the accident?"
# question = "Was the denial of claim based on late notice to the carrier?"
# question = "Who is the insurer?"
# question = "What type of form is that?"

# file_name = 'FRM-AR117-22-1252-6330_2024_16400'

vector_db_path = "/home/ubuntu/yulia/vllm-exploratory/llm/xplore/chroma_db"
my_prompt = PromptTemplate(template=prompt, input_variables=["context", "question"])
embeddings = OpenAIEmbeddings(model=EMBEDDING_MODEL, openai_api_key=openai_api_key)
vectordb = Chroma(persist_directory=vector_db_path, embedding_function=embeddings)
retriever = vectordb.as_retriever(search_kwargs={"k": 4, "filter": {"doc_name": file_name}})

In [71]:
qa = RetrievalQA.from_chain_type(llm=llm, 
                                chain_type="stuff", 
                                retriever=retriever, 
                                return_source_documents=True,
                                chain_type_kwargs={"prompt": my_prompt})

In [72]:
%%time
llama_answer = qa.invoke(question)

CPU times: user 21 ms, sys: 525 μs, total: 21.6 ms
Wall time: 1.43 s


In [73]:
print(llama_answer)
print(llama_answer['result'])

{'query': 'Who are the parties?', 'result': "tool.call('extract_party')", 'source_documents': [Document(metadata={'doc_name': '17-22-1250-8464', 'page': 5}, page_content='10 04 21\nKNEES pais BoM PATIENTS ROM RIGHT LEFT\nFlexion 130 OFrom IUROM, . -\nExtension 18 _ OFrom DNROM, . 2\nOcCrepitus is appreciated Cright Oleft\nC1Bulge/ Balloting sign of patella Dis Llis not present on Ciight Cleft\nCiAnterior draws sign is Lipositive Dnegative on Lhright Deft\nOLachmen test is positive Dhegative on Diright Met\n Varus stress detecting mel is Lipositive Dnegative\nOValgus stress test detecting LCL is Upositive Dnegative\nANKLES NO} ROM PATIENTSROM RIGHT LEFT\nDorsi Flexion 20 OFrom (HROM__\nPlanter Extension 45. OFrom ROM ____\nInversion 30 DFrom (NROM_____\nEversion 20 UFromCWROM____\nCicrepitus is appreciated ORight Let\nUswelling/Edma is appreciated ORight [Left OMedial [Lateral Malleoli\nCRANINAL NERVES EXAMINATION\nDOlfaction is intact, Pupils are equal and reactive to light and accommo

In [67]:
my_prompt = PromptTemplate(template=prompt, input_variables=["context", "question"])
embeddings = OpenAIEmbeddings(model=EMBEDDING_MODEL, openai_api_key=openai_api_key)
vectordb = Chroma(persist_directory=vector_db_path, embedding_function=embeddings)
retriever = vectordb.as_retriever(search_kwargs={"k": 4, "filter": {"doc_name": file_name}})
qa = RetrievalQA.from_chain_type(llm=get_gpt_llm(), 
                                chain_type="stuff", 
                                retriever=retriever, 
                                return_source_documents=True,
                                chain_type_kwargs={"prompt": my_prompt})

In [68]:
%%time
openai_answer = qa.invoke(question)

CPU times: user 24.2 ms, sys: 440 μs, total: 24.6 ms
Wall time: 1.14 s


In [69]:
print(openai_answer)
print(openai_answer['result'])

{'query': 'Who are the parties?', 'result': 'The parties involved are the patient, the healthcare provider, and possibly emergency medical personnel.', 'source_documents': [Document(metadata={'doc_name': '17-22-1250-8464', 'page': 5}, page_content='10 04 21\nKNEES pais BoM PATIENTS ROM RIGHT LEFT\nFlexion 130 OFrom IUROM, . -\nExtension 18 _ OFrom DNROM, . 2\nOcCrepitus is appreciated Cright Oleft\nC1Bulge/ Balloting sign of patella Dis Llis not present on Ciight Cleft\nCiAnterior draws sign is Lipositive Dnegative on Lhright Deft\nOLachmen test is positive Dhegative on Diright Met\n Varus stress detecting mel is Lipositive Dnegative\nOValgus stress test detecting LCL is Upositive Dnegative\nANKLES NO} ROM PATIENTSROM RIGHT LEFT\nDorsi Flexion 20 OFrom (HROM__\nPlanter Extension 45. OFrom ROM ____\nInversion 30 DFrom (NROM_____\nEversion 20 UFromCWROM____\nCicrepitus is appreciated ORight Let\nUswelling/Edma is appreciated ORight [Left OMedial [Lateral Malleoli\nCRANINAL NERVES EXAMINA

In [None]:
db._collection.get(include=["metadatas","documents"])

In [133]:
import torch
torch.cuda.empty_cache()

--------

In [None]:
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

model_id = "neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8"
number_gpus = 1
max_model_len = 8192

sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)

tokenizer = AutoTokenizer.from_pretrained(model_id)

llm = LLM(model=model_id, tensor_parallel_size=number_gpus, max_model_len=max_model_len, dtype=torch.float16)

In [49]:
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "write a poem about waterlilies"},
]

prompts = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
outputs = llm.generate(prompts, sampling_params)

generated_text = outputs[0].outputs[0].text
print(generated_text)

Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]

Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.80s/it, est. speed input: 4.75 toks/s, output: 20.85 toks/s]

Arrr, listen close to me rhyme,

O' the waterlilies, a wondrous sight,
Growin' on the surface, shinin' bright.
Their petals like silk, a gentle hue,
A treasure to behold, for me and you.

Their stems so slender, their leaves so grand,
A haven for fish, in this watery land.
The sun shines down upon 'em, and they thrive,
A pirate's delight, where the waters survive.

Their beauty is a sight, to make ye smile,
A treasure to see, in this pirate's isle.
So let's raise our cups, and give a hearty cheer,
For the waterlilies, that grow so clear.

So hoist the sails, me hearties, and set sail,
For the waters of wonder, where these flowers prevail.
And when ye find yerself, in this watery nest,
Just remember the waterlilies, and take a pirate's rest.

Yer matey, I hope ye enjoyed me poem,
About the waterlilies, in this pirate's gloam.
So keep yer wits about ye, and yer spirits high,
And remember the waterlilies, as ye sail by!



