<a href="https://colab.research.google.com/github/afifaniks/triagerX/blob/main/notebook/chatbot_llama2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chatbot using LLAMA 2

In [1]:
!nvidia-smi

Tue Sep 12 20:39:23 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100 80G...  On   | 00000000:17:00.0 Off |                    0 |
| N/A   56C    P0   282W / 300W |  13047MiB / 80994MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
import locale
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

## Installing Dependencies

In [None]:
!pip install -qU transformers accelerate einops langchain xformers bitsandbytes faiss-gpu sentence_transformers pypdf

## Import Libraries

In [None]:
import torch
import transformers

from langchain.chains import ConversationalRetrievalChain
from langchain.document_loaders import WebBaseLoader, PyPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import HuggingFacePipeline
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from torch import cuda, bfloat16
from transformers import StoppingCriteria, StoppingCriteriaList

## Fetch & Configure Llama-2-7b-chat model from HuggingFace

In [None]:
hf_auth = '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'

In [None]:
model_id = 'meta-llama/Llama-2-7b-chat-hf'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

# begin initializing HF items, you need an access token
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)

# enable evaluation mode to allow model inference
model.eval()

print(f"Model loaded on {device}")



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



Model loaded on cuda:0


## Prepare HuggingFace Pipeline with Langchain

In [None]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)



In [None]:
stop_list = ['\nHuman:', '\n```\n']

stop_token_ids = [tokenizer(x)['input_ids'] for x in stop_list]
stop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]
stop_token_ids

[tensor([    1, 29871,    13, 29950,  7889, 29901], device='cuda:0'),
 tensor([    1, 29871,    13, 28956,    13], device='cuda:0')]

In [None]:
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        for stop_ids in stop_token_ids:
            if torch.eq(input_ids[0][-len(stop_ids):], stop_ids).all():
                return True
        return False

stopping_criteria = StoppingCriteriaList([StopOnTokens()])

In [None]:
generate_text = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    task='text-generation',
    stopping_criteria=stopping_criteria,
    temperature=0.1,
    max_new_tokens=512,
    repetition_penalty=1.1
)

llm = HuggingFacePipeline(pipeline=generate_text)

**Please upload the PDF file first**

In [None]:
loader = PyPDFLoader("ai-ia.pdf")
documents = loader.load()

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
all_splits = text_splitter.split_documents(documents)
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
vectorstore = FAISS.from_documents(all_splits, embeddings)

In [None]:
chain = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), return_source_documents=True)

## Test on Custom Document

In [None]:
def query(question: str, chat_history: list = []):
  result = chain({"question": question, "chat_history": chat_history})
  return result["answer"]

In [None]:
query("Who is Angie Radiskovic?")

' Angie Radiskovic is the Assistant Superintendent and Chief Strategy and Risk Officer at Office of the Superintendent of Financial Institutions.'

In [None]:
query("Summarize Data Governance")

" Data governance is critical for financial institutions considering the sensitive and confidential nature of financial and customer data. It involves ensuring that data is accurate, consistent, safe, and complete, which is crucial for the effective functioning of AI systems. Key aspects of data governance include establishing roles and responsibilities, defining a well-defined risk appetite, and implementing flexible policies that can adapt as an institution's adoption of AI matures. Additionally, regulatory rules should be clear about what types of third-party data can be used by financial institutions and the level of due diligence required in each case."

In [None]:
query("What are the EDGE Principles")

' The EDGE Principles are a framework for developing and deploying artificial intelligence (AI) in a responsible and ethical manner. They were developed by the EDGE Consortium, a group of organizations dedicated to promoting ethical and responsible AI development. The EDGE Principles consist of four key areas:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\

In [None]:
query("What levels of explainability might AI systems have?")

" There are different levels of explainability that AI systems could have, depending on the context and purpose of the system. Here are some possible levels of explainability:\n\n1. Local Explainability: This refers to the ability to provide a detailed explanation of how a specific input or feature contributed to a particular output or decision. For example, a model that predicts credit risk might provide a local explanation for why a particular borrower was approved or denied based on their credit history and financial behavior.\n2. Global Explainability: This refers to the ability to provide a high-level overview of how a model works, including the key features and relationships that drive its predictions. For example, a model that predicts stock prices might provide a global explanation of how it uses economic indicators, market trends, and company performance to generate predictions.\n3. Model-Specific Explainability: This refers to the ability to provide a detailed explanation of 