<a href="https://colab.research.google.com/github/Squigspear/xCyberLLM/blob/main/Tutorial2/Tutorial_2_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This tutorial showcases **Retrieval Augmentation Generation** with a LLM, where a document (CCOP) is referred to when the model is queried. Split into 2 parts, (1) Open-sourced LLAMA 2 7B 5 bit quantized mode (2) Microsoft Azure OpenAI.


**Before starting:**

1. **Click on Runtime (top) > Change Runtime type -> Python 3, T4 GPU**

2. **Click Connect (top right corner)**

**To Run**

**1. Click on play button (left) or Ctrl-Enter of each cell to run cell**


## Initialise notebook
Runtime: ~ 7mins

In [None]:
#%cd /content/drive/MyDrive/Colab Notebooks/Hackathon

# Download from google drive
from google_drive_downloader import GoogleDriveDownloader as gdd
#some file id/ list of file ids parsed from file urls.
google_fid_id = '1PFXHKUfiDupWV_K_MLmSWBxajTPvj_B4' #'1jAluS8pfwn8_yp-cyED97OdYXl4-vWJ6'
destination = 'dir/dir/utils_tutorial_2_RAG.ipynb'
#if zip file ad kwarg unzip=true

gdd.download_file_from_google_drive(file_id=google_fid_id,dest_path = destination)

In [None]:
%cd /content/dir/dir
%run 'utils_tutorial_2_RAG.ipynb'



---

# Running LLAMA 2 on Colab
(Open-source LLM)

## **Initialise LLM (Llama-2-7b model)**
Optional: Change model and parameters as required




In [None]:
# Make sure the model path is correct for your system!
llm = LlamaCpp(
    n_ctx = 4096,
    model_path="llama-2-7b-chat.Q5_K_M.gguf", # location of the model, llama-2-13b-chat.Q4_0.gguf
    temperature=0.2,                 # temperature
    max_tokens=2000,                 # Max. number of tokens to be generated
    top_p=0.9,                    # top_p = 0.9
    top_k=30,                     # top_k = 30
    n_gpu_layers=200,                 # number of layers to offload to GPU
    verbose=True, # Verbose is required to pass to the callback manager
    callback_manager=callback_manager,
    n_batch=200,          # number of token generation in parallel
)

## **Customise prompt template**
To edit: Change qa_template accordingly.

In [None]:
qa_template = """Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Context: {context}
Question: {question}
Only return the helpful concise answer below and nothing else.
Helpful answer:"""

prompt_template = create_prompt(qa_template)
print(prompt_template)
prompt = PromptTemplate(template=prompt_template,input_variables=['context', 'question'])

## Add DocumentQA using langchain for RAG
Optional: Change embedding model accordingly

In [None]:
model_name = "sentence-transformers/all-MiniLM-L12-v2"
model_kwargs = {'device': 'cuda'}      # you must have a gpu, otherwise change it to cpu
encode_kwargs = {'normalize_embeddings': True}
embedding_func = HuggingFaceEmbeddings(
    model_name = model_name,
    model_kwargs = model_kwargs,
    encode_kwargs = encode_kwargs
)

## Launch Gradio UI

To do:
1. Upload CCOP.pdf
2. Type question in chatbox
3. To end - click on stop button on top left of cell

(Make sure the top left icon of the cell is running)


Error: restart the cell by clicking stop -> play)

In [None]:
COUNT = 0
chat_history = []
qa = None

setup_gradio()



---

# Using AzureOpenAI as LLM
(Closed-source LLM)

## **Initialise LLM (GPT3.5 / 4)**
Optional: Change model and parameters as required. Select either GPT3.5 or 4




## GPT3.5

In [None]:
# GPT 3.5
# Setting up API access
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_VERSION"] = "2023-05-15"
os.environ["OPENAI_API_BASE"] = "https://dto-testing.openai.azure.com/"
os.environ["OPENAI_API_KEY"] = "ce8e5b0d756f4d83b4dfbbc4ccd08fec"

embeddings = AzureOpenAIEmbeddings(deployment="DTO_embed"
                              ,model='text-embedding-ada-002'
                              ,chunk_size=1) # chunk_size number is peculiarity of Azure OpenAI

llm_OAI = AzureOpenAI( temperature = 0.1, #Change the temperature to increase
                   deployment_name="DTO_demo",  # This is the deployed GPT3 from Azure
                   model_name="text-davinci-003",
                            )

## Or GPT4


In [None]:
# GPT 4
# from langchain.chat_models.azure_openai import AzureChatOpenAI

os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_VERSION"] = "2023-03-15-preview"
os.environ["OPENAI_API_BASE"] = "https://ccm-3.openai.azure.com/"
os.environ["OPENAI_API_KEY"] = "6daf5804dd1e4d05a5d1079ac52e40a7"
os.environ["AZURE_ENDPOINT"] = "https://ccm-3.openai.azure.com/openai"


embeddings = AzureOpenAIEmbeddings(deployment="ccm-embedding"
                              ,model='text-embedding-ada-002'
                              ,chunk_size=1) # chunk_size number is peculiarity of Azure OpenAI

llm_OAI = AzureChatOpenAI( temperature = 0.1,
                   deployment_name="ccm-gpt4",  # This is the deployed GPT4 from Azure
                   model_name="gpt-4")

In [None]:
qa_template = """Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Context: {context}
Question: {question}
Only return the helpful concise answer below and nothing else.
Helpful answer:"""

prompt = PromptTemplate(template=qa_template,input_variables=['context', 'question'])

## Launch Gradio UI

To do:
1. Upload CCOP.pdf
2. Type question in chatbox
3. To end - click on stop button on top left of cell

(Make sure the top left icon of the cell is running)


Error: restart the cell by clicking stop -> play)

In [None]:
COUNT = 0
chat_history = []
setup_gradio_OAI()