<a href="https://colab.research.google.com/github/dcnguyen060899/RAG-LLM/blob/main/Llama_7B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Documentation for research, design and implementation

This report will provide an in-depth analysis of the code for a Streamlit-based web application, focusing on its integration with language models and document retrieval systems. The primary objective of this application is to create a sophisticated interface for uploading, processing, and querying PDF documents using the LLaMa 2 7B language model. The application exemplifies a blend of various advanced technologies, including Streamlit, Transformers, and custom document handling mechanisms.

### 1. Overview of the Application

The application is designed as a web interface using Streamlit, a popular framework for building data applications. It allows users to upload PDF documents, which are then processed and made searchable through natural language queries. The core of the application revolves around the integration of a language model, specifically the LLaMa 2 7B model, for processing and responding to user queries.

### 2. Key Components and Their Roles

#### 2.1 Streamlit Interface
Streamlit is utilized to create a user-friendly web interface. It handles file uploads, displays query results, and manages user interactions.

#### 2.2 Transformers and LLaMa 2 7B Model
The application employs the `transformers` library to load the LLaMa 2 7B model. This model is pivotal for generating responses to user queries and processing text data.

#### 2.3 Document Handling and Indexing
The application incorporates functionality to read and index PDF documents, making them searchable. This feature is crucial for handling user-uploaded documents.

### 3. Detailed Code Analysis

#### 3.1 Import Statements
The code begins with importing necessary libraries and modules. Key imports include Streamlit, Transformers for the language model, PyPDF for PDF processing, and various custom modules for embedding and indexing.

#### 3.2 Tokenizer and Model Initialization
The `get_tokenizer_model` function initializes the tokenizer and model for the LLaMa 2 7B model. These components are essential for processing natural language inputs.

#### 3.3 Streamlit UI Components
The code then sets up Streamlit UI elements such as text areas and buttons for user interaction.

#### 3.4 LLaMa Model and Query Wrapper
A `HuggingFaceLLM` object is instantiated, representing the LLaMa model integrated with a query wrapper for handling user inputs.

#### 3.5 System Prompt Update Function
The `update_system_prompt` function allows dynamic updating of the system prompt based on user input, demonstrating the application's interactive nature.

#### 3.6 Embeddings and Service Context
The application uses embeddings to represent document chunks and sets up a service context for managing these operations.

#### 3.7 File Upload and Document Indexing
The code provides functionality for users to upload PDF files, which are then saved and indexed for querying.

#### 3.8 Streamlit Chat Interface
A chat interface is created using Streamlit components, facilitating the interaction between the user and the application.

#### 3.9 Query Handling
The application processes user queries using the indexed documents and the LLaMa model, showcasing the retrieval-augmented aspect of the system.

### 4. Implementation Challenges and Strategies

#### 4.1 Integration of Diverse Technologies
Combining Streamlit, Transformers, and custom indexing mechanisms was a significant challenge. The solution was to modularize the code and carefully manage dependencies.

#### 4.2 Efficient Document Handling
Efficiently processing and indexing PDF documents required careful planning. The use of PyPDF and custom indexing algorithms addressed this challenge.

#### 4.3 User Interface Design
Creating an intuitive and responsive user interface with Streamlit was essential. The strategy involved iterative design and user feedback.

### 5. User-Driven Data Upload and Custom Retrieval

#### 5.1 Purpose and Functionality
In addition to processing and querying pre-existing documents, the application provides a unique feature where users can upload their data, such as resumes or candidate profiles. This functionality is particularly tailored for hiring managers or recruitment teams. They can upload a set of resumes and then use the application to find the most suitable candidates based on a specific job description and hiring guidelines.

#### 5.2 Implementation Strategy
The application employs a two-step process to facilitate this feature:

1. **Data Upload and Processing**: Users can upload documents (e.g., resumes) via the Streamlit interface. Once uploaded, these documents are processed and indexed similarly to the initial set of PDFs. This process involves extracting text from the resumes and converting them into a searchable format, which is then added to the application’s document index.

2. **Query-Based Retrieval-Augmented Generation (RAG)**: The core feature of the application is its ability to perform RAG based on user queries. When a hiring manager inputs job requirements or specific criteria, the application uses the LLaMa 2 7B model to understand and interpret these requirements. It then searches through the indexed resumes to find matches. The model's retrieval mechanism is designed to understand the context of the query (job description) and extract relevant information from the uploaded documents (candidate profiles), thereby suggesting the most fitting candidates.

#### 5.3 Enhancing Candidate Selection
The application's RAG system can significantly streamline the hiring process. By automating the initial screening of candidates based on specific criteria, it reduces the time and effort required in candidate selection. This system is not just a keyword match but understands the nuances of the job description and the candidates' qualifications, leading to more accurate and relevant recommendations.

#### 5.4 Challenges and Solutions
- **Data Privacy and Security**: Handling personal data such as resumes necessitates stringent data privacy measures. The application must be designed with robust security protocols to ensure user data protection.
- **Accurate Information Retrieval**: The success of this feature depends on the model's ability to accurately interpret and retrieve relevant information from a diverse set of documents. Continuous testing and fine-tuning of the model are essential to improve accuracy.

#### 5.5 User Experience
To maximize user satisfaction, the application provides a clear and intuitive interface for uploading documents and inputting queries. Feedback mechanisms are also in place to refine the model based on user interactions and improve the overall effectiveness of the candidate selection process.

### 6. Conclusion
This extension of the application showcases its adaptability and potential in real-world scenarios like recruitment. By enabling users to upload their data and utilizing a sophisticated RAG system, the application not only serves as a query-response tool but also becomes a powerful aid in decision-making processes like hiring. The integration of user-uploaded data with the LLaMa 2 7B model's retrieval and generation capabilities marks a significant advancement in the application's utility and effectiveness.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!pip install pyngrok==4.1.1

Collecting pyngrok==4.1.1
  Downloading pyngrok-4.1.1.tar.gz (18 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyngrok
  Building wheel for pyngrok (setup.py) ... [?25l[?25hdone
  Created wheel for pyngrok: filename=pyngrok-4.1.1-py3-none-any.whl size=15963 sha256=928a5824582b97b676998cfa0f36e3aa6916bc9c7d78a1e589f3336daefb72a1
  Stored in directory: /root/.cache/pip/wheels/4c/7c/4c/632fba2ea8e88d8890102eb07bc922e1ca8fa14db5902c91a8
Successfully built pyngrok
Installing collected packages: pyngrok
Successfully installed pyngrok-4.1.1


In [None]:
!pip install transformers



In [None]:
!pip install langchain einops accelerate bitsandbytes scipy xformers sentencepiece llama-index llama_hub sentence-transformers pypdf streamlit transformers

Collecting langchain
  Downloading langchain-0.0.344-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting einops
  Downloading einops-0.7.0-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.25.0-py3-none-any.whl (265 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m19.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes
  Downloading bitsandbytes-0.41.2.post2-py3-none-any.whl (92.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
Collecting xformers
  Downloading xformers-0.0.22.post7-cp310-cp310-manylinux2014_x86_64.whl (211.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.8/211.8 

In [None]:
# Define variable to hold llama2 weights naming
name = "meta-llama/Llama-2-7b-chat-hf"
# Set auth token variable from hugging face
auth_token = "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"

In [None]:
tokenizer = AutoTokenizer.from_pretrained(name, cache_dir='/content/drive/My Drive/LLM Deployment/LLM Deployment/', use_auth_token=auth_token)

In [None]:
# %%writefile app_test.py

import streamlit as st

# Import transformer classes for generaiton
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
# Import torch for datatype attributes
import torch
# Import the prompt wrapper...but for llama index
from llama_index.prompts.prompts import SimpleInputPrompt
# Import the llama index HF Wrapper
from llama_index.llms import HuggingFaceLLM
# Bring in embeddings wrapper
from llama_index.embeddings import LangchainEmbedding
# Bring in HF embeddings - need these to represent document chunks
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
# Bring in stuff to change service context
from llama_index import set_global_service_context
from llama_index import ServiceContext
# Import deps to load documents
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from pathlib import Path
import pypdf
import time
import os

# Define variable to hold llama2 weights namingfiner
name = "meta-llama/Llama-2-7b-chat-hf"
# Set auth token variable from hugging face
auth_token = "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"

@st.cache_resource
def get_tokenizer_model():
    # Create tokenizer
    tokenizer = AutoTokenizer.from_pretrained(name, cache_dir='/content/drive/My Drive/LLM Deployment/LLM Deployment/', use_auth_token=auth_token)

    # Create model
    model = AutoModelForCausalLM.from_pretrained(name, cache_dir='/content/drive/My Drive/LLM Deployment/LLM Deployment/'
                            , use_auth_token=auth_token, torch_dtype=torch.float16,
                            rope_scaling={"type": "dynamic", "factor": 2}, load_in_8bit=True)

    return model, tokenizer
model, tokenizer = get_tokenizer_model()

# Initialize the SimpleInputPrompt with an empty template
query_wrapper_prompt = SimpleInputPrompt("{query_str} [/INST]")

# Streamlit UI to let the user update the system prompt
# Start with an empty string or a default prompt
default_prompt = ""
user_system_prompt = st.text_area("How can I best assist you?", value="", height=100)
update_button = st.button('Request')

# Initialize the llm object with a placeholder or default system prompt
llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=256,
    system_prompt="",  # Placeholder if your initial prompt is empty
    query_wrapper_prompt=query_wrapper_prompt,  # Placeholder string
    model=model,
    tokenizer=tokenizer
)

# Function to update the system prompt and reinitialize the LLM with the new prompt
def update_system_prompt(new_prompt):
    global llm
    llm.system_prompt = new_prompt


if update_button:
    # Update the system prompt and reinitialize the LLM
    update_system_prompt(user_system_prompt)
    st.success('Requested')

# Create and dl embeddings instance
embeddings=LangchainEmbedding(
    HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
)

# Create new service context instance
service_context = ServiceContext.from_defaults(
    chunk_size=1024,
    llm=llm,
    embed_model=embeddings
)

# And set the service context
set_global_service_context(service_context)

# Define a directory for storing uploaded files
UPLOAD_DIRECTORY = "/content/"

if not os.path.exists(UPLOAD_DIRECTORY):
    os.makedirs(UPLOAD_DIRECTORY)

st.title('PDF Upload and Query Interface')

# File uploader allows user to add PDF
uploaded_file = st.file_uploader("Upload PDF", type="pdf", accept_multiple_files=True)
upload_button = st.button('Upload')

if uploaded_file and upload_button:
  for file in uploaded_file:
  # Save the uploaded PDF to the directory
    with open(os.path.join(UPLOAD_DIRECTORY, file.name), "wb") as f:
      f.write(file.getbuffer())
    st.success("File uploaded successfully.")

documents = SimpleDirectoryReader(UPLOAD_DIRECTORY).load_data()
index = VectorStoreIndex.from_documents(documents)


# Setup index query engine using LLM
query_engine = index.as_query_engine(streaming=True, similarity_top_k=1)

# Create centered main title
st.title('👔 HireMind 🧩')

# setup a session to hold all the old prompt
if 'messages' not in st.session_state:
  st.session_state.messages = []

# print out the history message
for message in st.session_state.messages:
  st.chat_message(message['role']).markdown(message['content'])


# Create a text input box for the user
# If the user hits enter
prompt = st.chat_input('Input your prompt here')

if prompt:
  st.chat_message('user').markdown(prompt)
  st.session_state.messages.append({'role': 'user', 'content': prompt})

  response = query_engine.query(prompt)

  st.chat_message('assistant').markdown(response)
  st.session_state.messages.append(
      {'role': 'assistant', 'content': response}
  )

Overwriting app_test.py


In [None]:
from pyngrok import ngrok

# Terminate open tunnels if exist
ngrok.kill()
ngrok.set_auth_token("2Y0hRfVfNK8adHVauZ0aEaqvM7w_7HYUBtmJ8DTrBZ4mq1D32")

# Set up a new tunnel
public_url = ngrok.connect(port='8501')
print('Streamlit URL:', public_url)

!streamlit run app_test.py &


Streamlit URL: http://673b-35-204-195-223.ngrok-free.app

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://35.204.195.223:8501[0m
[0m
Loading checkpoint shards: 100% 2/2 [01:22<00:00, 41.32s/it]
2023-12-01 22:23:13.654229: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-01 22:23:13.654289: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-01 22:23:13.654335: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin 