-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
General adjustments: - cleaned up the repository by removing and adjusting several files - updated README.md to reflect changes from original repository - added db_clear.py to easily clear the entire database main_st.py: - removed cache refresh button, issue is now fixed - cleaned up multiple parts of the code db_build.py: - cleaned up code
- Loading branch information
Showing
11 changed files
with
239 additions
and
3,398 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# GGML Models | ||
models/*.bin | ||
models/* | ||
|
||
# Data | ||
data/* | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,57 +1,54 @@ | ||
# Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A | ||
|
||
### Clearly explained guide for running quantized open-source LLM applications on CPUs using LLama 2, C Transformers, GGML, and LangChain | ||
## Preface | ||
This is a fork of Kenneth Leung's original repository, that adjusts the original code in several ways: | ||
- A streamlit visualisation is available to make it more user-friendly | ||
- Follow-up questions are now possible thanks to memory implementation | ||
- Different models now appear as options for the user | ||
- Multiple other optimalisations | ||
|
||
**Step-by-step guide on TowardsDataScience**: https://towardsdatascience.com/running-llama-2-on-cpu-inference-for-document-q-a-3d636037a3d8 | ||
___ | ||
## Context | ||
- Third-party commercial large language model (LLM) providers like OpenAI's GPT4 have democratized LLM use via simple API calls. | ||
- However, there are instances where teams would require self-managed or private model deployment for reasons like data privacy and residency rules. | ||
- The proliferation of open-source LLMs has opened up a vast range of options for us, thus reducing our reliance on these third-party providers. | ||
- When we host open-source LLMs locally on-premise or in the cloud, the dedicated compute capacity becomes a key issue. While GPU instances may seem the obvious choice, the costs can easily skyrocket beyond budget. | ||
- In this project, we will discover how to run quantized versions of open-source LLMs on local CPU inference for document question-and-answer (Q&A). | ||
<br><br> | ||
![Alt text](assets/diagram_flow.png) | ||
___ | ||
|
||
## Quickstart | ||
- Ensure you have downloaded the GGML binary file from https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML and placed it into the `models/` folder | ||
- To start parsing user queries into the application, launch the terminal from the project directory and run the following command: | ||
`poetry run python main.py "<user query>"` | ||
- For example, `poetry run python main.py "What is the minimum guarantee payable by Adidas?"` | ||
- Note: Omit the prepended `poetry run` if you are NOT using Poetry | ||
<br><br> | ||
- Ensure you have downloaded the model of your choice in GGUF format and placed it into the `models/` folder. Some examples: | ||
- https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF | ||
- https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF | ||
|
||
- Fill the `data/` folder with .pdf, .doc(x) or .txt files you want to ask questions about | ||
|
||
- To build a FAISS database with information regarding your files, launch the terminal from the project directory and run the following command <br> | ||
`python db_build.py` | ||
|
||
- To start asking questions about your files, run the following command: <br> | ||
`streamlit run main_st.py` | ||
|
||
- Choose which model to use for Q&A and adjust parameters to your liking | ||
|
||
![Alt text](assets/qa_output.png) | ||
|
||
___ | ||
## Tools | ||
- **LangChain**: Framework for developing applications powered by language models | ||
- **C Transformers**: Python bindings for the Transformer models implemented in C/C++ using GGML library | ||
- **LlamaCPP**: Python bindings for the Transformer models implemented in C/C++ | ||
- **FAISS**: Open-source library for efficient similarity search and clustering of dense vectors. | ||
- **Sentence-Transformers (all-MiniLM-L6-v2)**: Open-source pre-trained transformer model for embedding text to a 384-dimensional dense vector space for tasks like clustering or semantic search. | ||
- **Llama-2-7B-Chat**: Open-source fine-tuned Llama 2 model designed for chat dialogue. Leverages publicly available instruction datasets and over 1 million human annotations. | ||
- **Poetry**: Tool for dependency management and Python packaging | ||
|
||
___ | ||
## Files and Content | ||
- `/assets`: Images relevant to the project | ||
- `/config`: Configuration files for LLM application | ||
- `/data`: Dataset used for this project (i.e., Manchester United FC 2022 Annual Report - 177-page PDF document) | ||
- `/models`: Binary file of GGML quantized LLM model (i.e., Llama-2-7B-Chat) | ||
- `/models`: Binary file of GGUF quantized LLM model (i.e., Llama-2-7B-Chat) | ||
- `/src`: Python codes of key components of LLM application, namely `llm.py`, `utils.py`, and `prompts.py` | ||
- `/vectorstore`: FAISS vector store for documents | ||
- `db_build.py`: Python script to ingest dataset and generate FAISS vector store | ||
- `main.py`: Main Python script to launch the application and to pass user query via command line | ||
- `pyproject.toml`: TOML file to specify which versions of the dependencies used (Poetry) | ||
- `db_clear.py`: Python script to clear the previously built database | ||
- `main_st.py`: Main Python script to launch the streamlit application | ||
- `main.py`: Python script to launch an older version of the application within the terminal, mainly used for testing purposes | ||
- `requirements.txt`: List of Python dependencies (and version) | ||
___ | ||
|
||
## References | ||
- https://github.com/marella/ctransformers | ||
- https://huggingface.co/TheBloke | ||
- https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML | ||
- https://python.langchain.com/en/latest/integrations/ctransformers.html | ||
- https://python.langchain.com/en/latest/modules/models/llms/integrations/ctransformers.html | ||
- https://python.langchain.com/docs/ecosystem/integrations/ctransformers | ||
- https://ggml.ai | ||
- https://github.com/rustformers/llm/blob/main/crates/ggml/README.md | ||
- https://www.mdpi.com/2189676 | ||
- https://github.com/abetlen/llama-cpp-python | ||
- https://python.langchain.com/docs/integrations/llms/llamacpp |
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# ========================= | ||
# Module: Vector DB Clear | ||
# ========================= | ||
import box | ||
import yaml | ||
import os | ||
|
||
# Import config vars | ||
with open('config/config.yml', 'r', encoding='utf8') as ymlfile: | ||
cfg = box.Box(yaml.safe_load(ymlfile)) | ||
|
||
def delete_files_and_clear_content(folder_path, file_to_clear): | ||
try: | ||
# Get a list of all files in the folder | ||
files = os.listdir(folder_path) | ||
|
||
# Loop through the list and delete each file | ||
for file in files: | ||
file_path = os.path.join(folder_path, file) | ||
if os.path.isfile(file_path): | ||
os.remove(file_path) | ||
print(f"{file} deleted successfully.") | ||
|
||
print(f"All files in '{folder_path}' have been deleted.") | ||
except FileNotFoundError: | ||
print(f"Folder not found at path: {folder_path}") | ||
|
||
# Clear the contents of the specified file | ||
try: | ||
with open(file_to_clear, 'w') as clear_file: | ||
clear_file.truncate(0) | ||
print(f"Contents of '{file_to_clear}' cleared successfully.") | ||
except FileNotFoundError: | ||
print(f"{file_to_clear} not found.") | ||
|
||
if __name__ == "__main__": | ||
folder_path = cfg.DB_FAISS_PATH | ||
file_to_clear = os.path.join(cfg.DATA_PATH, cfg.LOG_FILE) | ||
|
||
delete_files_and_clear_content(folder_path, file_to_clear) |
Oops, something went wrong.