rag-python

ML/LLM experiments with Llama Index to develop a personal assistant for cognitive impaired patients.

Index

Setup
- Python scripts
- Ollama
Experiments
Datasets

Alternative indexing techniques using the FAISS library.

Setup

Python scripts

Python 3.11 is unsupported by Pytorch. The application must run with Python 3.10.

Install dependencies:

pip install -r requirements.txt
# additional dependencies
pip install pypdf fastembed chromadb accelerate streamlit langchainhub

In some scripts, the value device must reflect the hardware:

auto is the default
cpu should always work, but processing time will be too long
mps to use M1

Some scripts connect to HuggingFace. Set the following environment variables:

HF_HOME=path: path of the HuggingFace cache
HUGGING_FACE_HUB_TOKEN=token: HuggingFace token to download the models

Ollama

Ollama is required for most scripts.

Install ollama:

brew install ollama
mkdir -p ~/.ollama
# optional to store the models on an external drive
ln -s "/{PATH}/ollama" ~/.ollama/models
ollama serve
ollama pull mistral
ollama pull llama2

Run it:

ollama serve
ollama run llama2

That should open an interactive shell to chat with Llama2.

Experiments

01-use-local-knowledge: basic experiment using llama-index and llama to index and query a dataset.
02-chat-bot: experiment using ollama/llama2 + streamlit/landchan/chromadb to discuss a PDF with the LLM.
03-fine-tuning: experiment fine-tuning bert with a dataset of reviews.
04-training-with-colab: same as 03, but using Colab.
05-create-a-bio: generate knowledge with LLMs and use the results to build the knowledge base for further iterations.
06-sentence-split: evaluates how SentenceSplitter works.
07-rag-pipeline: variation of 06-sentence-split.
08-query-chroma: test to verify how Chroma retrieves knowledge based on queries and filters.
09-refiner: utilisation of LLMs to re-rank results from the vector database.
10-keywords-extraction: methods to extract keywords (or key-phrases) from a text.
11-query-chroma-with-kw: use keywords to pre-filter the nodes returned by a query.
12-faiss: alternative indexing techniques using the FAISS library.
13-ingest-ebook: comparison between two extractors in order to parse a medical book in PDF.
14-smarter-ingest: extension of the SimpleDirectoryReader with enhanced PDF processing.
15-diagnosis: attempts to define the probability of a diagnosis based on dialogs.
16-relevant: find relevant questions to diagnose a disease.
17-better-dialogs: attempt to improve the dialogs with RAG.
18-translate-in-you-form: translate a diagnosis into a dialog directed to the patient.
19-elastic: multilevel indexing of PDFs storing the embeddings in Elasticsearch.

Datasets

bio: the bio of a fictional woman generated by an 05-create-a-bio.
bio-single-file: like bio but in a single file.
dementia-wiki-txt: an extract of the Wikipedia page about dementia.
dementia-wiki-polluted: same as dementia-wiki-txt but polluted by a sentence affirming that there exists a relation between dementia and alien kidnapping (to study hallucinations).
TwentyThousandLeaguesUnderTheSea: Twenty Thousand Leagues Under the Seas by Jules Verne. Source: https://www.gutenberg.org/
gutenberg: five books from https://www.gutenberg.org/. On the Origin of Species By Means of Natural Selection by Charles Darwin, Paradise Lost by John Milton, The Fall of the House of Usher by Edgar Allan Poe, The Republic by Plato, and Twenty Thousand Leagues under the Sea by Jules Verne.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rag-python

Index

Setup

Python scripts

Ollama

Experiments

Datasets

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
01-use-local-knowledge		01-use-local-knowledge
02-chat-bot		02-chat-bot
03-fine-tuning		03-fine-tuning
04-training-with-colab		04-training-with-colab
05-create-a-bio		05-create-a-bio
06-sentence-split		06-sentence-split
07-rag-pipeline		07-rag-pipeline
08-query-chroma		08-query-chroma
09-refiner		09-refiner
10-keywords-extraction		10-keywords-extraction
11-query-chroma-with-kw		11-query-chroma-with-kw
12-faiss		12-faiss
13-ingest-ebook		13-ingest-ebook
14-smarter-ingest		14-smarter-ingest
15-diagnosis		15-diagnosis
16-relevant		16-relevant
17-better-dialogs		17-better-dialogs
18-translate-in-you-form		18-translate-in-you-form
19-elastic		19-elastic
datasets		datasets
.gitignore		.gitignore
README.md		README.md
generate-requirements.sh		generate-requirements.sh
requirements.txt		requirements.txt

alros/rag-python

Folders and files

Latest commit

History

Repository files navigation

rag-python

Index

Setup

Python scripts

Ollama

Experiments

Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages