This project implements a Retrieval-Augmented Generation (RAG) pipeline to answer questions about food products using documents from Open Food Facts.
Click to expand
This project implements a Retrieval-Augmented Generation (RAG) pipeline to answer questions about food products using documents from Open Food Facts. The pipeline is built with Python and Langchain, and exposed through a FastAPI application.
The core components of the pipeline are:
- Vectorization: Documents are transformed into vector embeddings.
- Vector Storage: Vectors are stored in a Chroma vector database.
- Semantic Search: User queries are vectorized to perform a similarity search.
- Reranking: Retrieved documents are reranked to improve relevance.
- Response Generation: A Large Language Model (LLM) generates an answer based on the query and retrieved documents.
![]() prompt pour le LLM avec RAG. |
![]() Vectorisation d'un document |
![]() Serveur FastAPI |
This project uses data from Open Food Facts, a free, open, and collaborative database of food products from around the world. The data is used to build the knowledge base of the RAG pipeline, enabling it to answer a wide range of questions about food products, their ingredients, nutritional information, and more.
You will need to download the Open Food Facts dataset to use this project. You can choose from the following formats:
- CSV (for advanced users): This is a single CSV file containing the entire dataset.
- Link: en.openfoodfacts.org.products.csv
- Size:
11.20GB (direct link)
- CSV.gz (for advanced users): This is a compressed archive of the CSV file. You can extract it with WinRAR, File Explorer/Windows Explorer,
gziportar.- Link: en.openfoodfacts.org.products.csv.gz
- Size:
1.11GB (11.20GB décompressé)
- Sample of the CSV for test (Recommended): This is a sample of CSV file containing the entire dataset. You don't need to extract it.
- Link: sample_openfoodfacts.csv(TODO)
- Size:
0.02GB (XX.XXGB compressé)
Once downloaded, place the data in the data directory.
Click to expand
To get a local copy up and running follow these simple steps.
Prerequisites
- Python 3.8+
- Ollama installed and running.
Création d'environnements virtuels :
Un package manquant :
.venv\Scripts\activate && python -m pip install --upgrade pip && python -m pip install -r requirements.txt
1. Cloner le Repo
avec GitHub (Copie les fichiers localement)
2. python -m venv .venv
peut nécessiter le passage par CMD (Crée le Dossier .venv)
3. .venv\Scripts\activate
Créer un environnement virtuel Python (Sur Linux/Mac) :
source venv/bin/activate # Sur Linux/MacLancer avec le CMD peut éviter les erreurs. (Lance l'environnement virtuel)
EN ADMIN : Set-ExecutionPolicy -ExecutionPolicy RemoteSigned en cas d'erreur
(détail)
Résultat :
On peut aussi (Si c'est un problème de l'éditeur) $ . .venv\Scripts\activate.ps1
(lance l'environnement virtuel)
4. python -m pip install --upgrade pip
(met à jour pip)
5. python -m pip install -r requirements.txt
pip freeze > requirements.txt pour remplir automatiquement les requirements
Pour toutes les étapes précédentes (sur CMD ou powershell>=7) :
python -m venv .venv && .venv\Scripts\activate && python -m pip install --upgrade pip && python -m pip install -r requirements.txt
en cas d'erreur (supprimer le dossier .venv ou lancer):
.venv\Scripts\activate && python -m pip install --upgrade pip && python -m pip install -r requirements.txt
Avec pip freeze :
Pour toutes les étapes précédentes (sur CMD ou powershell>=7) :
```python -m venv .venv && .venv\Scripts\activate && python -m pip install --upgrade pip && python -m pip install -r requirements.txt && pip freeze > requirements.txt```
en cas d'erreur (supprimer le dossier .venv ou lancer):
```.venv\Scripts\activate && python -m pip install --upgrade pip && python -m pip install -r requirements.txt && pip freeze > requirements.txt```
6. Modifier .git\info\exclude
Ajouter : .venv
(Ne prend pas en compte la modification du dossier .venv)
7. Lancer le serveur FastAPI
uvicorn api.main:app --reload(Lance le fichier serveur principal avec python)
Linux/Mac :
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python main.py
Modifier .git\info\exclude
Download Mistral Model
- Download the Mistral model with Ollama:
ollama pull mistral
Click to expand
To start the FastAPI server, run the following command from the project's root directory:
uvicorn api.main:app --reloadThe API will be available at http://127.0.0.1:8000. You can access the interactive Swagger UI documentation at http://127.0.0.1:8000/docs.
http://127.0.0.1:8000/docs
GET /health: Health check endpoint.POST /query: Processes a natural language query.POST /upload_document: Uploads a document to the knowledge base.GET /documents: Retrieves information about indexed documents.
Query Example:
curl -X POST "http://127.0.0.1:8000/query" \
-H "Content-Type: application/json" \
-d
'{
"query": "What are the benefits of olive oil?",
"top_k": 2
}'Upload Document Example:
curl -X POST "http://127.0.0.1:8000/upload_document" \
-H "Content-Type: multipart/form-data" \
-F "file=@my_document.txt"Click to expand
├── api
│ └── main.py
├── data
├── uploads
├── src
│ ├── document_processor.py
│ ├── generator.py
│ ├── rag_pipeline.py
│ ├── reranker.py
│ └── vector_store.py
├── venv
├── GEMINI.md
├── requirements.txt
├── TODO.txt
└── README.md
- Add support for more document types.
- Implement a more advanced reranking model.
- Add user authentication.
- Create a web interface for easier interaction.
See the open issues for a full list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt for more information.
Project Link: https://github.com/RaykeshR/RAG


