RAG Pipeline for Open Food Facts

This project implements a Retrieval-Augmented Generation (RAG) pipeline to answer questions about food products using documents from Open Food Facts.

Explore the docs »

Report Bug · Request Feature

About The Project

This project implements a Retrieval-Augmented Generation (RAG) pipeline to answer questions about food products using documents from Open Food Facts. The pipeline is built with Python and Langchain, and exposed through a FastAPI application.

The core components of the pipeline are:

Vectorization: Documents are transformed into vector embeddings.
Vector Storage: Vectors are stored in a Chroma vector database.
Semantic Search: User queries are vectorized to perform a similarity search.
Reranking: Retrieved documents are reranked to improve relevance.
Response Generation: A Large Language Model (LLM) generates an answer based on the query and retrieved documents.

_{prompt pour le LLM avec RAG.}

_{Vectorisation d'un document}

_{Serveur FastAPI}

Open Food Facts

This project uses data from Open Food Facts, a free, open, and collaborative database of food products from around the world. The data is used to build the knowledge base of the RAG pipeline, enabling it to answer a wide range of questions about food products, their ingredients, nutritional information, and more.

Data

You will need to download the Open Food Facts dataset to use this project. You can choose from the following formats:

CSV (for advanced users): This is a single CSV file containing the entire dataset.
- Link: en.openfoodfacts.org.products.csv
- Size: 11.20 GB (direct link)
CSV.gz (for advanced users): This is a compressed archive of the CSV file. You can extract it with WinRAR, File Explorer/Windows Explorer, gzip or tar.
- Link: en.openfoodfacts.org.products.csv.gz
- Size: 1.11 GB (11.20 GB décompressé)
Sample of the CSV for test (Recommended): This is a sample of CSV file containing the entire dataset. You don't need to extract it.
- Link: sample_openfoodfacts.csv(TODO)
- Size: 0.02 GB (XX.XX GB compressé)

Once downloaded, place the data in the data directory.

Built With

Getting Started

Click to expand

To get a local copy up and running follow these simple steps.

Prerequisites

Python 3.8+
Ollama installed and running.

Création d'environnements virtuels :

Un package manquant :

.venv\Scripts\activate && python -m pip install --upgrade pip && python -m pip install -r requirements.txt

1. Cloner le Repo

avec GitHub (Copie les fichiers localement)

2. `python -m venv .venv`

peut nécessiter le passage par CMD (Crée le Dossier .venv)

3. `.venv\Scripts\activate`

Créer un environnement virtuel Python (Sur Linux/Mac) :

source venv/bin/activate  # Sur Linux/Mac

Lancer avec le CMD peut éviter les erreurs. (Lance l'environnement virtuel) EN ADMIN : Set-ExecutionPolicy -ExecutionPolicy RemoteSigned en cas d'erreur (détail)

Résultat :

$\color{rgba(100,255,100, 0.75)}{\textsf{(.venv)}}$ PS C:\Users...\Portfolio_Django> |

On peut aussi (Si c'est un problème de l'éditeur) $ . .venv\Scripts\activate.ps1 (lance l'environnement virtuel)

4. `python -m pip install --upgrade pip`

(met à jour pip)

5. `python -m pip install -r requirements.txt`

pip freeze > requirements.txt pour remplir automatiquement les requirements

Pour toutes les étapes précédentes (sur CMD ou powershell>=7) :

python -m venv .venv && .venv\Scripts\activate && python -m pip install --upgrade pip && python -m pip install -r requirements.txt

en cas d'erreur (supprimer le dossier .venv ou lancer):

.venv\Scripts\activate && python -m pip install --upgrade pip && python -m pip install -r requirements.txt

Avec pip freeze :

Pour toutes les étapes précédentes (sur CMD ou powershell>=7) : 
```python -m venv .venv && .venv\Scripts\activate && python -m pip install --upgrade pip && python -m pip install -r requirements.txt && pip freeze > requirements.txt```

en cas d'erreur (supprimer le dossier .venv ou lancer): 
```.venv\Scripts\activate && python -m pip install --upgrade pip && python -m pip install -r requirements.txt && pip freeze > requirements.txt```

6. Modifier .git\info\exclude

Ajouter : .venv (Ne prend pas en compte la modification du dossier .venv)

7. Lancer le serveur FastAPI

uvicorn api.main:app --reload

(Lance le fichier serveur principal avec python)

Linux/Mac :

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python main.py

Modifier .git\info\exclude

Download Mistral Model

Download the Mistral model with Ollama:
```
ollama pull mistral
```

Usage

Click to expand

To start the FastAPI server, run the following command from the project's root directory:

uvicorn api.main:app --reload

The API will be available at http://127.0.0.1:8000. You can access the interactive Swagger UI documentation at http://127.0.0.1:8000/docs. http://127.0.0.1:8000/docs

API Endpoints

GET /health: Health check endpoint.
POST /query: Processes a natural language query.
POST /upload_document: Uploads a document to the knowledge base.
GET /documents: Retrieves information about indexed documents.

Query Example:

curl -X POST "http://127.0.0.1:8000/query" \
-H "Content-Type: application/json" \
-d 
'{ 
  "query": "What are the benefits of olive oil?",
  "top_k": 2
}'

Upload Document Example:

curl -X POST "http://127.0.0.1:8000/upload_document" \
-H "Content-Type: multipart/form-data" \
-F "file=@my_document.txt"

Project Structure

Click to expand

├── api
│   └── main.py
├── data
├── uploads
├── src
│   ├── document_processor.py
│   ├── generator.py
│   ├── rag_pipeline.py
│   ├── reranker.py
│   └── vector_store.py
├── venv
├── GEMINI.md
├── requirements.txt
├── TODO.txt
└── README.md

Roadmap

Add support for more document types.
Implement a more advanced reranking model.
Add user authentication.
Create a web interface for easier interaction.

See the open issues for a full list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE.txt for more information.

Contact

Project Link: https://github.com/RaykeshR/RAG

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
api		api
img		img
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
GEMINI.md		GEMINI.md
README.md		README.md
TODO.txt		TODO.txt
clean_db.py		clean_db.py
create_sample_csv.py		create_sample_csv.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Pipeline for Open Food Facts

Table of Contents

About The Project

Open Food Facts

Data

Built With

Getting Started

Prerequisites

Un package manquant :

1. Cloner le Repo

2. `python -m venv .venv`

3. `.venv\Scripts\activate`

4. `python -m pip install --upgrade pip`

5. `python -m pip install -r requirements.txt`

6. Modifier .git\info\exclude

7. Lancer le serveur FastAPI

Linux/Mac :

Download Mistral Model

Usage

API Endpoints

Project Structure

Roadmap

Contributing

License

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Pipeline for Open Food Facts

Table of Contents

About The Project

Open Food Facts

Data

Built With

Getting Started

Prerequisites

Un package manquant :

1. Cloner le Repo

2. python -m venv .venv

3. .venv\Scripts\activate

4. python -m pip install --upgrade pip

5. python -m pip install -r requirements.txt

6. Modifier .git\info\exclude

7. Lancer le serveur FastAPI

Linux/Mac :

Download Mistral Model

Usage

API Endpoints

Project Structure

Roadmap

Contributing

License

Contact

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. `python -m venv .venv`

3. `.venv\Scripts\activate`

4. `python -m pip install --upgrade pip`

5. `python -m pip install -r requirements.txt`

Packages