This project is a visual invoice parser powered by DeepSeek-VL2-tiny, a multimodal vision-language model. It extracts structured data from invoice or receipt images and outputs a clean .csv file containing relevant information like vendor, invoice number, date, line items, and totals.
- Accepts invoice images (
.jpg,.png, etc.) - Automatically extracts:
- Vendor name
- Invoice number
- Invoice date
- Purchase order or job number
- Itemized descriptions, quantities, and prices
- Subtotal, tax, and total
- Exports structured
.csvoutput - Offers both a Streamlit UI and script-based batch processing
git clone https://github.com/YOUR_USERNAME/invoice-parser.git
cd invoice-parser# Windows
python -m venv venv
venv\Scripts\activate
# macOS/Linux
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txt
⚠️ Note: You'll also need to have PyTorch installed with CUDA if you want GPU acceleration. You can get the correct install command from: https://pytorch.org/get-started/locally/
You can either:
Ensure main.py or your model loader uses:
model = AutoModelForVision2Seq.from_pretrained("deepseek-ai/DeepSeek-VL2", trust_remote_code=True)streamlit run app.pyThen open your browser to the link shown in the terminal, usually:
http://localhost:8501
- Upload invoice images via the UI
- See extracted data live
- Download results as a
.csv
invoices_as_images/
├── invoice1.jpg
├── invoice2.png
└── ...python main.pyoutput.csv- Python 3.9+
- 8–16 GB VRAM 16–32 GB RAM
- Optional: NVIDIA GPU with CUDA support for faster inference
- Disk space: Model files are several hundred MB in size
| File | Purpose |
|---|---|
app.py |
Streamlit frontend |
main.py |
Backend script for bulk image processing |
csv_writer.py |
Helper for writing extracted data to CSV |
reader.py |
Model logic for parsing invoice images |
parse_output.py |
Extract structured info from model output |
Feel free to fork the repo, open issues, or suggest enhancements via pull requests!
MIT License.
by Modin Wang