🧾 Invoice Parser

This project is a visual invoice parser powered by DeepSeek-VL2-tiny, a multimodal vision-language model. It extracts structured data from invoice or receipt images and outputs a clean .csv file containing relevant information like vendor, invoice number, date, line items, and totals.

✨ Features

Accepts invoice images (.jpg, .png, etc.)
Automatically extracts:
- Vendor name
- Invoice number
- Invoice date
- Purchase order or job number
- Itemized descriptions, quantities, and prices
- Subtotal, tax, and total
Exports structured .csv output
Offers both a Streamlit UI and script-based batch processing

🚀 How to Run This Project

1. Clone the repository

git clone https://github.com/YOUR_USERNAME/invoice-parser.git
cd invoice-parser

2. Create a virtual environment (optional but recommended)

# Windows
python -m venv venv
venv\Scripts\activate

# macOS/Linux
python3 -m venv venv
source venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

⚠️ Note: You'll also need to have PyTorch installed with CUDA if you want GPU acceleration. You can get the correct install command from: https://pytorch.org/get-started/locally/

4. Download the DeepSeek-VL2-tiny model

You can either:

A. Pull the model automatically via Hugging Face (preferred):

Ensure main.py or your model loader uses:

model = AutoModelForVision2Seq.from_pretrained("deepseek-ai/DeepSeek-VL2", trust_remote_code=True)

B. Or place the downloaded model files in the `models/` directory if working offline.

🧪 Option 1: Try It With Streamlit (Visual Demo)

streamlit run app.py

Then open your browser to the link shown in the terminal, usually:
http://localhost:8501

What to do:

Upload invoice images via the UI
See extracted data live
Download results as a .csv

⚙️ Option 2: Run From Command Line (Batch Mode)

1. Place your invoice images in the folder `invoices_as_images/`

invoices_as_images/
├── invoice1.jpg
├── invoice2.png
└── ...

2. Run the processing script:

python main.py

3. Output will be saved to:

output.csv

🖥️ Minimum System Requirements

Python 3.9+
8–16 GB VRAM 16–32 GB RAM
Optional: NVIDIA GPU with CUDA support for faster inference
Disk space: Model files are several hundred MB in size

📦 Files to Know

File	Purpose
`app.py`	Streamlit frontend
`main.py`	Backend script for bulk image processing
`csv_writer.py`	Helper for writing extracted data to CSV
`reader.py`	Model logic for parsing invoice images
`parse_output.py`	Extract structured info from model output

🤝 Contributions

Feel free to fork the repo, open issues, or suggest enhancements via pull requests!

📜 License

MIT License.

by Modin Wang

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
all-files.txt		all-files.txt
app.py		app.py
csv_writer.py		csv_writer.py
image_processing.py		image_processing.py
image_test.py		image_test.py
main.py		main.py
parse_output.py		parse_output.py
reader.py		reader.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧾 Invoice Parser

✨ Features

🚀 How to Run This Project

1. Clone the repository

2. Create a virtual environment (optional but recommended)

3. Install dependencies

4. Download the DeepSeek-VL2-tiny model

A. Pull the model automatically via Hugging Face (preferred):

B. Or place the downloaded model files in the `models/` directory if working offline.

🧪 Option 1: Try It With Streamlit (Visual Demo)

What to do:

⚙️ Option 2: Run From Command Line (Batch Mode)

1. Place your invoice images in the folder `invoices_as_images/`

2. Run the processing script:

3. Output will be saved to:

🖥️ Minimum System Requirements

📦 Files to Know

🤝 Contributions

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ModinWang1/invoice-parser

Folders and files

Latest commit

History

Repository files navigation

🧾 Invoice Parser

✨ Features

🚀 How to Run This Project

1. Clone the repository

2. Create a virtual environment (optional but recommended)

3. Install dependencies

4. Download the DeepSeek-VL2-tiny model

A. Pull the model automatically via Hugging Face (preferred):

B. Or place the downloaded model files in the models/ directory if working offline.

🧪 Option 1: Try It With Streamlit (Visual Demo)

What to do:

⚙️ Option 2: Run From Command Line (Batch Mode)

1. Place your invoice images in the folder invoices_as_images/

2. Run the processing script:

3. Output will be saved to:

🖥️ Minimum System Requirements

📦 Files to Know

🤝 Contributions

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

B. Or place the downloaded model files in the `models/` directory if working offline.

1. Place your invoice images in the folder `invoices_as_images/`

Packages