# üè¶ Capstone Project: AI Invoice Analyzer

AI Invoice Analyzer is an end-to-end intelligent document processing system that automatically extracts, validates, and structures key financial information from invoice images and PDFs.

The system combines OCR, a locally hosted LLM (Ollama), and rule-based validation logic to transform unstructured invoices into clean, reliable JSON outputs ‚Äî all through an interactive Gradio web interface.


üöÄ **Key Features**

üìÑ Supports invoice images (JPG, PNG) and PDF files

üîç OCR-based text extraction using Tesseract

ü§ñ Local LLM extraction using Ollama (LLaMA 3.2)

üß† Intelligent prompt-based field extraction

‚úÖ Business rule validation:

Total ‚âà Subtotal + Tax

Due date > Invoice date

Invoice number format checks

üìä Confidence scoring and metadata

üåê Interactive Gradio web interface

üß© Modular, extensible architecture

**Demo-video link:** https://drive.google.com/file/d/1sXG0nK2BgfmmM2CFF-Ne70hKstxMJmzv/view?usp=sharing

SECTION 1: INSTALLATION & SETUP

In [14]:
# Installing required packages
print("üì¶ Installing system dependencies...")
!apt-get update -qq
!apt-get install -y tesseract-ocr poppler-utils

print("\nüì¶ Installing Ollama...")
!curl -fsSL https://ollama.com/install.sh | sh

print("\nüì¶ Installing Python packages...")
!pip install -q pytesseract pdf2image Pillow gradio requests

print("\n‚úÖ All dependencies installed successfully!")

üì¶ Installing system dependencies...
W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
tesseract-ocr is already the newest version (4.1.1-2.1build1).
poppler-utils is already the newest version (22.02.0-2ubuntu0.12).
0 upgraded, 0 newly installed, 0 to remove and 54 not upgraded.

üì¶ Installing Ollama...
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.

üì¶ Install

In [15]:
# Importing libraries
import os
import re
import json
import time
import subprocess
import requests
from io import BytesIO
from datetime import datetime
from typing import Dict, List, Optional, Tuple
import warnings
warnings.filterwarnings('ignore')

# Image processing
from PIL import Image
import pytesseract

# PDF handling
from pdf2image import convert_from_path, convert_from_bytes

# UI
import gradio as gr

SECTION 2: OLLAMA SETUP & MANAGEMENT

In [16]:
class OllamaManager:
    """Manage Ollama service and model"""

    def __init__(self, model_name: str = "llama3.2:3b"):
        """Initialize Ollama manager with specified model"""
        self.model_name = model_name
        self.base_url = "http://localhost:11434"
        self.process = None

    def start_service(self):
        """Start Ollama service in background"""
        try:
            # First, kill any existing Ollama processes
            subprocess.run(["pkill", "ollama"], capture_output=True)
            time.sleep(1)

            print(f"üöÄ Starting Ollama service...")
            # Start ollama serve with output redirection
            log_file = open('/tmp/ollama.log', 'w')
            self.process = subprocess.Popen(
                ["ollama", "serve"],
                stdout=log_file,
                stderr=subprocess.STDOUT,
                preexec_fn=os.setpgrp  # Prevent process from being killed with parent
            )
            time.sleep(5)  # Wait longer for service to start

            # Verify it's actually running
            if self.test_connection():
                print("‚úÖ Ollama service started and responding")
            else:
                print("‚ö†Ô∏è  Ollama started but not responding yet")

        except Exception as e:
            print(f"‚ö†Ô∏è  Error starting Ollama: {e}")

    def pull_model(self):
        """Pull the specified model if not available"""
        try:
            print(f"üì• Pulling model: {self.model_name}")
            print("‚è≥ This may take 2-3 minutes on first run...")

            result = subprocess.run(
                ["ollama", "pull", self.model_name],
                capture_output=True,
                text=True,
                timeout=300
            )

            if result.returncode == 0:
                print(f"‚úÖ Model {self.model_name} ready!")
            else:
                print(f"‚ö†Ô∏è  Warning: {result.stderr}")

        except subprocess.TimeoutExpired:
            print("‚ö†Ô∏è  Model pull timeout - may need to retry")
        except Exception as e:
            print(f"‚ö†Ô∏è  Error pulling model: {e}")

    def test_connection(self) -> bool:
        """Test if Ollama service is running"""
        try:
            response = requests.get(f"{self.base_url}/api/tags", timeout=5)
            return response.status_code == 200
        except:
            return False

    def generate(self, prompt: str, system: str = "") -> str:
        """Generate response using Ollama API"""
        try:
            url = f"{self.base_url}/api/generate"

            payload = {
                "model": self.model_name,
                "prompt": prompt,
                "system": system,
                "stream": False,
                "options": {
                    "temperature": 0.1,  # Low temperature for consistent extraction
                    "top_p": 0.9,
                }
            }

            response = requests.post(url, json=payload, timeout=60)
            response.raise_for_status()

            result = response.json()
            return result.get("response", "")

        except requests.exceptions.Timeout:
            raise ValueError("Ollama request timeout - try a smaller model")
        except Exception as e:
            raise ValueError(f"Ollama generation failed: {str(e)}")

    def cleanup(self):
        """Stop Ollama service"""
        if self.process:
            self.process.terminate()
            self.process.wait()

SECTION 3: CONFIGURATION

In [17]:
class Config:
    """Configuration management for the invoice analyzer"""

    # Model settings - Choose one:
    # "llama3.2:3b" - Fast, good for Colab (RECOMMENDED)
    # "llama3.2:1b" - Fastest, less accurate
    # "phi3:mini" - Alternative, very fast
    # "gemma2:2b" - Google's small model
    OLLAMA_MODEL = "llama3.2:3b"

    # OCR settings
    TESSERACT_CONFIG = '--oem 3 --psm 6'

    # Image processing
    MAX_IMAGE_SIZE = 2000  # Reduce for faster processing

    # Validation thresholds
    AMOUNT_TOLERANCE = 0.02  # 2% tolerance

    # Supported formats
    SUPPORTED_IMAGE_FORMATS = ['.jpg', '.jpeg', '.png']
    SUPPORTED_PDF_FORMAT = '.pdf'

SECTION 4: PREPROCESSING MODULE

In [18]:
class ImagePreprocessor:
    """Handles image preprocessing and PDF conversion"""

    @staticmethod
    def pdf_to_images(pdf_file) -> List[Image.Image]:
        """Convert PDF to list of PIL Images"""
        try:
            # Use lower DPI for faster processing
            if hasattr(pdf_file, 'name'):
                images = convert_from_path(pdf_file.name, dpi=200)
            else:
                images = convert_from_bytes(pdf_file.read(), dpi=200)
            return images
        except Exception as e:
            raise ValueError(f"PDF conversion failed: {str(e)}")

    @staticmethod
    def enhance_image(image: Image.Image) -> Image.Image:
        """Enhance image quality for better OCR"""
        # Convert to RGB if needed
        if image.mode != 'RGB':
            image = image.convert('RGB')

        # Optimize size for faster processing
        width, height = image.size
        max_size = Config.MAX_IMAGE_SIZE

        if width > max_size or height > max_size:
            # Resize to max dimension
            ratio = min(max_size / width, max_size / height)
            new_size = (int(width * ratio), int(height * ratio))
            image = image.resize(new_size, Image.Resampling.LANCZOS)

        return image

    @staticmethod
    def load_image(file_path_or_bytes) -> Image.Image:
        """Load image from file path or bytes"""
        try:
            if isinstance(file_path_or_bytes, str):
                image = Image.open(file_path_or_bytes)
            else:
                image = Image.open(file_path_or_bytes)
            return ImagePreprocessor.enhance_image(image)
        except Exception as e:
            raise ValueError(f"Image loading failed: {str(e)}")

SECTION 5: OCR MODULE

In [19]:
class OCREngine:
    """OCR extraction using Tesseract"""

    @staticmethod
    def extract_text(image: Image.Image) -> str:
        """Extract text from image using Tesseract OCR"""
        try:
            text = pytesseract.image_to_string(
                image,
                config=Config.TESSERACT_CONFIG
            )
            return text.strip()
        except Exception as e:
            raise ValueError(f"OCR extraction failed: {str(e)}")

    @staticmethod
    def extract_with_confidence(image: Image.Image) -> Tuple[str, float]:
        """Extract text with overall confidence score"""
        try:
            # Get detailed OCR data
            data = pytesseract.image_to_data(
                image,
                output_type=pytesseract.Output.DICT,
                config=Config.TESSERACT_CONFIG
            )

            # Extract text
            text = pytesseract.image_to_string(image, config=Config.TESSERACT_CONFIG)

            # Calculate average confidence
            confidences = [int(conf) for conf in data['conf'] if int(conf) > 0]
            avg_confidence = sum(confidences) / len(confidences) if confidences else 0

            return text.strip(), avg_confidence / 100.0
        except Exception as e:
            return "", 0.0

SECTION 6: LLM EXTRACTION MODULE (OLLAMA)

In [20]:
class LLMExtractor:
    """Extract structured data using Ollama"""

    def __init__(self, ollama_manager: OllamaManager):
        """Initialize with Ollama manager"""
        self.ollama = ollama_manager

    def create_extraction_prompt(self, ocr_text: str) -> str:
        """Create structured prompt for invoice data extraction"""
        # Truncate very long text to avoid timeout
        if len(ocr_text) > 2000:
            ocr_text = ocr_text[:2000] + "..."

        prompt = f"""Extract invoice information from this text and return ONLY a JSON object.

Invoice Text:
{ocr_text}

Return a JSON object with these exact fields:
- vendor_name (string or null)
- invoice_number (string or null)
- invoice_date (YYYY-MM-DD format or null)
- due_date (YYYY-MM-DD format or null)
- subtotal (number or null)
- tax (number or null)
- total (number or null)
- currency (string like USD, EUR, or null)

Rules:
1. Extract only what you find, use null if missing
2. Dates must be YYYY-MM-DD format
3. Numbers should be numeric values without currency symbols
4. Return ONLY valid JSON, no explanation or markdown

JSON:"""

        return prompt

    def extract_invoice_data(self, ocr_text: str) -> Dict:
        """Extract structured invoice data using Ollama"""
        try:
            system_prompt = "You are a precise data extraction system. Return only valid JSON with no additional text."

            prompt = self.create_extraction_prompt(ocr_text)
            response = self.ollama.generate(prompt, system=system_prompt)

            # Clean response
            json_text = response.strip()

            # Remove markdown code blocks if present
            json_text = re.sub(r'^```json\s*', '', json_text)
            json_text = re.sub(r'^```\s*', '', json_text)
            json_text = re.sub(r'\s*```$', '', json_text)

            # Try to find JSON in response
            json_match = re.search(r'\{.*\}', json_text, re.DOTALL)
            if json_match:
                json_text = json_match.group(0)

            extracted_data = json.loads(json_text)

            # Add confidence scores (Ollama doesn't provide these, so estimate)
            if "confidence_scores" not in extracted_data:
                extracted_data["confidence_scores"] = {
                    field: 0.85 if extracted_data.get(field) is not None else 0.0
                    for field in ["vendor_name", "invoice_number", "invoice_date",
                                "due_date", "subtotal", "tax", "total", "currency"]
                }

            return extracted_data

        except json.JSONDecodeError as e:
            raise ValueError(f"LLM returned invalid JSON: {str(e)}\nResponse: {response[:200]}")
        except Exception as e:
            raise ValueError(f"LLM extraction failed: {str(e)}")


SECTION 7: VALIDATION MODULE

In [21]:
class InvoiceValidator:
    """Validate extracted invoice data"""

    @staticmethod
    def validate_amount_calculation(subtotal: Optional[float],
                                   tax: Optional[float],
                                   total: Optional[float]) -> Dict:
        """Validate that Total ‚âà Subtotal + Tax"""
        if subtotal is None or tax is None or total is None:
            return {
                "valid": None,
                "message": "Missing values for calculation validation"
            }

        calculated_total = subtotal + tax
        difference = abs(calculated_total - total)
        tolerance = total * Config.AMOUNT_TOLERANCE

        is_valid = difference <= tolerance

        return {
            "valid": is_valid,
            "message": f"Calculated: {calculated_total:.2f}, Actual: {total:.2f}, Diff: {difference:.2f}",
            "calculated_total": calculated_total,
            "difference": difference
        }

    @staticmethod
    def validate_dates(invoice_date: Optional[str],
                      due_date: Optional[str]) -> Dict:
        """Validate that Due Date > Invoice Date"""
        if invoice_date is None or due_date is None:
            return {
                "valid": None,
                "message": "Missing dates for validation"
            }

        try:
            inv_date = datetime.strptime(invoice_date, "%Y-%m-%d")
            d_date = datetime.strptime(due_date, "%Y-%m-%d")

            is_valid = d_date > inv_date
            days_difference = (d_date - inv_date).days

            return {
                "valid": is_valid,
                "message": f"Due date is {days_difference} days after invoice date",
                "days_difference": days_difference
            }
        except ValueError:
            return {
                "valid": False,
                "message": "Invalid date format"
            }

    @staticmethod
    def validate_invoice_number(invoice_number: Optional[str]) -> Dict:
        """Validate invoice number format"""
        if invoice_number is None:
            return {
                "valid": None,
                "message": "Invoice number not found"
            }

        # Check if it contains at least some alphanumeric characters
        has_alphanum = bool(re.search(r'[a-zA-Z0-9]', invoice_number))

        return {
            "valid": has_alphanum,
            "message": "Valid format" if has_alphanum else "Invalid format"
        }

    @staticmethod
    def validate_all(extracted_data: Dict) -> Dict:
        """Run all validation checks"""
        validations = {
            "amount_calculation": InvoiceValidator.validate_amount_calculation(
                extracted_data.get("subtotal"),
                extracted_data.get("tax"),
                extracted_data.get("total")
            ),
            "dates": InvoiceValidator.validate_dates(
                extracted_data.get("invoice_date"),
                extracted_data.get("due_date")
            ),
            "invoice_number": InvoiceValidator.validate_invoice_number(
                extracted_data.get("invoice_number")
            )
        }

        # Overall validation status
        all_valid = all(
            v.get("valid") != False
            for v in validations.values()
        )

        return {
            "overall_valid": all_valid,
            "checks": validations
        }


SECTION 8: MAIN PIPELINE

In [22]:
class InvoiceAnalyzer:
    """Main pipeline for invoice analysis"""

    def __init__(self, ollama_manager: OllamaManager):
        """Initialize the analyzer"""
        self.preprocessor = ImagePreprocessor()
        self.ocr_engine = OCREngine()
        self.llm_extractor = LLMExtractor(ollama_manager)
        self.validator = InvoiceValidator()

    def process_invoice(self, file_input) -> Dict:
        """
        Process invoice file and return structured data

        Args:
            file_input: File path or file object (image or PDF)

        Returns:
            Dictionary with extracted data, validations, and metadata
        """
        try:
            start_time = time.time()

            # Step 1: Load and preprocess
            file_extension = os.path.splitext(file_input.name if hasattr(file_input, 'name') else file_input)[1].lower()

            if file_extension == '.pdf':
                images = self.preprocessor.pdf_to_images(file_input)
                # Process only first page for speed
                image = images[0]
            else:
                image = self.preprocessor.load_image(file_input)

            # Step 2: OCR extraction
            ocr_text, ocr_confidence = self.ocr_engine.extract_with_confidence(image)

            if not ocr_text:
                return {
                    "success": False,
                    "error": "No text could be extracted from the image",
                    "ocr_confidence": 0.0
                }

            # Step 3: LLM extraction
            extracted_data = self.llm_extractor.extract_invoice_data(ocr_text)

            # Step 4: Validation
            validation_results = self.validator.validate_all(extracted_data)

            # Step 5: Compile results
            processing_time = time.time() - start_time

            result = {
                "success": True,
                "extracted_data": extracted_data,
                "validations": validation_results,
                "metadata": {
                    "ocr_confidence": round(ocr_confidence, 2),
                    "processing_time": round(processing_time, 2),
                    "processing_timestamp": datetime.now().isoformat(),
                    "file_type": file_extension,
                    "model_used": Config.OLLAMA_MODEL
                },
                "raw_ocr_text": ocr_text[:500] + "..." if len(ocr_text) > 500 else ocr_text
            }

            return result

        except Exception as e:
            return {
                "success": False,
                "error": str(e),
                "metadata": {
                    "processing_timestamp": datetime.now().isoformat()
                }
            }


SECTION 9: GRADIO INTERFACE

In [23]:
def create_demo_interface(analyzer: InvoiceAnalyzer):
    """Create Gradio interface for the invoice analyzer"""

    def process_and_display(file):
        """Process file and format output for display"""
        if file is None:
            return "Please upload an invoice file.", "{}"

        result = analyzer.process_invoice(file)

        if not result["success"]:
            return f"‚ùå Error: {result.get('error', 'Unknown error')}", json.dumps(result, indent=2)

        # Format human-readable summary
        data = result["extracted_data"]
        val = result["validations"]
        meta = result["metadata"]

        summary = f"""
‚úÖ **Invoice Analysis Complete**

‚è±Ô∏è **Processing Time:** {meta['processing_time']}s
ü§ñ **Model:** {meta['model_used']}

üìã **Extracted Information:**
- **Vendor:** {data.get('vendor_name', 'Not found')}
- **Invoice Number:** {data.get('invoice_number', 'Not found')}
- **Invoice Date:** {data.get('invoice_date', 'Not found')}
- **Due Date:** {data.get('due_date', 'Not found')}
- **Subtotal:** {data.get('subtotal', 'Not found')} {data.get('currency', '')}
- **Tax:** {data.get('tax', 'Not found')} {data.get('currency', '')}
- **Total:** {data.get('total', 'Not found')} {data.get('currency', '')}

üîç **Validation Results:**
- **Amount Calculation:** {'‚úÖ Valid' if val['checks']['amount_calculation']['valid'] else '‚ùå Invalid' if val['checks']['amount_calculation']['valid'] is False else '‚ö†Ô∏è Cannot validate'}
  {val['checks']['amount_calculation']['message']}
- **Date Logic:** {'‚úÖ Valid' if val['checks']['dates']['valid'] else '‚ùå Invalid' if val['checks']['dates']['valid'] is False else '‚ö†Ô∏è Cannot validate'}
  {val['checks']['dates']['message']}
- **Invoice Number:** {'‚úÖ Valid' if val['checks']['invoice_number']['valid'] else '‚ùå Invalid' if val['checks']['invoice_number']['valid'] is False else '‚ö†Ô∏è Cannot validate'}

üìä **Quality Metrics:**
- **OCR Confidence:** {meta['ocr_confidence']*100:.1f}%
- **Overall Validation:** {'‚úÖ PASSED' if val['overall_valid'] else '‚ö†Ô∏è NEEDS REVIEW'}
        """

        # Return both summary and full JSON
        return summary.strip(), json.dumps(result, indent=2)

    # Create Gradio interface
    interface = gr.Interface(
        fn=process_and_display,
        inputs=gr.File(
            label="Upload Invoice (JPG, PNG, or PDF)",
            file_types=[".jpg", ".jpeg", ".png", ".pdf"]
        ),
        outputs=[
            gr.Textbox(label="Analysis Summary", lines=25),
            gr.Textbox(label="Full JSON Output", lines=15)
        ],
        title="üßæ AI Invoice Analyzer (Ollama-Powered)",
        description="""
        Upload an invoice image or PDF to extract structured data automatically.

        **Powered by:** Ollama (Local LLM) + Tesseract OCR

        **Model:** """ + Config.OLLAMA_MODEL + """

        **Features:** OCR extraction, AI structuring, automatic validation

        ‚ö° **Note:** First request may take 30-60s as model loads into memory
        """,
        examples=None,
        theme=gr.themes.Soft()
    )

    return interface


SECTION 10: MAIN EXECUTION

In [24]:
def main():
    """Main execution function"""

    print("üöÄ AI Invoice Analyzer with Ollama - Initializing...")
    print("=" * 60)

    # Initialize Ollama
    ollama_manager = OllamaManager(model_name=Config.OLLAMA_MODEL)

    # Start Ollama service
    ollama_manager.start_service()

    # Wait for service to be ready
    print("‚è≥ Waiting for Ollama service...")
    max_retries = 15
    for i in range(max_retries):
        if ollama_manager.test_connection():
            print("‚úÖ Ollama service is ready")
            break
        time.sleep(2)
        print(f"  Retry {i+1}/{max_retries}...")
    else:
        print("‚ùå Failed to start Ollama service")
        print("\nManual fix: Run these commands in a new cell:")
        print("!pkill ollama")
        print("!ollama serve > /tmp/ollama.log 2>&1 &")
        print("Then re-run this cell")
        return

    # Pull model
    ollama_manager.pull_model()

    # Verify service is still running after model pull
    print("\nüîç Verifying Ollama service...")
    if not ollama_manager.test_connection():
        print("‚ùå Ollama service stopped after model pull")
        print("Restarting service...")
        ollama_manager.start_service()
        time.sleep(3)

    print("\n" + "=" * 60)
    print("‚úÖ System Configuration:")
    print(f"  ‚Ä¢ OCR Engine: Tesseract")
    print(f"  ‚Ä¢ LLM: Ollama ({Config.OLLAMA_MODEL})")
    print(f"  ‚Ä¢ Max Image Size: {Config.MAX_IMAGE_SIZE}px")
    print(f"  ‚Ä¢ Service Status: {'üü¢ Running' if ollama_manager.test_connection() else 'üî¥ Stopped'}")
    print("=" * 60)

    # Initialize analyzer
    try:
        analyzer = InvoiceAnalyzer(ollama_manager)
        print("\nüåê Launching web interface...")
        print("‚ö†Ô∏è  IMPORTANT: Keep Ollama service running")
        print("   Do not run cleanup or restart until done testing")

        interface = create_demo_interface(analyzer)

        # Launch without blocking to keep service alive
        interface.launch(share=True, debug=False, prevent_thread_lock=False)

        print("\n‚úÖ Interface launched successfully!")
        print("üìù Note: Ollama service is running in background")
        print("   To stop it later, run: !pkill ollama")

    except Exception as e:
        print(f"‚ùå Error: {str(e)}")
        print("\nTroubleshooting:")
        print("1. Check Ollama status: !ps aux | grep ollama")
        print("2. Check Ollama logs: !tail /tmp/ollama.log")
        print("3. Restart service: !pkill ollama && ollama serve &")

In [25]:
# EXECUTION
if __name__ == "__main__":
    main()

üöÄ AI Invoice Analyzer with Ollama - Initializing...
üöÄ Starting Ollama service...
‚úÖ Ollama service started and responding
‚è≥ Waiting for Ollama service...
‚úÖ Ollama service is ready
üì• Pulling model: llama3.2:3b
‚è≥ This may take 2-3 minutes on first run...
‚úÖ Model llama3.2:3b ready!

üîç Verifying Ollama service...

‚úÖ System Configuration:
  ‚Ä¢ OCR Engine: Tesseract
  ‚Ä¢ LLM: Ollama (llama3.2:3b)
  ‚Ä¢ Max Image Size: 2000px
  ‚Ä¢ Service Status: üü¢ Running

üåê Launching web interface...
‚ö†Ô∏è  IMPORTANT: Keep Ollama service running
   Do not run cleanup or restart until done testing
Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://b44a29be5bf085090d.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)



‚úÖ Interface launched successfully!
üìù Note: Ollama service is running in background
   To stop it later, run: !pkill ollama


In [26]:
# Quick test without Gradio interface (In case interface fails):

ollama_mgr = OllamaManager("llama3.2:3b")
ollama_mgr.start_service()
time.sleep(3)
ollama_mgr.pull_model()

analyzer = InvoiceAnalyzer(ollama_mgr)

# Test with an image
result = analyzer.process_invoice("/content/invoice-sample.PNG")
print(json.dumps(result, indent=2))

üöÄ Starting Ollama service...
‚úÖ Ollama service started and responding
üì• Pulling model: llama3.2:3b
‚è≥ This may take 2-3 minutes on first run...
‚úÖ Model llama3.2:3b ready!
{
  "success": true,
  "extracted_data": {
    "vendor_name": "Aesthetic Holiday",
    "invoice_number": "12345",
    "invoice_date": "2070-10-15",
    "due_date": "2070-10-30",
    "subtotal": null,
    "tax": null,
    "total": 250.0,
    "currency": "USD",
    "confidence_scores": {
      "vendor_name": 0.85,
      "invoice_number": 0.85,
      "invoice_date": 0.85,
      "due_date": 0.85,
      "subtotal": 0.0,
      "tax": 0.0,
      "total": 0.85,
      "currency": 0.85
    }
  },
  "validations": {
    "overall_valid": true,
    "checks": {
      "amount_calculation": {
        "valid": null,
        "message": "Missing values for calculation validation"
      },
      "dates": {
        "valid": true,
        "message": "Due date is 15 days after invoice date",
        "days_difference": 15
      },
