# Table of contents
1. [Introduction](#introduction)
2. [Data Ingestion](#data-ingestion)
3. [Data Extraction](#data-extraction)
4. [Display the output](#display-the-output)
5. [Display the output in a table](#display-the-output-in-a-table)
6. [Conclusion](#conclusion)

## Introduction
In this notebook, we will be extracting the data from the given PDF and display the output in a table format. We will be using the Gemini API to extract the data from the PDF.

## Data Ingestion
We will upload the PDF file to the notebook and then transform the pdf into base64 utf-8 format.

In [1]:
from components.data_ingestion import DataIngestion
from configs import ROOT_DIR
import os

invoice_dir = ROOT_DIR / 'data' / 'invoices'
invoice_files = os.listdir(invoice_dir)

file_path = invoice_dir / invoice_files[-5]
print(file_path)
data_ingestion = DataIngestion()
data = data_ingestion.transform(file_path)

c:\Users\ravi.kumar\github\OCR\data\invoices\skonica3-4.pdf


## Data Extraction
We will use the Gemini API to extract the data from the PDF.

In [2]:
from components.model import OCR_Model
ocr_model = OCR_Model()
invoice = ocr_model.extract(data)

## Display the output
We will display the extracted data in the form of JSON.

In [3]:
ocr_model.display(invoice)

{'line_items': [{'ItemPosition': 1,
   'ProductCode': '11001533',
   'Description': 'RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP',
   'Quantity': 5886.0,
   'UnitPrice': 12.5,
   'ItemVatRate': 0.0,
   'TotalAmount': 18393.75},
  {'ItemPosition': 2,
   'ProductCode': '11001533',
   'Description': 'RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP',
   'Quantity': 2340.0,
   'UnitPrice': 12.5,
   'ItemVatRate': 0.0,
   'TotalAmount': 7312.5},
  {'ItemPosition': 3,
   'ProductCode': '11001533',
   'Description': 'RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP',
   'Quantity': 3060.0,
   'UnitPrice': 12.5,
   'ItemVatRate': 0.0,
   'TotalAmount': 9562.5},
  {'ItemPosition': 4,
   'ProductCode': '11001533',
   'Description': 'RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP',
   'Quantity': 3564.0,
   'UnitPrice': 12.5,
   'ItemVatRate': 0.0,
   'TotalAmount': 11137.5},
  {'ItemPosition': 5,
   'ProductCode': '11001533',
   'Description': 'RASPBERRY FRA CAR 18X250 GEN VD DES K62H

## Display the output in a table
We will display the extracted data in the form of a table.

In [4]:
ocr_model.display(invoice, html=True)

ItemPosition,ProductCode,Description,Quantity,UnitPrice,ItemVatRate,TotalAmount
1,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,5886.0,12.5,0.0,18393.75
2,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,2340.0,12.5,0.0,7312.5
3,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,3060.0,12.5,0.0,9562.5
4,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,3564.0,12.5,0.0,11137.5
5,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,7146.0,12.5,0.0,22331.25
6,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,1224.0,12.5,0.0,3825.0
7,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,6120.0,12.5,0.0,19125.0
8,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,6120.0,12.5,0.0,19125.0
9,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,3060.0,12.5,0.0,9562.5
10,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,1530.0,12.5,0.0,4781.25

ItemPosition,ProductCode,Description,Quantity,UnitPrice,ItemVatRate,TotalAmount
1,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,5886.0,12.5,0.0,18393.75
2,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,2340.0,12.5,0.0,7312.5
3,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,3060.0,12.5,0.0,9562.5
4,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,3564.0,12.5,0.0,11137.5
5,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,7146.0,12.5,0.0,22331.25
6,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,1224.0,12.5,0.0,3825.0
7,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,6120.0,12.5,0.0,19125.0
8,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,6120.0,12.5,0.0,19125.0
9,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,3060.0,12.5,0.0,9562.5
10,11001533,RASPBERRY FRA CAR 18X250 GEN VD DES K62H40 NG PAP,1530.0,12.5,0.0,4781.25

0,1
suppName,Surexport UK LTD
invNo,500019353
invDate,2023-09-11
orderNo,141800 141801 141802 141803 141804 141805 141799 141966 141974 142010 141929 142097 142198 142200
custName,DPSLTD
custAddress,57-63 CHURCH ROAD SW19 5SB WIMBLEDON United Kingdom
amountNet,223481.25
amountVat,0.0
amountTotal,223481.25
currency,EUR


## Conclusion
We have successfully extracted the data from the PDF and displayed the output in a table format.