# Table of contents
1. [Introduction](#introduction)
2. [Data Ingestion](#data-ingestion)
3. [Data Extraction](#data-extraction)
4. [Display the output](#display-the-output)
5. [Display the output in a table](#display-the-output-in-a-table)
6. [Conclusion](#conclusion)

## Introduction
In this notebook, we will be extracting the data from the given PDF and display the output in a table format. We will be using the Gemini API to extract the data from the PDF.

## Data Ingestion
We will upload the PDF file to the notebook and then transform the pdf into base64 utf-8 format.

In [1]:
from components.data_ingestion import DataIngestion
from configs import ROOT_DIR

file_path = ROOT_DIR / 'data' / 'invoices' / 'invoice.pdf'
data_ingestion = DataIngestion()
data = data_ingestion.transform(file_path)

## Data Extraction
We will use the Gemini API to extract the data from the PDF.

In [2]:
from components.model import OCR_Model
from  configs import MODEL
ocr_model = OCR_Model(model= MODEL)
invoice = ocr_model.extract(data)

## Display the output
We will display the extracted data in the form of JSON.

In [3]:
ocr_model.display(invoice)

{'line_items': [{'product_code': '13.0802HK',
   'description': 'RHP Legro Blackberry/Raspberry\nOptima PLUS',
   'quantity': '111,80 ENm3',
   'price_per_unit': '57,75',
   'vat_percent': None,
   'total_price': '6.456,45'},
  {'product_code': '19.1001N',
   'description': 'Pack up in big-bale',
   'quantity': '26 Pieces',
   'price_per_unit': '18,00',
   'vat_percent': None,
   'total_price': '468,00'},
  {'product_code': '48.3000',
   'description': 'Pallet 100x120 HT Export',
   'quantity': '26 Pieces',
   'price_per_unit': '11,00',
   'vat_percent': None,
   'total_price': '286,00'},
  {'product_code': '19.2002',
   'description': 'Customs charges',
   'quantity': '1',
   'price_per_unit': '54,00',
   'vat_percent': None,
   'total_price': '54,00'},
  {'product_code': '19.2015',
   'description': 'Transportcost Pallets FTL',
   'quantity': '1',
   'price_per_unit': '1.973,60',
   'vat_percent': None,
   'total_price': '1.973,60'}],
 'total_amount': {'total_items': 5,
  'total_tax'

## Display the output in a table
We will display the extracted data in the form of a table.

In [4]:
ocr_model.display(invoice, html=True)

product_code,description,quantity,price_per_unit,vat_percent,total_price
rate,amount,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
13.0802HK,RHP Legro Blackberry/Raspberry Optima PLUS,"111,80 ENm3",5775,,"6.456,45"
19.1001N,Pack up in big-bale,26 Pieces,1800,,46800
48.3000,Pallet 100x120 HT Export,26 Pieces,1100,,28600
19.2002,Customs charges,1,5400,,5400
19.2015,Transportcost Pallets FTL,1,"1.973,60",,"1.973,60"
000,000,,,,
line_items,"product_codedescriptionquantityprice_per_unitvat_percenttotal_price13.0802HKRHP Legro Blackberry/Raspberry Optima PLUS111,80 ENm357,75None6.456,4519.1001NPack up in big-bale26 Pieces18,00None468,0048.3000Pallet 100x120 HT Export26 Pieces11,00None286,0019.2002Customs charges154,00None54,0019.2015Transportcost Pallets FTL11.973,60None1.973,60",,,,
total_amount,"total_items5total_tax0,00total_price9.238,05",,,,
due_date,2023-12-09,,,,
payment_date,,,,,

product_code,description,quantity,price_per_unit,vat_percent,total_price
13.0802HK,RHP Legro Blackberry/Raspberry Optima PLUS,"111,80 ENm3",5775,,"6.456,45"
19.1001N,Pack up in big-bale,26 Pieces,1800,,46800
48.3000,Pallet 100x120 HT Export,26 Pieces,1100,,28600
19.2002,Customs charges,1,5400,,5400
19.2015,Transportcost Pallets FTL,1,"1.973,60",,"1.973,60"

0,1
total_items,5
total_tax,000
total_price,"9.238,05"

0,1
iban,NL48 INGB 0001 6905 45
swift,
bic,INGBNL2A
account_number,

rate,amount
0,0

0,1
vat_number,GB125476511


## Conclusion
We have successfully extracted the data from the PDF and displayed the output in a table format.