# Table of contents
1. [Introduction](#introduction)
2. [Data Ingestion](#data-ingestion)
3. [Data Extraction](#data-extraction)
4. [Display the output](#display-the-output)
5. [Display the output in a table](#display-the-output-in-a-table)
6. [Conclusion](#conclusion)

## Introduction
In this notebook, we will be extracting the data from the given PDF and display the output in a table format. We will be using the Gemini API to extract the data from the PDF.

## Data Ingestion
We will upload the PDF file to the notebook and then transform the pdf into base64 utf-8 format.

In [1]:
from components.data_ingestion import DataIngestion
from configs import ROOT_DIR

file_path = ROOT_DIR / 'data' / 'invoices' / 'PO 166939 - 204865    Summary and Detail Report.pdf'
data_ingestion = DataIngestion()
data = data_ingestion.transform(file_path)

## Data Extraction
We will use the Gemini API to extract the data from the PDF.

In [2]:
from components.model import OCR_Model
ocr_model = OCR_Model(model= 'gemini-2.0-flash-exp')
invoice = ocr_model.extract(data)

## Display the output
We will display the extracted data in the form of JSON.

In [3]:
ocr_model.display(invoice)

{'reportDetails': {'supplierCode': 'GUIM€',
  'supplier': 'Guimera Fruits',
  'countryOfOrigin': 'Spain',
  'category': 'Stone Fruit',
  'department': '7-ISS Linton',
  'customerDPSPO': '166939',
  'issPO': '204865',
  'vehicleNo': None,
  'vessel': None,
  'haulier': None,
  'temperature': {'min': 4.2, 'max': 5.1, 'avg': 4.58},
  'recorder': 'NO',
  'expectedETA': '03/09/2024 00:00:00',
  'received': '03/09/2024 09:59:25',
  'inspectionDate': '04/09/2024 06:35:59',
  'printDate': '04/09/2024'},
 'productDetails': [{'issPalletId': '7761770',
   'supplierPalletId': '5808239',
   'ggnNumber': None,
   'harvestDate': '30/08/2024',
   'displayUntil': '10/09/2024',
   'customer': 'Tesco',
   'packhouseOrganic': 'NO',
   'plu': None,
   'minor': 3.3,
   'major': 1.67,
   'waste': 0.0,
   'qaComments': 'Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.',
   'brix': 16.04,
   'pressure': 0.0,
   'maturity': '5.0%',
   'totalDefects': '5.0%',


## Display the output in a table
We will display the extracted data in the form of a table.

In [4]:
ocr_model.display(invoice, html=True)

issPalletId,supplierPalletId,ggnNumber,harvestDate,displayUntil,customer,packhouseOrganic,plu,minor,major,waste,qaComments,brix,pressure,maturity,totalDefects,estimatedYield,rag,expectedQty,receivedQty
7761770,5808239,,30/08/2024,10/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.,16.04,0.0,5.0%,5.0%,100%,BLUE,80.0,80.0
7763099,7763099,,,10/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.,16.14,0.0,5.0%,5.0%,100%,BLUE,80.0,80.0
7763100,7763100,,,10/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,14.76,0.0,5.0%,5.0%,100%,BLUE,80.0,80.0
7763101,7763101,,,10/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.,16.59,0.0,5.0%,5.0%,100%,BLUE,80.0,80.0
7763102,7763102,,,10/09/2024,Tesco,NO,,5.0,3.33,2.5,Waste. Bruising. Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,13.19,0.0,10.8%,10.83%,94%,AMBER,80.0,80.0
7763103,7763103,,,10/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Underweight punnets.,15.24,0.0,5.0%,5.0%,100%,BLUE,80.0,80.0
7763104,7763104,,,10/09/2024,Tesco,NO,,5.0,3.33,2.5,Waste. Bruising. Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,15.2,0.0,10.8%,10.83%,94%,AMBER,80.0,80.0
7761771,5808238,,30/08/2024,11/09/2024,Tesco,NO,,5.0,3.33,2.5,Waste. Bruising. Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,19.29,0.0,10.8%,10.83%,94%,AMBER,80.0,80.0
7763105,7763105,,,11/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,13.85,0.0,5.0%,5.0%,100%,BLUE,80.0,80.0
7763106,7763106,,,11/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,16.56,0.0,5.0%,5.0%,100%,BLUE,80.0,80.0

0,1
supplierCode,GUIM€
supplier,Guimera Fruits
countryOfOrigin,Spain
category,Stone Fruit
department,7-ISS Linton
customerDPSPO,166939
issPO,204865
vehicleNo,
vessel,
haulier,

0,1
min,4.2
max,5.1
avg,4.58

issPalletId,supplierPalletId,ggnNumber,harvestDate,displayUntil,customer,packhouseOrganic,plu,minor,major,waste,qaComments,brix,pressure,maturity,totalDefects,estimatedYield,rag,expectedQty,receivedQty
7761770,5808239,,30/08/2024,10/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.,16.04,0.0,5.0%,5.0%,100%,BLUE,80,80
7763099,7763099,,,10/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.,16.14,0.0,5.0%,5.0%,100%,BLUE,80,80
7763100,7763100,,,10/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,14.76,0.0,5.0%,5.0%,100%,BLUE,80,80
7763101,7763101,,,10/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.,16.59,0.0,5.0%,5.0%,100%,BLUE,80,80
7763102,7763102,,,10/09/2024,Tesco,NO,,5.0,3.33,2.5,Waste. Bruising. Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,13.19,0.0,10.8%,10.83%,94%,AMBER,80,80
7763103,7763103,,,10/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Underweight punnets.,15.24,0.0,5.0%,5.0%,100%,BLUE,80,80
7763104,7763104,,,10/09/2024,Tesco,NO,,5.0,3.33,2.5,Waste. Bruising. Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,15.2,0.0,10.8%,10.83%,94%,AMBER,80,80
7761771,5808238,,30/08/2024,11/09/2024,Tesco,NO,,5.0,3.33,2.5,Waste. Bruising. Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,19.29,0.0,10.8%,10.83%,94%,AMBER,80,80
7763105,7763105,,,11/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,13.85,0.0,5.0%,5.0%,100%,BLUE,80,80
7763106,7763106,,,11/09/2024,Tesco,NO,,3.3,1.67,0.0,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,16.56,0.0,5.0%,5.0%,100%,BLUE,80,80

0,1
freshnessTechnology,
punnetPadType,N/A/
outer,
brand,CORE
organic,NO
plu,
doesPalletMeetSpec,YES
endCustomer,Tesco
dp,7
packhouse,

0,1
avg,
min,
max,
underweight,0%

0,1
avg,44

0,1
avg,
min,
max,

0,1
avg,0.00
min,0
max,0
undersize,0.00%

0,1
st1,0%
st2,0%
st3,0%
st4,0%
st5,0%

0,1
avg,0.0
min,0.0
max,0.0


## Conclusion
We have successfully extracted the data from the PDF and displayed the output in a table format.