# Table of contents
1. [Introduction](#introduction)
2. [Data Ingestion](#data-ingestion)
3. [Data Extraction](#data-extraction)
4. [Display the output](#display-the-output)
5. [Display the output in a table](#display-the-output-in-a-table)
6. [Conclusion](#conclusion)

## Introduction
In this notebook, we will be extracting the data from the given PDF and display the output in a table format. We will be using the Gemini API to extract the data from the PDF.

## Data Ingestion
We will upload the PDF file to the notebook and then transform the pdf into base64 utf-8 format.

In [2]:
from components.data_ingestion import DataIngestion
from configs import ROOT_DIR

file_path = ROOT_DIR / 'data' / 'invoices' / 'PO 166939 - 204865    Summary and Detail Report.pdf'
data_ingestion = DataIngestion()
data = data_ingestion.transform(file_path)

## Data Extraction
We will use the Gemini API to extract the data from the PDF.

In [3]:
from components.model import OCR_Model
ocr_model = OCR_Model(model= 'gemini-2.0-flash-thinking-exp-01-21')
invoice = ocr_model.extract(data)

## Display the output
We will display the extracted data in the form of JSON.

In [4]:
ocr_model.display(invoice)

{'report_header': {'supplier_code': 'GUIM€',
  'supplier': 'Guimera Fruits',
  'coo': 'Spain',
  'category': 'Stone Fruit',
  'dp': '7-ISS Linton',
  'vehicle_no': None,
  'vessel': None,
  'haulier': None,
  'temperature': 'Min 4.20; Max 5.10; Avg 4.58',
  'recorder_s': 'NO',
  'expected_eta': '03/09/2024 00:00:00',
  'received': '03/09/2024 09:59:25',
  'inspection_date': '04/09/2024 06:35:59'},
 'general_details': {'customer_dps_po': '166939', 'iss_po': '204865'},
 'product_details_summary': [{'iss_pallet': '7761770',
   'cust_pallet_id': '5808239',
   'supplier_pallet': '5808239',
   'ggn_grower': None,
   'harvest_date_display_until_end': '30/08/2024\n10/09/2024\n10/09/2024',
   'customer': 'Tesco',
   'packhouse': None,
   'organic': 'NO',
   'plu': None,
   'minor': '3.3%',
   'major': '1.67%',
   'waste': '0.00%',
   'qa_comments': 'Dry splits. Scarring. Isolated\npuncture. Isolated\ncondensation punnets. Isolated\nunderweight punnets.',
   'brix_avg_percent': '16.04',
   'pres

## Display the output in a table
We will display the extracted data in the form of a table.

In [5]:
ocr_model.display(invoice, html=True)

iss_pallet,cust_pallet_id,supplier_pallet,ggn_grower,harvest_date_display_until_end,customer,packhouse,organic,plu,minor,major,waste,qa_comments,brix_avg_percent,pressure_avg_kg,maturity_percent,total_defects,est_yield,rag,Unnamed: 19_level_0,Unnamed: 20_level_0,Unnamed: 21_level_0,Unnamed: 22_level_0,Unnamed: 23_level_0,Unnamed: 24_level_0,Unnamed: 25_level_0,Unnamed: 26_level_0,Unnamed: 27_level_0,Unnamed: 28_level_0,Unnamed: 29_level_0,Unnamed: 30_level_0,Unnamed: 31_level_0,Unnamed: 32_level_0,Unnamed: 33_level_0,Unnamed: 34_level_0,Unnamed: 35_level_0,Unnamed: 36_level_0,Unnamed: 37_level_0,Unnamed: 38_level_0,Unnamed: 39_level_0,Unnamed: 40_level_0,Unnamed: 41_level_0,Unnamed: 42_level_0,Unnamed: 43_level_0,Unnamed: 44_level_0,Unnamed: 45_level_0,Unnamed: 46_level_0,Unnamed: 47_level_0,Unnamed: 48_level_0,Unnamed: 49_level_0,Unnamed: 50_level_0,Unnamed: 51_level_0
iss_pallet_id,product_details,supplier_pallet_id,customer_pallet_id,variety,grower,ggn,orchard_farm,harvest_date,size_calibre,lot_number,freshness_technology,punnet_pad_type,outer,brand,organic,plu,does_pallet_meet_spec,end_customer,dp,packhouse,inspector,expected_qty,received_qty,total_defects_percent,estimated_yield_percent,minor_defects,major_defects,defects_tot_percent,defects_fruit_total,packs_with_defects_percent,waste_tot_percent,waste_fruit_total,packs_with_waste_percent,minor_defects_tot_percent,minor_fruit_total,major_defects_tot_percent,major_fruit_total,packs_with_major_percent,box_pack_weights,weight_readings,fruit_weights,sugar_brix,brix_readings,size,size_readings,maturity_percent,pressures,other_issues,qa_comments,packs_fruits_inspected_sample_size,boxes_inspected
7761770,5808239,5808239.0,,30/08/2024 10/09/2024 10/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.,16.04,0.00,00000,5.0%,100%,BLUE 80,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7763099,,7763099.0,,10/09/2024 10/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.,16.14,0.00,00000,5.0%,100%,BLUE 80,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7763100,,7763100.0,,10/09/2024 10/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,14.76,0.00,00000,5.0%,100%,BLUE 80,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7763101,,7763101.0,,10/09/2024 10/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.,16.59,0.00,00000,5.0%,100%,BLUE 80,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7763102,,7763102.0,,10/09/2024 10/09/2024,Tesco,,NO,,5.0%,3.33%,2.50%,Waste. Bruising. Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,13.19,0.00,00000,10.8%,94%,AMBER 80,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7763103,,7763103.0,,10/09/2024 10/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Underweight punnets.,15.24,0.00,00000,5.0%,100%,BLUE 80,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7763104,,7763104.0,,10/09/2024 10/09/2024,Tesco,,NO,,5.0%,3.33%,2.50%,Waste. Bruising. Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,15.2,0.00,00000,10.8%,94%,AMBER 80,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7761771,5808238,5808238.0,,30/08/2024 11/09/2024 11/09/2024,Tesco,,NO,,5.0%,3.33%,2.50%,Waste. Bruising. Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,19.29,0.00,00000,10.8%,94%,AMBER 80,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7763105,,7763105.0,,11/09/2024 11/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,13.85,0.00,00000,5.0%,100%,BLUE 80,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7763106,,7763106.0,,11/09/2024 11/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,16.56,0.00,00000,5.0%,100%,BLUE 80,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

0,1
supplier_code,GUIM€
supplier,Guimera Fruits
coo,Spain
category,Stone Fruit
dp,7-ISS Linton
vehicle_no,
vessel,
haulier,
temperature,Min 4.20; Max 5.10; Avg 4.58
recorder_s,NO

0,1
customer_dps_po,166939
iss_po,204865

iss_pallet,cust_pallet_id,supplier_pallet,ggn_grower,harvest_date_display_until_end,customer,packhouse,organic,plu,minor,major,waste,qa_comments,brix_avg_percent,pressure_avg_kg,maturity_percent,total_defects,est_yield,rag
7761770,5808239.0,5808239,,30/08/2024 10/09/2024 10/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.,16.04,0.0,0,5.0%,100%,BLUE 80
7763099,,7763099,,10/09/2024 10/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.,16.14,0.0,0,5.0%,100%,BLUE 80
7763100,,7763100,,10/09/2024 10/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,14.76,0.0,0,5.0%,100%,BLUE 80
7763101,,7763101,,10/09/2024 10/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.,16.59,0.0,0,5.0%,100%,BLUE 80
7763102,,7763102,,10/09/2024 10/09/2024,Tesco,,NO,,5.0%,3.33%,2.50%,Waste. Bruising. Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,13.19,0.0,0,10.8%,94%,AMBER 80
7763103,,7763103,,10/09/2024 10/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Underweight punnets.,15.24,0.0,0,5.0%,100%,BLUE 80
7763104,,7763104,,10/09/2024 10/09/2024,Tesco,,NO,,5.0%,3.33%,2.50%,Waste. Bruising. Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,15.2,0.0,0,10.8%,94%,AMBER 80
7761771,5808238.0,5808238,,30/08/2024 11/09/2024 11/09/2024,Tesco,,NO,,5.0%,3.33%,2.50%,Waste. Bruising. Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,19.29,0.0,0,10.8%,94%,AMBER 80
7763105,,7763105,,11/09/2024 11/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,13.85,0.0,0,5.0%,100%,BLUE 80
7763106,,7763106,,11/09/2024 11/09/2024,Tesco,,NO,,3.3%,1.67%,0.00%,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets.,16.56,0.0,0,5.0%,100%,BLUE 80

iss_pallet_id,product_details,supplier_pallet_id,customer_pallet_id,variety,grower,ggn,orchard_farm,harvest_date,size_calibre,lot_number,freshness_technology,punnet_pad_type,outer,brand,organic,plu,does_pallet_meet_spec,end_customer,dp,packhouse,inspector,expected_qty,received_qty,total_defects_percent,estimated_yield_percent,minor_defects,major_defects,defects_tot_percent,defects_fruit_total,packs_with_defects_percent,waste_tot_percent,waste_fruit_total,packs_with_waste_percent,minor_defects_tot_percent,minor_fruit_total,major_defects_tot_percent,major_fruit_total,packs_with_major_percent,box_pack_weights,weight_readings,fruit_weights,sugar_brix,brix_readings,size,size_readings,maturity_percent,pressures,other_issues,qa_comments,packs_fruits_inspected_sample_size,boxes_inspected
7761770,Apricots 20x320g Punnet,5808239.0,5808239.0,Fardao,/,/,/,30/08/2024,40/45,508164.0,,N/A /,,CORE,NO,,YES,Tesco,7.0,,Hanna.Dziuba,80.0,80.0,5.00%,100%,dry_splits1.67%puncture1.67%scarring1.67%,scarring1.67%,0.00%,0.0,0.00%,0.00%,0.0,0%,3.33%,2.0,1.67%,1.0,0%,average340gmin326gmax367gunderweight_percent0%,326326328330330331332333334334342343343347349350352353353367,average44greadings44,average_percent16.04%min_percent14.90%max_percent17.60%,14.915.015.415.615.716.116.216.817.117.6,average0.00min0max0undersize_percent0.00%,,st10%st20%st30%st40%st50%,average0.00min0.00max0.00,,Dry splits. Scarring. Isolated puncture. Isolated condensation punnets. Isolated underweight punnets.,60.0,0.0
dry_splits,1.67%,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
puncture,1.67%,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
scarring,1.67%,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
scarring,1.67%,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
average,340g,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
min,326g,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
max,367g,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
underweight_percent,0%,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
average,44g,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

0,1
dry_splits,1.67%
puncture,1.67%
scarring,1.67%

0,1
scarring,1.67%

0,1
average,340g
min,326g
max,367g
underweight_percent,0%

0,1
average,44g
readings,44

0,1
average_percent,16.04%
min_percent,14.90%
max_percent,17.60%

0,1
average,0.00
min,0
max,0
undersize_percent,0.00%

0,1
st1,0%
st2,0%
st3,0%
st4,0%
st5,0%

0,1
average,0.0
min,0.0
max,0.0

0,1
waste,2.50%
bruising,2.50%
dry_splits,0.83%
puncture,0.83%
scarring,0.83%

0,1
bruising,2.50%
scarring,0.83%

0,1
average,352g
min,335g
max,375g
underweight_percent,0%

0,1
average,44g
readings,44

0,1
average_percent,19.29%
min_percent,16.30%
max_percent,21.60%

0,1
average,0.00
min,0
max,0
undersize_percent,0.00%

0,1
st1,0%
st2,0%
st3,0%
st4,0%
st5,0%

0,1
average,0.0
min,0.0
max,0.0

0,1
dry_splits,1.67%
puncture,1.67%
scarring,1.67%

0,1
scarring,1.67%

0,1
average,345g
min,324g
max,361g
underweight_percent,0%

0,1
average,44g
readings,44

0,1
average_percent,16.14%
min_percent,13.30%
max_percent,18.20%

0,1
average,0.00
min,0
max,0
undersize_percent,0.00%

0,1
st1,0%
st2,0%
st3,0%
st4,0%
st5,0%

0,1
average,0.0
min,0.0
max,0.0

0,1
puncture,1.67%
scarring,1.67%
dry_splits,1.67%

0,1
dry_splits,1.67%

0,1
average,348g
min,330g
max,380g
underweight_percent,0%

0,1
average,44g
readings,44

0,1
average_percent,14.76%
min_percent,13.60%
max_percent,16.90%

0,1
average,0.00
min,0
max,0
undersize_percent,0.00%

0,1
st1,0%
st2,0%
st3,0%
st4,0%
st5,0%

0,1
average,0.0
min,0.0
max,0.0

0,1
dry_splits,1.67%
puncture,1.67%
scarring,1.67%

0,1
scarring,1.67%

0,1
average,341g
min,324g
max,359g
underweight_percent,0%

0,1
average,44g
readings,44

0,1
average_percent,16.59%
min_percent,14.50%
max_percent,18.80%

0,1
average,0.00
min,0
max,0
undersize_percent,0.00%

0,1
st1,0%
st2,0%
st3,0%
st4,0%
st5,0%

0,1
average,0.0
min,0.0
max,0.0

0,1
waste,2.50%
bruising,2.50%
puncture,0.83%
scarring,1.67%

0,1
bruising,1.67%
dry_splits,0.83%
scarring,0.83%

0,1
average,350g
min,327g
max,379g
underweight_percent,0%

0,1
average,44g
readings,44

0,1
average_percent,13.19%
min_percent,12.20%
max_percent,14.80%

0,1
average,0.00
min,0
max,0
undersize_percent,0.00%

0,1
st1,0%
st2,0%
st3,0%
st4,0%
st5,0%

0,1
average,0.0
min,0.0
max,0.0

0,1
dry_splits,1.67%
puncture,1.67%
scarring,1.67%

0,1
scarring,1.67%

0,1
average,325g
min,313g
max,340g
underweight_percent,0%

0,1
average,44g
readings,44

0,1
average_percent,15.24%
min_percent,13.10%
max_percent,17.30%

0,1
average,0.00
min,0
max,0
undersize_percent,0.00%

0,1
st1,0%
st2,0%
st3,0%
st4,0%
st5,0%

0,1
average,0.0
min,0.0
max,0.0

0,1
waste,2.50%
bruising,2.50%
dry_splits,0.83%
puncture,0.83%
scarring,0.83%

0,1
bruising,1.67%
dry_splits,0.83%
scarring,0.83%

0,1
average,343g
min,328g
max,363g
underweight_percent,0%

0,1
average,44g
readings,44

0,1
average_percent,15.20%
min_percent,11.60%
max_percent,19.60%

0,1
average,0.00
min,0
max,0
undersize_percent,0.00%

0,1
st1,0%
st2,0%
st3,0%
st4,0%
st5,0%

0,1
average,0.0
min,0.0
max,0.0

0,1
puncture,1.67%
scarring,1.67%
dry_splits,1.67%

0,1
dry_splits,1.67%

0,1
average,343g
min,330g
max,360g
underweight_percent,0%

0,1
average,44g
readings,44

0,1
average_percent,13.85%
min_percent,12.20%
max_percent,15.50%

0,1
average,0.00
min,0
max,0
undersize_percent,0.00%

0,1
st1,0%
st2,0%
st3,0%
st4,0%
st5,0%

0,1
average,0.0
min,0.0
max,0.0

0,1
puncture,1.67%
scarring,1.67%
dry_splits,1.67%

0,1
dry_splits,1.67%

0,1
average,343g
min,330g
max,357g
underweight_percent,0%

0,1
average,44g
readings,44

0,1
average_percent,16.56%
min_percent,13.50%
max_percent,19.20%

0,1
average,0.00
min,0
max,0
undersize_percent,0.00%

0,1
st1,0%
st2,0%
st3,0%
st4,0%
st5,0%

0,1
average,0.0
min,0.0
max,0.0


## Conclusion
We have successfully extracted the data from the PDF and displayed the output in a table format.