**Objectif global :**

Ce code combine des techniques d'analyse d'images et d'extraction de texte pour identifier des tableaux dans un fichier PDF, extraire ces tableaux sous forme d'images, et convertir leur contenu en texte structuré.



In [None]:
!pip install opencv-python-headless
!pip install pillow
!pip install pandas
!pip install pytesseract
!pip install scikit-learn
!apt-get install tesseract-ocr
!apt-get install tesseract-ocr-fra
!pip install ultralyticsplus==0.0.28 ultralytics==8.0.43

Collecting pytesseract
  Downloading pytesseract-0.3.13-py3-none-any.whl.metadata (11 kB)
Downloading pytesseract-0.3.13-py3-none-any.whl (14 kB)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.13
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  tesseract-ocr-eng tesseract-ocr-osd
The following NEW packages will be installed:
  tesseract-ocr tesseract-ocr-eng tesseract-ocr-osd
0 upgraded, 3 newly installed, 0 to remove and 49 not upgraded.
Need to get 4,816 kB of archives.
After this operation, 15.6 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tesseract-ocr-eng all 1:4.00~git30-7274cfa-1.1 [1,591 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tesseract-ocr-osd all 1:4.00~git30-7274cfa-1.1 [2,990 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tesseract-ocr amd64 

In [None]:
!pip install pdf2image
!apt-get install -y poppler-utils


Collecting pdf2image
  Downloading pdf2image-1.17.0-py3-none-any.whl.metadata (6.2 kB)
Downloading pdf2image-1.17.0-py3-none-any.whl (11 kB)
Installing collected packages: pdf2image
Successfully installed pdf2image-1.17.0
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  poppler-utils
0 upgraded, 1 newly installed, 0 to remove and 49 not upgraded.
Need to get 186 kB of archives.
After this operation, 696 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 poppler-utils amd64 22.02.0-2ubuntu0.5 [186 kB]
Fetched 186 kB in 0s (944 kB/s)
Selecting previously unselected package poppler-utils.
(Reading database ... 123684 files and directories currently installed.)
Preparing to unpack .../poppler-utils_22.02.0-2ubuntu0.5_amd64.deb ...
Unpacking poppler-utils (22.02.0-2ubuntu0.5) ...
Setting up poppler-utils (22.02.0-2ubuntu0.5) ...
Processing tr

In [None]:
import cv2
import os
import pytesseract
from pytesseract import Output
from ultralyticsplus import YOLO
from pdf2image import convert_from_path
import pandas as pd

**Fonction 1 : Initialisation du modèle YOLO**

In [None]:
def initialize_model():
    model = YOLO('foduucom/table-detection-and-extraction')
    model.overrides['conf'] = 0.25  # Seuil de confiance
    model.overrides['iou'] = 0.45  # Seuil IoU
    model.overrides['agnostic_nms'] = False
    model.overrides['max_det'] = 1000
    return model

**Fonction 2 : Extraction des tables sous forme d'images**

Conversion des pages PDF en images, détection et découpe des tableaux, stockage des images dans un dossier

In [None]:
def extract_tables_from_pdf(pdf_path, output_dir):
    os.makedirs(output_dir, exist_ok=True)
    model = initialize_model()
    table_image_paths = []

    pages = convert_from_path(pdf_path)
    print(f"{len(pages)} pages trouvées dans le PDF.")

    for page_number, page in enumerate(pages, start=1):
        page_image_path = os.path.join(output_dir, f'page_{page_number}.jpg')
        page.save(page_image_path, 'JPEG')

        image = cv2.imread(page_image_path)
        results = model.predict(page_image_path)

        for idx, box in enumerate(results[0].boxes, start=1):
            x1, y1, x2, y2 = map(int, box.xyxy[0])
            cropped_image = image[y1:y2, x1:x2]

            table_image_path = os.path.join(output_dir, f'table_page{page_number}_{idx}.jpg')
            cv2.imwrite(table_image_path, cropped_image)
            table_image_paths.append(table_image_path)
            print(f"Tableau enregistré : {table_image_path}")

    return table_image_paths

**Fonction 3 : Extraction du texte d'une image de table**

Analyse des images pour extraire et organiser le texte en conservant la mise en page.

In [None]:
def extract_text_from_table_image(image_path):
    img = cv2.imread(image_path)
    img_resized = cv2.resize(img, (int(img.shape[1] + (img.shape[1] * 0.1)),
                                   int(img.shape[0] + (img.shape[0] * 0.25))),
                             interpolation=cv2.INTER_AREA)
    img_rgb = cv2.cvtColor(img_resized, cv2.COLOR_BGR2RGB)

    custom_config = r'-l eng --oem 3 --psm 6 -c tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-:.$%./@& *"'
    d = pytesseract.image_to_data(img_rgb, config=custom_config, output_type=Output.DICT)
    df = pd.DataFrame(d)

    df1 = df[(df.conf != '-1') & (df.text != ' ') & (df.text != '')]

    sorted_blocks = df1.groupby('block_num').first().sort_values('top').index.tolist()
    extracted_text = ""

    for block in sorted_blocks:
        curr = df1[df1['block_num'] == block]
        char_w = (curr.width / curr.text.str.len()).mean()
        prev_par, prev_line, prev_left = 0, 0, 0
        text = ''

        for _, ln in curr.iterrows():
            if prev_par != ln['par_num']:
                text += '\n'
                prev_par = ln['par_num']
                prev_line = ln['line_num']
                prev_left = 0
            elif prev_line != ln['line_num']:
                text += '\n'
                prev_line = ln['line_num']
                prev_left = 0

            added = 0
            if ln['left'] / char_w > prev_left + 1:
                added = int((ln['left']) / char_w) - prev_left
                text += ' ' * added
            text += ln['text'] + ' '
            prev_left += len(ln['text']) + added + 1
        extracted_text += text + '\n'

    print(f"Contenu extrait de l'image {image_path} :\n{extracted_text}")
    print('-----------------------------------------------------------------------------------------')
    return extracted_text

**Script principal**

In [None]:
def main(pdf_path, output_dir):
    print("Début du traitement...")
    table_images = extract_tables_from_pdf(pdf_path, output_dir)
    print(f"Nombre de tables extraites : {len(table_images)}")

    for image_path in table_images:
        extract_text_from_table_image(image_path)

**Exemple d'utilisation**

In [None]:
pdf_path = "/content/sfcr_covea_2022.PDF"  # Remplacez par le chemin de votre PDF
output_dir = "tables_extracted"
os.makedirs(output_dir, exist_ok=True)

print("Début du traitement...")
table_images = extract_tables_from_pdf(pdf_path, output_dir) #Extraction des tables sous forme d'images dans le pdf
print(f"Nombre de tables extraites : {len(table_images)}")

Début du traitement...


  return torch.load(file, map_location='cpu'), file  # load
Ultralytics YOLOv8.0.43 🚀 Python-3.10.12 torch-2.5.1+cu121 CPU


98 pages trouvées dans le PDF.


Model summary (fused): 168 layers, 11126358 parameters, 0 gradients, 28.4 GFLOPs

image 1/1 /content/tables_extracted/page_1.jpg: 640x480 1 bordered, 879.6ms
Speed: 2.7ms preprocess, 879.6ms inference, 1.8ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page1_1.jpg


image 1/1 /content/tables_extracted/page_2.jpg: 640x480 1 borderless, 470.8ms
Speed: 1.0ms preprocess, 470.8ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page2_1.jpg


image 1/1 /content/tables_extracted/page_3.jpg: 640x480 (no detections), 455.0ms
Speed: 1.0ms preprocess, 455.0ms inference, 0.6ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_4.jpg: 640x480 (no detections), 459.5ms
Speed: 1.1ms preprocess, 459.5ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_5.jpg: 640x480 1 borderless, 469.6ms
Speed: 1.0ms preprocess, 469.6ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page5_1.jpg


image 1/1 /content/tables_extracted/page_6.jpg: 640x480 1 borderless, 465.4ms
Speed: 1.1ms preprocess, 465.4ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page6_1.jpg


image 1/1 /content/tables_extracted/page_7.jpg: 640x480 (no detections), 476.4ms
Speed: 1.0ms preprocess, 476.4ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_8.jpg: 640x480 3 bordereds, 482.0ms
Speed: 0.9ms preprocess, 482.0ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page8_1.jpg
Tableau enregistré : tables_extracted/table_page8_2.jpg
Tableau enregistré : tables_extracted/table_page8_3.jpg


image 1/1 /content/tables_extracted/page_9.jpg: 640x480 1 borderless, 522.3ms
Speed: 1.1ms preprocess, 522.3ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page9_1.jpg


image 1/1 /content/tables_extracted/page_10.jpg: 640x480 (no detections), 557.7ms
Speed: 1.0ms preprocess, 557.7ms inference, 0.7ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_11.jpg: 640x480 (no detections), 1311.9ms
Speed: 1.0ms preprocess, 1311.9ms inference, 0.7ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_12.jpg: 640x480 (no detections), 1316.3ms
Speed: 1.0ms preprocess, 1316.3ms inference, 0.7ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_13.jpg: 640x480 (no detections), 1433.8ms
Speed: 1.0ms preprocess, 1433.8ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_14.jpg: 640x480 (no detections), 468.1ms
Speed: 1.0ms preprocess, 468.1ms inference, 4.8ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_15.jpg: 640x480 2 borderlesss, 450.9ms
Speed: 0.

Tableau enregistré : tables_extracted/table_page15_1.jpg
Tableau enregistré : tables_extracted/table_page15_2.jpg


image 1/1 /content/tables_extracted/page_16.jpg: 640x480 2 borderlesss, 495.7ms
Speed: 1.1ms preprocess, 495.7ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page16_1.jpg
Tableau enregistré : tables_extracted/table_page16_2.jpg


image 1/1 /content/tables_extracted/page_17.jpg: 640x480 (no detections), 512.4ms
Speed: 0.9ms preprocess, 512.4ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_18.jpg: 640x480 1 borderless, 632.1ms
Speed: 1.0ms preprocess, 632.1ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page18_1.jpg


image 1/1 /content/tables_extracted/page_19.jpg: 640x480 1 borderless, 723.1ms
Speed: 1.0ms preprocess, 723.1ms inference, 6.0ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page19_1.jpg


image 1/1 /content/tables_extracted/page_20.jpg: 640x480 1 borderless, 470.1ms
Speed: 0.9ms preprocess, 470.1ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page20_1.jpg


image 1/1 /content/tables_extracted/page_21.jpg: 640x480 1 borderless, 456.1ms
Speed: 1.0ms preprocess, 456.1ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page21_1.jpg


image 1/1 /content/tables_extracted/page_22.jpg: 640x480 (no detections), 470.2ms
Speed: 1.1ms preprocess, 470.2ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_23.jpg: 640x480 2 borderlesss, 465.4ms
Speed: 1.1ms preprocess, 465.4ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page23_1.jpg
Tableau enregistré : tables_extracted/table_page23_2.jpg


image 1/1 /content/tables_extracted/page_24.jpg: 640x480 1 borderless, 458.9ms
Speed: 1.1ms preprocess, 458.9ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page24_1.jpg


image 1/1 /content/tables_extracted/page_25.jpg: 640x480 1 borderless, 471.3ms
Speed: 1.0ms preprocess, 471.3ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page25_1.jpg


image 1/1 /content/tables_extracted/page_26.jpg: 640x480 (no detections), 452.2ms
Speed: 1.0ms preprocess, 452.2ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_27.jpg: 640x480 (no detections), 463.4ms
Speed: 1.0ms preprocess, 463.4ms inference, 0.6ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_28.jpg: 640x480 1 borderless, 538.8ms
Speed: 1.0ms preprocess, 538.8ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page28_1.jpg


image 1/1 /content/tables_extracted/page_29.jpg: 640x480 (no detections), 734.1ms
Speed: 1.2ms preprocess, 734.1ms inference, 0.7ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_30.jpg: 640x480 (no detections), 733.2ms
Speed: 1.1ms preprocess, 733.2ms inference, 0.8ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_31.jpg: 640x480 (no detections), 741.4ms
Speed: 1.2ms preprocess, 741.4ms inference, 0.7ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_32.jpg: 640x480 (no detections), 771.3ms
Speed: 1.2ms preprocess, 771.3ms inference, 0.7ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_33.jpg: 640x480 (no detections), 531.2ms
Speed: 1.3ms preprocess, 531.2ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_34.jpg: 640x480 2 borderlesss, 472.1ms
Speed: 1.1ms pr

Tableau enregistré : tables_extracted/table_page34_1.jpg
Tableau enregistré : tables_extracted/table_page34_2.jpg


image 1/1 /content/tables_extracted/page_35.jpg: 640x480 (no detections), 451.0ms
Speed: 1.2ms preprocess, 451.0ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_36.jpg: 640x480 (no detections), 460.4ms
Speed: 1.0ms preprocess, 460.4ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_37.jpg: 640x480 (no detections), 450.2ms
Speed: 1.0ms preprocess, 450.2ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_38.jpg: 640x480 (no detections), 468.3ms
Speed: 1.0ms preprocess, 468.3ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_39.jpg: 640x480 (no detections), 456.3ms
Speed: 1.0ms preprocess, 456.3ms inference, 0.6ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_40.jpg: 640x480 (no detections), 442.7ms
Speed: 1.0ms 

Tableau enregistré : tables_extracted/table_page49_1.jpg
Tableau enregistré : tables_extracted/table_page49_2.jpg


image 1/1 /content/tables_extracted/page_50.jpg: 640x480 (no detections), 663.5ms
Speed: 1.0ms preprocess, 663.5ms inference, 0.6ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_51.jpg: 640x480 (no detections), 715.8ms
Speed: 1.2ms preprocess, 715.8ms inference, 0.6ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_52.jpg: 640x480 (no detections), 719.5ms
Speed: 1.1ms preprocess, 719.5ms inference, 0.6ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_53.jpg: 640x480 (no detections), 772.8ms
Speed: 1.1ms preprocess, 772.8ms inference, 3.1ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_54.jpg: 640x480 (no detections), 740.2ms
Speed: 1.1ms preprocess, 740.2ms inference, 0.8ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_55.jpg: 640x480 1 borderless, 458.5ms
Speed: 1.1ms pre

Tableau enregistré : tables_extracted/table_page55_1.jpg


image 1/1 /content/tables_extracted/page_56.jpg: 640x480 (no detections), 469.8ms
Speed: 1.1ms preprocess, 469.8ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_57.jpg: 640x480 (no detections), 448.8ms
Speed: 1.1ms preprocess, 448.8ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_58.jpg: 640x480 (no detections), 493.2ms
Speed: 1.1ms preprocess, 493.2ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_59.jpg: 640x480 (no detections), 444.4ms
Speed: 1.0ms preprocess, 444.4ms inference, 0.6ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_60.jpg: 640x480 3 borderlesss, 482.4ms
Speed: 1.1ms preprocess, 482.4ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page60_1.jpg
Tableau enregistré : tables_extracted/table_page60_2.jpg
Tableau enregistré : tables_extracted/table_page60_3.jpg


image 1/1 /content/tables_extracted/page_61.jpg: 640x480 1 borderless, 467.5ms
Speed: 1.0ms preprocess, 467.5ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page61_1.jpg


image 1/1 /content/tables_extracted/page_62.jpg: 640x480 (no detections), 446.0ms
Speed: 1.0ms preprocess, 446.0ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_63.jpg: 640x480 2 borderlesss, 469.5ms
Speed: 1.0ms preprocess, 469.5ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page63_1.jpg
Tableau enregistré : tables_extracted/table_page63_2.jpg


image 1/1 /content/tables_extracted/page_64.jpg: 640x480 2 borderlesss, 456.1ms
Speed: 1.1ms preprocess, 456.1ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page64_1.jpg
Tableau enregistré : tables_extracted/table_page64_2.jpg


image 1/1 /content/tables_extracted/page_65.jpg: 640x480 (no detections), 475.3ms
Speed: 1.0ms preprocess, 475.3ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_66.jpg: 640x480 1 borderless, 443.1ms
Speed: 1.1ms preprocess, 443.1ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page66_1.jpg


image 1/1 /content/tables_extracted/page_67.jpg: 640x480 (no detections), 454.5ms
Speed: 1.1ms preprocess, 454.5ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_68.jpg: 640x480 (no detections), 469.0ms
Speed: 1.0ms preprocess, 469.0ms inference, 0.6ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_69.jpg: 640x480 (no detections), 459.1ms
Speed: 1.1ms preprocess, 459.1ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_70.jpg: 640x480 (no detections), 473.1ms
Speed: 1.1ms preprocess, 473.1ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_71.jpg: 640x480 3 borderlesss, 547.1ms
Speed: 1.1ms preprocess, 547.1ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page71_1.jpg
Tableau enregistré : tables_extracted/table_page71_2.jpg
Tableau enregistré : tables_extracted/table_page71_3.jpg


image 1/1 /content/tables_extracted/page_72.jpg: 640x480 1 borderless, 718.1ms
Speed: 1.1ms preprocess, 718.1ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page72_1.jpg


image 1/1 /content/tables_extracted/page_73.jpg: 640x480 (no detections), 710.3ms
Speed: 1.1ms preprocess, 710.3ms inference, 0.7ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_74.jpg: 640x480 1 bordered, 771.3ms
Speed: 1.1ms preprocess, 771.3ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page74_1.jpg


image 1/1 /content/tables_extracted/page_75.jpg: 640x480 1 borderless, 770.7ms
Speed: 1.1ms preprocess, 770.7ms inference, 1.7ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page75_1.jpg


image 1/1 /content/tables_extracted/page_76.jpg: 640x480 1 borderless, 469.9ms
Speed: 1.0ms preprocess, 469.9ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page76_1.jpg


image 1/1 /content/tables_extracted/page_77.jpg: 640x480 (no detections), 457.9ms
Speed: 1.1ms preprocess, 457.9ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_78.jpg: 640x480 1 bordered, 463.1ms
Speed: 1.1ms preprocess, 463.1ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page78_1.jpg


image 1/1 /content/tables_extracted/page_79.jpg: 640x480 (no detections), 466.5ms
Speed: 1.1ms preprocess, 466.5ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_80.jpg: 640x480 1 borderless, 464.3ms
Speed: 1.0ms preprocess, 464.3ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page80_1.jpg


image 1/1 /content/tables_extracted/page_81.jpg: 640x480 (no detections), 461.0ms
Speed: 1.1ms preprocess, 461.0ms inference, 0.8ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_82.jpg: 640x480 (no detections), 456.1ms
Speed: 1.1ms preprocess, 456.1ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_83.jpg: 640x480 1 borderless, 468.7ms
Speed: 1.1ms preprocess, 468.7ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page83_1.jpg


image 1/1 /content/tables_extracted/page_84.jpg: 640x480 1 borderless, 468.4ms
Speed: 1.1ms preprocess, 468.4ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page84_1.jpg


image 1/1 /content/tables_extracted/page_85.jpg: 480x640 1 borderless, 465.6ms
Speed: 1.0ms preprocess, 465.6ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page85_1.jpg


image 1/1 /content/tables_extracted/page_86.jpg: 480x640 1 borderless, 452.9ms
Speed: 1.2ms preprocess, 452.9ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page86_1.jpg


image 1/1 /content/tables_extracted/page_87.jpg: 480x640 1 borderless, 467.1ms
Speed: 1.1ms preprocess, 467.1ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page87_1.jpg


image 1/1 /content/tables_extracted/page_88.jpg: 480x640 1 borderless, 461.0ms
Speed: 1.0ms preprocess, 461.0ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page88_1.jpg


image 1/1 /content/tables_extracted/page_89.jpg: 480x640 1 borderless, 470.0ms
Speed: 1.1ms preprocess, 470.0ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page89_1.jpg


image 1/1 /content/tables_extracted/page_90.jpg: 480x640 1 borderless, 473.8ms
Speed: 1.0ms preprocess, 473.8ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page90_1.jpg


image 1/1 /content/tables_extracted/page_91.jpg: 480x640 1 bordered, 1 borderless, 476.1ms
Speed: 1.0ms preprocess, 476.1ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page91_1.jpg
Tableau enregistré : tables_extracted/table_page91_2.jpg


image 1/1 /content/tables_extracted/page_92.jpg: 480x640 (no detections), 678.9ms
Speed: 1.1ms preprocess, 678.9ms inference, 0.8ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_93.jpg: 480x640 1 borderless, 712.9ms
Speed: 2.1ms preprocess, 712.9ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page93_1.jpg


image 1/1 /content/tables_extracted/page_94.jpg: 480x640 (no detections), 719.9ms
Speed: 1.1ms preprocess, 719.9ms inference, 0.8ms postprocess per image at shape (1, 3, 640, 640)

image 1/1 /content/tables_extracted/page_95.jpg: 480x640 1 borderless, 783.9ms
Speed: 1.1ms preprocess, 783.9ms inference, 1.6ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page95_1.jpg


image 1/1 /content/tables_extracted/page_96.jpg: 480x640 1 borderless, 750.3ms
Speed: 1.1ms preprocess, 750.3ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page96_1.jpg


image 1/1 /content/tables_extracted/page_97.jpg: 480x640 1 borderless, 454.8ms
Speed: 1.0ms preprocess, 454.8ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)



Tableau enregistré : tables_extracted/table_page97_1.jpg


image 1/1 /content/tables_extracted/page_98.jpg: 640x480 1 borderless, 450.7ms
Speed: 1.1ms preprocess, 450.7ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)


Tableau enregistré : tables_extracted/table_page98_1.jpg
Nombre de tables extraites : 59


In [None]:
# Transformation des images en tables
for image_path in table_images:
        extract_text_from_table_image(image_path)

Contenu extrait de l'image tables_extracted/table_page1_1.jpg :

-----------------------------------------------------------------------------------------
Contenu extrait de l'image tables_extracted/table_page2_1.jpg :

AL. ACtIVILG eee  cccccccscsescseseseccscscsesssesesesssesavassssesesesssavesasssasacsesesasesasasasaeseseseuasacssassssesesasscacssssataeersasee G 
A2. RESultat de SOUSCTIPTION ..........csecesecseesesestsessesesesesesnseeseesseseststsnsesseseetsesteseasesasetstetstesetesetetseeeees LO 
A3. Resultat des INVEStISSEMENES  0.0... cccccsesesseeteessscscseseseseseseescscscssscsesesseetscscsssssesestsasssssssssasseseses QU 
A4. Resultat d@S AULFES ACTIVITES oo... ccc ccccseseescseecseseeseseescsesscsesecsesecsssessssecsestsessesesssesatsesstsecsseeases 2 

-----------------------------------------------------------------------------------------
Contenu extrait de l'image tables_extracted/table_page5_1.jpg :

            oe                            :                    er  