Unstructured-IO · badGarnet · Feb 6, 2025 · Feb 6, 2025
diff --git a/...ured-output/local-single-file-with-pdf-infer-table-structure/layout-parser-paper.pdf.json b/...ured-output/local-single-file-with-pdf-infer-table-structure/layout-parser-paper.pdf.json
@@ -360,8 +360,8 @@
   },
   {
     "type": "Title",
-    "element_id": "bcb94891b0d7a997ab7e28d99195ff37",
-    "text": "Introduction",
+    "element_id": "3a170066f972d25cc303a05ddc16d52c",
+    "text": "1 Introduction",
     "metadata": {
       "filetype": "application/pdf",
       "languages": [
@@ -1600,9 +1600,9 @@
     }
   },
   {
-    "type": "NarrativeText",
-    "element_id": "2f41c1732a2870b1fecd72dec1b2ff3d",
-    "text": "1 import layoutparser as lp 2 image = cv2 . imread ( \" image_file \" ) # load images 3 model = lp . De t e c tro n2 Lay outM odel ( 4 \" lp :// PubLayNet / f as t er _ r c nn _ R _ 50 _ F P N_ 3 x / config \" ) 5 layout = model . detect ( image )",
+    "type": "ListItem",
+    "element_id": "508a6705bb0bfb693616cc14fec5e1b9",
+    "text": "1 import layoutparser as lp",
     "metadata": {
       "filetype": "application/pdf",
       "languages": [
@@ -1622,9 +1622,9 @@
     }
   },
   {
-    "type": "ListItem",
-    "element_id": "53b448c75f1556b1f60b4e3324bd0724",
-    "text": "1 import layoutparser as lp",
+    "type": "NarrativeText",
+    "element_id": "c2af717e76ad68bd6da87a15a69f126a",
+    "text": "2 image = cv2 . imread ( \" image_file \" ) # load images 3 model = lp . De t e c tro n2 Lay outM odel ( 4 \" lp :// PubLayNet / f as t er _ r c nn _ R _ 50 _ F P N_ 3 x / config \" )",
     "metadata": {
       "filetype": "application/pdf",
       "languages": [
@@ -2376,9 +2376,53 @@
       }
     }
   },
+  {
+    "type": "FigureCaption",
+    "element_id": "9e1c338a3371cb3df5bddc5672e6a53b",
+    "text": "Mode I: Showing Layout on the Original Image",
+    "metadata": {
+      "filetype": "application/pdf",
+      "languages": [
+        "eng"
+      ],
+      "page_number": 9,
+      "data_source": {
+        "record_locator": {
+          "path": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/example-docs/layout-parser-paper.pdf"
+        },
+        "permissions_data": [
+          {
+            "mode": 33188
+          }
+        ]
+      }
+    }
+  },
+  {
+    "type": "FigureCaption",
+    "element_id": "2b8903940339ed9a553e4b84107ebd40",
+    "text": "Mode Il: Drawing OCR'd Text at the Correspoding Position",
+    "metadata": {
+      "filetype": "application/pdf",
+      "languages": [
+        "eng"
+      ],
+      "page_number": 9,
+      "data_source": {
+        "record_locator": {
+          "path": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/example-docs/layout-parser-paper.pdf"
+        },
+        "permissions_data": [
+          {
+            "mode": 33188
+          }
+        ]
+      }
+    }
+  },
   {
     "type": "NarrativeText",
-    "element_id": "fadd4ad54cd14e3e4711d41a1c99f813",
+    "element_id": "e523a8d1116c7285345aef3a499c7819",
     "text": "Fig. 3: Layout detection and OCR results visualization generated by the LayoutParser APIs. Mode I directly overlays the layout region bounding boxes and categories over the original image. Mode II recreates the original document via drawing the OCR\u2019d texts at their corresponding positions on the image canvas. In this \ufb01gure, tokens in textual regions are \ufb01ltered using the API and then displayed.",
     "metadata": {
       "filetype": "application/pdf",
@@ -2400,7 +2444,7 @@
   },
   {
     "type": "NarrativeText",
-    "element_id": "625c9e1d41a9740f094041595f79953d",
+    "element_id": "5b842403fb6d891dd468a13d00295082",
     "text": "can also be highly sensitive and not sharable publicly. To overcome these chal- lenges, LayoutParser is built with rich features for e\ufb03cient data annotation and customized model training.",
     "metadata": {
       "filetype": "application/pdf",
@@ -2422,7 +2466,7 @@
   },
   {
     "type": "NarrativeText",
-    "element_id": "a3498730b5cd3fe9405fad69bcf37882",
+    "element_id": "eb88479f76d806ca0a07e36b53ecb4b9",
     "text": "LayoutParser incorporates a toolkit optimized for annotating document lay- outs using object-level active learning [32]. With the help from a layout detection model trained along with labeling, only the most important layout objects within each image, rather than the whole image, are required for labeling. The rest of the regions are automatically annotated with high con\ufb01dence predictions from the layout detection model. This allows a layout dataset to be created more e\ufb03ciently with only around 60% of the labeling budget.",
     "metadata": {
       "links": [
@@ -2451,7 +2495,7 @@
   },
   {
     "type": "NarrativeText",
-    "element_id": "c4ccf2cf2e7495668221cbe51534f90b",
+    "element_id": "4fe78489dc18583bc966cec8fe1c6ea5",
     "text": "After the training dataset is curated, LayoutParser supports di\ufb00erent modes for training the layout models. Fine-tuning can be used for training models on a small newly-labeled dataset by initializing the model with existing pre-trained weights. Training from scratch can be helpful when the source dataset and target are signi\ufb01cantly di\ufb00erent and a large training set is available. However, as suggested in Studer et al.\u2019s work[33], loading pre-trained weights on large-scale datasets like ImageNet [5], even from totally di\ufb00erent domains, can still boost model performance. Through the integrated API provided by LayoutParser, users can easily compare model performances on the benchmark datasets.",
     "metadata": {
       "links": [
@@ -2527,9 +2571,31 @@
       }
     }
   },
+  {
+    "type": "FigureCaption",
+    "element_id": "70d7d0c165c33191957d4e9e017c2f87",
+    "text": "(b) Illustration of the recreated document with dense text structure for better OCR performance",
+    "metadata": {
+      "filetype": "application/pdf",
+      "languages": [
+        "eng"
+      ],
+      "page_number": 10,
+      "data_source": {
+        "record_locator": {
+          "path": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/example-docs/layout-parser-paper.pdf"
+        },
+        "permissions_data": [
+          {
+            "mode": 33188
+          }
+        ]
+      }
+    }
+  },
   {
     "type": "NarrativeText",
-    "element_id": "ebbb8c84b2a69f817c8ae7df20d72dd9",
+    "element_id": "e4a5e8a6208d92b11dc523988ce795d8",
     "text": "Fig. 4: Illustration of (a) the original historical Japanese document with layout detection results and (b) a recreated version of the document image that achieves much better character recognition recall. The reorganization algorithm rearranges the tokens based on the their detected bounding boxes given a maximum allowed height.",
     "metadata": {
       "filetype": "application/pdf",
@@ -2551,7 +2617,7 @@
   },
   {
     "type": "Title",
-    "element_id": "88f6e589165656eceebf898d0240e05c",
+    "element_id": "f791f077e8d2d8faec47792a1d576766",
     "text": "4 LayoutParser Community Platform",
     "metadata": {
       "filetype": "application/pdf",
@@ -2573,7 +2639,7 @@
   },
   {
     "type": "NarrativeText",
-    "element_id": "e9a86eb57ba5483acfeefb0e931402b1",
+    "element_id": "35bc3fa0635dc2c5dd9feadb4ba039b1",
     "text": "Another focus of LayoutParser is promoting the reusability of layout detection models and full digitization pipelines. Similar to many existing deep learning libraries, LayoutParser comes with a community model hub for distributing layout models. End-users can upload their self-trained models to the model hub, and these models can be loaded into a similar interface as the currently available LayoutParser pre-trained models. For example, the model trained on the News Navigator dataset [17] has been incorporated in the model hub.",
     "metadata": {
       "links": [
@@ -2602,7 +2668,7 @@
   },
   {
     "type": "NarrativeText",
-    "element_id": "c08c76705396fe7a65be5dff6d3bffd5",
+    "element_id": "42cb4161d74e8e50a46b23a9daa70586",
     "text": "Beyond DL models, LayoutParser also promotes the sharing of entire doc- ument digitization pipelines. For example, sometimes the pipeline requires the combination of multiple DL models to achieve better accuracy. Currently, pipelines are mainly described in academic papers and implementations are often not pub- licly available. To this end, the LayoutParser community platform also enables the sharing of layout pipelines to promote the discussion and reuse of techniques. For each shared pipeline, it has a dedicated project page, with links to the source code, documentation, and an outline of the approaches. A discussion panel is provided for exchanging ideas. Combined with the core LayoutParser library, users can easily build reusable components based on the shared pipelines and apply them to solve their unique problems.",
     "metadata": {
       "filetype": "application/pdf",
@@ -2624,7 +2690,7 @@
   },
   {
     "type": "Title",
-    "element_id": "53da8301ac140e0b72cdcf6a7f405918",
+    "element_id": "dce66025c40567da68fd7370506997e1",
     "text": "5 Use Cases",
     "metadata": {
       "filetype": "application/pdf",
@@ -2646,7 +2712,7 @@
   },
   {
     "type": "NarrativeText",
-    "element_id": "1fd6bf73b6c80f8ed034bf977fba5a67",
+    "element_id": "0dfe18d94b2a87db18d0a45d434b1cb0",
     "text": "The core objective of LayoutParser is to make it easier to create both large-scale and light-weight document digitization pipelines. Large-scale document processing",
     "metadata": {
       "filetype": "application/pdf",