Skip to content
  • 0.10.1
  • bb58291
  • Compare
    Choose a tag to compare
    Search for a tag
  • 0.10.1
  • bb58291
  • Compare
    Choose a tag to compare
    Search for a tag

@jvalls-axa jvalls-axa released this Feb 25, 2020

Security vulnerability fixed

Bump bleach from 3.1.0 to 3.1.1 in /demo/jupyter-notebook

Assets 2

@jvalls-axa jvalls-axa released this Feb 19, 2020 · 15 commits to master since this release

Changes

  • New input file *.docx
  • New 'Table of contents' processing module
  • UI added button for outputs download
  • Added compatibility for PdfMiner '20200124'
  • Improved PdfMiner extraction time using xml stream reader
  • Allow to run new Ocr's using API by extending configuration file
  • Several bug fixes

Breaking changes

Assets 2

@jvalls-axa jvalls-axa released this Jan 24, 2020 · 120 commits to master since this release

Changes

  • Integrated new OCR's in GUI

    • Google Vision
    • Amazon Textract
    • Microsoft Cognitive Services
    • Abbyy
  • Updated GUI: Added oficial Logo and fixed some cosmetic issues

  • Several bug fixing

  • Updated Readme.md

Assets 2
  • v0.8
  • 26a9936
  • Compare
    Choose a tag to compare
    Search for a tag
  • v0.8
  • 26a9936
  • Compare
    Choose a tag to compare
    Search for a tag

@jvalls-axa jvalls-axa released this Jan 13, 2020 · 182 commits to master since this release

Changes

  • Simple Image detection using PdfMiner.
  • Allowed *.elm as input to be parsed (message body and attachments are used to extract data).
  • GUI can display page margins by activating just a switch.
  • Readme in French.
Assets 2
  • v0.7.1
  • c8d4305
  • Compare
    Choose a tag to compare
    Search for a tag
  • v0.7.1
  • c8d4305
  • Compare
    Choose a tag to compare
    Search for a tag

@jvalls-axa jvalls-axa released this Dec 16, 2019 · 308 commits to master since this release

Changes

  1. Removed 'sharp' dependency from API
  2. Improved errors handling
  3. Allow Tesseract to run multi pages PDF's
  4. Some JS vulnerabilities fixed
  5. Improved Jupyter Notebook document versioning display
Assets 2

@jvalls-axa jvalls-axa released this Dec 9, 2019 · 340 commits to master since this release

Changes

  1. Optimisation of images before tesseract scan (detect rotation & removed shadows)
  2. New input module option Pdf.js (recommended for large Pdf's)
  3. Jupyter Notebook: Added document versioning & comparison
  4. Javascript vulnerability Fixed
  5. Several GUI & Server bug fixes
Assets 2

@jvalls-axa jvalls-axa released this Nov 21, 2019 · 469 commits to master since this release

Changes

  1. Added Jupyter Notebook
  2. Improved Headings detection module (Hight reduction of false positives)
  3. Improved Table detection module
  4. Improved Paragraph detection module
  5. Improved git Readme
  6. Several GUI & Server bug fixes
Assets 2

@jvalls-axa jvalls-axa released this Nov 14, 2019 · 528 commits to master since this release

Changes

  1. New List detection Module (bullet and numeric type list)
  2. Improved Link detection module for pdfMiner extractor
  3. Improved Heading detection module (font usage ratio used to detect headings)
  4. Markdown exporter updated to export tables using standard syntax instead of html syntax
  5. Improved overall output accuracy
  6. Several GUI improvements

Dependencies

  1. Added GraphicksMagick for GUI thumbnails generation
Assets 2

@aarohijohal aarohijohal released this Oct 24, 2019 · 707 commits to master since this release

Changes

  1. Highly improved LinesToParagraph module
  2. Highly improved Headings detection module.
  3. Promotion of pdfminer as the primary PDF extracter + related output cleaning.
  4. Improved text redundancy/duplication detection and treatment.
  5. Leaner docker implementation for faster deploys.
  6. Several Vue UI improvements (demo/vue-viewer), including text inspector, forward, next buttons, and more.
  7. Several bugfixes in markdown export, including more flexible tables including rawspans and colspans.
  8. Windows deployment improvements under both bare-metal and docker flavors.
Assets 2

@aarohijohal aarohijohal released this Sep 19, 2019

Changes

  • Vital bugfix related to the table extraction module
  • Word inspection mode in the Vue UI, including heirarchy highlighting
  • Made pdf2json the default extractor for pdfs
Assets 2
You can’t perform that action at this time.