Skip to content

audreyt/pdf2md

Repository files navigation

pdf2md

Steps to un-scramble a BullsZIP PDF using KaiU font:

  1. pdf2htmlex --no-drm 1 --embed-font=0 --font-format svg FILENAME.pdf
  2. Replace the f1.svg in this directory with the generated one
  3. pdf2json -f FILENAME.pdf
  4. perl build.pl > output.json

For a sample json, see pta_18238_5821809_02503.json in this directory, generated by:

http://www.tcec.gov.tw/ezfiles/3/1003/attach/64/pta_18238_5821809_02503.pdf

TODO: Take the JSON and generate CSV or MD from it.

About

PDF to Markdown

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published