Steps to un-scramble a BullsZIP PDF using KaiU font:
- pdf2htmlex --no-drm 1 --embed-font=0 --font-format svg FILENAME.pdf
- Replace the
f1.svg
in this directory with the generated one - pdf2json -f FILENAME.pdf
- perl build.pl > output.json
For a sample json, see pta_18238_5821809_02503.json
in this directory,
generated by:
http://www.tcec.gov.tw/ezfiles/3/1003/attach/64/pta_18238_5821809_02503.pdf
TODO: Take the JSON and generate CSV or MD from it.