You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are several "PDF2SVG"converters running on different platforms (Java, Python, C(++)). Although the format is SVG there are many ways that it could be structured. We have used 2 which "run on all platforms":
PDF2SVG (AMI) Java https://bitbucket.org/petermr/pdf2svg/wiki/Home . This was based on PDFBox 1.8 (https://pdfbox.apache.org/) which has a very thorough toolchain for extracting PDF.
This is the default which will be used for this project. It runs from the commandline but is not yet pacaked as an uber-jar.
We plan to move to PDFBox 2.0.4 but not during the CM-UCL project.
PDF2SVG (http://www.cityinthesky.co.uk/opensource/pdf2svg/) this wraps some existing libraries. This is (somewhat) easier to install than AMI-PDF2SVG and has a more compact output. However it has not been tested for producing SVG2XML input and will not be used for production.
PDF2SVG only needs to be run once (and has been). The tables have been extracted by hand from both corpora.
The text was updated successfully, but these errors were encountered:
There are several "PDF2SVG"converters running on different platforms (Java, Python, C(++)). Although the format is SVG there are many ways that it could be structured. We have used 2 which "run on all platforms":
PDF2SVG (AMI) Java https://bitbucket.org/petermr/pdf2svg/wiki/Home . This was based on PDFBox 1.8 (https://pdfbox.apache.org/) which has a very thorough toolchain for extracting PDF.
This is the default which will be used for this project. It runs from the commandline but is not yet pacaked as an uber-jar.
We plan to move to PDFBox 2.0.4 but not during the CM-UCL project.
PDF2SVG (http://www.cityinthesky.co.uk/opensource/pdf2svg/) this wraps some existing libraries. This is (somewhat) easier to install than AMI-PDF2SVG and has a more compact output. However it has not been tested for producing SVG2XML input and will not be used for production.
PDF2SVG only needs to be run once (and has been). The tables have been extracted by hand from both corpora.
The text was updated successfully, but these errors were encountered: