Example PDF archive #302
EliotJones
started this conversation in
General
Replies: 2 comments
-
In #532 @BobLd provide an additional PDF repository here: https://github.com/pdf-association/pdf-corpora#safedocs-issue-tracker-corpus |
Beta Was this translation helpful? Give feedback.
0 replies
-
The attach zip contains the URLs of of an additional 30,000+ PDFs (~50GB) are available. To download (unix *nix or windows subsystem for linux (WSL)) enter: cut -d, -f1 URLsofExamplePDFs.txt | wget -i - |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Since it is useful to have many PDF documents when carrying out document layout analysis, new feature development and performance work I'm sharing the test archive I put together during the initial development of the library.
https://drive.google.com/file/d/1C6bD4BVIc4pxT4oDEmUT1jLV_oi9NYUh/view?usp=sharing
These PDFs are useful for testing bugfixes and running profiling since they represent a broad cross-section of producers and document types.
Beta Was this translation helpful? Give feedback.
All reactions