Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
standardized text extraction tests #49
@PrajitR mentioned something like this a while back and I think it could be really useful (#30 (comment)), particularly in light of issues like #48 where we were inadvertently "extracting" text that wasn't actually in the original file. It might be nice to have a set of tests that should have the exact same output, regardless of the filetype, so we can be confident that each extraction method works consistently.