…d 'just work.' The recurseive loop is ugly and this is the first Ruby code I've ever written, so I'm sure there's 100 things wrong with it.
This patch addresses two problems with tests - Various tests rely on Dir.glob ordering. This is not reliable; this patch introduces assert_directory_contains to avoid Dir.glob ordering inconsistencies. - test_ocr_extraction relies on exact text match from tesseract. However, this differs with each version of tesseract. This patch instead checks that all required txt files exist and that they have reasonable size.
…tion for file. Here is the error: 'file: unrecognized option --mime-type'. The --mime option is more standard.
…y is labeled as 'tesseract-ocr'
…e to PDF