standardized text extraction tests #49

deanmalmgren · 2014-08-22T19:59:15Z

@PrajitR mentioned something like this a while back and I think it could be really useful (#30 (comment)), particularly in light of issues like #48 where we were inadvertently "extracting" text that wasn't actually in the original file. It might be nice to have a set of tests that should have the exact same output, regardless of the filetype, so we can be confident that each extraction method works consistently.

…tuations in #53

…thods (like pdfminer)

)

…t.TestCase as necessary

standardized text extraction tests

deanmalmgren added the testing label Aug 12, 2014

deanmalmgren mentioned this pull request Aug 13, 2014

Added unicode tests for pdf parser #54

Closed

Dean Malmgren added 17 commits August 22, 2014 13:35

adapted testing framework from #54 to work in practice for .eml

9582955

ShellParserTestCase now tests for filenames with spaces to address si…

8924019

…tuations in #53

added all other raw_text tests to new testing framework

de0c24f

broke out tests for cli and python separately

e577a3f

no more crappy run_functional_tests.sh script

c49ff83

strange importing problems on travis-ci

1a31711

no more running provision/debian.sh as sudo on travis

b586f5e

refactored to add some quick tests for jpeg synonym

6b81bf5

refactored tests to make it possible to test particular extraction me…

86a7df1

…thods (like pdfminer)

added test for running subprocess commands on large files (see issue #33

be39b15

)

updated contributing documentation for the new test suite

016c7ac

minor doc tweaks

e99be5a

added standardized text tests for all supported file types to date

b3cf4dc

updated changelog

35fc7af

redo inheritance in test classes so we can override things in unittes…

552b3c3

…t.TestCase as necessary

managing temp files better

6e3b24a

fixed test from refactoring

67eb306

deanmalmgren pushed a commit that referenced this pull request Aug 25, 2014

Merge pull request #49 from deanmalmgren/standardized-tests

d16f784

standardized text extraction tests

deanmalmgren merged commit d16f784 into master Aug 25, 2014

deanmalmgren deleted the standardized-tests branch August 25, 2014 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

standardized text extraction tests #49

standardized text extraction tests #49

deanmalmgren commented Aug 22, 2014

standardized text extraction tests #49

standardized text extraction tests #49

Conversation

deanmalmgren commented Aug 22, 2014