Skip to content

Added plain-text comparison for tests#1180

Merged
benjats07 merged 24 commits intomainfrom
feat/plain-text-outputs-for-testing
Aug 29, 2023
Merged

Added plain-text comparison for tests#1180
benjats07 merged 24 commits intomainfrom
feat/plain-text-outputs-for-testing

Conversation

@benjats07
Copy link
Copy Markdown
Contributor

@benjats07 benjats07 commented Aug 23, 2023

This PR adds a comparison during ingest test for the content of the files in plain text (i.e.: without JSON format)

@benjats07 benjats07 marked this pull request as ready for review August 23, 2023 22:31
@cragwolfe
Copy link
Copy Markdown
Contributor

I tried modifying
https://github.com/Unstructured-IO/unstructured/blob/main/test_unstructured_ingest/expected-structured-output/local-single-file/english-and-korean.png.json
and running

./test_unstructured_ingest/test-ingest-local-single-file.sh

and got this output:

\ No newline at end of file
xargs: jq '.[].text'<{}>/Users/cragwolfe/r/unstructured/test_unstructured_ingest/expected-text-output/local-single-file: No such file or directory
xargs: jq '.[].text'<{}>/Users/cragwolfe/r/unstructured/test_unstructured_ingest/text-output/local-single-file: No such file or directory

Which makes sense because a file is a valid test path in addition to a directory. Besides fixing that, additional comments:

  • please word wrap long lines to multiple lines (say 80 characters per line) for easier diff viewing.
  • store-just-text.sh could have a more descriptive name. something like json-to-clean-text-file.sh ?
  • please add a line or two to .gitignore so the generated text files are ignored by git

@benjats07
Copy link
Copy Markdown
Contributor Author

@cragwolfe I addressed those issues

@@ -1,2 +1,4 @@
structured-output
download
test_unstructured_ingest/expected-text-output/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the folder name it sounds like this is where the text output fixtures would go. Don't we want to track changes in git for these fixtures?

Copy link
Copy Markdown
Contributor

@cragwolfe cragwolfe Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are no text fixtures so far, only json fixtures. this PR derives the clean text from json's at runtime.

@benjats07 benjats07 enabled auto-merge (squash) August 29, 2023 22:53
Copy link
Copy Markdown
Contributor

@cragwolfe cragwolfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a big improvement to view partitioned text output changes! 👍

@benjats07 benjats07 merged commit 5052e6c into main Aug 29, 2023
@benjats07 benjats07 deleted the feat/plain-text-outputs-for-testing branch August 29, 2023 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants