Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the size of the output data proportional to the size of the input data #26

Open
andrewphilipsmith opened this issue Apr 6, 2022 · 0 comments
Labels
enhancement New feature or request
Milestone

Comments

@andrewphilipsmith
Copy link
Contributor

As a:
user who is processing large volumes of data
I want:
alto2txt to check the ratio of the size of the output data to the size of the input data and warn me if it falls outside a given range.
so that:
potential errors in the text processing can be identified.

Notes:

  • The ratio of input/output should be calculated on the smallest unit practical; eg "newpaper+year", or "newpaper+month" or "newpaper+issue" etc.
  • The acceptable range of ratios will need to be worked out experimentally (though anecdotally around 10:1 seems typical).
  • There should be unittests to accompany this.

This idea came out of a meeting with @fedenanni and @thobson88. Our notes are here https://hackmd.io/KeOzeaMYTOiF37pGUjUl7A

@andrewphilipsmith andrewphilipsmith added the enhancement New feature or request label Apr 6, 2022
@andrewphilipsmith andrewphilipsmith added this to the v0.5 milestone Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant