Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add diachronic information to QuiVer #148

Open
4 tasks done
mweidling opened this issue Jan 23, 2023 · 3 comments
Open
4 tasks done

Add diachronic information to QuiVer #148

mweidling opened this issue Jan 23, 2023 · 3 comments
Assignees

Comments

@mweidling
Copy link
Collaborator

mweidling commented Jan 23, 2023

Describe the feature you'd like

Currently we display only the latest information about a workflow in QuiVer. We run a workflow A, the important metrics are saved and overwritten when we run workflow A again.

In order to measure how the changes in the OCR-D software impact the OCR quality as well as the hardware statistics we should introduce diachronic information to QuiVer, e.g. via a time stamp.

User story

As a developer I need an overview of how the changes in the software effect the OCR quality and hardware metrics in order to be certain that the newest contribution to OCR-D really improve the software's outcome.

Ideas we have discussed so far

How to display the information

For each GT corpus available there should be a line chart that depicts how a metric has changed over time. Each step in time (x axis) represents an ocrd_all or a ocrd_core release (clarified -> ocrd_all; see comments)
Users can choose between the different metrics and can see a tendency whether the metric improves or not.

Underlying data structure

When selecting a GT corpus the front end uses an ID map file that points it to the right collection of JSON objects. Each OCR-D workflow that is executed on a GT corpus has a separate file in which all the runs per release are present.

Given GT workspace 16_ant_simple. We then have a file 16_ant_simple_minimal.json with all its benchmarking workflows, 16_ant_simple_selected_pages.json with all its benchmarking workflows etc. Each executed workflow has a timestamp by which the front end can then sort the single executions and retrieve the relevant data.

TODOs

  • clarify what our steps / increments in time are. A release of ocrd_all? A release of ocrd_core?
  • add time stamps to workflow objects
  • add single files for each GT workspace + workflow. ideally, the data should be sorted chronologically right from the start (although the front end should not depend on that)
  • create id map file
@paulpestov
Copy link

Here is the first draft according to this description:
Workflow Runs List@2x (1)

@mweidling
Copy link
Collaborator Author

clarify what our steps / increments in time are. A release of ocrd_all? A release of ocrd_core?

According to @kba this doesn't matter much so I will opt for ocrd_all.

@cneud
Copy link
Member

cneud commented Apr 26, 2023

+1 for basing this of ocrd_all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants