You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we display only the latest information about a workflow in QuiVer. We run a workflow A, the important metrics are saved and overwritten when we run workflow A again.
In order to measure how the changes in the OCR-D software impact the OCR quality as well as the hardware statistics we should introduce diachronic information to QuiVer, e.g. via a time stamp.
User story
As a developer I need an overview of how the changes in the software effect the OCR quality and hardware metrics in order to be certain that the newest contribution to OCR-D really improve the software's outcome.
Ideas we have discussed so far
How to display the information
For each GT corpus available there should be a line chart that depicts how a metric has changed over time. Each step in time (x axis) represents an ocrd_all or a ocrd_core release (clarified -> ocrd_all; see comments)
Users can choose between the different metrics and can see a tendency whether the metric improves or not.
Underlying data structure
When selecting a GT corpus the front end uses an ID map file that points it to the right collection of JSON objects. Each OCR-D workflow that is executed on a GT corpus has a separate file in which all the runs per release are present.
Given GT workspace 16_ant_simple. We then have a file 16_ant_simple_minimal.json with all its benchmarking workflows, 16_ant_simple_selected_pages.json with all its benchmarking workflows etc. Each executed workflow has a timestamp by which the front end can then sort the single executions and retrieve the relevant data.
TODOs
clarify what our steps / increments in time are. A release of ocrd_all? A release of ocrd_core?
add time stamps to workflow objects
add single files for each GT workspace + workflow. ideally, the data should be sorted chronologically right from the start (although the front end should not depend on that)
create id map file
The text was updated successfully, but these errors were encountered:
Describe the feature you'd like
Currently we display only the latest information about a workflow in QuiVer. We run a workflow A, the important metrics are saved and overwritten when we run workflow A again.
In order to measure how the changes in the OCR-D software impact the OCR quality as well as the hardware statistics we should introduce diachronic information to QuiVer, e.g. via a time stamp.
User story
As a developer I need an overview of how the changes in the software effect the OCR quality and hardware metrics in order to be certain that the newest contribution to OCR-D really improve the software's outcome.
Ideas we have discussed so far
How to display the information
For each GT corpus available there should be a line chart that depicts how a metric has changed over time. Each step in time (x axis) represents an ocrd_all or a ocrd_core release (clarified -> ocrd_all; see comments)
Users can choose between the different metrics and can see a tendency whether the metric improves or not.
Underlying data structure
When selecting a GT corpus the front end uses an ID map file that points it to the right collection of JSON objects. Each OCR-D workflow that is executed on a GT corpus has a separate file in which all the runs per release are present.
Given GT workspace 16_ant_simple. We then have a file 16_ant_simple_minimal.json with all its benchmarking workflows, 16_ant_simple_selected_pages.json with all its benchmarking workflows etc. Each executed workflow has a timestamp by which the front end can then sort the single executions and retrieve the relevant data.
TODOs
The text was updated successfully, but these errors were encountered: