You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 20, 2021. It is now read-only.
There's a DataSet state model inherited from the days when that state was entirely managed by the Sip-Creator, and evolved a little bit to accomodate for changes in the processing model.
The current states are:
INCOMPLETE
PARSING
UPLOADED
QUEUED
PROCESSING
CANCELLED
ENABLED
DISABLED
ERROR
The event model now aims at reflecting every change of state and other event related to a DataSet. The existing events are:
Created
Updated
Removed
StateChanged
Error (propagates the error message)
Locked
Unlocked
SourceRecordCountChanged (transient event, only used for giving feedback in the UI for the progress)
ProcessedRecordCountChanged (transient event, only used for giving feedback in the UI for the progress)
There is however a number of events or change of state (depending on what semantics the state model has) that are not captured:
when a new mapping is uploaded to the hub form the Sip-Creator
when a new set of invalid record indexes is uploaded to the hub from the Sip-Creator
when Sip-Creator hints are uploaded
when a set is being downloaded from the Sip-Creator (this could be inferred by the state changing to Locked, but it's not exactly the same)
Some of the above events, although discrete, could perhaps be viewed as one, as e.g. mappings, invalid records (and hints?) are usually connected on an abstract level. These events have impact on other - not yet existing - states, for example a new mapping means that the set is "outdated" in some way, as the new mapping probably influences the way the data and index looks like.
We should think of a better state model for the DataSet, with various states related to the different parts of the life-cycle:
the creation life-cycle (created, meta-data updated, deleted)
the publishing life-cycle (queued, processing, enabled, disabled)
the provisioning life-cycle (mapping uploaded, invalid record indexes uploaded, statistics uploaded, hints uploaded, storing source, source uploaded)
the usage life-cycle (locked, unlocked, downloading source?)
It might also make sense to consider splitting the publishing life-cycle so that it is possible to index without re-creating the cache. For this however the state model needs to reflect whether the cache is outdated (in comparison to the mapping).
I think a lot of the above would get clearer if we could somehow bundle the sip-creator "meta-information" (mapping, invalid records, hints, statistics) and keep versions thereof. We dropped the versioning of source data for the time being as it does not effectively bring any added value and technically isn't viable at the moment. A lot of the "versions" created contained identical data and were only versioned because of problems in identifying the same records.
The text was updated successfully, but these errors were encountered:
I think that the state machine is the core concept orchestrating the internal and external aspects of the dataset workflow, so it is of paramount importance that we devise and refine this state machine in the context of a unit test which covers all possible situations.
I'm thinking more in terms of a state composed of a number of bits rather than an enumeration of all individual plausible states. Many of the state transitions could pay attention to only one or two of these bits, which avoids the quadratic explosion of state transitions. In other words, many of these bits cast shadows on the others.
It should be possible to upload data before any proper mapping has been performed so that one person can upload the data and hand it over to somebody else for building a mapping.
Actually this is possible to do. If you then try processing the set
with missing mappings, they will be ignored during the run. Of course
it'd be better if the interface wouldn't let you process at all.
It should be possible to upload data before any proper mapping has been performed so that one person can upload the data and hand it over to somebody else for building a mapping.
Reply to this email directly or view it on GitHub: #602 (comment)
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
There's a DataSet state model inherited from the days when that state was entirely managed by the Sip-Creator, and evolved a little bit to accomodate for changes in the processing model.
The current states are:
The event model now aims at reflecting every change of state and other event related to a DataSet. The existing events are:
There is however a number of events or change of state (depending on what semantics the state model has) that are not captured:
Some of the above events, although discrete, could perhaps be viewed as one, as e.g. mappings, invalid records (and hints?) are usually connected on an abstract level. These events have impact on other - not yet existing - states, for example a new mapping means that the set is "outdated" in some way, as the new mapping probably influences the way the data and index looks like.
We should think of a better state model for the DataSet, with various states related to the different parts of the life-cycle:
It might also make sense to consider splitting the publishing life-cycle so that it is possible to index without re-creating the cache. For this however the state model needs to reflect whether the cache is outdated (in comparison to the mapping).
I think a lot of the above would get clearer if we could somehow bundle the sip-creator "meta-information" (mapping, invalid records, hints, statistics) and keep versions thereof. We dropped the versioning of source data for the time being as it does not effectively bring any added value and technically isn't viable at the moment. A lot of the "versions" created contained identical data and were only versioned because of problems in identifying the same records.
The text was updated successfully, but these errors were encountered: