You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Optionally support a libsql connection URI which will be used to track jobs as they are processed by twine-writerd or twine-cli.
A job consists of:
A UUID to identify it
Optional a parent UUID
A URI to identify it (which may simply be a urn:uuid: representation of the job UUID, if nothing else is suitable, otherwise it'll be the canonical source or target URI, depending upon the processing pipeline; workflow components may update it accordingly during processing)
Timestamps for added and updated
A status: WAITING, ACTIVE, ABORTED (by the user), COMPLETE, FAILED, ERRORS (partial failure)
A status annotation (free-text) which may be set to indicate the failure reason
If active, the cluster/instance details of the node processing the job (preserved for diagnosis once set)
Processing item x of y progress indicators (particularly for bulk ingests from filesystem sources)
UUIDs should be where possible taken from the source, if it incorporates one into its identification, or generated on-the-fly if this is not possible.
A job stack should be maintained internally to libtwine in order to track parent/child relationships, rather than requiring it to be made explicit.
As an example, an ingest of N-Quads from a file, processing with spindle-correlate might yield the following:
A job is created in state WAITING with a newly-generated UUID and a file:/// URI
The N-Quads are parsed and the number of graphs determined; the job is updated to state ACTIVE, with progress set to 0 of number-of-graphs
For each graph that is correlated by Spindle, progress is updated, and a new child job is created in state WAITING, using the Spindle-generated UUID and URI
Once processing of the N-Quads is complete, the job status is updated to COMPLETE
As spindle-generate later processes its queue of items, it performs the following:
A job is created in state WAITING using the Spindle-generated UUID and URI; if it already exists, its parentage is preserved (thus, if the job originated from an ingest as described above, the proxy-generation step maintains the parent-child relationship allowing for ready visualisation
As the proxy is generated, its status is updated accordingly
With this arrangement, a small number of relatively simple SQL queries can result in progress tracking and volumetrics across a processing cluster.
Open question: how would Twine know when to preserve versus replace the parent of a job?
Perhaps it could be as simple as user action (i.e., twine-cli) taking precedence over an on-going process — thus, a queue-driven twine-writerd will only set the parent of a job if it's newly-created, whereas twine-cli will always override it. Both would create an overarching job for their processing runs, whether that's from a file or a queue.
Sketched interface to be implemented as part of libtwine to support this functionality:
typedef/*opaque*/structtwine_job_structTWINEJOB;
typedefenum
{
TJS_WAITING,
TJS_ACTIVE,
TJS_ABORTED,
TJS_COMPLETED,
TJS_FAILED,
TJS_ERRORS
} TWINEJOBSTATUS;
typedefenum
{
TJP_PRESERVE,
TJP_FORCE
} TWINEJOBPARENTAGE;
/* This is a relatively low-level libtwine API: the only side-effects are limited to * twine_job_create() creating or updating rows depending upon the parentage * mode of the current parent job and whether a row for that UUIS exists or not. */TWINEJOB*twine_job_create(constuuid_tuuid, constchar*restrict uri, CLUSTER*restrict /*optional*/cluster);
inttwine_job_close(TWINEJOB*job);
constchar*twine_job_uristr(TWINEJOB*job);
inttwine_job_set_uristr(TWINEJOB*restrict job, constchar*restrict uri);
/* NB: possibly require URI and librdf_uri variants of the above */inttwine_job_set_parentage(TWINEJOB*job, TWINEJOBPARENTAGEmode);
inttwine_job_update(TWINEJOB*restrict job, TWINEJOBSTATUSstatus, constchar*restrict /*optional*/annotation);
inttwine_job_set_progress(TWINEJOB*job, int/*optional*/current, int/*optional*/total);
/* NB: twine_job_set_progress() uses -1 as a sentinel to indicate NULL integer values; * these will cause the job status to be left unchanged: twine_job_set_progress(job, -1, -1); * is therefore a no-op */
Optionally support a
libsql
connection URI which will be used to track jobs as they are processed bytwine-writerd
ortwine-cli
.A job consists of:
urn:uuid:
representation of the job UUID, if nothing else is suitable, otherwise it'll be the canonical source or target URI, depending upon the processing pipeline; workflow components may update it accordingly during processing)WAITING
,ACTIVE
,ABORTED
(by the user),COMPLETE
,FAILED
,ERRORS
(partial failure)x
ofy
progress indicators (particularly for bulk ingests from filesystem sources)UUIDs should be where possible taken from the source, if it incorporates one into its identification, or generated on-the-fly if this is not possible.
A job stack should be maintained internally to
libtwine
in order to track parent/child relationships, rather than requiring it to be made explicit.As an example, an ingest of N-Quads from a file, processing with
spindle-correlate
might yield the following:WAITING
with a newly-generated UUID and afile:///
URIACTIVE
, with progress set to 0 of number-of-graphsWAITING
, using the Spindle-generated UUID and URICOMPLETE
As
spindle-generate
later processes its queue of items, it performs the following:WAITING
using the Spindle-generated UUID and URI; if it already exists, its parentage is preserved (thus, if the job originated from an ingest as described above, the proxy-generation step maintains the parent-child relationship allowing for ready visualisationWith this arrangement, a small number of relatively simple SQL queries can result in progress tracking and volumetrics across a processing cluster.
Open question: how would Twine know when to preserve versus replace the parent of a job?
Perhaps it could be as simple as user action (i.e.,
twine-cli
) taking precedence over an on-going process — thus, a queue-driventwine-writerd
will only set the parent of a job if it's newly-created, whereastwine-cli
will always override it. Both would create an overarching job for their processing runs, whether that's from a file or a queue.Tracked as RESDATA-1279
The text was updated successfully, but these errors were encountered: