Identifier for a specific document from a specific source.
Column | Type | Indexed | Nullable | FK | Default | Description |
---|---|---|---|---|---|---|
identifier |
text | Identifier given to the document by the source | ||||
source_config_id |
int | ✓ | SourceConfig used to harvest and ingest the document |
identifier
,source_config_id
(unique)
A piece of raw data, exactly as it was given to SHARE.
Column | Type | Indexed | Nullable | FK | Default | Description |
---|---|---|---|---|---|---|
suid_id |
int | ✓ | ✓ | SUID for this datum | ||
datum |
text | The raw datum itself (typically JSON or XML string) | ||||
sha256 |
text | SHA-256 hash of data |
||||
harvest_jobs |
m2m | List of HarvestJobs for harvester runs that found this exact datum | ||||
ingest_jobs |
m2m | List of IngestJobs that ingested this datum |
suid_id
,sha256
(unique)
Describes one way to harvest metadata from a Source, and how to transform the result.
Column | Type | Indexed | Nullable | FK | Default | Description |
---|---|---|---|---|---|---|
source_id |
int | ✓ | ✓ | Source to harvest from | ||
base_url |
text | ✓ | URL of the API or endpoint where the metadata is available | |||
earliest_date |
date | ✓ | Earliest date with available data | |||
rate_limit_allowance |
int | 5 | Number of requests allowed every rate_limit_period seconds |
|||
rate_limit_period |
int | 1 | Number of seconds for every rate_limit_allowance requests |
|||
harvester_id |
int | ✓ | ✓ | ✓ | Harvester to use | |
harvester_kwargs |
jsonb | ✓ | JSON object passed to the harvester as kwargs | |||
transformer_id |
int | ✓ | ✓ | ✓ | Transformer to use | |
transformer_kwargs |
jsonb | ✓ | JSON object passed to the transformer as kwargs, along with the harvested raw data | |||
disabled |
bool | False | True if this source config should not be run automatically | |||
version |
int | 0 | Version of this source config |
A Source is a place metadata comes from.
Column | Type | Indexed | Nullable | FK | Default | Description |
---|---|---|---|---|---|---|
name |
text | unique | Short name | |||
long_title |
text | unique | Full, human-friendly name | |||
home_page |
text | ✓ | URL | |||
icon |
image | ✓ | Recognizable icon for the source | |||
user_id |
int | ✓ | ✓ | User with permission to submit data as this source (TODO: replace with django permissions stuff) |
Each row corresponds to a Harvester implementation in python.
Column | Type | Indexed | Nullable | FK | Default | Description |
---|---|---|---|---|---|---|
key |
text | unique | Key that can be used to get the corresponding Harvester subclass | |||
date_created |
datetime | now | ||||
date_modified |
datetime | now (on update) |
Each row corresponds to a Transformer implementation in python.
Column | Type | Indexed | Nullable | FK | Default | Description |
---|---|---|---|---|---|---|
key |
text | unique | Key that can be used to get the corresponding Transformer subclass | |||
date_created |
datetime | now | ||||
date_modified |
datetime | now (on update) |
Job entries to track the status of a specific harvester run.
Column | Type | Indexed | Nullable | FK | Default | Description |
---|---|---|---|---|---|---|
task_id |
uuid | ✓ | UUID of the celery task running the harvester | |||
status |
enum(int) | ✓ | created | Status of the harvester run, one of {created, started, failed, succeeded, rescheduled, forced, skipped, retried} | ||
context |
text | "" | A custom message or traceback describing why this job failed or was skipped | |||
completions |
int | 0 | The number of times status has been set to succeeded |
|||
date_started |
datetime | ✓ | Datetime status was last set to started |
|||
date_created |
datetime | now | Datetime this row was created | |||
date_modified |
datetime | ✓ | now (on update) | Datetime this row was last modified | ||
source_config_id |
int | ✓ | ✓ | SourceConfig for this harvester run | ||
share_version |
text | UNKNOWN | The commitish at the time this job was last run | |||
source_config_version |
int | Version of the SourceConfig on the last attempted run |
||||
start_date |
date | ✓ | Beginning of the date range to harvest | |||
end_date |
date | ✓ | End of the date range to harvest | |||
harvester_version |
int | Version of the harvester on the last attempted run |
source_config_id
,start_date
,end_date
,harvester_version
,source_config_version
(unique)
Job entries to track the status of an ingest task
Column | Type | Indexed | Nullable | FK | Default | Description |
---|---|---|---|---|---|---|
task_id |
uuid | ✓ | UUID of the celery task running the harvester | |||
status |
enum(int) | ✓ | created | Status of the harvester run, one of {created, started, failed, succeeded, rescheduled, forced, skipped, retried} | ||
context |
text | "" | A custom message or traceback describing why this job failed or was skipped | |||
completions |
int | 0 | The number of times status has been set to succeeded |
|||
date_started |
datetime | ✓ | Datetime status was last set to started |
|||
date_created |
datetime | now | Datetime this row was created | |||
date_modified |
datetime | ✓ | now (on update) | Datetime this row was last modified | ||
share_version |
text | UNKNOWN | The commitish at the time this job was last run | |||
suid_id |
int | ✓ | ✓ | SUID of the document to ingest | ||
raw_id |
int | ✓ | ✓ | The RawDatum this job will (or did) ingest | ||
source_config_version |
int | Version of the SUID's SourceConfig on the last attempted run |
||||
transformer_version |
int | Version of the Transformer | ||||
regulator_version |
int | Version of the Regulator | ||||
transformed_datum |
text | ✓ | Serialized output from the Transformer | |||
regulator_logs |
o2m | List of RegulatorLogs for this ingestion run | ||||
regulated_datum |
text | ✓ | Serialized output from the Regulator |
suid_id
,latest_raw_id
,source_config_version
,transformer_version
,regulator_version
(unique)
Log of a single operation that happened during regulation
Column | Type | Indexed | Nullable | FK | Default | Description |
---|---|---|---|---|---|---|
description |
text | Human-readable description of what happened | ||||
node_id |
text | ✓ | ID of the node in the graph which was affected by this operation (if only one) | |||
rejected |
bool | Whether a problem was encountered which caused ingestion to fail | ||||
ingest_job_id |
int | ✓ | ✓ | IngestJob this log belongs to |
{Description}
Column | Type | Indexed | Nullable | FK | Default | Description |
---|---|---|---|---|---|---|
✓ | ✓ | ✓ |
{column_name}
,{column_name}
, ... [(unique)]- ...