-
Notifications
You must be signed in to change notification settings - Fork 2
ES scheme
Evildoor edited this page Feb 28, 2020
·
38 revisions
Currently, DKB uses elasticsearch as a final storage, mapping can be found in here. A single index (name?) stores 2 types of documents: task
and output_dataset
. The following tables list fields of the documents.
Columns:
- Field name
- Type - note that elasticsearch's mapping has no special definition of lists - for example, integer and list of integers are both defined as "integer", and the field's contents in this regard depend on what was put into it
- Source from which system the information is retrieved ("derivative" means that it is not present in any source and is constructed from other fields, "service" means that the field is not the part of the data and serves other purposes)
- Comment
- Value - how the field is calculated, "as-is" means value of the field with the same name in the source
Documents of type task
represent the tasks processing ATLAS' data.
Field name | Type | Source | Comment | Value |
---|---|---|---|---|
architecture | keyword | Oracle | ||
campaign | text | Oracle | ||
chain_data | integer | Oracle, table ATLAS_DEFT.t_production_task | task chain is a sequence of related tasks: each task's output is used as input for the next one | list of ids of all tasks in the chain that includes this task, constructed by subquery (tasks after this one are omitted) |
core_count | short | Oracle | ||
ctag | keyword | Oracle | ||
description | text | Oracle | ||
end_time | date | Oracle | ||
energy_gev | integer | Oracle | ||
geometry_version | keyword | Oracle | ||
hashtag_list | keyword | Oracle | String is lowercased and split into a list | |
n_events_per_job | long | Oracle | ||
n_files_per_job | short | Oracle | ||
n_files_to_be_used | integer | Oracle | ||
output_formats | keyword | Oracle | String is split into a list | |
phys_group | text | Oracle | ||
pr_id | integer | Oracle | ||
primary_input | text | Oracle | ||
processed_events | long | Oracle | ||
project | text | Oracle | ||
requested_events | long | Oracle | ||
run_number | integer | Oracle | ||
start_time | date | Oracle | ||
status | keyword | Oracle | ||
step_id | integer | Oracle | ||
step_name | text | Oracle | ||
subcampaign | text | Oracle | ||
task_timestamp | date | Oracle | ||
taskid | integer | Oracle | ||
taskname | text | Oracle | ||
ticket_id | keyword | Oracle | ||
total_events | long | Oracle | ||
trans_home | keyword | Oracle | ||
trans_path | keyword | Oracle | ||
trans_uses | keyword | Oracle | ||
trigger_config | keyword | Oracle | ||
user_name | keyword | Oracle | ||
vo | keyword | Oracle | ||
input_bytes | long | Rucio | as-is, -1 if it is missing or error occurs | |
primary_input_deleted | boolean | Rucio | False if input_bytes is successfully retrieved from source, True otherwise |
|
primary_input_events | long | Rucio | as-is | |
hs06 | long | Chicago ES | ||
toths06 | long | Chicago ES | CPU resources used by the task | |
toths06_failed | long | Chicago ES | 'wasted' CPU resources | |
toths06_finished | long | Chicago ES | CPU resources the task would use in the perfect world | |
chain_id | integer | Derivative | id of the chain's root (the first, initial task in it) | derived from chain_data
|
input_events | long | Derivative | ||
phys_category | keyword | Derivative | Is determined by hashtag_list and taskname | |
_update_required | boolean | Service | marks documents that contain incomplete information about object and thus must be updated sooner or later | True if the record is incomplete and should be updated, False otherwise |
TBD