Skip to content

ES scheme

Marina Golosova edited this page Jun 9, 2020 · 38 revisions

Overview

Currently, DKB uses elasticsearch as a final storage, mapping can be found in here. A single index (production_tasks, analysis_tasks) stores 2 types of documents: task and output_dataset. The following tables list fields of the documents.

Columns:

  • Field name
  • Type - note that elasticsearch's mapping has no special definition of lists - for example, integer and list of integers are both defined as "integer", and the field's actual contents, in this regard, depend on what was put into it. Some fields are stored in multiple types, in such cases the additional types are listed in brackets.
  • Source from which system the information is retrieved ("derivative" means that it is not present in any source and is constructed from other fields, "service" means that the field is not the part of the data and serves other purposes).
  • Comment
  • Value - how the field is calculated, "as-is" means value of the field with the same name in the source.

Tasks

Documents of type task represent the tasks processing ATLAS' data.

Field name Type Source Comment Value
architecture keyword Oracle, table ATLAS_DEFT. t_task extracted from source field jedi_task_ parameters
campaign text (keyword) Oracle, table ATLAS_DEFT. t_production_task as-is
chain_data integer Oracle, table ATLAS_DEFT. t_production_task task chain is a sequence of related tasks: each task's output is used as input for the next one list of ids of all tasks in the chain that includes this task, constructed by subquery (tasks after this one are omitted)
conditions_tags keyword Oracle, table ATLAS_DEFT. t_task extracted from source field jedi_task_ parameters
core_count short Oracle, table ATLAS_DEFT. t_task extracted from source field jedi_task_ parameters
ctag keyword Oracle, table ATLAS_DEFT. t_production_task as-is
description text Oracle, table ATLAS_DEFT. t_prodmanager_ request as-is
end_time date Oracle, table ATLAS_DEFT. t_production_task source field endtime
energy_gev integer Oracle, table ATLAS_DEFT. t_prodmanager_ request as-is
geometry_version keyword Oracle, table ATLAS_DEFT. t_task extracted from source field jedi_task_ parameters
hashtag_list keyword Oracle, table ATLAS_DEFT. t_hashtag aggregation of source field hashtag is lowercased and split into a list
n_events_per_job long Oracle, table ATLAS_DEFT. t_task extracted from source field jedi_task_ parameters
n_files_per_job short Oracle, table ATLAS_DEFT. t_task extracted from source field jedi_task_ parameters
n_files_to_be_used integer Oracle, table ATLAS_DEFT. t_production_task source field filestobeused
output_formats keyword Oracle, table ATLAS_DEFT. t_production_task as-is, but split into a list
phys_group text (keyword) Oracle, table ATLAS_DEFT. t_production_task as-is
pr_id integer (keyword) Oracle, table ATLAS_DEFT. t_production_task as-is
primary_input text (keyword) Oracle, table ATLAS_DEFT. t_production_task as-is
processed_events long Oracle, table ATLAS_PANDA. jedi_datasets sum of source's neventsused corresponding to given taskid (for input datasets) if it is not "Null", total_events otherwise
project text (keyword) Oracle, table ATLAS_DEFT. t_production_task as-is
requested_events long Oracle, table ATLAS_PANDA. jedi_datasets sum of source's nevents corresponding to given taskid (for input datasets)
run_number integer (keyword) Oracle, table ATLAS_DEFT. t_task extracted from source field jedi_task_ parameters
start_time date Oracle, table ATLAS_DEFT. t_production_task as-is
status keyword Oracle, table ATLAS_DEFT. t_production_task as-is
step_id integer Oracle, table ATLAS_DEFT. t_production_task as-is
step_name text (keyword) Oracle, table ATLAS_DEFT. t_step_template as-is
subcampaign text (keyword) Oracle, table ATLAS_DEFT. t_production_task as-is
task_timestamp date Oracle, table ATLAS_DEFT. t_production_task source field timestamp
taskid integer (keyword) Oracle, table ATLAS_DEFT. t_production_task as-is
taskname text (keyword) Oracle, table ATLAS_DEFT. t_production_task as-is
ticket_id keyword Oracle, table ATLAS_DEFT. t_task extracted from source field jedi_task_ parameters
total_events long Oracle, table ATLAS_DEFT. t_production_task as-is
trans_home keyword Oracle, table ATLAS_DEFT. t_task extracted from source field jedi_task_ parameters
trans_path keyword Oracle, table ATLAS_DEFT. t_task extracted from source field jedi_task_ parameters
trans_uses keyword Oracle, table ATLAS_DEFT. t_task extracted from source field jedi_task_ parameters
trigger_config keyword Oracle, table ATLAS_DEFT. t_task extracted from source field jedi_task_ parameters
user_name keyword Oracle, table ATLAS_DEFT. t_production_task source field username
vo keyword Oracle, table ATLAS_DEFT. t_task extracted from source field jedi_task_ parameters
input_bytes long Rucio as-is, "-1" if it is missing or error occurs
primary_input_deleted boolean Rucio "False" if input_bytes is successfully retrieved from source, "True" otherwise
primary_input_events long Rucio as-is
hs06 long Chicago ES, index tasks_archive_* source field cputime
toths06 long Chicago ES, index jobs_archive_* CPU resources used by the task sum of source's hs06sec where jobstatus is "failed" or "finished"
toths06_failed long Chicago ES, index jobs_archive_* 'wasted' CPU resources sum of source's hs06sec where jobstatus is "failed"
toths06_finished long Chicago ES, index jobs_archive_* CPU resources the task would use in the perfect world sum of source's hs06sec where jobstatus is "finished"
chain_id integer Derivative id of the chain's root (the first, initial task in it) derived from chain_data
input_events long Derivative calculated from several other fields' values
phys_category keyword Derivative physics category with which the task can be associated determined by hashtag_list and taskname
_update_required boolean Service marks documents that contain incomplete information about object and thus must be updated sooner or later "True" if the record is incomplete and should be updated, "False" otherwise

Output datasets

Documents of type output_dataset represent the datasets generated by the tasks while processing ATLAS' data.

Field name Type Source Comment Value
datasetname text (keyword) Oracle, table ATLAS_PANDA. jedi_datasets full name of the dataset as-is
bytes long Rucio size of the dataset as-is, "-1" if dataset was not found in source
deleted boolean Rucio whether the dataset was deleted from source or not as-is, "True" if dataset was not found in source
events long Rucio number of events in the dataset as-is
data_format keyword Derivative extracted from datasetname
cross_section double AMI source field crossSection
cross_section_ref keyword AMI source field crossSectionRef
gen_filt_eff double AMI source field genFiltEff
k_factor double AMI source field kFactor
me_pdf keyword AMI source field mePDF
process_group keyword AMI source field processGroup
_update_required boolean Service marks documents that contain incomplete information about object and thus must be updated sooner or later "True" if the record is incomplete and should be updated, "False" otherwise