ES scheme

Overview

Currently, DKB uses elasticsearch as a final storage, mapping can be found in here. A single index (name?) stores 2 types of documents: task and output_dataset. The following tables list fields of the documents.

Columns:

Field name
Type - note that elasticsearch's mapping has no special definition of lists - for example, integer and list of integers are both defined as "integer", and the field's actual contents, in this regard, depend on what was put into it. Some fields are stored in multiple types, in such cases the additional types are listed in brackets.
Source from which system the information is retrieved ("derivative" means that it is not present in any source and is constructed from other fields, "service" means that the field is not the part of the data and serves other purposes)
Comment
Value - how the field is calculated, "as-is" means value of the field with the same name in the source

Tasks

Documents of type task represent the tasks processing ATLAS' data.

Field name	Type	Source	Comment	Value
architecture	keyword	Oracle, table t_task		Extracted from source field `jedi_task_ parameters`
campaign	text (keyword)	Oracle, table ATLAS_DEFT. t_production_task		as-is
chain_data	integer	Oracle, table ATLAS_DEFT. t_production_task	task chain is a sequence of related tasks: each task's output is used as input for the next one	list of ids of all tasks in the chain that includes this task, constructed by subquery (tasks after this one are omitted)
conditions_tags	keyword	Oracle, table t_task		Extracted from source field `jedi_task_ parameters`
core_count	short	Oracle, table t_task		Extracted from source field `jedi_task_ parameters`
ctag	keyword	Oracle, table ATLAS_DEFT. t_production_task		as-is
description	text	Oracle, table ATLAS_DEFT. t_prodmanager_ request		as-is
end_time	date	Oracle, table ATLAS_DEFT. t_production_task		source field `endtime`
energy_gev	integer	Oracle, table ATLAS_DEFT. t_prodmanager_ request		as-is
geometry_version	keyword	Oracle, table t_task		Extracted from source field `jedi_task_ parameters`
hashtag_list	keyword	Oracle		String is lowercased and split into a list
n_events_per_job	long	Oracle, table t_task		Extracted from source field `jedi_task_ parameters`
n_files_per_job	short	Oracle, table t_task		Extracted from source field `jedi_task_ parameters`
n_files_to_be_used	integer	Oracle, table ATLAS_DEFT. t_production_task		source field `filestobeused`
output_formats	keyword	Oracle		String is split into a list
phys_group	text (keyword)	Oracle, table ATLAS_DEFT. t_production_task		as-is
pr_id	integer (keyword)	Oracle, table ATLAS_DEFT. t_production_task		as-is
primary_input	text (keyword)	Oracle, table ATLAS_DEFT. t_production_task		as-is
processed_events	long	Oracle, table ATLAS_PANDA. jedi_datasets		Sum of source's `neventsused` corresponding to given `taskid` if it is not `Null`, `total_events` otherwise
project	text (keyword)	Oracle, table ATLAS_DEFT. t_production_task		as-is
requested_events	long	Oracle, table ATLAS_PANDA. jedi_datasets		Sum of source's `nevents` corresponding to given `taskid`
run_number	integer (keyword)	Oracle, table t_task		Extracted from source field `jedi_task_ parameters`
start_time	date	Oracle, table ATLAS_DEFT. t_production_task		as-is
status	keyword	Oracle, table ATLAS_DEFT. t_production_task		as-is
step_id	integer	Oracle, table ATLAS_DEFT. t_production_task		as-is
step_name	text (keyword)	Oracle, table ATLAS_DEFT. t_step_template		as-is
subcampaign	text (keyword)	Oracle, table ATLAS_DEFT. t_production_task		as-is
task_timestamp	date	Oracle, table ATLAS_DEFT. t_production_task		source field `timestamp`
taskid	integer (keyword)	Oracle, table ATLAS_DEFT. t_production_task		as-is
taskname	text (keyword)	Oracle, table ATLAS_DEFT. t_production_task		as-is
ticket_id	keyword	Oracle, table t_task		Extracted from source field `jedi_task_ parameters`
total_events	long	Oracle, table ATLAS_DEFT. t_production_task		as-is
trans_home	keyword	Oracle, table t_task		Extracted from source field `jedi_task_ parameters`
trans_path	keyword	Oracle, table t_task		Extracted from source field `jedi_task_ parameters`
trans_uses	keyword	Oracle, table t_task		Extracted from source field `jedi_task_ parameters`
trigger_config	keyword	Oracle, table t_task		Extracted from source field `jedi_task_ parameters`
user_name	keyword	Oracle, table ATLAS_DEFT. t_production_task		source field `username`
vo	keyword	Oracle, table t_task		Extracted from source field `jedi_task_ parameters`
input_bytes	long	Rucio		as-is, -1 if it is missing or error occurs
primary_input_deleted	boolean	Rucio		False if `input_bytes` is successfully retrieved from source, True otherwise
primary_input_events	long	Rucio		as-is
hs06	long	Chicago ES, index tasks_archive_* (clarify this)		source field `cputime`
toths06	long	Chicago ES, index jobs_archive_* (clarify this)	CPU resources used by the task	Sum of source's `hs06sec` where `jobstatus` is `failed` or `finished`
toths06_failed	long	Chicago ES, index jobs_archive_* (clarify this)	'wasted' CPU resources	Sum of source's `hs06sec` where `jobstatus` is `failed`
toths06_finished	long	Chicago ES, index jobs_archive_* (clarify this)	CPU resources the task would use in the perfect world	Sum of source's `hs06sec` where `jobstatus` is `finished`
chain_id	integer	Derivative	id of the chain's root (the first, initial task in it)	derived from `chain_data`
input_events	long	Derivative		Is calculated from several other fields' values
phys_category	keyword	Derivative	physics category with which the task can be associated	Is determined by hashtag_list and taskname
_update_required	boolean	Service	marks documents that contain incomplete information about object and thus must be updated sooner or later	True if the record is incomplete and should be updated, False otherwise

Output datasets

Documents of type output_dataset represent the datasets generated by the tasks while processing ATLAS' data.

Field name	Type	Source	Comment	Value
datasetname	text (keyword)	Oracle, table ATLAS_PANDA. jedi_datasets	full name of the dataset	as-is
bytes	long	Rucio	size of the dataset	as-is, -1 if dataset was not found in source
deleted	boolean	Rucio	whether the dataset was deleted from source or not	as-is, True if dataset was not found in source
events	long	Rucio	number of events in the dataset	as-is
data_format	keyword	Derivative		extracted from `datasetname`
cross_section	double	AMI		source field `crossSection`
cross_section_ref	keyword	AMI		source field `crossSectionRef`
gen_filt_eff	double	AMI		source field `genFiltEff`
k_factor	double	AMI		source field `kFactor`
me_pdf	keyword	AMI		source field `mePDF`
process_group	keyword	AMI		source field `processGroup`
_update_required	boolean	Service	marks documents that contain incomplete information about object and thus must be updated sooner or later	True if the record is incomplete and should be updated, False otherwise

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES scheme

Overview

Tasks

Output datasets

Docs

Wiki

Developer guides

Clone this wiki locally