ES scheme

Overview

Currently, DKB uses elasticsearch as a final storage, mapping can be found in here. A single index (name?) stores 2 types of documents: task and output_dataset. The following tables list fields of the documents.

Columns:

Field name
Type - note that elasticsearch's mapping has no special definition of lists - for example, integer and list of integers are both defined as "integer", and the field's contents in this regard depend on what was put into it
Source from which system the information is retrieved ("derivative" means that it is not present in any source and is constructed from other fields, "service" means that the field is not the part of the data and serves other purposes)
Comment
Value - how the field is calculated, "as-is" means value of the field with the same name in the source

Tasks

Documents of type task represent the tasks processing ATLAS' data.

Field name	Type	Source	Comment	Value
architecture	keyword	Oracle
campaign	text	Oracle
chain_data	integer	Oracle, table ATLAS_DEFT.t_production_task	task chain is a sequence of related tasks: each task's output is used as input for the next one	list of ids of all tasks in the chain that includes this task, constructed by subquery (tasks after this one are omitted)
core_count	short	Oracle
ctag	keyword	Oracle
description	text	Oracle
end_time	date	Oracle
energy_gev	integer	Oracle
geometry_version	keyword	Oracle
hashtag_list	keyword	Oracle		String is lowercased and split into a list
n_events_per_job	long	Oracle
n_files_per_job	short	Oracle
n_files_to_be_used	integer	Oracle
output_formats	keyword	Oracle		String is split into a list
phys_group	text	Oracle
pr_id	integer	Oracle
primary_input	text	Oracle
processed_events	long	Oracle
project	text	Oracle
requested_events	long	Oracle
run_number	integer	Oracle
start_time	date	Oracle
status	keyword	Oracle
step_id	integer	Oracle
step_name	text	Oracle
subcampaign	text	Oracle
task_timestamp	date	Oracle
taskid	integer	Oracle
taskname	text	Oracle
ticket_id	keyword	Oracle
total_events	long	Oracle
trans_home	keyword	Oracle
trans_path	keyword	Oracle
trans_uses	keyword	Oracle
trigger_config	keyword	Oracle
user_name	keyword	Oracle
vo	keyword	Oracle
input_bytes	long	Rucio		as-is, -1 if it is missing or error occurs
primary_input_deleted	boolean	Rucio		False if `input_bytes` is successfully retrieved from source, True otherwise
primary_input_events	long	Rucio		as-is
hs06	long	Chicago ES
toths06	long	Chicago ES	CPU resources used by the task
toths06_failed	long	Chicago ES	'wasted' CPU resources
toths06_finished	long	Chicago ES	CPU resources the task would use in the perfect world
chain_id	integer	Derivative	id of the chain's root (the first, initial task in it)	derived from `chain_data`
input_events	long	Derivative
phys_category	keyword	Derivative		Is determined by hashtag_list and taskname
_update_required	boolean	Service	marks documents that contain incomplete information about object and thus must be updated sooner or later	True if the record is incomplete and should be updated, False otherwise

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES scheme

Overview

Tasks

Output datasets

Docs

Wiki

Developer guides

Clone this wiki locally