[WIP] es/query: request_id-based derivation tasks statistics #187

mgolosova · 2018-12-05T11:20:28Z

Added query to get hashtag-based derivation tasks statistics.

The query gets information:

aggregated by output data formats;
within formats -- aggregated by task status;
for each format+status bucket:
- total number of input events;
- total size of input datasets;
- total number of output events;
- total size of output datasets;
- average task walltime;
- estimated total cpu time.

ToDo

fix issue with negative walltime value (see q.json and deriv-stats.json);

The query gets information: * aggregated by output data formats; * within formats -- aggregated by task status; * for each format+status bucket: * total number of input events; * total size of input datasets; * total number of output events; * total size of output datasets; * average task walltime; * estimated total cpu time.

mgolosova · 2018-12-05T12:19:23Z

hashtag-статистика для Deriv tasks

It is said that for derivation tasks it is more common to look for tasks with given request ID than with given hashtag(s).

ES aggregation "terms" returns by default only first 10 buckets; to get others, "size" should be specified.

Field "data_format" of output dataset is artificially extended with "general" format: "DAOD_EXOT12" is turned to ["DAOD", "DAOD_EXOT12"] (see PR #102, commit 8c5ca49). For given task it is not that good: we have extra format "DAOD", that does not fit any specific datatset yet fits all the "DAOD_*" datasets. To bypass this issue, list of data formats can be taken from tasks metadata ("output_formats" field).

Somehow there are tasks with `start_time` > `end_time` in ProdSys2 DB, so we have to check it explicitly to have the correct result.

mgolosova · 2019-01-23T12:45:38Z

[WIP] status is due to the fact that we still don`t know if the query does what it was made for.

Initially it was supposed that the query main parameter is hashtag (or list of hashtags), but later it was changed to Request ID.

NOTE: output sample will be updated later, when data in ES are ready.

mgolosova self-assigned this Dec 5, 2018

mgolosova added 5 commits December 6, 2018 13:42

es/query: replace hashtag(s) condition with requst ID condition.

d0beb1c

It is said that for derivation tasks it is more common to look for tasks with given request ID than with given hashtag(s).

es/query: add query to get list of output dataset formats for PR_ID.

952d7f2

es/query: fix issue with too little data formats number.

c0cbcb0

ES aggregation "terms" returns by default only first 10 buckets; to get others, "size" should be specified.

es/query: fix issue with negative walltime in deriv-steps statistics.

e72c509

Somehow there are tasks with `start_time` > `end_time` in ProdSys2 DB, so we have to check it explicitly to have the correct result.

mgolosova changed the title ~~[WIP] Hashtag-based derivation tasks statistics~~ [WIP] Request-based derivation tasks statistics Jan 11, 2019

mgolosova changed the title ~~[WIP] Request-based derivation tasks statistics~~ [WIP] es/query: request_id-based derivation tasks statistics Jan 23, 2019

mgolosova added 2 commits January 23, 2019 13:48

es/query: rename files according to the query.

3ba05ef

Initially it was supposed that the query main parameter is hashtag (or list of hashtags), but later it was changed to Request ID.

[WIP] es/query: use "toths06*" fields instead of "hs06*total_events".

5648915

NOTE: output sample will be updated later, when data in ES are ready.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] es/query: request_id-based derivation tasks statistics #187

[WIP] es/query: request_id-based derivation tasks statistics #187

mgolosova commented Dec 5, 2018 •

edited

Loading

mgolosova commented Dec 5, 2018

mgolosova commented Jan 23, 2019

[WIP] es/query: request_id-based derivation tasks statistics #187

Are you sure you want to change the base?

[WIP] es/query: request_id-based derivation tasks statistics #187

Conversation

mgolosova commented Dec 5, 2018 • edited Loading

mgolosova commented Dec 5, 2018

mgolosova commented Jan 23, 2019

mgolosova commented Dec 5, 2018 •

edited

Loading