# sisclab: help miao wang on task `D1.e`

Date: 06.01.2021

Task `D1.e`: "List Code names, sorted by by how many calcjobs where run with each". Ie, find {code_name : number_of_calcjobs}.

In [1]:
from aiida import load_profile
profile = load_profile('sisclab20')

In [2]:
from aiida.orm import QueryBuilder
from aiida.orm import Code, CalcJobNode, Int
from aiida.common import LinkType

# Approach1: via QueryBuilder: ignore codes without any calcjobs

In [3]:
qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Code, with_outgoing='calcjob', edge_filters={'label': 'code'})
result = qb.distinct().all()
codes_with_calcjobs = [single_item_list[0] for single_item_list in result]
codes_with_calcjobs

[<Code: Remote code 'inpgen' on local_mac, pk: 1, uuid: 8b105672-7ebc-4cce-813c-e501d32916ee>,
 <Code: Remote code 'fleur_mpi_v0.28' on claix, pk: 5, uuid: 19ec39f2-5389-40aa-b52f-21bdc4610896>]

Experiment: for counting calcjobs: check if `all_nodes()` is slower than something that returns only a list of strings like `all_link_labels()`.

In [4]:
code = codes_with_calcjobs[0]

In [58]:
%timeit len(code.get_outgoing(node_class=CalcJobNode).all_link_labels())

9.47 ms ± 37.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [59]:
%timeit len(code.get_outgoing(node_class=CalcJobNode).all_nodes())

9.46 ms ± 17 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


# Approach2: via `Entity.objects`: all codes

In [5]:
codes = Code.objects.all()
len(codes)

18

In [6]:
# optional: prune zero-calcjobs with list comprehension (equals python filter() but more elegant)
codes_with_calcjobs = [code for code in codes if len(code.get_outgoing(node_class=CalcJobNode).all_nodes()) > 0]
codes_with_calcjobs

[<Code: Remote code 'fleur_mpi_v0.28' on claix, pk: 5, uuid: 19ec39f2-5389-40aa-b52f-21bdc4610896>,
 <Code: Remote code 'inpgen' on local_mac, pk: 1, uuid: 8b105672-7ebc-4cce-813c-e501d32916ee>]

In [7]:
# create result with a python dict comprehension
result = {code.label : len(code.get_outgoing(node_class=CalcJobNode).all_nodes()) for code in codes}
result

{'fleur_v0.28': 0,
 'inpgen_v0.28': 0,
 'inpgen_iff_0.28': 0,
 'fleur_iff_0.28': 0,
 'fleur_mpi_max_2': 0,
 'fleur_dev_mpi': 0,
 'fleur_mpi_v0.28': 22,
 'fleur_max_2': 0,
 'inpgen_max_2': 0,
 'fleur_dev_mpi_booster': 0,
 'fleur_mac_3_mpi': 0,
 'fleur_max_1.3_dev': 0,
 'fleur_mpi_max_1.3_dev': 0,
 'inpgen': 21}

Nicer output:
- offer option to exclude zero-calcjob nodes
- offer option output as pandas dataframe. see any jupyter notebook pandas tutorial. it looks much nicer. google dict to pandas dataframe. then result, not print(result).
- sort output by descending order (dicts are insertion order sorted, so if use dict, have to do sorting before create dict)
- add optional piechart or something