Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isolated bandsdata list and test #3817

Merged
merged 4 commits into from
Apr 11, 2020
Merged

Conversation

unkcpz
Copy link
Member

@unkcpz unkcpz commented Feb 29, 2020

Fixes #3811

  • when no ancestors found for BandsData object verdi data bands list returns NOT FOUND as formula. <<UNKNOW>> reserved for the situation when ancestor found but its formula cannot be read properly.
  • in abstractqueries::get_bands_and_parents_structure query for band after the group query to prevent item listed in empty group.
  • _extrat_formula removed from djsite::queries use the general function.

@unkcpz
Copy link
Member Author

unkcpz commented Mar 1, 2020

Emm.. I think I don't have the permissions to assign reviewers, right? Could you review it?@ltalirz

Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @unkcpz for the fix!

some questions/comments from my side.

Re pre-commit failure: it's probably due to a different yapf version - aiida-core currently uses 0.28

@@ -136,7 +136,7 @@ def _extract_formula(args, akinds, asites):

# We want only the StructureData that have attributes
if akinds is None or asites is None:
return None
return '<<UNKNOWN>>'
Copy link
Member

@ltalirz ltalirz Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand: why do we need to introduce this custom label?
EDIT: Oh, I see, it is already in the code... Aren't you changing the meaning of <<UNKNOWN>> here, though?

Anyhow, since you now already looked into this function, if you could add a few lines of documentation to its docstring that would be great.
Have you seen whether this args object is documented anywhere else?
We're really digging in some aiida code from the "old days" here... great to clean this up a little bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you are right. I did this because I want to make as few changes as possible.
Now, I use NOT FOUND for the isolated bands, and reserve <<UNKNOW>> for what was originally designed for (the situation when formula variable is unproperly assigned).

tests/cmdline/commands/test_data.py Outdated Show resolved Hide resolved
@ltalirz
Copy link
Member

ltalirz commented Mar 3, 2020

Thanks a lot @unkcpz for working through this convoluted code!

Before I go through this line-by-line:
@giovannipizzi - in this PR, @unkcpz is removing the django-specific implementation of _extract_formula (I agree).

While we're at it, there is also a django-specific implementation of get_bands_and_parents_structure.
Do you mind giving this function a look and letting us know whether there are actual performance reasons for keeping this, or can we get rid of this as well?
To me, this code looks very complicated, so it should really be significantly faster in order to justify its existence:

def get_bands_and_parents_structure(self, args):
"""Returns bands and closest parent structure."""
from django.db.models import Q
from aiida.backends.djsite.db import models
from aiida.common.utils import grouper
from aiida.orm import BandsData
q_object = None
if args.all_users is False:
from aiida import orm
q_object = Q(user__id=orm.User.objects.get_default().id)
else:
q_object = Q()
self.query_past_days(q_object, args)
self.query_group(q_object, args)
bands_list_data = models.DbNode.objects.filter(
node_type__startswith=BandsData.class_node_type
).filter(q_object).distinct().order_by('ctime').values_list('pk', 'label', 'ctime')
entry_list = []
# the frist argument of the grouper function is the query group size.
for this_chunk in grouper(100, [(_[0], _[1], _[2]) for _ in bands_list_data]):
# gather all banddata pks
pks = [_[0] for _ in this_chunk]
# get the closest structures (WITHOUT DbPath)
structure_dict = get_closest_parents(pks, Q(node_type='data.structure.StructureData.'), chunk_size=1)
struc_pks = [structure_dict[pk] for pk in pks]
# query for the attributes needed for the structure formula
res_attr = models.DbNode.objects.filter(id__in=struc_pks).values_list('id', 'attributes')
# prepare the printout
for (b_id_lbl_date, struc_pk) in zip(this_chunk, struc_pks):
formula = self._extract_formula(struc_pk, args, {rattr[0]: rattr[1] for rattr in res_attr})
if formula is None:
continue
entry_list.append([
str(b_id_lbl_date[0]),
str(formula), b_id_lbl_date[2].strftime('%d %b %Y'), b_id_lbl_date[1]
])
return entry_list

@unkcpz
Copy link
Member Author

unkcpz commented Mar 5, 2020

@ltalirz @giovannipizzi
If I understand correctly, code support for different backend comes from historical reasons and there was no efficiency promotion. This piece of code was added e94fad1 , at that time, it seemed that all queries is done by Q function directly .
All the band query and list code would be better to remove to aiida/cmdline/command/cmd_data/cmd_bands.py where the function was called. This will keep it consistent with other similar implementations such as cmd_structure.

If it is correctly, I can make a PR to try to fixes the band query function here and check if the other place have the same problem.
my two cents

@giovannipizzi
Copy link
Member

I don't think this function is a leftover - it was moved there on purpose (I have some memory of this) because it was faster. Now, I don't know if with the current changes it's faster. Note that this touched the transitive table, i.e. now the ancestors and descendants in the QueryBuilder.

I agree that that functionality is quite specific. However, before removing, I suggest testing the Django-specific implementation vs. the generic implementation, and reporting the results here (but this should be done in a big DB).
Actually, this could be a good test for @danieleongari (see #3755) to check if this (or an adaptation of this) is faster than using the QueryBuilder? (It might be, this gets the 'closest' ancestors, while the QueryBuilder always gets everything).

@unkcpz
Copy link
Member Author

unkcpz commented Mar 10, 2020

The original purpose of this PR seems already reached. Could you merge this so I can get back to #3798 based on this PR? @ltalirz
or we just keep this PR open and wait for the report from @danielhollas ?

I have no experience at working several interrelated issues at the same time, could you tell me how to deal with this situation? :-p

@danielhollas
Copy link
Collaborator

danielhollas commented Mar 10, 2020

The original purpose of this PR seems already reached. Could you merge this so I can get back to #3798 based on this PR? @ltalirz
or we just keep this PR open and wait for the report from @danielhollas ?

Too many Daniels in the world 😆 Except for one PR, I have no connection to this project.

@unkcpz I think you wanted to mention @danieleongari, though I seem to be unable to mention him for some reason, though Github does not suggest him while typing for some reason...

@unkcpz
Copy link
Member Author

unkcpz commented Mar 10, 2020

forgive my careless @danielhollas . yes, Github does not suggest him while typing.

@codecov
Copy link

codecov bot commented Mar 22, 2020

Codecov Report

Merging #3817 into develop will decrease coverage by 0.92%.
The diff coverage is 96.42%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3817      +/-   ##
===========================================
- Coverage    78.13%   77.21%   -0.93%     
===========================================
  Files          460      458       -2     
  Lines        33978    33767     -211     
===========================================
- Hits         26548    26072     -476     
- Misses        7430     7695     +265     
Flag Coverage Δ
#django 69.26% <32.14%> (-0.89%) ⬇️
#sqlalchemy 70.07% <64.28%> (-0.93%) ⬇️
Impacted Files Coverage Δ
aiida/backends/general/abstractqueries.py 63.70% <94.73%> (+5.44%) ⬆️
aiida/backends/djsite/queries.py 92.78% <100.00%> (+11.36%) ⬆️
aiida/engine/processes/calcjobs/tasks.py 32.36% <0.00%> (-37.35%) ⬇️
aiida/engine/daemon/execmanager.py 29.69% <0.00%> (-30.46%) ⬇️
aiida/engine/processes/calcjobs/manager.py 54.90% <0.00%> (-26.48%) ⬇️
aiida/orm/autogroup.py 70.11% <0.00%> (-16.13%) ⬇️
aiida/schedulers/plugins/direct.py 41.52% <0.00%> (-11.70%) ⬇️
aiida/parsers/plugins/arithmetic/add.py 72.41% <0.00%> (-10.35%) ⬇️
aiida/orm/nodes/process/calculation/calcjob.py 64.87% <0.00%> (-10.25%) ⬇️
aiida/manage/external/postgres.py 75.55% <0.00%> (-6.39%) ⬇️
... and 39 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 41a4e29...e90e150. Read the comment docs.

@ltalirz
Copy link
Member

ltalirz commented Mar 22, 2020

@CasperWA Just to understand - should codecov now be working on PRs from forks or not?

@CasperWA
Copy link
Contributor

@CasperWA Just to understand - should codecov now be working on PRs from forks or not?

I believe it should work. You can always check whether a codecov report was successfully uploaded by checking the CI run.

See, e.g., here.
It seems for this particular instance it tried several times to upload, and finally succeded, while for the django backend it succeded to upload right away.

Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @unkcpz , and sorry for the large delay in review.

The changes look good to me, I have just some minor requests

aiida/backends/djsite/queries.py Show resolved Hide resolved
aiida/backends/djsite/queries.py Outdated Show resolved Hide resolved
aiida/backends/general/abstractqueries.py Outdated Show resolved Hide resolved
tests/cmdline/commands/test_data.py Outdated Show resolved Hide resolved
tests/cmdline/commands/test_data.py Outdated Show resolved Hide resolved
@unkcpz unkcpz requested a review from ltalirz April 11, 2020 13:30
Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @unkcpz !

@ltalirz ltalirz merged commit ff9e8a2 into aiidateam:develop Apr 11, 2020
@unkcpz unkcpz deleted the iso_bands_list branch April 12, 2020 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Isolated BandsData will raise exception when running verdi data bands list
5 participants