Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow StructureData without specified cell #5341

Merged
merged 18 commits into from
Apr 11, 2022

Conversation

ltalirz
Copy link
Member

@ltalirz ltalirz commented Feb 1, 2022

fix #5248

Prior to this change, when a user would create a StructureData without specifying a cell,
AiiDA would set a default cell

[[1,0,0], [0,1,0], [0,0,1]]

That is problematic, since this cell is not distinguishable from a valid cell.
We therefore change the default cell to [[0,0,0],[0,0,0],[0,0,0]], mirroring
e.g. how things work in ASE.

Furthermore, we remove the requirement that a cell set by the user must have
a finite 3d volume, since a cell with zero volume is an appropriate choice when storing non-periodic
structures (molecules, ...).

The only requirement we retain is that structures that are indicated as nd-periodic
must have a finite nd volume. This requirement is validated upon calling .store().

Note: Since we are not changing the default value of pbc, this means that the following will now raise:

s = StructureData().store()

When trying to store "dummy" instances of StructureData, users can use the following instead:

s = StructureData(pbc=False).store()

Since StructureData is supposed to support storing non-periodic
structures, it should also support not specifying a cell.

The default [1,1,1] was both not distinguishable from an actual cell
[1,1,1], and dangerous because it could give the impression that a
structure was ready for a periodic calculation when in fact no cell had
been specified.

The new default `None` clearly signals that no cell was specified.
A check is added that will raise when specifying periodicity without
specifying a cell.
@ltalirz
Copy link
Member Author

ltalirz commented Feb 1, 2022

This PR includes some optional changes that could be undone while still fixing #5248

  • move private _adjust_default_cell back inside the StructureData class (was only used in one place in the cli, though)
  • Don't raise when setting PBC=true on a structure with no cell. This could trip up user code (but it may also be useful in identifying issues) reverted for backwards compatibility

Once could use [0,0,0] as the default cell instead of None but that would then probably also force us to remove the logic for raising when the cell volume is zero or close to zero.

@giovannipizzi
Copy link
Member

See also my comment in the issue.
I think I prefer setting zeros, and the get_volume could actually only get the "volume" in a general sense, only for the PBC part. So return always zero (or None?) for 0D, the length for 1D, the surface for 2D and the volume for 3D PBCs. Probably one would need, however, to change the return value of get_volume to also return the dimensionality, to avoid obvious mistakes? (this could actually become a new function, get_generalized_volume, to avoid backward-incompatibilities; get_volume would return 0 for 0D, 1D, 2D, while the generalised one does the right thing for all dimensionalities, and can still be used to check a person passed e.g. three parallel vectors in 3D or two parallel vectors in 2D).

@ltalirz
Copy link
Member Author

ltalirz commented Feb 1, 2022

I forgot that if we want to forbid pbc=True on a structure without cell I also need to change the default value of pbc (at least in cases where the user does not provide a cell to the constructor).

In the interest of backwards compatibility I will remove this check, i.e. inconsistent structures with pbc=True and an unspecified cell will continue to be allowed for the time being.

@ltalirz
Copy link
Member Author

ltalirz commented Feb 1, 2022

Thanks a lot for your comments, Gio!

I think I prefer setting zeros

Note that StructureData currently explicitly considers a cell with zero-vectors invalid (it even has a volume threshold below which it considers a cell invalid).

When using None, we could still keep this behavior - when using zero-vectors we will need to change it.
Just confirming that you are suggesting to change this behavior.

get_volume would return 0 for 0D, 1D, 2D

Yes, let's not change the return value of get_cell_volume (there is no get_volume).

When calling get_cell_volume on a structure where no cell was defined, I feel it's fine to raise an exception.
If we're going to use zero-vectors as the default value instead of None, we lose this ability (unless we again define an arbitrary threshold below which a volume is considered zero).
Just saying... for me it's also fine to return a volume of 0 if a cell was not defined.

this could actually become a new function, get_generalized_volume, to avoid backward-incompatibilities; get_volume would return 0 for 0D, 1D, 2D, while the generalised one does the right thing for all dimensionalities, and can still be used to check a person passed e.g. three parallel vectors in 3D or two parallel vectors in 2D)

I believe this function already exists, it's called get_dimensionality.

@codecov
Copy link

codecov bot commented Feb 1, 2022

Codecov Report

Merging #5341 (d9bf9ec) into develop (8293e45) will increase coverage by 2.51%.
The diff coverage is 91.67%.

❗ Current head d9bf9ec differs from pull request most recent head 3660518. Consider uploading reports for the commit 3660518 to get more accurate results

@@             Coverage Diff             @@
##           develop    #5341      +/-   ##
===========================================
+ Coverage    79.63%   82.13%   +2.51%     
===========================================
  Files          517      533      +16     
  Lines        37040    38499    +1459     
===========================================
+ Hits         29492    31619    +2127     
+ Misses        7548     6880     -668     
Flag Coverage Δ
django 77.22% <91.67%> (?)
sqlalchemy 76.52% <91.67%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/orm/nodes/data/structure.py 83.97% <91.18%> (+0.12%) ⬆️
aiida/cmdline/commands/cmd_data/cmd_structure.py 84.83% <100.00%> (+0.11%) ⬆️
aiida/orm/nodes/data/array/xy.py 18.58% <0.00%> (-41.42%) ⬇️
aiida/cmdline/params/options/commands/setup.py 54.55% <0.00%> (-38.45%) ⬇️
aiida/cmdline/commands/cmd_setup.py 56.87% <0.00%> (-38.03%) ⬇️
aiida/orm/nodes/data/array/projection.py 16.13% <0.00%> (-32.25%) ⬇️
aiida/manage/external/postgres.py 63.10% <0.00%> (-20.23%) ⬇️
aiida/tools/archive/exceptions.py 90.48% <0.00%> (-9.52%) ⬇️
aiida/orm/utils/mixins.py 81.31% <0.00%> (-9.48%) ⬇️
aiida/cmdline/params/types/profile.py 64.45% <0.00%> (-7.29%) ⬇️
... and 462 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8293e45...3660518. Read the comment docs.

@giovannipizzi
Copy link
Member

Thanks Leo for the work on this.

In the interest of backwards compatibility I will remove this check, i.e. inconsistent structures with pbc=True and an unspecified cell will continue to be allowed for the time being.

I'm not sure I understand. What can be now done that would be a problem if we raise when pbc=[True, True, True] and no cell is set? (at least upon storing, during setting up the StructureData it's probably OK)

Note that StructureData currently explicitly considers a cell with zero-vectors invalid (it even has a volume threshold below which it considers a cell invalid).
When using None, we could still keep this behavior - when using zero-vectors we will need to change it.
Just confirming that you are suggesting to change this behavior.

My suggestion is to have a get_cell_generalized_volume() method that returns a tuple (dimensionality, generalized_volume) (the latter being the volume if dimensionality is 3, the surface is dimensionality is 2, etc.).
This function will compute the surface using the first and third vector if e.g. pbc is [True, False, True]. So if the second row is [0, 0, 0], that wouldn't give a zero generalized volume (it would still give a zero 3D volume of course, but I would switch the check to the generalized volume). And if the dimensionality is zero, then the function returns zero, and this is the only case in which having a zero volume is not an error.

Yes, let's not change the return value of get_cell_volume (there is no get_volume).

Agreed.

When calling get_cell_volume on a structure where no cell was defined, I feel it's fine to raise an exception.

I'm not sure - I think we would end up in the same situation we want to go away for KpointsData, for instance: if you call get_kpoints_mesh and the data was set as a list (or viceversa) you get an exception. But in practice it's not so nice to have to catch exceptions that depend on how the class has been setup. Better to return zero (generalised) volume for pbc=[False,False,False].

If we're going to use zero-vectors as the default value instead of None, we lose this ability (unless we again define an arbitrary threshold below which a volume is considered zero).
Just saying... for me it's also fine to return a volume of 0 if a cell was not defined.

See my suggested behaviour above.

I believe this function already exists, it's called get_dimensionality.

Yes, this would be used internally by the function so we don't duplicate code. But I feel that get_cell_generalized_volume() should return both the generalized volume and the dimensionality, otherwise it's not clear the units of the generalized volume (no units, length, surface, volume) so the value is essentially meaningless.

@ltalirz
Copy link
Member Author

ltalirz commented Feb 1, 2022

What can be now done that would be a problem if we raise when pbc=[True, True, True] and no cell is set?

User code will fail, for example if someone does

s=StructureData()
s.set_cell(...)
s.set_pbc(...)

which was legal before. This happened already in our own test base, so I do not think it is safe to assume this won't happen elsewhere.

If you prefer, and if others agree, I'd still be ok with making this breaking change here.
This change is not required to fix #5248 but it does go into the right direction as it enforces an important fundamental programming rule (namely, that a class should always be initialized to a valid state).
Note, however, that this would require us to also change the default of pbc in the constructor as follows:

  • if cell is provided by user, default to [True,True,True] as before
  • if cell is not provided by the user, default to [False,False,False]

I cannot easily estimate unintended consequences of this change.

if the dimensionality is zero, then the function returns zero, and this is the only case in which having a zero volume is not an error.

Let's discuss this when we've decided the question above since it is linked.

Better to return zero (generalised) volume for pbc=[False,False,False].

On reflection, I agree not to raise, and get_cell_volume now returns 0 for that case.

I feel that get_cell_generalized_volume() should return both the generalized volume and the dimensionality,

Did you check the implementation of get_dimensionality? It already does this.
I guess this points out that this function is awkwardly named... anyhow, that's for another time.

@ltalirz
Copy link
Member Author

ltalirz commented Feb 22, 2022

@giovannipizzi please let me know whether you'll have the time to review/respond to the comments or whether I should look for another reviewer (which is not a problem, just let me know)

@giovannipizzi
Copy link
Member

Hi @ltalirz sorry, admittedly I'm a bit swamped at the moment... If you find another reviewer that'd be great! Otherwise let me know an I'll try to prioritize checking this carefully

@ltalirz
Copy link
Member Author

ltalirz commented Feb 25, 2022

@sphuber Since @giovannipizzi is busy, would you perhaps be able to give this a look?

@sphuber
Copy link
Contributor

sphuber commented Feb 27, 2022

I would gladly take a look but It seems like @giovannipizzi had very specific comments about the changes and it doesn't seem like a PR that I can easily take over. I am not so familiar with the StructureData class and since it is used so widely, any changes to it are extra sensitive. So I think it is really important that @giovannipizzi handles this PR.

Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some little comments. The main logical change I suggest is the following: we now make it legal to set the cell to None, BUT this is only valid for pbc = (False, False, False). Note that this is NOT enforced at every function call, but ONLY in the _validate (i.e. before storing), and some methods will check and raise if this is not the case (e.g. get_dimensionality).

This still breaks backward incompatibility because now cell is not set anymore to diag([1, 1, 1]).
I hope nobody was relying on it for actual simulations; on the other hand people might have relied on it e.g. for testing... Probably this is OK to change in 2.0 (@ltalirz however mention this in the wiki page with the list of changes of 2.0).

I still have a conceptual issue that I mentioned in my original comment to the issue: this kind of improves the situation for 0D systems, but still leaves the issue for the non-PBC directions of 1D and 2D systems... Probably it's OK as fixing it properly would require more massive changes, but I would still like to point out this problem.
If we want to solve it as ASE does, one should also allow for zero size of the cell:

ase.build.surface('Al', (1, 0, 0), layers=3)

that returns Atoms(symbols='Al12', pbc=[True, True, False], cell=[4.05, 4.05, 0.0])

Anyway, I'm quite sure that this is "recent" behaviour of ASE ("recent" = after I wrote the initial class, i.e. < 7 years ;-) ).

aiida/cmdline/commands/cmd_data/cmd_structure.py Outdated Show resolved Hide resolved
aiida/orm/nodes/data/structure.py Outdated Show resolved Hide resolved
aiida/orm/nodes/data/structure.py Outdated Show resolved Hide resolved
aiida/orm/nodes/data/structure.py Outdated Show resolved Hide resolved
aiida/orm/nodes/data/structure.py Outdated Show resolved Hide resolved
aiida/orm/nodes/data/structure.py Outdated Show resolved Hide resolved
@giovannipizzi
Copy link
Member

Ah, then I remembered correctly! ASE used to have the AiiDA choice [[1,0,0], [0,1,0], [0,0,1]] as the default:
https://gitlab.com/ase/ase/-/commit/44b77b7ab14a418ce2770b51a8ccfab2a9412211
This is where the AiiDA default came from. (Just reporting for historical reasons)

giovannipizzi
giovannipizzi previously approved these changes Mar 31, 2022
Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @ltalirz, this seems good to me! :-)

Is there any documentation we need to update? (Could you please check?)
Also, can you please add a note here, for users? https://github.com/aiidateam/aiida-core/wiki/AiiDA-2.0-plugin-migration-guide

@giovannipizzi
Copy link
Member

@ltalirz will you take are of fixing the failing tests?

@ltalirz
Copy link
Member Author

ltalirz commented Apr 7, 2022

Is there any documentation we need to update? (Could you please check?)

Checked aiida-core docs and didn't find anything that needed updating;
checked aiida-tutorials and didn't find anything that needed updating

Also, can you please add a note here, for users? https://github.com/aiidateam/aiida-core/wiki/AiiDA-2.0-plugin-migration-guide

Done!

giovannipizzi
giovannipizzi previously approved these changes Apr 10, 2022
Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @ltalirz !

@giovannipizzi giovannipizzi merged commit 1d36c0c into aiidateam:develop Apr 11, 2022
unkcpz added a commit to aiidalab/aiidalab-widgets-base that referenced this pull request Oct 3, 2023
`_validate_and_fix_ase_cell` function is not needed anymore, since it add a
dummy cell to gas phase molecule. In the old AiiDA, there needed to be default
cell, otherwise StructureData would not work.
But that has been fixed so we should get rid of this login in AWB.

The fix in aiida-core was done here aiidateam/aiida-core#5341
and released in version 2.0 https://github.com/aiidateam/aiida-core/blob/main/CHANGELOG.md#v200---2022-04-27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StructureData constructor should accept ASE object with zero cell data
3 participants