Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insufficient space group descriptions #416

Closed
sauliusg opened this issue Jun 9, 2022 · 73 comments
Closed

Insufficient space group descriptions #416

sauliusg opened this issue Jun 9, 2022 · 73 comments
Labels
blocking-release This is a PR or issue that presently blocks the release of next version of the spec.

Comments

@sauliusg
Copy link
Contributor

sauliusg commented Jun 9, 2022

Insufficient space group descriptions

@sauliusg
Copy link
Contributor Author

sauliusg commented Jun 9, 2022

Space group descriptions as merged in PR #405 suffer from a serious drawback: they do not list symmetry operators, and they do not provide for other spacegroup symbols such as H-M.

Symmetry operators provide a universal means to specify symmetry. They are easier to parse that Hall symbols, and are guaranteed to be able to convey all possible symmetry operator possibilities for any space group and setting. The standard is hopelessly incomplete if it does not allow to specify symmetry operators.

The H-M symbols, and the extended H-M symbols, are a human-readable description of the space group, including non-standard settings. They are easier to interpret that Hall symbols (if in doubt, read Hall1981).

The Hall symbols, in contrast, depend on ad-hoc definitions to make space group operators derivable from the symbols itself (anybody knows what is 'w' without looking up in a table?), and need extensions to specify (if they can do it at all), and the latest extensions seems to be equivalent to just specifying the same symmetry operators (this needs to be checked). Parsing them is complicated (in contrast to just space group operators like '-x,-y,z') and reading it by humans is more difficult IMHO than reading H-M symbols.

If OPTIMADE is to be kept to the bare minimum, a space group operators (to give operator matrices) and the ITC number (which gives a mathematical group identity up to isomorphism) should be standardised. If Hall symbols are included, then all other other symbols (H-M, extended H-M and possibly Schoenflies) should be included as well. Having just Hall symbols is the worse situation of all – neither easy for humans nor easy to parse (and ITC number does not help in this case).

Since Hall symbols are already standardised, I suggest including also:

@merkys
Copy link
Member

merkys commented Jun 9, 2022

I am fine with including support for H-M symbols and symmetry operators, and I volunteer to prepare PR(s) whenever the details about them are sorted out.

Citing coreCIF:

The commonly used Hermann-Mauguin symbol determines the space-group type uniquely but a given space-group type may be described by more than one Hermann-Mauguin symbol. The space-group type is best described using _space_group_IT_number. The Hermann-Mauguin symbol may contain information on the choice of basis, but not on the choice of origin. To define the setting uniquely, use _space_group_name_Hall or list the symmetry operations.

Do we need to establish our conventions on how to always choose the same H-M symbol for same space group type?

Conventions for symmetry operators in OPTIMADE have been discussed in #35, but I do not think consensus has been reached for all the details.

@rartino
Copy link
Contributor

rartino commented Jun 14, 2022

(I think this discussion belongs in #35. We may want to close this issue as a duplicate and move things there?)

I'm in favor for more space group symbols, but I think we need to resolve the fundamental question of how to handle multiple "synonymous" data in some sane way so we don't end up with half of the databases providing HM-symbols but no Hall symbols and vise-versa.

Having just Hall symbols is the worse situation of all – neither easy for humans nor easy to parse (and ITC number does not help in this case).

My understanding from those who argued for Hall over HM is that for, e.g., spacegroup 68 you have origin choices that are often not distinguished in the HM symbol corresponding to, e.g., Hall symbols c_2_2_-1ac, -c_2a_2ac; these would both correspond to HM symbol C_c_c_a. Hence, you have to indicate origin choice "1" or "2" in some way, e.g., C_c_c_a:1 , C_c_c_a:2. Sure, one can pick a standard for this, but the risk is that people who don't understand the issue/don't know, will just put: C_c_c_a.

Regarding the non-human readable aspect, couldn't a client implementation just encode the following list: http://cci.lbl.gov/sginfo/itvb_2001_table_a1427_hall_symbols.html and auto-translate back-and-forth to HM symbols?

@vaitkus
Copy link
Contributor

vaitkus commented Jun 14, 2022

After some offline discussion with @sauliusg and @merkys we came to the conclusion that there does not seem to be a widely accepted notation that would be both human-readable and unambiguous. Therefore, we suggest for now to remove/avoid all space group symbol fields altogether (Hall, HM, etc.) and rather express the symmetry using the space group IT number and a list of symmetry operations. A separate PR (issue?) will be filled on this topic.

From what I gathered at the CECAM meeting, space group number is sufficient for most current queries and additional space group information can always be derived from the symmetry operation list if needed. However, maybe there are some actual use cases that I have missed?

@rartino
Copy link
Contributor

rartino commented Jun 15, 2022

@vaitkus

there does not seem to be a widely accepted notation that would be both human-readable and unambiguous.

Can you clarify how the Hall symbol is ambiguous? (I'm aware of the investigation of @sauliusg and @BobHanson that concluded that there are origin choices that cannot be represented as Hall symbols, but that is not ambiguity.) Edit: I realize your point is that the Hall symbol is not human-readable? Ok, sure, but I guess others would debate that.

Therefore, we suggest for now to remove/avoid all space group symbol fields altogether (Hall, HM, etc.) and rather express the symmetry using the space group IT number and a list of symmetry operations. A separate PR (issue?) will be filled on this topic.

Are you opposed to IT number + an optional field origin_choice on the format of http://cci.lbl.gov/sginfo/itvb_2001_table_a1427_hall_symbols.html , i.e. matching something like: -?[abc][1-3]?|[1-2]?cab|[1-2]?bca|[1-2]?ba-c|[1-2]?-cba|[1-2]?a-cb|[hr]
(+ still a separate field for symmetry operators)

That way I can more easily translate back and forth into Hall and HM symbols (by using that table) than I can via identification of the whole list of symmetry operators.

@sauliusg
Copy link
Contributor Author

sauliusg commented Jun 15, 2022

I am fine with including support for H-M symbols and symmetry operators, and I volunteer to prepare PR(s) whenever the details about them are sorted out.

I forgot to stress in the original issue, but I think the crucial thing is IMHO to include a full list of symmetry operators for the space group. They alone permit to encode any possible space group or setting, derive any symbol, are for sure unambiguous and are in end the information that is used for symmetry computations.

(I've added that update to the original issue, sorry for confusion).

@sauliusg
Copy link
Contributor Author

Are you opposed to IT number + an optional field origin_choice

I'm quite strongly opposed to this combination – it does not solve the symmetry operator issue, is not sufficient and invents yet another (for what I know, non-standard) way to describe space groups (at least my copy of the ITC vol. A does not mention the notation like '2:c' for the ITC spacegroup number).

@rartino
Copy link
Contributor

rartino commented Jun 15, 2022

@sauliusg

I forgot to stress in the original issue, but I think the crucial thing is IMHO to include a full list of symmetry operators for the space group. They alone permit to encode any possible space group or setting, derive any symbol, are for sure unambiguous and are in end the information that is used for symmetry computations.

Sure, however, for clients and servers that want to deal with symmetry on their preferred form (e.g., a database only storing the Hall symbol, and a client that always want to display the HM-symbol to the end user in a UI) it is IMO preferable if OPTIMADE standardize on a format where connections between the representations to an as large degree as possible can be done cheaply on-the-fly for large sets of entries, i.e., preferably via direct lookup tables.

While I really like to follow the CIF standard when we can, I'm not overly enthusiastic over the _space_group_symop_operation_xyz format, because it is very verbose and non-canonical.

Also, we have to avoid choosing a format that down the line causes a lot of troubles in representing non-3D periodicity, i.e., slabs, writes, and molecules; what we do needs to be extensible in that direction. The same thing with magnetism, etc.

@sauliusg
Copy link
Contributor Author

sauliusg commented Jun 15, 2022

(I'm aware of the investigation of @sauliusg and @BobHanson that concluded that there are origin choices that cannot be represented as Hall symbols, but that is not ambiguity.)

I think they are ambiguous in the sense that multiple Hall symbols denote exactly the same symmetry operators. Also, the original proposal seems seams to be not powerful enough to capture all standard space group setting and were extended in multiple ways.

I'm now digging a bit into the decoding of the Hall symbols (decode-Hall-symbol, takes time, bear with me...). What I come across, and @BobHanson seems to confirm (pleas correct me if I am wrong), is the following:

  • the original Hall paper [1] makes some assumptions about some standard settings, and thus not every set of symmetry operators can be represented by a Hall symbol, not even the standard ones, like operators standardised in the ITC A for 'P 31 1 2' (H-M, space group No.151). The Hall 1981 paper gives concise (Hall) symbol 'P 31 2' for this space group, which assumes origin at inversion centre; however the standard in the ITC specifies "on 2[2 1 0] at 3 1 1", and thus Hall symbols needed to be amended with origin shift [2] and later with the change-of-basis [3], in this case the Hall symbol for the standard setting will be 'P 31 2 (0 0 4)';
  • some extra centerings like S or T were proposed [2], their standardisation status is not clear to me;
  • adding change-of-basis notation makes multiple possibilities to construct Hall symbols. To cite [3], "The change-of-basis vector (0 0 1) could also be entered as (x, y, z+1/12)."
  • Moreover, the same symbol can be written with and without explicit axes, so even in the original Hall notation 'P 2 2', 'P 2z 2', 'P 2z 2y' and 'P 2z 2x' are all the same (the first three – for purely syntactic reasons, the fourth one – because of crystallographic symmetry considerations); even more can be generated.

Edit: I realize your point is that the Hall symbol is not human-readable? Ok, sure, but I guess others would debate that.

Debated it can be, but for the H-M symbol, 'P 21' means a screw twofold, and this you can not avoid learning if you study crystallography. For Hall notation, it becomes 'P 2yb', and you can only learn what y and b means from the original paper, it is a local ad-hoc convention. Thus, it is more likely that people will use H-M notation and not Hall, and this is indeed observed when you scan I-net for "spacegroup".

My conclusion: Hall symbols are a nice try and a useful gadget to play with, with certain utility in computer applications, but they are not to replace H-M notation for humans, nor are they suitable as a single replacement for a symmetry operator list.

Refs.:

  1. Hall, S. R. "Space-group notation with an explicit origin". Acta Crystallographica Section A, International Union
    of Crystallography (IUCr), 1981, 37, 517-525, DOI: https://doi.org/10.1107/s0567739481001228

  2. Sydney R. Hall, Ralf W. Grosse-Kunstleve "Concise Space-Group Symbols". 1996, URL: https://cci.lbl.gov/sginfo/hall_symbols.html [accessed: 2022-06-14T15:24+03:00]

  3. International Tables Volume B 1994, Section 1.4. "Symmetry in reciprocal space". URL: https://onlinelibrary.wiley.com/iucr/itc/B/ [accessed: 2022-06-14T15:35+03:00]

@sauliusg
Copy link
Contributor Author

While I really like to follow the CIF standard when we can, I'm not overly enthusiastic over the _space_group_symop_operation_xyz format, because it is very verbose and non-canonical.

If anything is standardised, then _space_group_symop_operation_xyz are one of the best standardised thinks. Just look up in the ITC vol. A. There remain minor issues like capitalisation, x vs. X, and term order, -x+y+1/2 vs. 1/2+y-x, but these are obvious, have algebraic solutions and are easy to solve and standardise (e.g. we have a set of solutions that work fine for the COD, if you ask).

Also, we have to avoid choosing a format that down the line causes a lot of troubles in representing non-3D periodicity, i.e., slabs, writes, and molecules; what we do needs to be extensible in that direction. The same thing with magnetism, etc.

Symmetry operators allow all this an more, e.g. modulated structures.

The point is that symmetry operators are just a concise (and rather human-readable ;) ) encoding of symmetry operator matrices. One can decode them by using just a formal grammar (while decoding Hall symbols needs ad-hoc lookup tables). Also, when using them you can be pretty sure that anything you can process using matrix algebra you can also encode as an operator. Not so obvious for symbolic notation.

@sauliusg
Copy link
Contributor Author

sauliusg commented Jun 15, 2022

To sum up, I propose the following two options for the upcomming PR:

  1. If we want to keep OPTIMADE standard simple and just use the bare minimum, we standardise:
  • the symmetry operators as general position coordinates, as in -x,-y,z+1/2; should be present in responces but not necessarily searchable;
  • the ITC space group number; present in responses and searchable, identifies space group up to isomorphism (but distiguishes chirality); has a good standartisation based on algebraic definitions. More detailed information about settings, origins and cell choices can be derived from symmetry operators.

No symbols (neither Hall nor H-M nor Schönflies) are standardised in variant (1).

  1. If we want to keep most possibilities covered, we add in addition to (1) (symops+ITC number) the following:
  • the Hermann-Mauguin symbols (original ones as in ITC and universal ones that contain the origin and cell chouce information)
  • the Hall symbols, as in ITC vol. B (with explicit change-of-basis notation);
  • the Schönflies symbols;

Keep open for other notations (orbifold, fibrifold, ... see https://en.wikipedia.org/wiki/Space_group).

I am neutral as to whether (1) or (2) is selected, but symops MUST be present in any case. The (2) variant seems less work to do right now, and (2) can be added afterwards. So I would go for (1).

@vaitkus
Copy link
Contributor

vaitkus commented Jun 15, 2022

There is also option 3:

  1. IT number, symops list and one of [Hall, HM, Schönflies] symbols (the same one for all databases).

This way the database implementers only have to worry about dealing with a single set of space group symbols and the conversion between different notations is dedicated to the clients. Having multiple alternative representations as proposed in (2) seems to have a high risk of leading to internal contradictions.

For now, I would also lean more towards option 1, since, as mentioned by @sauliusg, the specification can always be extended in the future. I fully understand the benefits of having a relatively simple space group symbol string that can be easily used in search queries, however, the existing notations do not seem to be sufficient (or sufficiently standardized).

@rartino, does approach (1) seem reasonable for now? Alternatively, we could continue the discussion and try to agree on the use of a space group symbol notation that is not exhaustive, but covers most actual use cases. If I am not mistaken, one of the initial ideas that @merkys had was to provide Hall symbols only for entries with symop sets that are explicitly described in ITC tables (this would cover about 99% of the COD). However, it would be useful to first known what the current actual use cases are.

@BobHanson
Copy link

BobHanson commented Jun 15, 2022 via email

@sauliusg
Copy link
Contributor Author

sauliusg commented Jun 16, 2022

Serving up Cartesians with symmetry operations (necessarily fractional) seems odd to me.

Indeed it is odd; good point!

  • For anyone wanting to convert back from Cartesians to the original fractional coordinates (alas!)

As I understand, the fractional coordinates were discussed but not yet described.

Should we make an issue for this?

@sauliusg
Copy link
Contributor Author

  • Personally, I would prefer ITA number + H-M + operations. The ITA number and H-M would both be valuable for searches

I agree. Does this mean that we lean towards more complete solution (2) (which would also leave the currently merged Hall symbols, which there should be if we add H-M)?

@sauliusg
Copy link
Contributor Author

There is also option 3:

3. IT number, symops list and **one** of [Hall, HM, Schönflies] symbols (the same one for all databases).

But what should we do if a client requests both H-M and Hall?

This way the database implementers only have to worry about dealing with a single set of space group symbols and the conversion between different notations is dedicated to the clients. Having multiple alternative representations as proposed in (2) seems to have a high risk of leading to internal contradictions.

I think resolving contraditions is relatively easy:

Mandate (suggest) the search order:

  • if symops are given, use symops;
  • else, if Hall symbol is given, use the Hall symbol to look up or derive symops;
  • else, if universal H-M symbol is given, use that symbol to look up symops in your tables;
  • else, use either H-M or ITC number to look up the space group, assume standard setting.

Of course the symbols SHOULD encode non-contradictory information (i.e. symops, ITC number and space group symbols SHOULD all point to the same space group).

Seems unambiguous?

@merkys
Copy link
Member

merkys commented Jun 16, 2022

@sauliusg

As I understand, the fractional coordinates were discussed but not yet described.

Should we make an issue for this?

There is already a PR #206 for introduction of fractional coordinates.

@sauliusg
Copy link
Contributor Author

@sauliusg

As I understand, the fractional coordinates were discussed but not yet described.
Should we make an issue for this?

There is already a PR #206 for introduction of fractional coordinates.

OK then, not needed. I only search Issues, not PR's ;)

@vaitkus
Copy link
Contributor

vaitkus commented Jun 16, 2022

@sauliusg

But what should we do if a client requests both H-M and Hall?

If we explicitly choose a single space group symbol notation that all databases should implement (e.g. H-M) then such a query will simply be an invalid one.

I think resolving contraditions is relatively easy:

Mandate (suggest) the search order:

  • if symops are given, use symops;
  • else, if Hall symbol is given, use the Hall symbol to look up or derive symops;
  • else, if universal H-M symbol is given, use that symbol to look up symops in your tables;
  • else, use either H-M or ITC number to look up the space group, assume standard setting.

I think that each of these fields serve a slightly different purpose and that we should not mandate any hierarchy/lookup order nor assume anything. From my point of view:

  • Symmetry operations (symops). The most complete way of describing a space group with a specific setting and origin. However, it cannot be queried in a convenient way.
  • IT number. The least specific way of describing a space group (no setting, no origin), but still conveys a lot of important information (e.g., is the molecule a racemate). It is extremely easy to query (single number in the range of 1--230).
  • H-M/Hall symbol. Probably falls somewhere in between symops and IT number in regards to describing the space group. It is much easier to read and query than symops, but may not convey certain aspects of the space group (e.g., origin). Though, according to @BobHanson it may be possible to provide a complete description using the Hall symbol (and maybe a universal H-M symbol?).

Under this approach H-M and Hall symbol have the same purpose and thus there is no need for the servers to provide them both, but only the one defined in the specification (which we have not chosen yet). A client, however, is free to implement a conversion between this and any other desired space group notation. Or do we want H-M and Hall symbol to serve different purposes?

@merkys
Copy link
Member

merkys commented Jun 16, 2022

@vaitkus

If I am not mistaken, one of the initial ideas that @merkys had was to provide Hall symbols only for entries with symop sets that are explicitly described in ITC tables (this would cover about 99% of the COD).

Indeed. I believe I was echoing someone's else's idea though.

@rartino
Copy link
Contributor

rartino commented Jun 16, 2022

Basically, for structures that are experimental determinations (most of COD and most of ICSD), it just seems a bit odd to do all this conversion to Cartesians for delivery only to have to convert back to fractional.

There is already a PR #206 for introduction of fractional coordinates.

This was the first place where we realized that "just adding support also for fractional coordinates" was taking us in a direction where databases could choose to support either one, which would be terrible for clients. Let us just figure out how we best express a "support level" that says: "You may provide fractional coordinates in fractional_site_positions, however, if you do, you MUST also support cartesian_site_positions. Then we can use the same mechanism for, e.g., allowing the Hall symbol but in that case force symops to be given.

Does anyone see a drawback with supporting multiple fields with overlapping purposes if we add these kind of "dependencies"?

If not, we just need to choose which is the "most" required field - which I think we all agree is some format that clearly can represent any list of symops.

  • For anyone wanting to convert back from Cartesians to the original fractional coordinates (alas!), truly the only way to do this easily is with the Johns-Faithful (x-1/2, y, -z) notation. No, it is not canonical, but no one would search for this anyway.

A list of symmetry operations is surely needed, but why "only" the xyz notation? I recall finding it bulky to work with when I implemented xyz <-> matricies. I'm fairly sure I would have preferred Seitz symbols.

Are there good arguments against Seitz symbols? The notation seems very standard, more normalized, and more human readable than the xyz-notation?

  • Personally, I would prefer ITA number + H-M + operations. The ITA number and H-M would both be valuable for searches, and the operations list gives us what we need to convert back to fractional coordinates and handle the symmetry properly.

Note that there are potential issues beyond searching with having a very non-canonical primary format for symmetry information. For example: lets say quite a few databases only provide symmetry info via symops. Now, a client wants to always show a H-M symbol as part of the UI. It would in that case be nice (but, sure, not absolutely crucial) if it was fairly easy (e.g., via a lookup table) to map the list of symops returned by these databases into a H-M symbol.

I also foresee finding myself fairly often in the position of asking "are these two entries I got from two different databases describing the same symmetry?", and then having to normalize the symop lists.

  • For 2D structures (slabs, surfaces), it might be nice to have the symmetry, but honestly I don't know that anyone cares. P1 is probably expected.

I'm pretty sure people who build databases of 2D materials care (I'm part of a collaboration that continues on that linked work...).

But, after thinking a bit I think the xyz format, this is not an issue with that format - right? It can express any symop, it just requires unusual coefficients?

@BobHanson
Copy link

BobHanson commented Jun 16, 2022 via email

@sauliusg
Copy link
Contributor Author

sauliusg commented Jun 17, 2022

This was the first place where we realized that "just adding support also for fractional coordinates" was taking us in a direction where databases could choose to support either one, which would be terrible for clients.

I do not see it as terrible. Converting fractionals to cartesians is a simple task, something you can implement on a weekend. A lot of code around already does that. So I do not see any problem for clients in converting these two representations.

And, as @BobHanson pointed out, working with symmetry is much more convenient in fractionals.

Let us just figure out how we best express a "support level" that says: "You may provide fractional coordinates in fractional_site_positions, however, if you do, you MUST also support cartesian_site_positions. Then we can use the same mechanism for, e.g., allowing the Hall symbol but in that case force symops to be given.

I am against such "MUST" and singling out cartesian_site_positions. This essentially dictates the database how implementation should be done, and is geared towards one set of applications that are usually done in cartesian coordinates. If OPTIMADE is to become universal and database-neutral, there should be now prescription on how the information needs to be expressed; it is about comminication protocol, not about the the ways to represent different physical quantities.

Already the current spec, that only has only cartesians, is bad enough. For COD it would have been so much easier to implement OPTIMADE if we could just return the fractional coordinates which we have.

Now, essentially, the spec demands that we do expensive calculation on the server side for the perceived benefit of the client, and in the end it turns out that for some clients this is not just unnecessary but actually counterproductive.

To sum up:

  • IMHO, both fractional coordinates and Cartesian coordinates should be allowed on an equal footing;
  • a client should be able to request coordinates (in any form), and a database returns what it has; we can debate whether just an asymmetric unit or the whole P1 cell should be returned;
  • the client MUST convert the coordinate representation to what they need on the client side (do not load servers unnecessarily, offset as much as possible to clients!)

Shall we move the further discussion to #206?

@sauliusg
Copy link
Contributor Author

sauliusg commented Jun 17, 2022

To wrap the things up – do we agree on the following position:

  • Symmetry operations (symops). The most complete way of describing a space group with a specific setting and origin. However, it cannot be queried in a convenient way.
  • Symmetry operations (symops), in general position coordinates (Johns-Faithful (x-1/2, y, -z) notation) SHOULD be supported in responses; an EBNF grammar will be written in the OPTIMADE standard (I can do this); querries MAY (but do not need to) be supported on this field;
  • IT number. The least specific way of describing a space group (no setting, no origin), but still conveys a lot of important information (e.g., is the molecule a racemate). It is extremely easy to query (single number in the range of 1--230).
  • IT number. SHOULD be supported in responses; if it is supported in responses it MUST be supported in queries
  • H-M/Hall symbol. Probably falls somewhere in between symops and IT number in regards to describing the space group. It is much easier to read and query than symops, but may not convey certain aspects of the space group (e.g., origin).
  • H-M symbols . MAY be supported in responses; if they are supported in responses they MUST be supported in queries, at least as string queries; universal H-M symbols MUST be used, a table will be provided in OPTIMADE standard;

  • Hall symbols. MAY be supported in responses; if they are supported in responses they MUST be supported in queries, at least as string queries; an EBNF grammar will be provided in OPTIMADE (again, I can do this), no table is needed since the symbols are intended to be parsed, and all necessary tables are provided in the Hall 1981 paper and in ITC vol. B.

Can we proceed with these ideas?

@sauliusg
Copy link
Contributor Author

I would just say that one should never assume the standard setting.

If we do not, then having just an ITC number or a short H-M symbol in the response (which is permissible) would not allow to compute symmetry equivalent atoms, which would be perfectly possible, and indeed intended by specifying default settings in the ICT vol. A.

If we do not specify the default setting in the standard (by referring to the ITC vol. A), then the client will have to produce an error in a situation where it could perfectly well continue (and indeed is the behaviour for most of the macromolecular crystallography software); e.g. specifying space group 'P 21' implies 'P 1 21 1'.

What is Jmol's behaviour if you get just a short H-M symbol ?

@merkys
Copy link
Member

merkys commented Jun 17, 2022

Already the current spec, that only has only cartesians, is bad enough. For COD it would have been so much easier to implement OPTIMADE if we could just return the fractional coordinates which we have.

I agree with @sauliusg on this, but the main cost for the COD is not the fractional -> cartesian conversion, but symmetry reconstruction from an asymmetric unit.

@rartino
Copy link
Contributor

rartino commented Jun 17, 2022

To clarify what @merkys brought up; @sauliusg, @BobHanson when you call for a field for fractional coordinates, do you mean a field to specify the coordinates of atoms in the full unit cell (which is the proposal in #206, and which only differs from cartesian_coordinates by a matrix multiplication with the lattice vectors), or the coordinates in only the asymmetric unit, i.e., what is available in CIF? Since @BobHanson mentioned needing the symops, I think you mean the latter?

To add a new field for the asymmetric unit fractional coordinates makes a lot of sense along with all the other symmetry data fields we are discussing here. The reason we don't have it yet is because it would be meaningless without the symops or equivalent. Maybe we should drop #206 entirely and just add this? But I think in that case we need two new fields in the line of
asym_fractional_coords and asym_species_at_site?

This was the first place where we realized that "just adding support also for fractional coordinates" was taking us in a >> direction where databases could choose to support either one, which would be terrible for clients.

I do not see it as terrible. Converting fractionals to cartesians is a simple task, something you can implement on a weekend. A lot of code around already does that. So I do not see any problem for clients in converting these two representations.

We've debated this before; so I'll keep it short here; but the aim to design the standardized fields so we avoid the need for clients to explore large amounts of fields that overlap (i.e., express the same data in different ways) for each individual database to find out precisely which ones that specific database supports, and then do all the conversions client-side is a hill I am prepared to die on. That would be the end of interoperability and would make common queries between databases impossible. This position is not the same as saying databases should be forced into expensive server-side conversions, the point is that we must carefully choose our "standard" fields so the necessary server-side conversions are comparatively cheap and straightforward to implement.

Symmetry operations (symops), in general position coordinates (Johns-Faithful (x-1/2, y, -z) notation) SHOULD be supported in responses; an EBNF grammar will be written in the OPTIMADE standard (I can do this); querries MAY (but do not need to) be supported on this field;

I'm not sure about "SHOULD" level for any of the symmetry info. Are we saying databases that do no care about symmetry and just want to give Cartesian coordinates of their, e.g., huge bio-molecules are SHOULD-violating?

Also, does anyone have a link to the Johns-Faithful paper, or any other careful specification of the format? I've come up blank in my searches so far. I note that even the CIF definition of _symmetry_equiv_pos_as_xyz doesn't give a careful definition of the format, nor links a definition.

H-M symbols . MAY be supported in responses; if they are supported in responses they MUST be supported in queries, at least as string queries; universal H-M symbols MUST be used, a table will be provided in OPTIMADE standard;

Why a MUST requirement on string query support on an optional field? It isn't something we've done before. It can only be a MUST for equality string match, since a database doing on-the-fly translation from, e.g., Hall symbols cannot efficiently support partial string matching.

@BobHanson
Copy link

BobHanson commented Jun 18, 2022 via email

@merkys
Copy link
Member

merkys commented Jun 20, 2022

How about if someone creates a little survey that asks the general questions that have been discussed and lets us all summarize our positions on them?

This is something I have been thinking of for quite some time now, and is applicable for many other issues/PRs. I will look into technical means.

@rartino
Copy link
Contributor

rartino commented Jun 20, 2022

So, is this where we are now?:

Proposed symmetry-related fields in structure entries

  • space_group_symops: List of String, a list of symmetry operators on xyz format. Options to decide between:
    • (a) Free-form grammar allowing "any" expression in any order of rational numbers, x,y,z and +/-.
    • (b) OPTIMADE has a more strict grammar to enforce a more canonical form than CIF, fixing order of numbers and x,y,z, coefficient range [0,1), etc.
    • (c) OPTIMADE enforces (b) + a sorting order of the whole list, meaning that standard settings can be identified via a lookup table.
  • space_group_it_number: Integer, the ITA space group number. If given, MUST NOT conflict with the operators given in space_group_symops.
  • space_group_hall: String, if given, space_group_it_number and space_group_symops MUST also be given. MAY use the "extended" syntax to indicate an origin shift. If given, it MUST NOT conflict with space_group_it_number and space_group_symops.
  • space_group_hm: String, optional field, but if given, ITA space group and space_group_symops MUST also be given. The OPTIMADE standard provides a list of valid H-M symbols (?). If given, it MUST NOT conflict with space_group_it_number and space_group_symops.
  • asym_fractional_coords: List of List of Float: fractional coordinates for a list of sites in the asymetric unit. The coordinate information communicated via asym_fractional_coords, asym_species_at_site and space_group_symops MUST correspond to cartesian_site_positions and species_at_sites.
  • asym_species_at_site: List of String: name the species that sits at each site in asym_fractional_coords.

@sauliusg
Copy link
Contributor Author

sauliusg commented Jun 30, 2022

Is the list of symmetry operations always unambiguous for any possible origin choice in a structure? I.e., not just any of the reasonable origins, but for any completely arbitrary choice? If not, the IT number (and H-M, and Hall symbol) could contain more information than the symmetry operations.

My understanding is the following:

yes it is, because no matter how you chose an origin and (affine) coordinate axes, there always exists a change-of-basis matrix that transforms your point coordinates from a "standard" setting to this new coordinate system, and by multiplying a "standard" symmetry operator by the change-of-basis matrix you will get a symmetry operator expression in this new basis.

However, if you want simple symmetry operator expressions with rational coefficients, not all axes and not all origins are suitable. For example, if you describe 20 degree rotation in Cartesian frame, you will have an irrational cos(30°) as a coefficient. Also, if you shift your origin by, say, irrational translation π/10, you will get irrational translations (2π/10 = π/5, I guess) in your symops.

This brings to the comment of @BobHanson :

BH: I do not. All I know is that one has to be ready for +1/2-x or -x+1/2
or 1/2-x or 1/2 - x. Maybe even 0.5-x (though IMHO use of decimal numbers
is inappropriate).

If we want to accommodate any origin with any precision, we need to allow arbitrary floating point numbers, e.g. X,Y,Z+3.1415926E-01.

The tables [1] say:

The change-of-basis operator V has the general form (v x , v y , v z ). The vectors v x , v y and v z are specified by

image

where r i; j and t i are fractions or real numbers.

(emphasis on real is mine).

Thus, the Tables seem to suppose that arbitrary floating point (aka real) numbers can be used.

However, if OPTIMADE stiks to "permissible" origins, the ones listed in the Tables, then we can get away by standardising te symmetry operation strings where only rational numbers are permitted. We can always extend later of needed.

What is the general consensus, do we need general real translations (to specify any origin) or are with happy with only origins that can be expressed using rational translations? For what I know, all crystallographic varieties are expressible in rational translations.

@rartino
Copy link
Contributor

rartino commented Jun 30, 2022

Is the list of symmetry operations always unambiguous for any possible origin choice in a structure?

yes it is, because no matter how you chose an origin and (affine) coordinate axes, there always exists a change-of-basis matrix that transforms your point coordinates from a "standard" setting to this new coordinate system, and by multiplying a "standard" symmetry operator by the change-of-basis matrix you will get a symmetry operator expression in this new basis.

I see - smart way of looking at it, thanks for explaining.

do we need general real translations (to specify any origin) or are with happy with only origins that can be expressed using rational translations? For what I know, all crystallographic varieties are expressible in rational translations.

From the experimental side it probably seems silly to cater for these arbitrary origin choices. Nevertheless, structures generated by random assignments of coordinates (e.g., as done by Chris Pickard and others) and ML generated structures can easily end up arbitrarily translated. It would be nice to be able to report symmetry information for these structures (e.g., to make them searchable by space group number) without being forced to shift them.

But, this takes me to a, perhaps subtle, question:

I assume we mean to allow "under-reporting" symmetry. I.e., it is ok to miss symmetry operators in symmetry_operations as long as the reported operators, ITN, H-M, and Hall are all consistent, and when replicating the atoms in asym_fractional_coords using the operations in symmetry_operations one gets the same thing as in cartesian_coords.

However, do we really mean to make it a MUST-level violation to "miss" some symmetry operations in symmetry_operations that would be possible given the ITN specified in space_group_it_number? Our discussion above seems to say this should be a violation (because otherwise symmetry_operations could be ambiguous), but especially when thinking of the "weird" symmetry operations that will be needed for arbitrary origins, I'm worried this will end up overly stringent. At the very least, maybe it would make sense to allow specifically giving only the ITN without the symmetry_operations, nor any of the other symmetry fields?

@BobHanson
Copy link

BobHanson commented Jun 30, 2022 via email

@merkys
Copy link
Member

merkys commented Jul 1, 2022

Any volunteers for the symmetry PR text? @sauliusg I get the impression that you intend to supply the xyz grammar at least?

@merkys said to me that he will prepare the PR; I will make the grammar (essentially I have it in my notes, so just need to typeset and test). We just need to clarify the remaining points so that we understand them in the same way.

I am a bit in over my head lately, and this issue looks quite involved. It would be better if someone else steps up and drafts the PR. Remaining issues could be discussed per-point on the PR - I like PRs on GitHub better for that, as an issue is just a linear stream of messages and PR lets splitting off discussions per topic (=line of text).

@rartino
Copy link
Contributor

rartino commented Jul 1, 2022

It would be fine by me if only the ITN were allowed to be given, but then (a) no fractional coordinates should be given (as they would not be actionable), and (b) recognizing that the ITN then would be only generally useful in the context of searching, not in the context of using (F not R in FAIR).

Indeed - the reason I think it is acceptable is that only giving the ITN is "obviously" ambiguous about the origin, so this ambiguity is less confusing than for, e.g., the usual H-M symbol. But we must indeed not allow giving asym_fractional_coords without a complete list of the symmetry operations.

I assume we mean to allow "under-reporting" symmetry

OOh. I would DEFINITELY assume we mean MUST be the complete list of operators. From the CIF standard: When a list of symmetry operations is given, it must contain a complete set of coordinate representatives which generates all the operations of the space group by the addition of all primitive translations of the space group. [...]

I don't think I was clear enough on what I think we probably need to allow. Lets look at regular NaCl in the conventional cell:

  1. The full symmetry info would be: ITN=225, H-M=Fm-3m, have symmetry_operations list the 192 symmetry operations, and let asym_fractional_coords be a list of 2 coordinates, one for Na and one for Cl (which sits on an 'a' and 'b' Wyckoff position respectively).

  2. However, what if I instead say: ITN=1, H-M=P1, I only let symmetry_operations list the identity operation, and let asym_fractional_coords be a list of four coordinates for Na, and four coordinates for Cl.

I'd argue case (2) fulfills the CIF requirement, because all the information is still consistent and gives a complete representation of the atomic sites, it is just under-reporting the possible symmetry compared to case (1). I've seen many CIF files do this.

Do we mean for case (2) to be a violation of the OPTIMADE standard? I think there are good arguments for that this must be allowed.

@BobHanson
Copy link

BobHanson commented Jul 1, 2022 via email

@BobHanson
Copy link

BobHanson commented Oct 11, 2022 via email

@BobHanson
Copy link

BobHanson commented Oct 11, 2022 via email

@BobHanson
Copy link

BobHanson commented Oct 11, 2022 via email

@rartino
Copy link
Contributor

rartino commented Oct 11, 2022

To summarize the above, I think we are ready for a PR here (which I have been too busy to write up myself; anyone is welcome to do it). But, in particular, please feel free to help clarify the two HM definitions.

Symmetry-related fields in structure entries*

  • symmetry_operations: List of String. All the symmetry operators for the structure given on the Jones Faithful xyz format [1] (i.e, not just the generators). A formal grammar for this field is provided in Appendix X.
  • space_group_it_number: Integer, the ITA space group number of the structure. If given, MUST correspond to the operators given in symmetry_operations (i.e., it MUST be a possible space group given those operators).
  • space_group_hall: String. The Hall symbol of the structure. If given, space_group_it_number and symmetry_operations MUST also be given. The string MUST be consistent with symmetry_operations and, if necessary, use the extended Hall symbol syntax as described in (International Tables for Crystallography (2010). Vol. B. ch. 1.4, pp. 122-134)[https://doi.org/10.1107/97809553602060000761]. If the extended syntax is needed, then the full matrix transformation syntax SHOULD be used, for example, P 6 (x, y, z+1/3) rather than P 6 (0 0 4).
  • space_group_hm: String. The Hermann-Mauguin symbol of the structure. If given, space_group_it_number and symmetry_operations MUST also be given. The sometimes ambiguous short form notation is used, e.g., Pnnn. Appendix Y lists all valid H-M symbols. If given, MUST be consistent with space_group_it_number and symmetry_operations.
  • space_group_hm_universal: String. The universal Hermann-Mauguin symbol of the structure. If given, space_group_hm, space_group_it_number, and symmetry_operations MUST also be given. The format is the same as for space_group_hm with, if needed, an additional specification of the origin, e.g., Pnnn:1. If given, MUST be consistent with space_group_hm, space_group_it_numberm and symmetry_operations.
  • asym_fractional_coords: List of List of Float. Fractional coordinates for a list of sites in the asymetric unit. The coordinate information communicated via asym_fractional_coords, asym_species_at_site and symmetry_operations MUST correspond to cartesian_site_positions and species_at_sites.
  • asym_species_at_site: List of String. Name the species that sits at each site in asym_fractional_coords.
  • space_group_schoenflies: String. The Schoenflies symbol of the system. If given, space_group_it_number MUST also be given, and the symbol MUST be consistent with the space group number.

I think we have aligned on allowing all degrees of freedom in the xyz notation, with rational coefficients. Still, just to reply to these comments:

Even CIF does not require an H-M symbol, and without an ITA number and just having operators, it is a VERY complex calculation to determine the space group, particularly for nonstandard space group settings. There are services that do this (for example, https://www.cryst.ehu.es/cgi-bin/cryst/programs/checkgr.pl?tipog=gesp) but I certainly do not know how to do it, and I would not know where to even start to duplicate that service.

If one normalizes the degrees of freedom in the x,y,z notation (fix the order of operations, order of terms, sign of coefficients etc.), then each standard setting is identified by a unique set of symmetry operations. I.e., one can identify, e.g., the Hall symbol for the standard settings via a lookup table. At least I've seen this work in practice, but perhaps there are limitations for corner cases (?) (but in the context of an UI, it would just not show a H-M symbol if it couldn't make the identification.)

I also foresee finding myself fairly often in the position of asking "are these two entries I got from two different databases describing the same symmetry?", and then having to normalize the symop lists.
This is a well known and very difficult problem. Not sure how you would "normalize" the symop lists, but I think you are really asking, "are these two structures the same but just described differently?" See https://www.cryst.ehu.es/cryst/compstru.html -- and even here the structures have to first be put into standard settings, I think. I would say this problem is out of scope. Same problem with any database (even within a single one!) delivering multiple (potentially similar) structures.

No, I did not mean the generalized problem of "are these structures 'physically' 'the same'" - indeed a very difficult problem. I really just mean: I have two OPTIMADE structures. I ask myself "are these sets of symmetry operations they specify representing the exact same symmetry?" But, this was not the most important point - if I care about this (and in this discussion it is so far just me) I will have to implement the canonicalization on the client side instead, which is fine.

@ml-evs ml-evs added the blocking-release This is a PR or issue that presently blocks the release of next version of the spec. label Dec 5, 2022
@rartino
Copy link
Contributor

rartino commented Dec 22, 2022

A small update based on today's web meeting.

This is still partially blocked on the precise format for symmetry_operations. I think above we were in agreement on using the Johns-Faithful notation. However, as I understand it, after @sauliusg looked into creating a grammar for that notation, he lifted the question if it may make sense to switch to a matrix-type representation that is more directly JSON-parser-friendly.

I did argue the benefits of alternative notations simpler to parse above. Nevertheless, if the only two options are:

  • Johns-Faithful-strings (without any canonicalization).
  • A list of lists of lists of numbers to represent the full numerical matrices.

I think I still come down on the side of the first one for the sake of compactness of notation. (On the other hand, a more compact but still direct representation of the matrices I could get behind, but perhaps lets not open that discussion again.)

@BobHanson
Copy link

BobHanson commented Dec 22, 2022 via email

@rartino
Copy link
Contributor

rartino commented Jan 11, 2023

(@BobHanson just to clarify: I post my comments in the GitHub issue/PR system. The emails you get are due to your GitHub settings to be notified about activity in threads you have participated in.)

Thanks for the link on magnetism in cif. If I understand this correctly, the point is that one can add more comma separated items in the Jones-Faithful format to describe, e.g., a spin transformation for the operation (or really any linear transformation of a vector valued property on the sites). I assume we (in similarity with mcif) would add support for such symmetry operations by standardizing additional fields, e.g., _symmetry_operations_magn.

I think the same extension is possible in mostly the same way in a matrix version of operations. But, to reiterate: I'd much prefer a reasonably compact format which a list of, e.g., 192 JSON-encoded matrices is not. So, unless someone comes forward with a more compact (while also exact) JSON-friendly matrix representation, I vote to PR the Johns-Faithful format.

@sauliusg
Copy link
Contributor Author

sauliusg commented Jan 27, 2023

Thanks, Bob, for pointing out the potential problem with magnetic structures.

Indeed, the larges number of symops in "classical" space groups would be 192 (groups No 225–228), but magnetic structures will add more. Also, modulated structures will add more operators (up to several thousands, see Stokes, 2011) with higher dimentionality (up to 6+1 dim.); quasicrystals can add at least the same, possibly even more. I think we need to be prepared for this.

While any s.g. can be represented by matrices, the J-F notation is more compact, and, unlike space group symbols, is straightforward to interpret. But the classical "-X+Y+1/2,-X+1/2,Z" notation will need a moderately complicated grammar and an ad-hoc parser.

I therefore feel that some pre-parsed for of J-F notation could be optimal for structured could be optimal for OPTIMADE. The idea is to have the symmetry operator string split into distinct grammatical components and presented as elements of the JSON array:

  1. The components "-X+Y+1/2", "-X+1/2" and "Z" would be split into string arrays ["-X", "Y", "1/2"], ["-X", "1/2"] and ["Z"], respectively;
  2. Each symmetry operator would be represented as array of arrays; e.g. the above mentioned "-X+Y+1/2,-X+1/2,Z" will be serialised as [["-X", "Y", "1/2"], ["-X", "1/2"], ["Z"]];
  3. The space group description would be an array of symmetry operators; e.g.:

for P-1:

[
   [["x"], ["y"], ["z"]], [["-x"], ["-y"], ["-z"]]
]

For P31 ("x,y,z", "-y,x-y,z+1/3", "-x+y,-x,z+2/3"):

[
   [["x"], ["y"],["z"]],
   [["-y"], ["x", "-y"], ["z", "1/3"]],
   [["-x", "y"], ["-x"], ["z", "2/3"]]
]

The remaining tokens ("-x", "2/3") will be defined using regular expressions and can be easily parsed by regexp matching. From these, the matrices are easily built. The "+" symbol between the operator components in the JSON array is implied.

The symop lists can be either transmitted within the response, or stored on a remote server and only a href link to that list transmitted with each structure, to minimise traffic. The client in this case will have a choice to either fetch the symop list (in JSON encoding) from the server, to use a cached value for the given space group and setting (the symops sholdnt ever change ;), or to decode the space group symbol from the href itself.

The only drawback that I see in such representation is that it is unusual, which is cured by just starting to use it.

The advantages would be:

  • nearly as short as the J-F notation;
  • expandable to mag. groups, supergroups and quazicrystal symmetries; e.g. "-x1,x2+1/2,-x3,-x4" would become [["-x1"], ["x2", "1/2"],["-x3"],["-x4"]] (see COD 7036760)
  • trivially parseable from the JSON notation;
  • no need to specify the full syntax of the J-F notation in the OPTIMADE spec.; we need to only specify the syntax of the tokens, which is a regular language and done by simple regexps, like -?([xyz]|[12345]/[2346])

What is your take on that? (@BobHanson , @rartino , @merkys , @vaitkus)?

P.S. This reminds me of the LISP S-expressions... :)

@sauliusg
Copy link
Contributor Author

sauliusg commented Jan 27, 2023

I assume we mean to allow "under-reporting" symmetry. I.e., it is ok to miss symmetry operators in symmetry_operations as long as the reported operators, ITN, H-M, and Hall are all consistent, and when replicating the atoms in asym_fractional_coords using the operations in symmetry_operations one gets the same thing as in cartesian_coords.

I would regard missing symmetry operators as and error, and a rather serious one.

It is one think to check that the operators for a group, and quote another (more complicated thing) to reconstruct the group from the operators. Also, you can end up getting a subgroup of the original group if you through out too many operators. Too bad.

I think on this point I completely support what @BobHanson said in this thread.

@merkys
Copy link
Member

merkys commented Jan 27, 2023

I like the recent @sauliusg proposal. It stands somewhere between full-matrix and J-F representations which each have their own drawbacks. The proposed representation is quite concise and not too difficult to parse.

@sauliusg
Copy link
Contributor Author

sauliusg commented Jan 27, 2023

1. The full symmetry info would be: ITN=225, H-M=Fm-3m, have symmetry_operations list the 192 symmetry operations, and let asym_fractional_coords be a list of 2 coordinates, one for Na and one for Cl (which sits on an 'a' and 'b' Wyckoff position respectively).

2. However, what if I instead say: ITN=1, H-M=P1, I only let symmetry_operations list the identity operation, and let asym_fractional_coords be a list of four coordinates for Na, and four coordinates for Cl.
...

Do we mean for case (2) to be a violation of the OPTIMADE standard? I think there are good arguments for that this must be allowed.

I agree that the case (2) should not violate the OPTIMADE standard, but it is reporting a different (lower symmetry) structure than the case (1), with all the consequences – more independent parameters, highly correlated parameters.

@BobHanson
Copy link

Not convinced. How is defining a standard for

[["-y"], ["x", "-y"], ["z", "1/3"]]

any different from requiring a specific syntax for:

"-y,x-y,z+1/3"

?

There is no more or less information there, only punctuation. I'm pretty sure it would take fewer words in a standard to describe how to create well-formed JF strings than what it would take to describe an entirely new format.

As for the full symmetry/P1 issue, my understanding is that plenty of computational packages just go with P1 for their calculations and don't bother with symmetry constraints (particularly if they are single-point calculations). So I would guess plenty of structures would be described as "P1" that certainly could have more symmetry.

@sauliusg
Copy link
Contributor Author

From the discussion today I've got impression that everyone is OK to go forward with this suggestion; I'll put it into the PR and upload.

@sauliusg
Copy link
Contributor Author

sauliusg commented Jan 30, 2023

Not convinced. How is defining a standard for

[["-y"], ["x", "-y"], ["z", "1/3"]]

any different from requiring a specific syntax for:

"-y,x-y,z+1/3"

The bracketed notation, [["-y"], ["x", "-y"], ["z", "1/3"]], is a JSON array and will be parsed using any standard JSON parser, which you need anyway to parse the OPTIMADE response. The remaining strings are analyzed by regexp matches.

In contrast, the "-y,x-y,z+1/3" string is just a string from the JSON point of view, but it has complicated internal structure, and we need a new grammar (even if it is a grammar for a regular language, it will be more complex than the regexp in the first case), and a new parser. This is more complex to describe, and more complex to implement.

The structured string notation is not much shorter than a value array, so value array seems a good compromise IMHO since we re-use existing grammar(s) (of JSON, CIF2, XML or whatever carrier we use for the response).

PS

Possible (but still simplified) Regexp for the symmetry operation grammar could be:

^([-+]?[xyz]([-+]([xyz]|[1-5]\/[12346])){0,2})(,[-+]?[xyz]([-+]([xyz]|[1-5]\/[12346])){0,2}){2}$

Tested as:

echo '-y,x-y+1/2,z+1/3' | perl -lne 'if (/^([-+]?[xyz]([-+]([xyz]|[1-5]\/[12346])){0,2})(,[-+]?[xyz]([-+]([xyz]|[1-5]\/[12346])){0,2}){2}$/i) {print "OK\t", $_} else {print "FAILED\t", $_}'

This is already probably too complex for regexp, and in reality will become even longer if we want to capture cases like '1/2-x' and exclude cases like 'x+5/2' (we will basically have to list all allowed fractions, I guess).

@sauliusg
Copy link
Contributor Author

I'm pretty sure it would take fewer words in a standard to describe how to create well-formed JF strings than what it would take to describe an entirely new format.

I experience the opposite: if we want to describe the full J-F notation, we will need a full-fledged grammar in EBNF, or a lon-ish RE, to capture all permissible expressions and to block all unwanted expressions; and then it comes the superspacegroups and magnetic groups with their complications. This is a long description, needs testing and a special parser for decent implementation.

In contrast, if we go for JSON then we can simply say: "symmetry description MUST be an array with the elements satisfying the following constraints", them list the regexps that the elements MUST satisfy, and then explain the semantics, and we are done. Standard users that implement a client just need to rely on the parsed JSON (or whatever carrier format it is) and check the regexp matches using the regexp engine of their implementation platform (all platforms have the regexp subset that we will use).

PS

As I was mentioning, we do not want to be bound to JSON, but other carrier formats allow the same: CIF2 has arrays a-la JSON, and XML has nested elements, e.g.:

<symmetry>
<operator>
   <component>
       <term>-y</term>
   </component>
   <component>
      <term>x</term><term>-y</term>
   </component>
   <component>
      <term>z</term><term>1/3</term>
   </component>
</operator>
</symmetry>

and so on...

@sauliusg
Copy link
Contributor Author

As for the full symmetry/P1 issue, my understanding is that plenty of computational packages just go with P1 for their calculations and don't bother with symmetry constraints (particularly if they are single-point calculations). So I would guess plenty of structures would be described as "P1" that certainly could have more symmetry.

This might be true now, but we want to represent experimental crystallographic data, computational descriptions that are in the same setting as experimental data (e.g. for comparison), and calculations that do take symmetry into account, don't we?

@merkys
Copy link
Member

merkys commented Dec 20, 2023

After having #464 merged, the only lacking property from the original @sauliusg proposal is the Schönflies symbol.

@vaitkus
Copy link
Contributor

vaitkus commented Jan 12, 2024

I had an offline discussion with @sauliusg, he is OK with not including the Schönflies symbol in this PR.

I will close this issue now, feel free to re-open it if you see anything else that has not been properly addressed.

@vaitkus vaitkus closed this as completed Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocking-release This is a PR or issue that presently blocks the release of next version of the spec.
Projects
None yet
Development

No branches or pull requests

6 participants