Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDLm rationalisation #188

Merged
merged 9 commits into from Nov 11, 2020
Merged

DDLm rationalisation #188

merged 9 commits into from Nov 11, 2020

Conversation

jamesrhester
Copy link
Contributor

@jamesrhester jamesrhester commented Oct 16, 2020

A number of items are obsolete in DDLm:

  1. Save frames. DDLm aims to be format-agnostic. All references to save frames and ref-loops have been removed, except for import statements for which a save frame reference is currently unavoidable
  2. LOOP category. No official dictionary has ever used more than 1 level of looping, so there is no need for such attributes, which are also CIF-format-specific.
  3. Complex data types. Data types of the form 'Type1|Type2' to indicate that an item could have more than one type are not used, are complex to implement, and so have been removed. Data types of the form List(Type2,Type3,...) to indicate List objects composed of a tuple of values, each of potentially different types, have been removed. These were almost exclusively used for synthetic keys, which should be opaque. Any other current use can be changed to Array type.
  4. Allow arrays of strings. The current DDLm dictionary restricts Arrays to numeric values for no apparent reason.

If these changes are considered significant enough, we should change the major version number to 4.0. Please comment on this.

@vaitkus
Copy link
Collaborator

@vaitkus vaitkus commented Oct 16, 2020

Removal of unused element indeed makes the dictionary more readable and easier to understand. However, I do have several notes:

  1. The _atom_type_scat.versus_stol_list from the CIF_CORE dictionary is still defined as a List of Lists of two real numbers. I assume that technically this is a complex nested type that would no longer be supported after merging this PR;
  2. Although the Multiple container type is currently not used, it could potentially be applied to resolve some incompatibilities between the DDL1 and DDLm versions of the CIF_CORE dictionary. Namely, the _exptl_crystal.colour could be defined as having the (List|Single) container type to allow both the properly structured CIF2/DDLm lists, i.e. "[transluscent, pale, green]", and the currently widely used concatenated strings that are compatible with CIF1/DDL1, i.e. "clear light colourless". Of course, if backwards compatibility is not one of the goals, than this note can be disregarded.

@jamesrhester
Copy link
Contributor Author

@jamesrhester jamesrhester commented Oct 20, 2020

  1. I have updated the pull request to redefine _atom_type_scat.versus_stol_list as an Array of real numbers. Pending a syntax for unknown number of top-level Array elements, I've left the dimension unspecified.
  2. This incompatibility is unfortunate and should have been dealt with originally by defining a new DDLm dataname instead of re-using the old one. DDL1 software should flag a syntax error when trying to input the DDLm-style data. I think this dataname should be raised in a new issue, and we may need to revert the DDLm type to the original DDL1 type.

@vaitkus
Copy link
Collaborator

@vaitkus vaitkus commented Oct 28, 2020

Great, thank you for the recent changes. It was a bit unexpected to see that multidimensional arrays are supported by the DDLm (i.e. the _model_site.adp_eigen_system data item with dimensions of [3,4]) since they were previously not used. So the main difference between an Array container and a Matrix container is that an Array supports non-numeric values and the Matrix container supports tensor operations?

The human-readable definitions of the _model_site.adp_eigen_system and _atom_type_scat.versus_stol_list data items still revolve around lists. This is probably not an issue since the word 'list' can be interpreted as a generic list and not as a specific DDLm container type. The capitalised word in the _model_site.adp_eigen_system definition (<...> in the form of 4 element List. <...>), however, should probably be lowercased. Also the word "eigenvalues" seems to be slightly misspelt ("eigenvales").

I have filled a separate issue (#189) for the _exptl_crystal.colour data item as suggested. I would also like to note that the previously mentioned _atom_type_scat.versus_stol_list data item was also defined in the DDL1 as a text string (however, used much less frequently than _exptl_crystal.colour).

@jamesrhester
Copy link
Contributor Author

@jamesrhester jamesrhester commented Oct 30, 2020

Yes @vaitkus you are correct about the differences between the Array and Matrix types. Thank you for raising #189 .

I have just added to the pull request by removing _category.key_id (and the category category) as per mailing list discussion. As this requires bumping the major version number to 4.0, I thought it reasonable to remove all deprecated items at the same time. That is the 'xref' datanames and the 'Count' and 'Index' content types.

@vaitkus
Copy link
Collaborator

@vaitkus vaitkus commented Oct 30, 2020

Thank you for the answer. I have tried validating the latests version of the DDLm dictionary against itself and discovered the following additional issues:

  1. The 'save_type.contents' is still assigned the 'Multiple' container type, even though this enumeration value was recently removed. However, it seems that in this particular case the Multiple type is needed to support constructs like "Real,Integer" as provided in one of the descriptions;
  2. The LOOP category has been completely removed from the dictionary, however, it is still referenced in the DICTIONARY_VALID loop (see _dictionary_valid.application values [Dictionary Prohibited] and [Category Prohibited]);
  3. The CATEGORY category was completely removed from the dictionary and, as a result, all references to it in the DICTIONARY_VALID loop were also removed (see _dictionary_valid.application values [Dictionary Prohibited] and [Category Recommended]). However, doing this also resulted in data items from the CATEGORY_KEY (sub)category becoming allowed in these contexts. Therefore, it would be better to replace the CATEGORY category with CATEGORY_KEY in the DICTIONARY_VALID loop instead of outright removing it.

@jamesrhester
Copy link
Contributor Author

@jamesrhester jamesrhester commented Nov 2, 2020

I have updated the pull request to take into account the items @vaitkus has noted.

@jcbollinger jcbollinger merged commit 64e0f28 into COMCIFS:master Nov 11, 2020
1 check passed
@jamesrhester jamesrhester deleted the ddlm_update2 branch Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants