index into a default list with more than one key value (ie create `ENUMERATION_DEFAULTS`) #399

rowlesmr · 2023-05-18T12:17:44Z

This is a first attempt at putting things into place.

The idea is to allow indexing a single default value based on an arbitrary number of keys. In the example below, neutron scattering lengths are keyed on element symbol and atomic mass number. In any case, a single value is given to each combination of keys.

The definitions need work.

.

I've put in _enumeration.def_index_ids with a default value of [ _enumeration.def_index_id ], to give backward compatibility with the current use; in the new scheme, you need to parse _enumeration.def_index_ids, and we don't need to redefine every single current default enumeration.

I've generalised _enumeration_default.index as _enumeration_default.indexes, even though may not make total sense to make it a plural. Again, in the new scheme, _enumeration_default.indexes has the default value of [ _enumeration_default.index ], to give backward compatibility with the current use

rowlesmr · 2023-05-18T12:22:22Z

An example use would be:

save_atom_type_scat.length_neutron

    _definition.id                '_atom_type_scat.length_neutron'
    _definition.update            2023-05-17
    _description.text
;
    The bound coherent scattering length for a given atom type and isotope.
;
    _name.category_id             atom_type_scat
    _name.object_id               length_neutron
    _type.purpose                 Number
    _type.source                  Assigned
    _type.container               Single
    _type.contents                Real
    _enumeration.def_index_ids     
        ['_atom_type.element_symbol' '_atom_type.mass_number']
    _units.code                   femtometres

    loop_
         _enumeration_default.indexes
         _enumeration_default.value
     [ H   . ]   -3.7390
     [ H   1 ]   -3.7406
     [ H   2 ]    6.671
     [ H   3 ]    4.792
     [ D   . ]    6.671
     [ He  . ]    3.26
     [ He  3 ]    5.74
     [ He  4 ]    3.26
     [ Li  . ]   -1.90
     [ Li  6 ]    2.0
     [ Li  7 ]   -2.22
     #...
 
save_

* _atom_type.mass_number is not yet a real data item. It's proposal lead to this change.

vaitkus · 2023-05-18T12:44:33Z

I am thinking that it might be better to create a completely new category that would be most similar to ENUMERATION_DEFAULT, but that would use the new multivalue-key approach (e.g. ENUMERATION_DEFAULT_LOOKUP, but please feel free to offer a better name). The ENUMERATION_DEFAULT would be deprecated and we could slowly migrate away from it in the dictionaries. The use of dREL code in the reference dictionary introduces some complexity that IMHO would be better avoided.

@rowlesmr , @jamesrhester what are your opinions on this?

rowlesmr · 2023-05-19T04:40:58Z

That makes some sense. The errors in the ddlm check as (I think) associated with not having the category key present, even though it's populated through dREL.

Something like:

_enumeration.def_index_ids - This is an array of datanames, that, together, form the key.
_enumeration_default_lookup.indexes - This is an array of values that form the key. The values are associated to the datanames given in _enumeration.def_index_ids.
_enumeration_default_lookup.value - This is the single value associated with a particular combination of key values.

.

The alternative would be to keep the current categories and just add _enumeration_default.indexes and _enumeration.def_index_ids as Array-type dataitems, and to change over all dictionaries at once. This would depend on how many other dictionaries use enumeration defaults.

.

More thinking, from #397 (comment)

My preference would simply be to define new data names as above, perhaps deprecating the current ones before there is too much DDLm software out there using them.

@vaitkus 's solution above is a go slowly approach, which is valid when dealing with standards: bring in a new category, apply it to scattering lengths, and move on from there. The alternate approach is to go nuclear and just define _enumeration.def_index_ids and _enumeration_default.indexes in the existing categories, and shift all of templ_enum.cif and cif_core.cif over to the new syntax. It may be a shock to the system, and I have no intuition as to how big a change this is, but it is an option.

I'm an idiot. We need @vaitkus 's solution above, as we are going to be changing the category key. But I would combine it with @jamesrhester 's preference, and deprecate the old and change over all the existing to the new.

I'll put something together.

rowlesmr · 2023-05-20T07:58:13Z

Just some other things to think about:

The atom symbol is a concatenation of element symbol and charge. If redesigning the entire system (which we're not), this could be changed to either separate element and charge or to also include the isotope number (if not natural abundance).

i) Separating the charge from the element would make the atomic mass, atomic number, radius bond*, dispersion tables, and the like much simpler. This would necessitate the requirement to index multiple keys to reference scattering factors and the like.

ii) Adding the isotope number to the _atom_type.symbol would negate the (current) need to index multiple keys. It would also vastly expand the aforementioned tables to include all the duplicate information for each isotope. You would need an isotope number for each charge state, as structures could be shared between X-ray and neutron experiments, necessitating both scattering factors and lengths.

and then (the position which should be chosen), iii) Dont't touch _atom_type.symbol, and just add _atom_type.mass_number, and a way to index on multiple keys.

* why are the bond radii of ions and neutral atoms the same? Whence did they come?

jamesrhester · 2023-05-22T06:26:42Z

My thinking is as follows: If we create a new category to supercede enumeration_default, then we maintain perfect backwards compatibility, at the expense of carrying around an extra, unnecessary category and needing to update our templ_enum.cif file. If we instead change the category key of enumeration_default to the new list-valued object, we are not maintaining compatibility, which, with judicious use of dREL, has the following consequences:

Software authors who have encoded these DDLm semantics need to update software to read our updated dictionaries
We update version of DDLm to 5.0

Either of these approaches is viable, which means the decision is a policy one. I'm inclined to go with the update of DDLm to version 5.0, and simply provide the new category key with the dREL that explains how it is derived from the old category key. My reasons for preferring this is that at this level, very few software authors are affected (probably all are reading this message) and it keeps the foundational language cleaner.

If going with the above approach, we would need to pass it by ddlm-group for policy signoff.

rowlesmr · 2023-05-22T13:36:00Z

I have no intuition as to the best approach, so I'll go with the nuclear option of just changing the existing category. If proposing that to the dlm-group, then I don't think I need to do anything more.*

I've also already updated templ_enum.cif to the new syntax (not on this branch, though).

* I can't really update the scattering length, as that also needs a new data item, and that would complicate things in this PR.

vaitkus · 2023-05-22T16:31:25Z

Either of these approaches is viable, which means the decision is a policy one. I'm inclined to go with the update of DDLm to version 5.0, and simply provide the new category key with the dREL that explains how it is derived from the old category key. My reasons for preferring this is that at this level, very few software authors are affected (probably all are reading this message) and it keeps the foundational language cleaner.

@jamesrhester As you mentioned, there seems to be two conflicting policies here. From my point of view, the proposed approach does not seem cleaner since it breaks things for DDLm software that is not capable of dynamically interpreting dREL code. Of course, the coders can still use dREL code as a reference, but it does require going back and tweaking the software. Given that, software from the cod-tools package does indeed not yet handle the ENUMERATION_DEFAULT fields when validating CIF instance files, however, dictionary validation will also be affected by this since the software would now need to know how to dynamically construct the category key from the values of another item (as can be seen by currently issued errors during PR validation). I do acknowledge, though, that such problems probably do not arise in software that properly interprets dREL.

I also understand that having two categories which serve a similar purpose may be somewhat confusing though still a much lesser evil from a programmers point of view as the handling of the new category can be implemented completely independently.

If the ddlm-group will prefer to go with the "nuclear" solution, I apologize in advance if the ddlm_validate program will break for a bit in a minor way. (It will probably be sufficient to modify the post commit check message filtering until I properly fix it).

jamesrhester · 2023-05-23T07:43:13Z

@vaitkus , if the templ_enum.cif file is updated at the same time, would that help to fix the issues cod-tools software might experience? The new category key would be present, the only part that would not be validated is the relationship with the new _enumeration.def_index_ids.

Also, if cod-tools were to check the dictionary DDLm conformance version number, it could simply refuse to check anything that is not DDLm version 4 until updated.

I am not wedded to the "nuclear" option, but the other option would mean an old, unused category that would ideally eventually be removed, leading again to a version 5 of DDLm.

vaitkus · 2023-05-23T17:00:54Z

If the templ_enum.cif file is updated at the same time, would that help to fix the issues cod-tools software might experience? The new category key would be present, the only part that would not be validated is the relationship with the new _enumeration.def_index_ids.

I think that would work. I would eventually implement the checking of the new relationships.

Also, if cod-tools were to check the dictionary DDLm conformance version number, it could simply refuse to check anything that is not DDLm version 4 until updated.

This is indeed something worth implementing since it would at least help to detect the incompatibilities. However, the underlying need to modify the code would still remain.

I am not wedded to the "nuclear" option, but the other option would mean an old, unused category that would ideally eventually be removed, leading again to a version 5 of DDLm.

I think that the accumulation of some outdated or deprecated definitions is unavoidable. These should preferably still be supported for a few intermediate versions and then eventually removed with the next major release (or even the next after that). Further more, is I correctly understand the current proposal, it still leaves some deprecated items (_enumeration.def_index_id, _enumeration_default.index) in the dictionary, with the only real difference being that they belong to the same category as the new items.

However, I do understand that this version of DDLm can still potentially be considered early in development so some less conservative changes may be preferred. Are these changes something you would like to see published in the upcoming volume G of ITC or are they aimed at a future release?

As a side note, the new propose name _enumeration_default.indexes should indeed be changed to something else since there is still a single index, it is simply potentially a multi-column (composite) one. Maybe something like "_enumeration_default.composite_index" be better? (Though still a bit of a misnomer in cases when we have only a single column.) We could go full database and call it a _enumeration_default.unique_index to clearly denote a one-to-one match.

rowlesmr · 2023-05-24T02:04:07Z

I think that the accumulation of some outdated or deprecated definitions is unavoidable. These should preferably still be supported for a few intermediate versions and then eventually removed with the next major release (or even the next after that). Further more, is I correctly understand the current proposal, it still leaves some deprecated items (_enumeration.def_index_id, _enumeration_default.index) in the dictionary, with the only real difference being that they belong to the same category as the new items.

@vaitkus makes a good argument. I say go with this. It still ends in a v5 with things removed, but is a more gradual, and officially deprecates them prior to removal.

Side note: is there anything in the CIF release notes about communicating the actual removal of deprecated dataitems?

As a side note, the new propose name _enumeration_default.indexes should indeed be changed to something else since there is still a single index, it is simply potentially a multi-column (composite) one. Maybe something like "_enumeration_default.composite_index" be better? (Though still a bit of a misnomer in cases when we have only a single column.) We could go full database and call it a _enumeration_default.unique_index to clearly denote a one-to-one match.

I am definitely not wedded to _enumeration_default.indexes. I can redo as _enumeration_default.unique_index.

jamesrhester · 2023-05-26T05:26:23Z

OK, I'm happy to go with the preference of people who are actually using this thing. Let's have a new category and leave things at DDLm version 4 for now. As this is in time for Volume G 2nd Edition, I will write it up in place of the old data names. What do we think about enumeration_defaults as the new category name, and then _enumeration_defaults.index would work fine?

ddl.dic

jamesrhester · 2023-06-05T05:09:32Z

Apart from removing the dREL I think I'm happy to merge this. What do you think @vaitkus ?

rowlesmr · 2023-06-05T06:46:07Z

I shall remove the drel. I think I put it there to avoid having to alter all the enumerations in templ_enum, but they shouldn't be too hard to convert; I'll have to find where I stashed my go at changing them.

@jamesrhester

as per comments from @jamesrhester

jamesrhester · 2023-06-05T08:24:08Z

This change can be independent of any changes in templ_enum.cif, as we haven't removed the old attributes.

vaitkus · 2023-06-05T21:13:43Z

@jamesrhester There is actually one annoying issue with _enumeration_defaults.index which I am not sure how to resolve. The problem is that _enumeration_defaults.index treats all of its values as having the save content type (Code) while in reality the data item referenced by _enumeration.def_index_ids can have a whole range of content type (i.e. the mass_number will be an integer).

Possible solutions:

We could explicitly state that the content type(s) should be inherited from the items referenced by the _enumeration.def_index_ids attribute, but we still need some kind of content type value to indicate that this should be done. Neither Implied nor ByReference content type seem to be suitable for this task, but I am not sure a new content type should be added for just this one case.

2.a Alternatively, we could continue having a single content type _enumeration_defaults.index, but then it should be clearly stated that all values of the composite key are treated as if they would be of that type (e.g. even if mass_number is an integer, key values 01 and 1 would be treated as different). However, the question then still remains which one of the content types should be chosen? Maybe Word?

2.b Same as 2a, but we also state that the values are normalised before converting them to the Word value. However, the problem here is that we do not have any rules for normalising data values of different content types.

jamesrhester · 2023-06-07T04:46:26Z

Yes, this type of indexing is something we have deliberately removed from the main dictionary, which wanted to index all categories using a single data name potentially created from several data values. We can't do that here without redesigning the entire default system.

We want a low-effort solution, as all we are talking about is looking up default values to inform calculations of derived values by dREL. This is not something that affects actual data transmission or archiving. So some thoughts:

The values used to index come from the data file, so we can't control their presentation. That means we must be able to handle caseless data values and alternative ways of writing integers.
We can control what datanames are used to perform the lookup (def_index_ids)

So I would propose simply defining an Inherited data type as follows:

"Used only with compound data values in the DDLm attribute dictionary. The data type of each value in a compound value is the same as that of the data name to which it is directly related."

rowlesmr · 2023-06-07T09:05:27Z

Would you like me to work that into this PR?

jamesrhester · 2023-06-07T09:55:12Z

Let's wait and see what @vaitkus thinks.

vaitkus · 2023-06-07T11:32:56Z

I am generally OK with having the Inherited content type as long as it is only used in the reference dictionary.

However, the relationship between the value and the related item might not always be clear to an outside reader, so the definition of Inherited type should strongly encourage to include rules on establishing such relationships into the human-readable definition of the data item. Of course, we will have no formal way of checking if the description actually includes such information, but it is still better than nothing. Let's update the PR along these lines and then see if it need any further fine-tuning.

In the next major revision we may come up with better ideas, but, as you mentioned, this would require a serious reworking of the typing system.

rowlesmr · 2023-06-07T13:22:44Z

I've added some words.

The definition of _type.container for the value Implied refers to "the DDLm Reference Dictionary". Is that different to a "DDLm Attribute Dictionary"? or even "DDLm Dictionary"? also should it be "the" or "a" dictionary? is this supposed to refer to the file ddl.dic? I'm not up to speed on this level of nuance in the terminology.

@vaitkus

from @vaitkus

rowlesmr · 2023-06-13T13:38:56Z

How can a List dataitem have a _type.contents attribute, if not Inherited?

A List is:

Ordered set of values. Elements need not be of same contents type.

I got it wrong, a List can have Implied contents if there are mixed types.

Does

Operations across arrays are equivalent to operations across elements of the Array

mean [1 2 3] * [2 4 6] = [1*2 2*4 3*6]?

vaitkus · 2023-06-13T16:15:08Z

mean [1 2 3] * [2 4 6] = [1*2 2*4 3*6]?

Not sure if you can multiply two arrays like this. My interpretation would be more like [1 2 3] * 5 = [5 10 15].

rowlesmr · 2023-06-14T22:44:38Z

Sure you can. It's element-wise multiplication (Hadamad product).

jamesrhester · 2023-06-15T03:00:32Z

Just to confirm, yes, Lists may have different content types, but only Implied and Inherited would seem to allow that to happen. There's no point defining such a heterogeneous compound data name for use in a data file, as each of the components being of a different type means that they are processed differently which means they have a different meaning, so need a separate data name.

An Array has uniform content type.

A Matrix follows the rules of matrix multiplication, and would include items meant to be used as row/column vectors.

jamesrhester · 2023-06-22T02:16:43Z

I think this is ready for merging if those few suggested changes above are incorporated, and the minor version is bumped in ddl.dic, and _dictionary.ddl_conformance in cif_core.dic changed to match.

vaitkus · 2023-06-22T08:39:09Z

@jamesrhester The last version number change already increased the minor version, that is, the version was changed from 4.0.2 to 4.1.0, so the DDL version number increase is probably not necessary. Or should we create a new version of 4.2.0 and put all the changes from this branch there?

Also, should we formally deprecate the old ENUMERATION_DEFAULT category?

jamesrhester · 2023-06-23T05:02:45Z

Yes, we should deprecate the old category. OK not to change the version number then, as long as it is different to the one used in the 3.2.0 core release.

…ngly.

vaitkus · 2023-06-23T13:51:02Z

Ok, so I did quite a few changes since the last review:

Deprecated the ENUMERATION_DEFAULT category and the _enumeration.def_index_id attribute.
Adapted the ENUMERATION_SOURCE category mechanism to work with ENUMERATION_DEFAULTS instead of ENUMERATION_DEFAULT.
Updated the version number of DDL reference dictionary. However, it seems that we did assign some of the changes to version that were already in the 3.2.0-rc branch. I corrected the DDL reference dictionary here, but a more proper fix was implemented in PR Update dictionary version numbers and changelog messages. #429. I think we should merge that PR first, sync the master branch with this branch and after a final review we should be good to go (IMHO).

rowlesmr

I think it looks all good. Just that one change on line 1002.

ddl.dic

Co-authored-by: Matthew Rowles <rowlesmr@gmail.com>

vaitkus

The validator does not yet properly handle the 'Inherited' type, but I updated the GitHub actions to ignore the warnings for now so we can start using these dictionary attributes in other feature branches.

I will most likely implement the proper handling in the upcoming few weeks.

first attempt at new dataitems

51809ae

updated update date date

3486c02

rowlesmr mentioned this pull request May 22, 2023

Add _atom_type.mass_number #400

Merged

added ENUMERATION_DEFAULTS and reverted changes to ENUMERATION_DEFAULT

13d3125

jamesrhester reviewed Jun 5, 2023

View reviewed changes

ddl.dic Outdated Show resolved Hide resolved

jamesrhester reviewed Jun 5, 2023

View reviewed changes

ddl.dic Outdated Show resolved Hide resolved

remove drel

349941c

as per comments from @jamesrhester

rowlesmr changed the title ~~index into a default list with more than one key value~~ index into a default list with more than one key value (ie create ENUMERATION_DEFAULTS) Jun 5, 2023

rowlesmr mentioned this pull request Jun 5, 2023

Add ENUMERATION_SOURCE #406

Merged

vaitkus mentioned this pull request Jun 7, 2023

units for dataitems with mixed units? #409

Closed

implement wording changes

28127e0

from @vaitkus

rowlesmr mentioned this pull request Jun 13, 2023

Use of List vs Array vs Matrix COMCIFS/Powder_Dictionary#148

Closed

Merge branch 'master' into default-list-with-multiple-keys

ab0d1f1

rowlesmr mentioned this pull request Jun 17, 2023

wishlist of all the features we would like to eventually fix/address in DDLm 5/6/7 #424

Open

Merge branch 'master' into default-list-with-multiple-keys

c93304c

vaitkus added 2 commits June 22, 2023 10:22

Change the container type to 'List'.

5a183cd

Correct the category name in the description.

d6ed27a

vaitkus added 7 commits June 23, 2023 14:59

Deprecate the ENUMERATION_DEFAULT category.

21c752d

Set the same update date for all definitions involved in the PR.

159a9e5

Fix attribute order.

e87283d

Deprecate the _enumeration.def_index_id attribute.

fff4029

Adapt the ENUMERATION_SOURCE to work with ENUMERATION_DEFAULTS.

f02091d

Change ddl.dic version to 4.2.0 and update other dictionaries accordi…

4e4d707

…ngly.

Fix layout.

d173443

jamesrhester approved these changes Jun 26, 2023

View reviewed changes

vaitkus added 2 commits June 26, 2023 11:29

Merge branch 'master' into default-list-with-multiple-keys

855adf6

Merge branch 'master' into default-list-with-multiple-keys

41509b1

rowlesmr commented Jun 26, 2023

View reviewed changes

ddl.dic Outdated Show resolved Hide resolved

Accept @rowlesmr correction.

df2de9e

Co-authored-by: Matthew Rowles <rowlesmr@gmail.com>

vaitkus approved these changes Jun 27, 2023

View reviewed changes

jamesrhester merged commit 74c022a into COMCIFS:master Jun 27, 2023
3 checks passed

vaitkus mentioned this pull request Jun 28, 2023

Impossible to index into a default list when there is more than one key value #397

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index into a default list with more than one key value (ie create `ENUMERATION_DEFAULTS`) #399

index into a default list with more than one key value (ie create `ENUMERATION_DEFAULTS`) #399

rowlesmr commented May 18, 2023 •

edited

Loading

rowlesmr commented May 18, 2023 •

edited

Loading

vaitkus commented May 18, 2023

rowlesmr commented May 19, 2023 •

edited

Loading

rowlesmr commented May 20, 2023

jamesrhester commented May 22, 2023 •

edited

Loading

rowlesmr commented May 22, 2023

vaitkus commented May 22, 2023

jamesrhester commented May 23, 2023

vaitkus commented May 23, 2023

rowlesmr commented May 24, 2023

jamesrhester commented May 26, 2023

jamesrhester commented Jun 5, 2023

rowlesmr commented Jun 5, 2023 •

edited

Loading

jamesrhester commented Jun 5, 2023

vaitkus commented Jun 5, 2023

jamesrhester commented Jun 7, 2023

rowlesmr commented Jun 7, 2023

jamesrhester commented Jun 7, 2023

vaitkus commented Jun 7, 2023

rowlesmr commented Jun 7, 2023 •

edited

Loading

rowlesmr commented Jun 13, 2023

vaitkus commented Jun 13, 2023

rowlesmr commented Jun 14, 2023

jamesrhester commented Jun 15, 2023

jamesrhester commented Jun 22, 2023

vaitkus commented Jun 22, 2023

jamesrhester commented Jun 23, 2023

vaitkus commented Jun 23, 2023

rowlesmr left a comment

vaitkus left a comment

index into a default list with more than one key value (ie create ENUMERATION_DEFAULTS) #399

index into a default list with more than one key value (ie create ENUMERATION_DEFAULTS) #399

Conversation

rowlesmr commented May 18, 2023 • edited Loading

rowlesmr commented May 18, 2023 • edited Loading

vaitkus commented May 18, 2023

rowlesmr commented May 19, 2023 • edited Loading

rowlesmr commented May 20, 2023

jamesrhester commented May 22, 2023 • edited Loading

rowlesmr commented May 22, 2023

vaitkus commented May 22, 2023

jamesrhester commented May 23, 2023

vaitkus commented May 23, 2023

rowlesmr commented May 24, 2023

jamesrhester commented May 26, 2023

jamesrhester commented Jun 5, 2023

rowlesmr commented Jun 5, 2023 • edited Loading

jamesrhester commented Jun 5, 2023

vaitkus commented Jun 5, 2023

jamesrhester commented Jun 7, 2023

rowlesmr commented Jun 7, 2023

jamesrhester commented Jun 7, 2023

vaitkus commented Jun 7, 2023

rowlesmr commented Jun 7, 2023 • edited Loading

rowlesmr commented Jun 13, 2023

vaitkus commented Jun 13, 2023

rowlesmr commented Jun 14, 2023

jamesrhester commented Jun 15, 2023

jamesrhester commented Jun 22, 2023

vaitkus commented Jun 22, 2023

jamesrhester commented Jun 23, 2023

vaitkus commented Jun 23, 2023

rowlesmr left a comment

Choose a reason for hiding this comment

vaitkus left a comment

Choose a reason for hiding this comment

index into a default list with more than one key value (ie create `ENUMERATION_DEFAULTS`) #399

index into a default list with more than one key value (ie create `ENUMERATION_DEFAULTS`) #399

rowlesmr commented May 18, 2023 •

edited

Loading

rowlesmr commented May 18, 2023 •

edited

Loading

rowlesmr commented May 19, 2023 •

edited

Loading

jamesrhester commented May 22, 2023 •

edited

Loading

rowlesmr commented Jun 5, 2023 •

edited

Loading

rowlesmr commented Jun 7, 2023 •

edited

Loading