-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pd_diffractogram category. #39
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks fine?
I'll need to do a worked example to see how it all goes together. I really only know the block pointing the pdCIF does, and not the other general CIF linking.
Will this also need to be added to PD_CALC_COMPONENTS when that gets in?
I will create an example containing both block pointers and these pointers to demonstrate. The idea of this PR is to create the category, and then other PRs can add in child data names of |
At the most basic level I can write:
and then refer to Does this addition mean I can write:
and then refer to |
Yes, that is correct. Sorry I haven't put together an example yet, I'll do that now. |
OK, here is an example of the use of Again, I am not saying information must be separated into blocks in this way, but this corresponds to the default in the absence of any other guidance. You will note that
|
Just to get things straight in my head, with regard just one of the extra blocks:
This datablock contains information about the phase The pdCIF way of linking things is through block ids. |
also, I think that the definition of
doesn't really cut it anymore. |
Also: Currently, The equivalent datanames for diffractograms are There is a disparity here. If a phase appears in multiple diffractograms, should we not be able to loop |
Yes, this is correct. I might not say "normal way..." but more "minimum requirement for linking information between blocks in CIF".
Yes, that is correct.
OK, so the fundamental principle I'm trying to adhere to here is that the In this presentation, there will be only one value of So |
I'm not sure I like that approach. Now I need to either have a bunch of individual data blocks (which is error prone to produce), or use non-standard (non-default) CIF (and trust that software knows what to do) in order to list (for example) the QPA of some diffractograms: I'm only presenting one diffractogram per block, it's just that they reference multiple phases.
versus
|
I understand your distaste. This is why I'm assuming that the powder CIF community will specify their own preferred way of grouping results into data blocks so that pdCIF-conversant software will only have to understand that one particular way. If that way is not the "Default" schema then "Set" and "Loop" category types are not a limitation and you can loop "Set" category items as long as a category key has been assigned in the dictionary. Note that |
But there still is only one diffraction pattern or phase in the data block, it just happens to want to reference many other phases or diffraction patterns, respectively. This even negates the original use of Shouldn't we try and bake desired behaviour into the standard, and not require users to bypass it to do what they want to do? Could there be a |
Don’t understand the issue
On Nov 22, 2022, at 8:30 AM, Matthew Rowles ***@***.******@***.***>> wrote:
But there still is only one diffraction pattern or phase in the data block, it just happens to want to reference many other phases or diffraction patterns, respectively. This even negates the original use of _pd_phase_id for linking a _pd_refln_peak_id with a _pd_block_id. @briantoby<https://github.com/briantoby> any input on this?
|
So the issue as I understand it is that a diffraction pattern data block might want to point to data blocks containing all of the phases that are in this diffraction pattern, and a phase data block might want to point to all of the data blocks that contain diffractograms that include this phase. In either case restricting
We are forced to conclude that the use of I don't think that
Absolutely. The As long as the other dictionaries (powder, magnetism, incommensurate, twinning, etc.) do not change that choice of Thus the If we would like to mandate a particular, different, set of |
_pd_block_diffractogram_id and _pd_phase_block_id are already used in loops.
|
Sorry to be short, but it's bed time. From the DDL1 dictionary,
Here, the short length of the |
How's this as a summary/suggestion:
. Next: What to do with In my limited experience,
. I think this retains current capabilities and extends modern CIF linking to (at least a part of) pdCIF. @jamesrhester @briantoby ? . *Also, why is it |
Agreed that we have to do this.
Тhe way in which Set categories work is that all child data names of the set category data name (
Yes, though note that it doesn't identify a block as such, it assigns a value to
Ok, so we do the same as for Note in these solutions for
Agree.
No idea on the historical naming choices, agree on |
had to update CIF_POW date to be after the last CIF_CORE date?
Let's try that again...
All definitions and details need to be checked, as I haven't got them all. But I think (at least) all of the required data items are present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy Winter holidays and sorry for my lack of activity. Generally, my knowledge in the field of powder diffraction is currently extremely limited, but I think I got the general gist of data relationships from your discussions. Either way, an additional set of eyes with the general knowledge of DDLm does not hurt either.
Some inconsistencies that I noticed:
- Some category names were slightly incorrect which resulted in name duplication. I suggested fixed directly in the PR. Not sure why these issues were not detected by the automated check, probably due to a slightly outdated version of cif_ddlm_dic_check being used (easy fix in the future).
- The human-readable definitions of
_pd_data.diffractogram_id
,_pd_meas.diffractogram_id
and_pd_proc.diffractogram_id
all had the same text so I tried to tie it more to the categories that these items belong to. Changes suggested directly in the PR. Feel free to rephrase it in a more readable way. - The content type of the
_pd_diffractogram.id
and the types of all items that point to it should probably be changed fromCode
toText
. This is necessary because according to the human-readable definition of the_pd_diffractogram.id
data item and the associated dREL code, the value of this items can be used interchangeably with_pd_block.id
which is of theText
content type. Due to this we may run into some incompatibles down the line. - The
_definition.update
date in the definitions of the_pd_block_diffractogram.id
,PD_DATA
,PD_CALC
,PD_MEAS
,PD_PROC
should also be updated since these definitions were seemingly changed. - Some of the human-readable descriptions now contain references to outdated data names that no longer exist. Specifically, the following changes should be made:
- save_PD_BLOCK:
_pd_calib.std_external_block_id
->_pd_calib_std.external_block_id
. - save_PD_BLOCK:
_pd_phase.block_id
->pd_phase_block.id
. - save_pd_calc_component.block_id:
_pd_phase.block_id
->pd_phase_block.id
.
- save_PD_BLOCK:
Alternatively, some alias may have accidentally been left out from the new definitions.
I also have two question that may require slightly more discussion:
- If one of the intents is to be able to distribute the same dataset in various alternative forms (e.g. as a single CIF file, as multiple files of potentially different data formats) maybe it would also make sense to introduce some kind of a
_pd_dataset.uuid
data item that would allow to easily link these separate files? If I correctly understand the current approach, the_pd_phase.id
data item should generally be used for this purpose, however, I am unsure how well it would work outside of a specific context (e.g. if we encounter the same id in different files outside of an archive file, how sure can we be that they indeed describe the same dataset?). Use of a UUID would greatly reduce the probability of a id clash. The same can probably also be said for the_pd_block.id
value format since it uses a timestamp as one of its components, but it does not seem that currently this format is enforced in the descriptions of_pd_diffractogram.id
and related items (and potentially this format is not even desired for these fields due to its complexity/lack of brevity). - I would really appreciate if an example file illustrating the usage of data relationships introduced by this PR could be added (e.g. under examples/). Since several example were already presented in the discussion of this PR, maybe one of them could turned into a proper CIF file? It is not necessary to update the GitHub check to automatically validate these example files at this point, but at least having a working example would be extremely useful for testing and the general understanding of the concept.
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
As you've just shown!
I've commited these. And need to go back and change them for line length!
I'll wait for comment from @jamesrhester on this. One thing of note, the description of
The change was that the appropriate
I've changed these and found two other _pd_phase.block_id ones.
I haven't checked for these. |
Easy one first: Yes, we should work up a proper example. I've also mentioned this in #47, but more of a more general one, as we need to wait for the all the (current) additions (especially this one) to be merged to properly before we can do that (large) task.
Potentially heretical thing coming. From what I've gathered, @briantoby introduced We now have . * Yes, ** we will need to make something like |
Yes, let's make it I agree that we should work up some full examplex once we have finished a few of our other current ongoing changes. Meanwhile, hopefully the examples in this PR are sufficient to assess how the mechanism works.
As far as I can tell no such mechanism is guaranteed to be robust in all situations. For example, if calibration files are included in a data set, as they should, then they will belong to all datasets that have used the same calibrations. Therefore every data block associated with the calibrations will need to have a new UUID inserted, that is, must be changed. There was a long discussion about some sort of way of linking all data blocks in a dataset together here, from which I drew the conclusion that the only universal way is for the context to dictate aggregation. This does not invalidate the use of In any case, discussion of a
The |
@vaitkus : if you're happy, I can merge. |
Thank you for all the changes and for filing my questions as separate issues where relevant. It is indeed the better to continue the discussions there.
@rowlesmr , this is an extremely good point. Given the properties that you highlighted (case-insensitivity, no spaces) the I also noticed that the PR branch now contains the
I will of course not push on this, but IMHO having a separate examples file that describe the ongoing discussion, even if incomplete, proved quite useful during the development of the TopoCif dictionary since it made immediately clear when changes broke something that they were not supposed to affect. Anyways, I copied the example from the discussions to my machine as a separate file and with some minor changes (introduction of Multiple data blocks (e.g. I approve the PR anyways in case these questions run the risk of going too much off topic again. |
We don't want PD to be special and ignore the The intent here is to enforce the requirement that My interpretation of what you're saying is that a block containing a I think the way to do what we want to do is to just have the purpose as
|
I initially changed the dREL to properly reflect the idea that if |
Closes #21 . The current PR adds the category and child data names of
_pd_diffractogram.id
. This has the effect of linking the tabulated diffraction scan data to the concept of a diffractogram. When loops containing per-diffractogram, per-something-else information (e.g. per diffractogram per phase) are created, adding child data names of_pd_diffractogram.id
formally states that the information in those loops depends on a particular diffractogram.This is intended to be complementary to the block pointers used in pdCIF, and will work in situations where software is dictionary-aware but not pdCIF-aware.