-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow per-phase calculated intensity #3
Comments
This one is my fault. I've been thinking recently about plotting pd data from CIF, and what would be good things to be able to see. My initial idea of a solution to document the contribution from each phase is something like:
|
I think this is a good argument for the single-block CIF with _pd_phase.id. This would allow expansion by adding a new column for each phase rather than a new loop. In fact, the above is invalid unless each loop is put in a separate block, since each loop overwrites the previous data names. |
Yes, @rowlesmr 's suggestion cannot work because you may not duplicate data names within a block. If each of the loops over |
Yeah, just noticed that. Multiple instances of a data name in a single block result in issues. A modification of my example would be something like below. Each crystalline phase belongs to only one diffraction pattern, and therefore has a unique profile. Each diffraction pattern has many phases. I think everything knows about everything else.
A more complicated example (taken from NISI.cif) is where each phase has multiple experimental patterns, and each pattern has multiple phases. In this one:
|
Maybe my previous examples were a little too complex Here I propose the following new data names
In this one: The individual profiles know about their diffraction pattern through _pd_block_diffractogram_id. The diffraction patterns don't know about each other Anyway, I don't really know what I'm doing here, so I'll stop for now.
|
It is not clear to me how the intensity information would be stored. As a reflection table? As I recall (perhaps incorrectly), the reflection table allows a phase id to be included, which means that the reflection table can be included in the dataset block. This seems like a cleaner way to handle things then set up a new block structure.
OTOH, there is the need to set up for n*m sets of profile descriptions (where there are n phases and m datasets). It might still be better to used a looped variable for that where a phase ID would be included in a table by dataset (not good to put them in a phase block, since the description used might vary by dataset type), this would be valuable if the definitions available for profile information were to be expanded.
Brian (T.)
On Nov 7, 2021, at 7:48 AM, rowlesmr ***@***.******@***.***>> wrote:
Maybe my previous examples were a little too complex
Here I propose the following new data names
* _pd_profile_block_id: this is the block id of the block which contains the profile information pertaining to the structure/diffraction pattern in the current block
* _pd_proc_profile_intensity_total & _pd_proc_profile_intensity_net: the intensity attributed to a certain phase, either with or without a background contribution.
In this one:
The crystal structures know about their diffraction patterns through _pd_block_diffractogram_id.
The crystal structures know about their individual profiles through _pd_profile_block_id.
The crystal structures don't know about each other.
The individual profiles know about their diffraction pattern through _pd_block_diffractogram_id.
The individual profiles of a crystal structure don't know about each other.
The individual profiles know about their crystal structure through _pd_phase_block_id.
The diffraction patterns don't know about each other
The diffraction patterns know about their individual profiles through _pd_profile_block_id
The diffraction patterns know about their crystal structures through _pd_phase_block_id,
Anyway, I don't really know what I'm doing here, so I'll stop for now.
`data_STR1_block
_pd_block_id STR1
loop_
_pd_block_diffractogram_id
XRAY
NEUTRON
loop_
_pd_profile_block_id
STR1_XRAY
STR1_NEUTRON
loop_
_refln_d_spacing
2.3
3.4
4.5
5.6
#other crystal structure information
data_STR2_block
_pd_block_id STR2
loop_
_pd_diffractogram_id
XRAY
NEUTRON
loop_
_pd_profile_block_id
STR2_XRAY
STR2_NEUTRON
loop_
_refln_d_spacing
2.35
3.45
4.55
5.65
#other crystal structure information
data_XRAY_block
_pd_block_id XRAY
loop_
_pd_phase_block_id
_pd_profile_block_id
STR1 STR1_XRAY
STR2 STR2_XRAY
loop_
_pd_meas_2theta_scan
_pd_meas_counts_total
_pd_calc_intensity_total
_pd_proc_intensity_bkg_calc
1 2 3 4
2 3 4 5
#etc
data_NEUTRON_block
_pd_block_id NEUTRON
loop_
_pd_phase_block_id
_pd_profile_block_id
STR1 STR1_NEUTRON
STR2 STR2_NEUTRON
loop_
_pd_meas_time_of_flight
_pd_proc_d_spacing
_pd_meas_counts_total
_pd_calc_intensity_total
_pd_proc_intensity_bkg_calc
1 2 3 4 5
2 3 4 5 6
#etc
data_STR1_XRAY_block
_pd_block_id STR1_XRAY
loop_
_pd_block_diffractogram_id
_pd_phase_block_id
XRAY STR1
loop_
_pd_meas_2theta_scan
_pd_proc_profile_total
1 2
2 3
#etc
data_STR1_NEUTRON_block
_pd_block_id STR1_NEUTRON
loop_
_pd_block_diffractogram_id
_pd_phase_block_id
NEUTRON STR1
loop_
_pd_proc_d_spacing
_pd_proc_profile_total
1 2
2 3
#etc
data_STR2_XRAY_block
_pd_block_id STR2_XRAY
loop_
_pd_block_diffractogram_id
_pd_phase_block_id
XRAY STR2
loop_
_pd_meas_2theta_scan
_pd_proc_profile_intensity_total
1 2
2 3
#etc
data_STR2_NEUTRON_block
_pd_block_id STR2_NEUTRON
loop_
_pd_block_diffractogram_id
_pd_phase_block_id
NEUTRON STR2
loop_
_pd_proc_d_spacing
_pd_proc_profile_intensity_total
1 2
2 3
#etc
`
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACH7E2CX22OEVNGWSVNBSXDUKZ7SJANCNFSM5D5JQP5A>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Yes, you can store reflections from individual phases together in a single table when you include _pd_refln_phase_id
Yes, this is clunky.
does "dataset" mean "data block containing a diffraction pattern"? if so, there would need to be a bunch more keywords, but it would cut down on the number of blocks. You would need a This would definitely mimic a reflection table, just for every point in the diffraction pattern. It could look something like:
|
I think the time has come to figure out general principles for presenting complicated data. These principles would apply to PD as well as modulated + composite and any other complex dataset. The plan is to work these out for powder by imagining complicated scenarios and making sure they work. The following is a simple summary of what I've come up with so far. Note this is all in terms of DDLm dictionaries, DDL1 could never cope properly with the demands of any reasonably complex dataset. NB The use of block pointers addresses a separate problem that needn't complicate things here. Key information:
Tasks:
As I understand it, the way in which powder would like to split things up is to have information specific to a particular phase in separate data blocks. Therefore, in DDLm terms, pd_phase is a Cif_core specifies that Now I gather that a "summary block" is desirable, where selected information found in the other blocks is collated. This would be where block pointers would be included, but it should be the case that the same information could be obtained by just reading in all of the other data blocks. In any case, the summary block would need to e.g. loop I think this all started because @rowlesmr wanted to record the contributions of each phase to the calculated diffraction pattern. In the scheme posited above, this would require a separate tabulation in each data block corresponding to a particular diffraction pattern + particular phase, as well as a tabulation of the overall fit in each data block corresponding to a particular diffraction pattern (with no phase-specific information). This may seem vaguely wasteful of space due to the repetition of the 2 theta values, but the alternative would be to define a further So my question is, does the above scheme cover all situations that you've encountered? Have I perhaps missed something else that should be separated into another data block? |
I am afraid that I do not understand the meaning of “
1. Data names in a Set category may only take a single value in a single data block.
Etc.”
So I am just not following the gist of what you are saying.
I now understand what is wanted to provide partial patterns by phase. From a logistics perspective one really wants all the partials in a single loop. What one really needs is a way to say a CIF name gets N values not 1 for every row in the table. I think star might have a quoting or grouping mechanism that allows this even if CIF does not.
Brian
Sent from a powerful small device but with weak eyes.
On Nov 8, 2021, at 2:38 AM, James Hester ***@***.***> wrote:
Data names in a Set category may only take a single value in a single data block.
|
Apologies for the lack of clarity. In DDLm dictionaries, categories are classified as
The only way to do this in a single loop in even our most flexible interpretation of the relational model is to have a separate column labelling the phase this calculated intensity belongs to. So for two phases you would have what @rowlesmr proposed:
If that is what you would prefer then we can do that. I don't understand why having the partial pattern grouped together in a separate data block with the per phase, per histogram information is less practical though. |
What do you mean by "logistically" when wanting the partials all in one loop? If they all in one loop, you probably don't need the complexity of linking them to the structures and diffractograms, as you could just stick it in the diffractogram block and piggyback off the linking that is already there. In both cases, the total number of datapoints you're adding is the same, as you still need to repeat each datapoint in the measured data for each profile you want to record. . I should explain my "clunky" comment. Ideally, you could have a single loop that gives columns for 2theta, meas_intensity, calc_intensity, and then one column per individual profile, but that would either necessitate repeating the profile intensity dataname in a loop, or having an arbitrary number of datanames to hold profile_1, profile_2... intensities The clunkiness arises from having to repeat 2theta values in different loops or blocks that already exist. |
Here is my thinking on this: the goal of a loop_ structure is to bring together information that is related and shares a common ordinate. It is more difficult to relate such data when spread out over multiple loops and even harder when spread across blocks. The partial structure factor is definitely such a quantity, since in the end one probably wants to be able to see the partials superimposed or at least relate them, so I would really want them in a single loop. I really hate the idea of breaking up data across blocks that is logically and structurally linked. Now, something like 20 years after the introduction of multiple blocks for related information in pdCIF, do we yet have any software that assembles multiple blocks?
While less than ideal, here is one way to accommodate partials in the current syntax:
loop_
_pd_profile_meas_2theta_scan
_pd_profile_intensity_partials
5.00 “4 2 0”
5.02 “4 3 0”
5.04 “3 10 0”
loop_
_pd_profile_partials_phase_assignment
a b c
Another would be this
loop_
_pd_profile_meas_2theta_scan
_pd_profile_intensity_partialA
_pd_profile_intensity_partialB
_pd_profile_intensity_partialC
5.00 4 2 0
5.02 4 3 0
5.04 3 10 0
Both have their disadvantages. Then again one could get inventive with CIF syntax and do something like this:
loop_
_pd_profile_meas_2theta_scan
_pd_profile_intensity_partial[ABC]
5.00 4 2 0
5.02 4 3 0
5.04 3 10 0
Or
loop_
_pd_profile_meas_2theta_scan
_pd_profile_intensity_partials
5.00 {4 2 0}
5.02 {4 3 0}
5.04 {3 10 0}
I would argue that a goal of CIF is to keep together all the information that shares a structure (using that term from a database perspective). One would really not want to encourage partials to be tabulated with different data ranges, step sizes etc., but why not if they are in logically disconnected structures?
Brian
On Nov 8, 2021, at 11:20 PM, rowlesmr ***@***.******@***.***>> wrote:
What do you mean by "logistically" when wanting the partials all in one loop?
If they all in one loop, you probably don't need the complexity of linking them to the structures and diffractograms, as you could just stick it in the diffractogram block and piggyback off the linking that is already there.
If each profile is in it's own block, you do need to link everything, but you get the simplicity of "this block is the just for that phase in that other diffractogram".
In both cases, the total number of datapoints you're adding is the same, as you still need to repeat each datapoint in the measured data for each profile you want to record.
.
I should explain my "clunky" comment. Ideally, you could have a single loop that gives columns for 2theta, meas_intensity, calc_intensity, and then one column per individual profile, but that would either necessitate repeating the profile intensity dataname in a loop, or having an arbitrary number of datanames to hold profile_1, profile_2... intensities
The clunkiness arises from having to repeat 2theta values in different loops or blocks that already exist.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
(Just on that, Dave Billings should be emailing you and James about what has just started in the CPD) . I've only looked at the pictures in "DDLm: A New Dictionary Definition Language". Is it possible to have vectors, where their length is defined by another data item?
|
I am reminded of a quote I read once in a database book that I've never been able to find again: the relational model is always second-best, meaning that in any given situation you can find a more efficient, streamlined way to represent data, but the relational model will still be second-best when the situation changes, while your original streamlined approach is now much worse. Anyway. Matthew's suggestion of using CIF2 vectors would be workable, with another vector defined somewhere as per one of Brian's suggestions above to give the order of phases. There is no need to define a length for a CIF2 vector. So, I've slightly expanded Matthew's example below. How does it look? Notes on the example:
|
Is it not possible to automatically define It would be easier to maintain the CIF file if I only need to write down the phases in one place. Although, I do recall from somewhere (pycifrw docs?) that row order isn't guaranteed in CIFs... |
No, the order of rows is very deliberately not significant. I understand your concerns with writing down the phases in more than one place, this is a key concern of the relational model, which aims to minimise duplication of information. The "ideal" relational approach in our case would have every separate phase in a separate data block, with no "summary block", meaning you really would only write the phases down once, and then shuffle the data around after reading it in, to match your problem of the day. |
Relevant to this issue is http://comcifs.github.io/accepted/multi-block-principles. The core dictionary combined with that document and PD |
I've now drafted a document for ongoing discussion: https://github.com/COMCIFS/comcifs.github.io/blob/master/draft/powder_data_presentation.md |
How about something like this?
New data items:
|
As per previous discussions, I proposed that there would always be something like Therefore, any per-phase calculated intensity loop must be in a different (new) category, let's call it So that was a long-winded way of saying, yes, I have no objections to this proposal, as long as |
I only know enough to be dangerous, so questions: isn't that what and . or is it something like:
such that the order of values given in . or is it to do with a summary block listing all of the histograms, phases, component profiles, and the like? . Example time! component-intensities-in-a-list:
Per-phase listing
|
So this is not to do with the summary block. By having the Small point: lists (square-bracket-delimited values) are a CIF2 feature so any CIF reading software expecting CIF1 format is likely to fail rather than skipping over the value. Perhaps a more pedestrian reason for I've written out some dREL below to assure myself that not having Also, CIF allows the use of massive image arrays of numbers instead of the pure relational approach of a table of x,y positions and pixel intensity. So it is not like using an array to save space is new. I've written out some dREL showing the precise relationships between these categories. Note how dREL forces us to
It is indeed possible to write dREL for the
If we need to access the scale for a particular phase we get instead:
|
This sounds like a good reason to put it in.
I know the parser I'm fiddling around with writing for CIF1 just fails when it gets a '['. Pedestrian, but still legitimate. .
so I think, strictly,
^ With this definition, overlaying
^ This definition, requires that the bkg and normalisation correctsion are identication for each . I think that the scale foactor you're looking for should be |
Still need to add |
They are there. |
Currently the calculated intensity
_pd_calc_intensity_net
is for the sum of all phases. It has been suggested that seeing the calculated contribution of each phase would also be useful for plotting. The sketch of a solution involves adding a child data name ofphase_id
to the pd_proc category.The text was updated successfully, but these errors were encountered: