Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the _type.source of SU-related data items? #434

Closed
rowlesmr opened this issue Jun 27, 2023 · 10 comments · Fixed by #447
Closed

What is the _type.source of SU-related data items? #434

rowlesmr opened this issue Jun 27, 2023 · 10 comments · Fixed by #447

Comments

@rowlesmr
Copy link
Collaborator

From #431 and continued in #433

I started replacing _type.source Related in many SU items with type sources matching that of the linked data item, as there seemed to be a mixture of different source types, and also because the general_su save frame gives everything _type.source Derived.

Then:
@vaitkus

Also, I think that the Related purpose might have been intentionally set for the SU items to show that they are related to the measurand data items. Not a 100% sure about that, but we should investigate a bit before making that change.

@jamesrhester

Because the link from SU to the item it is the su of is specified via _name.linked_item_id, the _type.source is Related instead of derived. I note that this is not actually stated anywhere in ddl.dic, so perhaps we should rethink that or improve the Related description.

I am not qualified to properly parse the description of Related, but it does seem to only refer to looped lists.

# just FYI
         Related
;
         A value or tag used in the construction of looped lists of data.
         Typically identifying an item whose unique value is the reference key
         for a loop category and/or an item which has values in common with
         those of another loop category and is considered a Link between these
         lists.
;
@jamesrhester
Copy link
Contributor

I think we can decide on a rule for this here. Blanket "Derived" would seem to be good, as we either calculate SU from multiple observations, by propagation of errors, or from the square root in the case of counting statistics. In weird cases where SU is in some sense measured we can have custom values in that definition block.

@vaitkus
Copy link
Collaborator

vaitkus commented Jun 28, 2023

From the last comment in PR #433 ([1]) I understood that we the SU items should have the same _type.source as the linked measurand item. How strongly do we want to enforce this?

I can easily update the GitHub actions to check for this (~ 2 additional lines of code). I did a trial run of this check and noticed that there are 26 Recorded SU items that import the general_su save frame which then sets their _type.source to Derived. There are several alternative workarounds for this:

  1. Rewrite those definitions to not use the general_su import by putting everything directly in the dictionary.
  2. Introduce a new importable save frame (e.g. recorded_su) which has the Recorded type source, and use it in these definitions.
  3. Explicitly specify the _type.source definition in the dictionary and import general_su with the on-duplicate-ignore flag. This would result in the _type.source from temp_attr.dic being ignored.
  4. Remove the _type.source attribute from general_su altogether and explicitly specify this property in each SU definition.

Does any of these approaches look suitable?

[1] #433 (comment)

@rowlesmr
Copy link
Collaborator Author

rowlesmr commented Jun 29, 2023

I think we can decide on a rule for this here. Blanket "Derived" would seem to be good, as we either calculate SU from multiple observations, by propagation of errors, or from the square root in the case of counting statistics. In weird cases where SU is in some sense measured we can have custom values in that definition block.

Wouldn't an SU of something Recorded also be Recorded?

I think @jamesrhester 's remarks can be interpreted as "choose Derived, unless it's Recorded".

In that case, we can have a general_su, and a recorded_su, for Single instances, and write everything out for when it isn't. Which I think looks like Option 2.

@jamesrhester
Copy link
Contributor

I agree that the SU of a Derived item is Derived. But in what sense is the SU of a Recorded also Recorded? Even if the quantity itself is observed, the SU is only (approximately) observed if multiple measurements are taken. In the case of counting statistics, the SU is Derived even if the counts are Recorded.

With that exception, as Derived means "derived from the values of other data names", and the parent Recorded data name is by definition not derived from any other data names (so the SU couldn't be, e.g. if the parent is an average), we are left with Related in order to completely avoid committing to one or the other. Or to create a new one, SU, to state that this is the SU of another data name and dodge the issue another way.

Currently there seem to be no actual practical implications that I can see from choosing Derived or Recorded, so we are to some extent pondering angels on the head of a pin.

@vaitkus
Copy link
Collaborator

vaitkus commented Jun 29, 2023

I think Related for all SU items might indeed be the way to go because:

  • It pretty much states that the interpretation of SU largely depends on the linked Measurand item.
  • The SU definition contains the _name.linked_item_id data item which is also used for linking related key items.
  • It is simple.

So currently, the systems of cross-referencing loops and linking SU to measurands are already closely linked. If we ever decide to properly separate them (see PR), then more effort can be put into defining the source of the SU item (e.g. by adding the SU source).

I do agree that the current description of the Related origin does not properly mention SU values, but this can easily be changed since that would be an extension. Currently there is a different PR (#438) aimed at clarifying the semantics of _type.source, so we might want to wait until that one is merged.

Edit: added the number of the relevant PR #438.

@rowlesmr
Copy link
Collaborator Author

Currently there is a different PR aimed at clarifying the semantics of _type.source, so we might want to wait until that one is merged.

#433? That can be ignored, as it is directly affected by this conversation. If it is Related, that PR just gets rejected.

But in what sense is the SU of a Recorded also Recorded?

I had it in my mind that Recorded meant "figured out before the experiment", probably prompted by my thinking about cell prms being Recorded and (previously) in EXPTL.

@vaitkus
Copy link
Collaborator

vaitkus commented Jun 29, 2023

#433? That can be ignored, as it is directly affected by this conversation. If it is Related, that PR just gets rejected.

Sorry, I somehow forgot to include the PR number (#438). I will also update my comment accordingly.

@rowlesmr
Copy link
Collaborator Author

rowlesmr commented Jul 1, 2023

From comments here and #438 and #433, I think it looks like we're leaning towards either Derived, or Related with an extension of the definition to encompas SUs.

Related would mean (imho) "This value means whatever it means in the context of the value of which it is the SU", and Derived means "this value was figured out somehow from an analysis of the value of which it is the SU.".

I think Related is the most general way to define them.

jamesrhester added a commit to rowlesmr/cif_core that referenced this issue Jul 5, 2023
As per discussion in COMCIFS#434.
@vaitkus
Copy link
Collaborator

vaitkus commented Jul 8, 2023

@jamesrhester, so what is the final verdict on this topic? Do we:
a) Require all SU items to have the Related source.
b) Require all SU items to have the Derived source.
c) Leave things as is and allow the users to choose which ever options seems most relevant for the given case.
d) Something else.

@jamesrhester
Copy link
Contributor

I think we had a consensus that (a) was appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants