Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review units list for duplicates, errors #378

Open
amoeba opened this issue Nov 4, 2021 · 6 comments
Open

Review units list for duplicates, errors #378

amoeba opened this issue Nov 4, 2021 · 6 comments
Assignees
Labels

Comments

@amoeba
Copy link
Contributor

amoeba commented Nov 4, 2021

@earnaud over on ropensci/EML#330 reported seeing a unit with the id molePerKilogram twice in the units list but with slightly different attributes. You can see it's defined twice:

eml/eml-unitDictionary.xml

Lines 1623 to 1627 in fe77f8f

<unit id="molePerKilogram" name="molePerKilogram" unitType="amountOfSubstanceWeight"
parentSI="molePerKilogram" multiplierToSI="1" abbreviation="mol/kg"
udunitsSynonym="mole/kilogram">
<description>moles per kilogram</description>
</unit>

eml/eml-unitDictionary.xml

Lines 2017 to 2020 in fe77f8f

<unit id="molePerKilogram" name="molePerKilogram" unitType="" parentSI="molePerKilogram"
multiplierToSI="1" abbreviation="mol/kg" udunitsSynonym="mole/kilogram">
<description>micromoles per kilogram</description>
</unit>

The first time, it's grouped with <!--amountOfSubstanceWeight--> and the second time it's grouped with <!--amountPerMass-->.

It's also listed twice in the eml-unitTypeDefinitions.xsd file:

<xs:enumeration value="molePerKilogram"/>
<xs:enumeration value="molePerKilogram"/>

@mbjones, @mobb: Does this seem like a mistake to you too?

@earnaud also indicates there were other issues but I haven't figured out what those are just yet.

Once fixed, we need to issue a re-release of EML and emld. emld is where the schema files are shipped.

@amoeba amoeba added the bug label Nov 4, 2021
@amoeba amoeba self-assigned this Nov 4, 2021
@earnaud
Copy link

earnaud commented May 12, 2022

Hi,
I opened a full issue on ropensci/EML#343 and provided a unit file I worked to give EML more units.
I think I could serialize this table into XML format if required.

@mbjones
Copy link
Member

mbjones commented May 12, 2022

@amoeba Yes, I think the duplication is an issue, and bummer that we missed it. It appears to me there is little functional effect because they are defined the same way (with the exception of the use of the word "micromoles" in the second description). So, I think we should eliminate one of the two, which could go out in a patch release of the spec (e.g., 2.2.1).

@earnaud In terms of completeness, we have never thought the EML spec could define all units needed by researchers, but were striving to provide shared names for the most commonly used units, and a spec that allows additional units to be added as needed. Your ticket on how to get the R EML and emld packages to fold in udunits automatically I think is best handled there, rather than as part of the spec. If there are specific units that you think should be added to the spec in a new release because they are super common, then I think proposing those as feature requests here in the spec repo makes sense. But maybe getting "everything covered by udunits" should be more of a tooling issue rather than a spec issue. What do others think?

@mbjones
Copy link
Member

mbjones commented May 12, 2022

@amoeba I also see that the unitType is blank on a bunch of those "amountPerMass" fields, which is wrong -- it should be set to unitType="amountofSubstanceWeight". That unitType is poorly named, and I think it would be better names as amountPerMass, but it wasn't, so I'm not sure changing it makes sense now. @mobb, do you have any input on this situation and how we should move forward?

@mobb
Copy link
Contributor

mobb commented May 12, 2022

I confess I did not spend much time on the unitType field. I found it to be somewhat overloaded, combining features of quantity and dimensionality. Further, unitType did not seem to be widely used.

BTW, EDI has recently spun up a Units Working Group, to address the future for all the content of the (now retired) LTER Unit Registry. In particular, we would like to partner with a larger org dealing with units, and come up with a way to suggest new additions to that system, and to export from it in ways that are compatible with EML. We have just begun examining a group of systems (udunits among them) for certain features. Our WG does not have a web-presence yet - contact me if you're interested in joining this effort.

@mbjones
Copy link
Member

mbjones commented May 13, 2022

Thanks, @mobb. I'm interested, or maybe someone from our group at NCEAS might be.

Regarding unitType, it is a critical field that links the unit to a dimensional formula. For example, for amountOfSubstaneWeight:

eml/eml-unitDictionary.xml

Lines 205 to 209 in fe77f8f

<unitType id="amountOfSubstanceWeight" name="amountOfSubstanceWeight">
<!--molesPerKilogram-->
<dimension name="amount"/>
<dimension name="mass" power="-1"/>
</unitType>

Any two units that share the same unitType or have unitTypes with identical dimensionlity are in fact the same kind of measured quantity, and can therefore be converted losslessly between them using the multiplierToSI factor. While we generally only annotate with unit values, it is the unitType linkage that allows us to semantically group units and determine if they are from the same dimensional family. So it is used more behind the scenes in inferences about units and driving unit conversions. This is also what would allow us to automate the linkage to other unit vocabularies build from the NIST fundamental dimensions.

@earnaud
Copy link

earnaud commented May 13, 2022

Hi @mbjones ,

Indeed, I naively worked on the tables returned by EML::get_unitList() and didn't think to look how the function actualy worked. Therefore, I shall turn my table into an xml and review the units list with my users communities to assess which ones will be the most useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants