Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Unit parsing #8

Open
dmcclean opened this Issue · 7 comments

2 participants

@dmcclean

Following on to #7, the flip side to pretty-printing is parsing.

Unfortunately the SI unit notation and names have several ambiguities, documented (perhaps not extensively) by this article.

Nevertheless, we need not let that discourage us. We can either pick conventions, or report the ambiguities if the arise, or disambiguate them because we know what dimension we were expecting the user to enter in that field, or some combination of those strategies.

@bjornbm
Owner

If the expected physical dimensions are known the unambiguities in the article do not arise (I believe)?

If necessary we could return a list of possible parses, so that the program which decide which to use or ask the user to disambiguate.

@dmcclean

Those are both good thoughts.

It may be that the ambiguities disappear entirely if you know the expected dimension. Although it might be tough to prove that exhaustively, especially because multiple ambiguities in the same expression might potentially interact.

Returning a list of parses is great for my purposes. Also possibly good for a quasi-quote, since it could generate an error if the unit was ambiguous.

I've never worked on a parser for a language with this kind of ambiguity, I will need to google it a bit. I think my strategy might be to bake the list of prefixes into the parser, but to use a map that has the definitions of the actual units in play.

I'm not sure whether to try supporting concatenation-as-multiplication or to require spaces. Got any thoughts on that one?

@bjornbm
Owner

In your situation I would start by requiring spaces for the same of simplicity. Concatenation could be a nice to have but can come later, or perhaps not at all if at the cost of ambiguity. I am assuming no one else is levying requirements on you.

@dmcclean

I'm going to let this one sit. It turns out I can get everything I really need from a drop down list.

It potentially might be nice for the dimensional matrix quasiquote, but we'll see.

@bjornbm
Owner

Did you see this other guy's take on a unit parsing, announced on the Haskell mailing list recently?

I haven't looked into it much, but if it is good an easy solution could be along the lines of converting from his data type to your AnyQuantity and then promote to a full Dimensional.

@bjornbm
Owner

Another unambiguous representation is The Unified Code for Units of Measure:

The Unified Code for Units of Measure is a code system intended to include all units of measures being contemporarily used in international science, engineering, and business. The purpose is to facilitate unambiguous electronic communication of quantities together with their units. The focus is on electronic communication, as opposed to communication between humans.

There is also the Metric Interchange Format.

@dmcclean

I like those. I think the Unified Code for Units of Measure might be the best fit.

I did happen to see the Haskell mailing list announcement you mention, but I didn't follow the link. That is interesting as it pertains to parsing, though I think that it will be better to follow a published standard.

My only problem with the Metric Interchange Format is that my problem domain of aviation is full of various legacy units. Certainly it might be nice to have parsers for both, and output to the Metric Interchange Format.

The Metric Interchange Format's decision to allow fractional exponents so that they could express noise densities in the usual way strikes me as a bit suspect and couldn't really be supported, but a note to that effect in the documentation should be enough for most people I would think?

Similarly the UCUM's decision to allow arbitrary units for counting things doesn't square well with dimensional...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.