New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding documentation for how to deal with log-units #283
Comments
thanks, Bryce. maybe someone could mine the EML to see what people are already doing with log units. |
Good idea! This would take a bit of work but is fairly doable. A good stepping off point would be https://cn.dataone.org/cn/v2/query/solr/?q=attribute:*log*&fl=attribute |
Thanks for the query @amoeba . bummer that we don't have the semantics in place to remove the attrs that are about trees. But to be perfectly correct, it seems to be (expressing values that are the result of log transforms) that they are dimensionless: But I think this recommendation fits what we see in environmental data: So we should state that the log (or ln) is dimensionless, but the attribute description can state the original unit, which no longer have meaning - because you can't subtract or add the numbers as you originally would have. |
👍 |
Hi,
I think there are some interesting points being discussed here, and I'm
trying to straighten this out in my head...
For me:
1. "Dimension" refers to the type of the (typically, physical) variable
of interest-- e.g. Mass, Length, Time, etc
2. "Measure" or "Measurement" refers to the *quantification* of a
variable of interest, that presumably is_of some "Dimension" (although this
can get murky once we depart from basic physical variables)
3. "Units" are defined to serve as standards for comparability of
"Measurements" within some "Dimension"
4. A "Measurement" becomes comparable with other "Measurements" when its
measured "Value" is expressed as a ratio to some fundamental, standard
"Unit", e.g. a "Meter" (thus, saying something is "3.14 meters" is really
like saying it is "3.14 times the length of the 1 meter standard- leaving
aside for now quantum physics and speed of light issues relative to
quantifying a meter)
5. Thus, the Dimension of a Measurement is not changed simply because
the scale for expression of its Value has been "Transformed"
algorithmically (e.g. logarithmic transform). And in this case that
transformation is reversible. The"alteration/transformation" was done on
the "Value"-- and does not impact the "Dimension" (e.g. if Meter becomes
LogMeter, Dimension remains Length)
6. The Unit, however, must be restated to provide proper interpretation
of the associated Value. (hence, a log transform on measurements expressed
in meters would have Units of LogMeter, or maybe we need to have some
"transformation Units"-- for the log/ln, trigonometric, hyperbolic, and
other potential transforms, as these transformations can be applied to most
any Unit, notwithstanding issues with ZERO or negative numbers.
This is what makes greatest sense to me. The logarithm is of the "Value"
of the "Length", not the "Length" Dimension itself.
So the statement that "values are dimensionless" is true, but values are
associated with Measurements quantifying some Dimension, that remains
unchanged.
The fundamental nature (Dimension) of a variable of interest would not
change simply because its associated measured values are transformed,
whether linear (e.g. Meters -> Feet); or non-linear (Meters -> LogMeters).
I also wanted to reiterate that we should not confuse "Dimensions" in this
context, with the use of "Dimensional Analysis" to cancel out units through
some division or multiplication process, as (and we've often discussed this
in the past) we often need to preserve the specific identity (type) of the
variable of interest in many cases. Time is a special case that often can
be cancelled out.
To circle back, I think the Dimension of LogMeter should be "Length" (my
point #6 above)
cheers,
Mark
…On Tue, Aug 14, 2018 at 1:02 PM, Bryce Mecum ***@***.***> wrote:
So we should state that the log (or ln) is dimensionless, but the
attribute description can state the original unit, which no longer have
meaning - because you can't subtract or add the numbers as you originally
would have.
👍
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#283 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AE61-VqXbci5_NxXWdAY-onIzLtjzgcxks5uQyzagaJpZM4R6BlT>
.
|
Tried to summarize the slack discussion. something like this, for the EML documentation (feel free to edit): If an attribute is a log transform, it can be unitless ("dimensionless" is a standardUnit in EML). If it is useful to include a version of the original unit for labeling, the customUnit can reflects the original dimensions, e.g., "logMeter", or "lnPa". However, the definition for a customUnit for a transformed value (in STMML) should state that it's relation to a parent is through an inverse transformation, and describe the transform, e.g., exp(x); STMML assumes simple arithmetic. |
This sounds good to me, though we should consider two things:
1. "dimensionless" should not remain a standardUnit in EML, as a value
can be "unit-less" (e.g. Box-Cox), but still represent a "dimension" (e.g.
Mass, Length). I recommend we revise the name of "dimensionless" to
"unit-less", to preserve the important distinction between Dimension and
Unit
2. We should remember there are some other common data transformations
aside from Log/Ln, including (primarily) SqRt, CubRt, Arcsine, Reciprocal,
Box-Cox, and Regression. So we might want to develop a general method to
accommodate such cases.
…On Tue, Aug 14, 2018 at 5:19 PM, mobb ***@***.***> wrote:
Tried to summarize the slack discussion. something like this, for the EML
documentation (feel free to edit):
If an attribute is a log transform, it can be unitless ("dimensionless" is
a standardUnit in EML). If it is useful to include a version of the
original unit for labeling, the customUnit can reflects the original
dimensions, e.g., "logMeter", or "lnPa". However, the definition for a
customUnit for a transformed value (in STMML) should state that it's
relation to a parent is through an inverse transformation, and describe the
transform, e.g., exp(x); STMML assumes simple arithmetic.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#283 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AE61-U9LVxxUc7dTcQmNaMEJSQ4iMFDeks5uQ2kjgaJpZM4R6BlT>
.
|
Down-voting my own comment, above. Trying to cram all this into a single EML "unit" is a bad idea. Logs are dimensionless by definition, and a unit implies that certain operations can be performed, which is misleading. A better recommendation for describing a log measurement will be to use the annotation field. |
comment from @mpsaloha regarding how to handle Units for TRANSFORMED DATA: Interpretation of Units or Dimensions can be problematic after data are transformed for statistical purposes. Some transformations can be completely reversed to re-derive original values, although caution must be exercised if constants or other adjustments were made to the data beforehand. EML should recommend a convention for expressing transformed attribute values, e.g. Examples: Transforms to consider for providing standardized prefixes in EML include: Construction of an EML customized unit, as proposed above, should not be taken to indicate that the "original unit" is still associated with the transformed value. Rather, it indicates what that original unit was, for improved evaluation of data for re-use, as well as the potential for implementing a reverse transformation to re-derive the original data (although this should be done cautiously). |
@mpsaloha -
Content is in this file (second paragraph, section starting approx line 70):
If you want, put the text here and I'll add it, since I have that file out now. |
While I am fine with clarifying the math behind the use of logs, sin, exp, and other transcendental functions, I would like us to be clear that it is not possible mathematically to take the log of a dimensioned quantity with units. The idea of a "log meter" is nonsensical mathematically. Rather, people often use a shorthand that assumes the arguments to transcendental functions have first been made dimensionless before the function is evaluated. There are numerous explanations of this on the web. Here are a couple of decent ones, the first of which is the most comprehensive, and points out that several popular internet sites like Wikipedia have promulgated mistakes in some of the math, including the use of the Taylor expansion as justification one way or the other:
The math stack exchange site also trots out some of these erroneous explanations. A simple and intuitive way to show that log(10 grams) is nonsensical is to see what happens to it when expanding it. Take the definition of the log function (using base 10 log as an example, but its true for all bases):
From the paper linked above, then to calculate log(gram) one must ask yourself "what is the exponent y (a number) to which one should raise the base b, that will yield gram(s)?" There is no such number, as The way textbooks get away with using dimensioned numbers as arguments to transcendental functions is to (implicitly) divide by a reference constant first (e.g., So, if people want to make a new STMML definition for |
Hi Matt,
I agree with your point, and hopefully it is completely clear that I don't
think anybody (much less me) in this discussion has been advocating that a
log-transformed value "retains its original unit". And I think we also
agree that logarithmic values in general can have units and dimensions,
e.g. decibels, pH, and astronomical magnitude do...And those all involved
logarithmic transformations of some measured physical quantity, that is
supposed to make them unit-less according to some folks-- but apparently we
can then usefully "invent" Unit names to associate with values of
log-transformed data of specific types; aha precedents!
You suggest:
====
So, if people want to make a new STMML definition for logmeter in EML as
another name for dimensionless and that has unitType=dimensionless then
that is fine. It would clarify that the original unit was meter. But let's
not imply that the value of a transcendental function has a unit. It is a
pure number, and does not have units.
====
I think you are suggesting as a solution
unitType=logmeter
and
unitType=logmeter === unitType=dimensionless
I guess that will work since the key thing I am concerned about is knowing
those original Units, and it seems you are okay with that. I think you are
betraying some of the mathematical arguments you cite, however-- e.g. Matta
et al. or the stackexchange advocates for "dimensionless-ness" of
log-transformed data. Once you lose those Units through the dimensional
analysis necessary to "permit" taking logarithms (LOG FUNCTIONS MUST HAVE
UNIT-LESS ARGUMENTS!!), aren't they "gone"? :-)
I prefer, however, the syntax of *Log[meter]* rather than "*logmeter*", as
the latter seems to have stronger connotations that it is, well, referring
to a chimerical "log-meter"...
Use of brackets also more clearly separates the name of the transformation
from the original unit in which the data were represented.
Finally, I am still concerned about our synonymizing '*unitless*" with '
*dimensionless*'. I don't think these are the same thing. "Dimensions"
describe the physical variable measured. Thus, while log-transformed
measurements of wing-length might be unit-less, I would argue they retain
their dimension of "length". If it is possible to revise EML to
accommodate this distinction, I think it would be well advised.
cheers,
Mark
…On Sun, Aug 19, 2018 at 2:39 PM, Matt Jones ***@***.***> wrote:
While I am fine with clarifying the math behind the use of logs, sin, exp,
and other transcendental functions, I would like us to be clear that it is
not possible mathematically to take the log of a dimensioned quantity with
units. The idea of a "log meter" is nonsensical mathematically. Rather,
people often use a shorthand that assumes the arguments to transcendental
functions have first been made dimensionless before the function is
evaluated. There are numerous explanations of this on the web. Here are a
couple of decent ones, the first of which is the most comprehensive, and
points out that several popular internet sites like Wikipedia have
promulgated mistakes in some of the math, including the use of the Taylor
expansion as justification one way or the other:
- https://pubs.acs.org/doi/pdf/10.1021/ed1000476
- http://math.ucr.edu/home/baez/physics/General/logs.html
- https://math.stackexchange.com/a/238404
The math stack exchange site also trots out some of these erroneous
explanations. A simple and intuitive way to show that log(10 grams) is
nonsensical is to see what happens to it when expanding it. Take the
definition of the log function (using base 10 log as an example, but its
true for all bases): y = log(x) if x = 10^y. Then examine the following
expansion:
log(10 grams) = log(10 * 1 gram)
= log(10) + log(gram)
= 1 + log(gram)
From the paper linked above, then to calculate log(gram) one must ask
yourself "what is the exponent y (a number) to which one should raise the
base b, that will yield gram(s)?" There is no such number, as gram is not
a number.
The way textbooks get away with using dimensioned numbers as arguments to
transcendental functions is to (implicitly) divide by a reference constant
first (e.g., ln(3 m) is really ln(3 m/1 m) to make the units cancel,
which is ln(3). All arguments to transcendental functions must be
dimensionless numbers, even though sometimes people don't make that
explicit.
So, if people want to make a new STMML definition for logmeter in EML as
another name for dimensionless and that has unitType=dimensionless then
that is fine. It would clarify that the original unit was meter. But let's
not imply that the value of a transcendental function has a unit. It is a
pure number, and does not have units.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#283 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AE61-Y42zWe6Ctg2jUurtPIavLcNHUngks5uSdsMgaJpZM4R6BlT>
.
|
You wrote:
This is where we might diverge. My reading has lead me to think that log-transformed values are indeed dimensionless -- e.g., they no lo longer represent a 'length' -- and rather now represent a pure numerical value. Here's the relevant quote from the Massa et al paper:
And also you wrote:
I think |
I believe that the logarithm of length in meters would technically be considered a level measurement, that is, of type "level" or "level difference" rather than of type "length" or of type "dimensionless". |
Hi Matt and Carl, Carl-- thanks for finding that. But did you notice that in the link you provided they refer to decibels under "Units of Level"? Matt-- yes, I read the Matta et al. paper and liked their argument for dismissal of the Taylor expansion as "proof" of dimensionlessness in the case of logarithms, but noted that they also never mentioned the issue of inverse transformations in the case of logarithms-- which is a common use case. So, we don't see eye-to-eye on several things here: in general, what a "dimension" represents as opposed to a "unit"-- I don't think "dimensionless" is the same as "unitless"-- while a measurement value with its unit allows us to infer its dimension, the reverse is not true (a measured value with its dimension does not allow us to infer its unit-- as we well know from under-specified metadata! "Body weight of 5": dimension of Mass; units of ??) that if one log-transforms a set of wing-lengths (e.g. measured in cm) it becomes a pure-number, so the inverse transform of those pure-numbers are also pure-numbers (i.e. dimension (length) and unit (cm) of those measurements are irretrievably lost. Note that analysts routinely re-derive original values and their associated units from statistically transformed variables-- how is this defendable if log-transforms are "pure numbers"?) that pH, dB and other logarithmic scaled measurements are unitless. For example, I'd assert that 10 is unitless, but that 10dB has a unit of decibel, which is a measurement of the log ratio of amplitude of two "sounds" (air pressure levels) or other energy sources. If you want to call 'dB' (as an example) something other than a unit, maybe we need to invent a new category-- "unitless standard" for these standard names for interpreting and comparing quantitative values along some scale (which coincidentally is the primary function of those thingies we call "units"). So, regardless of what we call these, I think retaining them somewhere in the metadata, rather than letting them drift away in pure number bliss. Also, we are promoting different notions of "dimensionlessness"-- yours having more to do with dimensional analysis, and mine more regarding semantics. E.g. if one has 100Kg of antelopes per 5Kg of Lions, I'd say the dimensions are "Mass"; whereas you (I think) would say this ratio is dimensionless. |
@mpsaloha Yeah, decibels are a particularly interesting case. Apparently decibel is technically the log ratio of any measurement, so arguably the 'units' of logarithm of length could be decibels! Wikipedia suggests the convention is to put the unit following decibels, so decibels of log voltage would be dBV. (ironically dBm apparently refers to log base of milliwatts, sorry meters). Apparently the SI standard opposes this convention. To make this more confusing, decibels are defined differently for power-type units and "field" (now called "root-power") type measurements, where it is typical to square the values before taking the ratio (equivalently, multiplying the log by 2), see: https://en.wikipedia.org/wiki/Field,_power,_and_root-power_quantities). so decibel-meters, anyone? Not sure I'm helping. pH is a little cleaner as technically it's already defined as the log of H+ activity, which is already defined as a dimensionless measure, so the use of logs does not imply the need for a reference scale. There is some argument that these log-scaled units are quantities we tend to think of in percentage/multiplicative terms anyway, and measure in log-scale units.... |
Hi Carl,
Yes, these issues aren't trivial, but sometimes I feel like we are dealing
more here with Zeno's Achilles and Tortoise paradox rather than anything
else. Alternatively, are we flogging a dead something, and simply not
reaching consensus on what it should be called? :-)
Do we at least agree that qualifying log-transformed values (must be "pure
numbers"!)-- as "decibels", "dB" or "dBV, or "pH" (certainly in common
understanding at least a scale if not a reference scale?!)-- with these
standard "unit-like suffices", can enrich our interpretation of log-based,
dimensionless, "pure numbers"?
(I'm not going to mention again the thought experiment about how an inverse
transform on log-transformed data could enable us to "regain" original
units from allegedly "pure numbers", but only if we somehow preserve the
information about those original units)
By the way, I subsequently discussed this issue with my brother-in-law who
is a math professor, and he suggested, after paragraphs about derivatives
and various scenarios involving transcendental functions, that this was
more a "kind of a philosophical problem" rather than a call for
mathematical purity. Admittedly, he is a low dimensional topologist, so
this area is not his expertise-- but he has taught advanced calculus for 30
years. He suggested the view that "units" are useful by convention, and
that we might consider adopting some (notational) convention ourselves, and
explaining it well. Which is more or less along the line of what I've been
advocating- rather than LOSING invaluable information about those raw
values being quantified in {meters, Kilograms, Counts, etc} because one can
only take their logarithms after their units are dimensionally cancelled
out and "lost"(?) to yield a dimensionless number. This approach does
call for care from the "data re-user"determining how those values can be
algebraically combined with other measurements. But there are also some
common, straightforward use cases (e.g. again, the possibility of
reacquiring units from an inverse transform on a log value).
So even if we agree that these buggers might have "log-scale units", I
think in many cases it will also be useful to know the original units on
the measured value that was log-transformed. I've suggested some ways to
do that -- e.g. Log[Kg] seems to be a standard syntax for communicating
relevant information in a graphical axis label that 1) the values are
log-transformed and 2) the original measurements were taken in Kg. I think
this is quite a common and highly interpretable way of representing the
nature of the data values for this particular use case.
Well, sometimes it is useful to repeat arguments in different ways to
hopefully add clarity or come closer to consensus. At this point, though,
I'm not sure what or who will determine our way forward, although Matt,
you, and Margaret have certainly done the lion's share of work on the EML
revision.
thanks!
Mark
…On Tue, Aug 21, 2018 at 8:01 PM, Carl Boettiger ***@***.***> wrote:
@mpsaloha <https://github.com/mpsaloha> Yeah, decibels are a particularly
interesting case. Apparently decibel is technically the log ratio of any
measurement, so arguably the 'units' of logarithm of length could be
decibels! Wikipedia suggests <https://en.wikipedia.org/wiki/Decibel> the
convention is to put the unit following decibels, so decibels of log
voltage would be dBV. (ironically dBm apparently refers to log base of
milliwatts, sorry meters). Apparently the SI standard opposes
<https://en.wikipedia.org/wiki/Decibel#Suffixes_and_reference_values>
this convention.
To make this more confusing, decibels are defined differently for
power-type units and "field" (now called "root-power") type measurements,
where it is typical to square the values before taking the ratio
(equivalently, multiplying the log by 2), see:
https://en.wikipedia.org/wiki/Field,_power,_and_root-power_quantities).
so decibel-meters, anyone?
Not sure I'm helping. pH is a little cleaner as technically it's already
defined as the log of H+ activity, which is already defined as a
dimensionless measure, so the use of logs does not imply the need for a
reference scale.
There is some argument that these log-scaled units are quantities we tend
to think of in percentage/multiplicative terms anyway, and measure in
log-scale units....
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#283 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AE61-fPxW8-XQf6-fuPBjChLz68hzJOjks5uTMmIgaJpZM4R6BlT>
.
|
Summary from @mpsaloha via email: It's the 'somehow' that we want to explain in the EML documentation.
|
Mark mentioned this thread to me --- so here are my 2 cents: I do not agree that a log transform of a number removes neither its associated unit nor dimension. If the number is a number of something, the log of this number is still of something.
It is important since you can invert the transformation and get the original number (of something or not) back. So the unit is still the same after a log transform, but we need to find a way to save the information that the stored values in the data file are in a log scale. |
@brunj7 Your "equation" commits the fundamental mistakes that are outlined in the Matta et al. paper (https://pubs.acs.org/doi/pdf/10.1021/ed1000476) that I linked to in my comment above. I suggest that a deep read and understanding of that paper is required before we can make headway on this issue. I propose that we remove this issue from the EML 2.2 release given that we have not reached consensus in the last year and a half on the issue. I will bump this issue to the 3.0.0 milestone unless others object and can show a mechanism for consensus to be very quickly reached. |
related to #323 |
This came up on Slack today, what do we fill in when an attribute is a log-transform of another? The consensus was to create a custom unit called
log{unit}
, e.g.,meter
->logmeter
that is dimensionless. I couldn't find any guidance on this in the docs so I thought we might want to add a note or two so at least a Ctrl+F for "log" or "transform" would pop something up for the curious user.The text was updated successfully, but these errors were encountered: