Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Use cases for the FractionState #49

Closed
lipchev opened this issue Apr 24, 2024 · 15 comments
Closed

[Discussion] Use cases for the FractionState #49

lipchev opened this issue Apr 24, 2024 · 15 comments

Comments

@lipchev
Copy link
Contributor

lipchev commented Apr 24, 2024

I'm curious about the use cases you have for the FractionState and weather you intend to extend or reduce the support for Improper (i.e. non-reduced/non-normalized) Fractions moving forward.

Here are some observations we can make:

  • there is currently no Improper option in the FractionState
  • if no support for an explicitly Improper state is required, the field could be replaced by a the equivalent private readonly bool _hasBeenNormalized (which, if necessary, could still be used to return the original State property)
  • most operations with fractions such as Multiply and Divide currently result in a Fraction with an IsNormalized state (regardless of the input state(s))
  • an app that only uses Normalized fractions can be thought of as using an upgraded version of the double type where as one that operates with Improper fractions is (IMO) similar to operating on an upgraded version of the decimal type
  • Operations with Normalized fractions are generally faster than the equivalent operations on Improper fractions
  • it is possible to represent the two use cases (Fraction, NormalizedFraction) using separate types having an implicit conversion between one another (without the need for any _state fields)

Here is an example of what I mean by having decimal-like operators:

Console.WriteLine(0.10m * 0.10m); // Outputs: "0.0100" which is effectively "100/10000" (as in 10/100 * 10/100)
Console.WriteLine(0.100m / 0.10m); // Outputs: "1.0" which is effectively "10/10" (as in 100/1000 / 10/100)
@danm-de
Copy link
Owner

danm-de commented Apr 28, 2024

The enum was originally intended (as a bit field) to signal states such as IsNormalized, NaN and Infinity.

  • Bit0 == 1 -> Normalized,
  • Bit0 == 0 -> Improper.

Since I didn't see any use case for NaN & Infinity at the time, it stuck with IsNormalized. Actually, I'm still annoyed today that 0d / 0d == double.NaN instead of DivideByZeroException. This has led to many incorrect program states in the past. (Fail early, fail fast) But that's another topic. However, if I had implemented it, these states would have been queryable with simple (and high-performance) bit checks (_state.HasFlag(FractionState.NaN))

I don't see any advantage in introducing two data types (one for normalized, one for non-normalized fractions). In fact, by default, users automatically generate normalized fractions. All subsequent operations on these fractions are automatically normalized (if not, it is a bug 🥲).
The user must explicitly call the special constructor to create a non-normalized fraction. Anyone who does this has thought this through carefully beforehand (A use case would be representation in an UI interface, i.e. the ToString() method).

The Fraction data type is a struct - and therefore should be readonly. Every operation (regardless of whether addition, subtraction, multiplication, division, etc.) creates a new instance of this data type - and reserves memory accordingly. Therefore, in my opinion it is completely legitimate for the results to be automatically reduced to the smallest possible numbers (--> normalized fraction). There is no definition of how a mathematical operation should be performed.

The only exception to this rule are negation and the reciprocal. The expectation as a user would simply have been to multiply by -1 or swap the numerator and denominator. I realize that this doesn't seem very consistent.

@danm-de
Copy link
Owner

danm-de commented Apr 29, 2024

I'll close this issue - further discussions can be done in #26

@danm-de danm-de closed this as completed Apr 29, 2024
@lipchev
Copy link
Contributor Author

lipchev commented May 5, 2024

* Operations with `Normalized` fractions are generally faster than the equivalent operations on `Improper` fractions

Upon further reflection, I realized that this may not be always the case. Consider this scenario:

  1. An analytical balance produces it's measurements with the same precision (let's say 5 digits).
  2. We want to Average() multiple measurements

Normalized fractions would each have their own denominator - adding them together would require calculating the GCD and doing the extra operations on the 4 BigIntegers, for each consecutive addition- with a good chance of getting the same Numerator/Denominator in the end.
For an "improper" (or what in this case I would probably call a "decimal") fraction we are always adding fractions with the same denominator, which would result in a pretty fast Sum() (on top of the faster construction phase).

@danm-de
Copy link
Owner

danm-de commented May 6, 2024

I would say it depends very much on the calculations that are done.

I can only speak from our experiences here: In general, the product performed better when we kept the numerators and denominators as small as possible. We have had good experiences with normalized fractions, particularly in serialization scenarios (saving to a DB) and conversion to decimal (for display in the user interface).

This of course applies to version <= 6.x.x.

@lipchev
Copy link
Contributor Author

lipchev commented May 6, 2024

Yes, the multiplications are definitely going to be the least performant operations in the case of the non-normalized fractions (even the divisions are probably going to be faster). However, if we follow the division rule as well, then that would actually bring the size of the BigIntegers down:
x * 10 / 10 -> would be Equal to x exactly, and would probably perform about the same

@lipchev
Copy link
Contributor Author

lipchev commented May 12, 2024

I don't see any advantage in introducing two data types (one for normalized, one for non-normalized fractions). In fact, by default, users automatically generate normalized fractions. All subsequent operations on these fractions are automatically normalized (if not, it is a bug 🥲). The user must explicitly call the special constructor to create a non-normalized fraction. Anyone who does this has thought this through carefully beforehand (A use case would be representation in an UI interface, i.e. the ToString() method).

Before I move on I just wanted to highlight this exception to the rule (works for both a+0 and 0+a):

  Name Value Type
  (Fraction.Zero + new Fraction(2,2, false)).State == FractionState.Unknown true bool

As for the performance benefit- I did a POC (without splitting the types) and the benchmarks indicate that as long as there is a rule in place that constrains the unlimited expansion of the BigIntegers, the results are much better without the normalization.
I'll post a more detailed explanation later but here is what I went with (note that this is the inverse of how the decimal does it):
1.00000g * 1000 = 1000.0mg and returning back to 1.00000 when dividing by 1000 (as if the user is switching the balance display from grams to milligrams).

Comparing the results using the BasicMathBenchmarks is a bit of a pain so I think I'll make a complete benchmark scenario- I had a look at the slides you showcased in #26 , I think I can take that as an example (calculating the duration based a random concentration, person weight and dosage). And then perhaps a little summary (e.g. total duration, total source required). I think if there is ever a scenario that would benefit from working with normalized fractions- I think the "hourly-stuff" has the best chance (still, I strongly doubt even this would be any faster).

@danm-de danm-de reopened this May 12, 2024
@danm-de
Copy link
Owner

danm-de commented May 12, 2024

Yes, I imagine there are many use cases where constant normalization degrades performance.

We had a specific problem back then: Drugs that are dosed in nanograms as a running rate on a syringe pump (e.g. 5 ng/kg/min) were problematic. When the application converted between volume flow rate (ml/h) or active ingredient dosage, the result was often an incredibly large fraction. Calculations with such large numbers were significantly slower (at least ~10 years ago) - and users noticed this as soon as the UI interface performed further calculations based on them.

But time goes on. I no longer work on the medication module and probably none of my colleagues have given such considerations to performance in recent years - once it's running, you don't question it anymore.

@danm-de
Copy link
Owner

danm-de commented May 12, 2024

Theoretically, you don't need two data types. When creating a Fraction data type, the user already specifies whether the fraction should be normalized or not (default is normalize: true). It would be a pretty big breaking change - but - we could change the mathematical operations so that they only perform normalization if both operants having State == FractionState.IsNormalized applied.

Assuming:

  • NF = Non-normalized (improper) fraction
  • F = normalized fraction
  • = mathematical operation having two operants (+, -, *, /, %)

We could introduce the following (breaking) change:

F ⊙ F = F
NF ⊙ F = NF
F ⊙ NF = NF
NF ⊙ NF = NF

There is a Reduce() function if you want to go back to normalized fractions.

@lipchev
Copy link
Contributor Author

lipchev commented May 12, 2024

Excellent, that's exactly how I modeled it in the POC.
However, you should note that this could be a bigger breaking change than anticipated- as if the original code assumed that value * new Fraction(1234, 1000, false) would return normalized (which btw isn't documented anyway) then the following Equals call may be a surprise.
However if we didn't do it that way, we wouldn't be able to benefit from NF * 2m (as an implicit operation resulting in NF).

Right, those are the exact use cases I'm considering (and trying to optimize for)- in UnitsNet currently when doing any type of math operations with quantities such as Mass/Volume/MassConcentration etc. often require an intermediate conversion to a base unit and back (e.g. "1mg + 1ng" would have to go via the (SI) base unit "g") - so for most quantities (except for the Freedom Units) we are constantly doing multiplications and divisions by a factor of 10.
I was previously convinced that I "normalized" = "smaller" = "faster" and wouldn't have even exposed the concept of "non-normalized" fractions. But now I'm actually considering the opposite (or at least have it by default).
The "smaller" = "faster" is in fact only true when we're talking about BigIntgeres larger than int.MaxValue - which is pretty close to the limit so I'm not sure whether this would not become a problem, but frankly it's difficult for me to tell- without actually having a way to test it out.
I am almost done creating the benchmark scenario- I haven't used any extreme conversions yet, but once I'm done, and have replicated the scenario with UnitsNet- that would be much easier to model.

I'll just merge my branches and show you what i got

@danm-de
Copy link
Owner

danm-de commented May 12, 2024

Thanks. However, I must apologize in advance. I'm currently busy with other topics both professionally and privately and travel a lot - so I won't always be able to respond promptly.

I assume we'll have to wait a bit until 8.0.0 is released. The above would be a big (but meaningful) breaking change

@lipchev
Copy link
Contributor Author

lipchev commented May 12, 2024

Yes, I don't think we'll manage this weekend 😄 . I'll also be absent for most of the week, but will try to iron out most of what's left until I leave on Tuesday.
If you don't get the time to review #65 , and I get bored next weekend, I might try out the two-type version, I have a feeling it will make everything fit very nicely together (without breaking everyone's code).

@lipchev
Copy link
Contributor Author

lipchev commented Jun 2, 2024

Given that #65 is now merged, I think the only thing missing from this feature is a factory method that exposes the private constructor call:

public static Fraction FromTerms (BigInteger numerator, BigInteger denominator) => new Fraction(false, numerator, denominator);

.. is what I was thinking, but not sure where the word nonreduced should go (it's currently only on the invisible xml-comment 😄 ).

I've already mentioned the performance improvements that I worked over the last weekend (I think it was), but I haven't had time to clean up and test it against the slightly-less-optimized-but-much-simpler implementation that could be derived from it (which I haven't had the time for either). I've also update the readme.md of the benchmarks, but that's not very important. Anyways, these are not stoppers and could be released later (there should be no breaking changes, unless I've messed up somewhere).

As for advertising the trailing zero and the decimal shifting logic- I'd personally hold off on that, In the benchmarks section I've only mentioned the performance aspects.
I'll only know if the logic makes actual sense once I've had the change to test it with some actual quantities (which I won't be able to do until end of next week). The fact that Fraction.Pow(x, 2) is not the same as x * x is worrying me quite a bit...
I wish I could have made it work in the opposite direction (i.e. reduce on division), but the tests just kept failing. Finally, as I was doing the optimized CompareTo/Equals implementation, it occurred to me that I never actually tested this type of approach for the Divide operator, perhaps that's right way of achieving the correct shift (I do have a branch with the tests inverted, and actually testing it might have proven easier than writing this whole blog post.. but yeah, I'm under no-time-for-coding restrictions right now..).

Anyway, as long as we don't specify what-shifting-occurs-when (or at least mention it as experimental) I think there are 3 ways this could go:

  1. I manage to make it work with the new idea - we move the reductions from the Multiply to the Divide
  2. I test it out and it turns out the behavior is inconsistent/not useful: we scratch the two reduction operations from the Multiply and let it rip..
  3. Well.. let's call this the happy path (where we start thinking about splitting the types) 😄

@lipchev
Copy link
Contributor Author

lipchev commented Jun 10, 2024

1. I manage to make it work with the `new idea` - we move the reductions from the `Multiply` to the `Divide`

Tried it, it didn't work.

2. I test it out and it turns out the behavior is inconsistent/not useful: we scratch the two reduction operations from the `Multiply` and _**let it rip..**_

Tried a single method for Multiply / Divide that does some reduction (using the BigInteger.DivRem method instead of the GCD) - it was faster on some of the extreme cases, but slower overall.

I think it's better if we removed the reduction-stuff altogether. I don't like that it's asymmetric, the implementation of ToDecimalWithTrailingZeros is probably quite slow, and overall I'm not convinced that it's very useful / worth the performance penalty.

@lipchev
Copy link
Contributor Author

lipchev commented Jun 11, 2024

I think it's better if we removed the reduction-stuff altogether. I don't like that it's asymmetric, the implementation of ToDecimalWithTrailingZeros is probably quite slow, and overall I'm not convinced that it's very useful / worth the performance penalty.

Lol, I can't believe I didn't realize this earlier- of course they should be asymmetric, just not the way we have them setup ATM:

  • 1.234 * 5.678 or (1234 * 5678) / (1000 * 1000) is very unlikely to get reduced in any significant way.
  • 1.234 / 5.678 or (1234 / 5678) * (1000 / 1000) is a whole different story- in which, I'd argue, it only makes sense to reduce the denominators

Of course, there are the benchmarks testing the multiplication of numbers such as 1.234e-10 * 5.678e+24 but I think these are only effective some of the time (the hourly-stuff probably goes into that category as well).

@danm-de
Copy link
Owner

danm-de commented Jul 1, 2024

I think this discussion is resolved with pull request #72. I'll close the thread.

@danm-de danm-de closed this as completed Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants