Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of "@@" syntax (was: Use of float numbers in Amount) #194

Open
blais opened this issue Sep 16, 2017 · 27 comments
Open

Use of "@@" syntax (was: Use of float numbers in Amount) #194

blais opened this issue Sep 16, 2017 · 27 comments
Labels
BUG This is a bug, not a feature request; something isn't working. core-ops Component: Core data structures and stream ops P2 Low priority, addressing would result in useful improvements

Comments

@blais
Copy link
Member

blais commented Sep 16, 2017

Original report by Zhuoyun Wei (Bitbucket: wzyboy, GitHub: wzyboy).


Hi,

I have always been bugged by the fact that in Fava, in almost all
transactions involving currency conversions (that is, every time I pay
with international credit cards), very long fractional
numbers are displayed in postings.

I always thought it's something about DisplayContext and did not dig
into the reasons. Today I am playing with BQL and finally it occurs to
me that float numbers, instead of Decimals, are used somewhere
internally in Beancount.

Observe the float round-off errors:

beancount> select date, narration, position, weight where account ~ "Expenses:" and narration ~ "Amazon web services";
   date         narration      position                weight
---------- ------------------- --------- ----------------------------------
[...]
2016-07-04 Amazon web services 12.06 USD  80.53000000000000000000000001 CNY
[...]
2017-05-05 Amazon web services 13.04 USD  90.13999999999999999999999999 CNY

I can see that in Beancount's docs, Martin always uses "@" syntax to
record currency conversions, like "10 USD @ 1.3 CAD". I, however,
always uses "@@" syntax to record conversions, like "10 USD @@ 65 CNY".
Because on my bank statements there are no currency exchange rates, and
only the numbers in foreign currency and local currency are provided.

Here is how the two transactions in the bean-query examples above look
like in the Beancount input files:

2016-07-04 * "Amazon web services"
  Liabilities:CMB:CreditCards                 -80.53 CNY
  Expenses:Communications:Internet            +12.06 USD @@ 80.53 CNY

2017-05-05 * "Amazon web services"
  Liabilities:CMB:CreditCards                 -90.14 CNY
  Expenses:Communications:Internet            +13.04 USD @@ 90.14 CNY

It seems that Beancoun did not use these numbers directly. Instead, it
uses the quotient of the two numbers to calculate the weight. In this
process, float round-off errors occur.

@blais
Copy link
Member Author

blais commented Sep 16, 2017

Original comment by Zhuoyun Wei (Bitbucket: wzyboy, GitHub: wzyboy).


Hmm, sorry for my misunderstanding. This may not be relevant to float round-off errors:

>>> 90.14/13.04
6.912576687116565
>>> from beancount.core.number import D
>>> D('90.14')/D('13.04')
Decimal('6.912576687116564417177914110')
>>> D('90.14')/D('13.04')*D('13.04')
Decimal('90.13999999999999999999999999')

@blais
Copy link
Member Author

blais commented Sep 16, 2017

Original comment by Zhuoyun Wei (Bitbucket: wzyboy, GitHub: wzyboy).


Maybe an optional attribute "weight" could be attached to a Posting? If @@ syntax is used, save the number after @@ in "weight" attribute. And display the saved number directly (instead of calculate it with the price) if converted currency is needed.

@blais
Copy link
Member Author

blais commented Sep 16, 2017

Original comment by Zhuoyun Wei (Bitbucket: wzyboy, GitHub: wzyboy).


It seems that if '@@' is used, the number after is discarded after the price is calculated:

https://bitbucket.org/blais/beancount/src/621cec5ed38bcd128a3502a3b5c367f283deffe2/beancount/parser/grammar.py?at=default&fileviewer=file-view-default#grammar.py-806:816

@blais
Copy link
Member Author

blais commented Sep 16, 2017

Original comment by Martin Blais (Bitbucket: blais, GitHub: blais).


It's not discarded, but rather, the per-unit price is stored, so there is a division involved. Instead of keeping two prices (per-unit, total) and to duplicate all the calculations a while ago I decided to keep only the per-unit price.

Here:
https://bitbucket.org/blais/beancount/src/621cec5ed38bcd128a3502a3b5c367f283deffe2/beancount/parser/grammar.py?at=default&fileviewer=file-view-default#grammar.py-806

What would you suggest an alternative should be?

I think perhaps if the display context handled price display better, maybe that would resolve the concern that had you look into this in the first place. IIRC it wasn't a simple issue though, has multiple ramifications.

@blais
Copy link
Member Author

blais commented Sep 17, 2017

Original comment by Zhuoyun Wei (Bitbucket: wzyboy, GitHub: wzyboy).


Sorry that was a typo. "the number after is discarded" -> "the number after @@ is discaarded". No matter which syntax (@ or @@) the input file uses, Beancount always keeps the per-unit price, and discards the total unit (if one is provided).

My suggestion is to keep both prices (per-unit, total). We add an attribute "total_price" / "weight" / "weighted" (I'm bad at naming things. Your choice.) to the Posting object. This attribute is of type "Amount" or None. If the user uses "@@" to provide the total price, use that number as "total_price", otherwise "total_price" is None. The calucation of "price" does not change.

When rendering the Posting, if "total_price" is not None, rendering the Posting with the "@@" syntax. Currently "@" syntax is always used even if user uses "@@" syntax in input files.

This way, if user uses "@@" syntax, they would not see very long fractional numbers, as "total_price" is rendered as-is, instead of calculated "price", which may be a very long fractional number.

I use both "@" and "@@" syntaxes. I use "@" for trading activities because my broker only tells me the per-unit price. I use "@@" for international creidt card transactions, because my bank only tells the total price. I think both syntaxes have different meanings and use cases. One syntax should not be converted to the other syntax and "forget about the detail of input syntax" (grammar.py, line 806).

Sorry but I am not sure if I made my ideas clear in English. I always feel bad when explaining complicated ideas to native English speaks. ;-(

@blais
Copy link
Member Author

blais commented Sep 17, 2017

Original comment by Martin Blais (Bitbucket: blais, GitHub: blais).


I understand your description. However, I don't find the solution proposed simple enough. It introduces a lot of complexity, and if lots get merged, it would have to handle that too (probably devolving to per-unit prices). After a lot has been input, the input syntax is irrelevant. Having to handle both total and per-unit costs would be a real pain when making transformations. And the lot can get transformed over time.

A more compelling compromise would be to use fractions to represent the cost number, which would result in an accurate recreation of the original amount provided by the user.

But... all of this just to avoid a display quirk? I think there are better ways to fix the display issue, rounding prices at a low enough precision should fix most of those, and keeps the representation simple.

@blais
Copy link
Member Author

blais commented Sep 17, 2017

Original comment by Zhuoyun Wei (Bitbucket: wzyboy, GitHub: wzyboy).


Hmm, it's not just for displaying issues. I mean, a user inputs "@@" and invoke "bean-report print", they see "@" instead of "@@". Something is lost in translation text -> Python objects -> text.

But I agree with you, if my proposal is too complicated to implement in code, it may not worth it...

So, does Beancount has an option to limit the length / precision of price? Having a price that is 20+ digits long is not meaningful for everyday credit card transactions. Limiting it to 4 digits after decimal point would be sufficient for most currencies in the real world. This would solve the display issues in both "bean-report print" and Fava.

@blais
Copy link
Member Author

blais commented Sep 17, 2017

Original comment by Martin Blais (Bitbucket: blais, GitHub: blais).


DisplayContext is supposed to handle it, but I think I never completed the conversion of all codes to use the price consistently when rendering.

@blais
Copy link
Member Author

blais commented Sep 17, 2017

Original comment by Zhuoyun Wei (Bitbucket: wzyboy, GitHub: wzyboy).


Fraction seems nice anyway:

>>> from beancount.core.number import D
>>> from fractions import Fraction as F
>>> D('90.14')/D('13.04')*D('13.04')
Decimal('90.13999999999999999999999999')
>>> F(D('90.14'))/F(D('13.04'))*F(D('13.04'))
Fraction(4507, 50)
>>> 4507/50
90.14
>>> D(str(float(F(D('90.14'))/F(D('13.04'))*F(D('13.04')))))
Decimal('90.14')

@blais
Copy link
Member Author

blais commented Sep 17, 2017

Original comment by Martin Blais (Bitbucket: blais, GitHub: blais).


I hadn't realized it was built-in.
Thx

@blais
Copy link
Member Author

blais commented Sep 17, 2017

Original comment by Jakob Schnitzer (Bitbucket: yagebu, GitHub: yagebu).


This kind-of looks like a duplicate of #109 to me, where I suggested the following possible solution:

It would be nice if beancount would only calculate the price number of digits required to make the transaction balance under the known tolerances.

@blais
Copy link
Member Author

blais commented Sep 17, 2017

Original comment by Zhuoyun Wei (Bitbucket: wzyboy, GitHub: wzyboy).


So there are four possible solutions for this issue:

  1. Store total price (the number after @@) if @@ is used in input file, and render total price as-is as output. (That is, don't render per-unit price at all if user inputs total price);
  2. Use fraction to store per-unit price so that precision is not lost when we calculate per-unit price from total price;
  3. Convert all rendering code to use a proper DisplayContext that limits decimal digits to a sane value, for example, 4 for most currencies in the real world;
  4. When caculate the per-unit price from total price, only preserve digits required to balance the transaction, and discard the additional digits.

As Martin says, the 1st idea may cause issues when transforming positions. So personally I am a fan of the 2rd idea. We could subclass fractions.Fraction, and override its __mul__(), __rmul__(), etc, to allow arithmetic operations with Decimals:

>>> from decimal import Decimal
>>> from fractions import Fraction
>>> from beancount.core.number import D

>>> class F(Fraction):
  2     def __mul__(self, other):
  3         if not isinstance(other, Decimal):
  4             return super().__mul__(other)
  5         else:
  6             return super().__mul__(Fraction(other))

>>> F('1.1') * D('2')
Fraction(11, 5)

The customized subclass could be used to store the value of calculated per-unit price (from total price).

Jakob's idea (4th) is also good. I am also annoyed at the 20+ digits of price in Fava.

@blais
Copy link
Member Author

blais commented Sep 17, 2017

Original comment by Martin Blais (Bitbucket: blais, GitHub: blais).


(4) is difficult, I remember looking into it and running in some complications (it's been a while, don't remember any of the details now)

@blais
Copy link
Member Author

blais commented Feb 7, 2018

Original comment by droogmic (Bitbucket: Michael Droogleever, GitHub: droogmic).


@wzyboy Do you know if any progress been made towards any of the proposed solutions. As seen in #207, this is a blocking issue to allow importers to add @@ prices to postings directly...

I do not see why option 1 would be an issue besides a lot of code needing changes, where do prices affect positions?

@blais
Copy link
Member Author

blais commented Feb 8, 2018

Original comment by Zhuoyun Wei (Bitbucket: wzyboy, GitHub: wzyboy).


@droogmic Hi, I do not think there is any progress being made on this issue, as the BDFL of this project, Martin , is currently busying with project #kid (congrats!)

As for issue #207, am I understanding correctly that you want a perfect "Beancount text file -> Python objects -> Beancount text file" conversion? I remebered once Martin said in ML that bean-report never meant to be an accurate "load and dump" tool. Some information is discarded during the convertions, '@@' syntax being one of them.

@blais
Copy link
Member Author

blais commented Feb 8, 2018

Original comment by droogmic (Bitbucket: Michael Droogleever, GitHub: droogmic).


@wzyboy Yes, an @@ round-trip would be preferable. Particularly because utils that generate pythonic entries to be printed have no way of printing the @@. I had a go at implementing your option 1, because I think that is the best way of doing it. It seems to have worked, passes all the existing unit tests, and was easier to implement than I thought it would be. See: pull request #42. I'll just wait for someone to point out why what I did won't work :P

Congrats Martin! this project must have consumed so much time which you probably no longer have :P, good luck :)

@blais
Copy link
Member Author

blais commented Feb 8, 2018

Original comment by Zhuoyun Wei (Bitbucket: wzyboy, GitHub: wzyboy).


@droogmic Hmm.. I'm no Python expert but I took a quick look at your commits. IMHO the change to data.Posting namedtuple is a risky move. A lot of code out there constructs Posting instance by positional arguments, and all those code would break...

@blais
Copy link
Member Author

blais commented Feb 8, 2018

Original comment by droogmic (Bitbucket: Michael Droogleever, GitHub: droogmic).


@wzyboy
I did a grep of all "posting("s to find all uses of it in this repo, to add the new arg as needed. I reran the unit tests and I don't think any new ones failed because of my changes. If something does break, it will be obvious why, and if code breaks because it didn't have a unit test, then at least this will encourage the creation of one.

I considered maybe overloading price with a different type, e.g. having TotalPrice = Amount, and setting price to an instance of TotalPrice or an instance of Amount as needed. Then you just need to check types when using it, but it did not seem like the easiest and cleanest way to do so.

It is a breaking change, which is never a good thing, e.g. if it breaks existing code outside this repo, but it seems to work with all the local code and achieves its goal. Ideally one would replace the namedtuple with a class that then raises a helpful error when not instantiating it with enough args, but I don't think this is a big enough project to necessitate that...

@blais
Copy link
Member Author

blais commented Feb 8, 2018

Original comment by Zhuoyun Wei (Bitbucket: wzyboy, GitHub: wzyboy).


@droogmic I was thinking about creating a new type as well, but you are right, doing type checks every time is a headache of its own.

I must confess, that I instantiated all my Posting instances by postitional arguments instead of keyword arguments, in my own importers and side projects. (A quick grep revelas at least 5) And I did not write any unit tests for them... Maybe I should change them to be instantiaed by keyword arguments, to be future-proof.

@blais
Copy link
Member Author

blais commented Feb 17, 2018

Original comment by Martin Blais (Bitbucket: blais, GitHub: blais).


The input and processed representation of the stream of transactions is distinct - by design. The idea of a perfect round-trip is already limited to a subset of the input language. Zhuoyun's comment on this is correct. Adding support for @@ roundtrip isn't really going to resolve that.

The extra complexity of keeping the total price just for display isn't worth it, maybe finding another solution - a smarter way to round on display? - would solve the display issue. I really do want to keep the schema as simple and unambiguous as it can be, and introducing an alternate field for the units means that all code operating on those numbers would have to handle both cases - that's just not going to be simple and it would break a lot of code.

Here's another idea to tinker with: perhaps storing the total price as an OPTIONAL metadata field that can be used by the rendering code might work, but it should be done in such a way that the metadata containing the total price is always optional, and asserted to be close to the per-unit number?

@blais
Copy link
Member Author

blais commented Feb 17, 2018

Original comment by Zhuoyun Wei (Bitbucket: wzyboy, GitHub: wzyboy).


Here's another idea to tinker with: perhaps storing the total price as an OPTIONAL metadata field that can be used by the rendering code might work, but it should be done in such a way that the metadata containing the total price is always optional, and asserted to be close to the per-unit number?

Storing information in metadata is a good idea IMHO. If I recall correctly, currently very few "official" code utilizes metadata field, "commodity" being one of them. Maybe we can actually take more advantage of the metadata field for storing optional data. I mean, they are optional, after all...

@blais
Copy link
Member Author

blais commented Feb 22, 2018

Original comment by droogmic (Bitbucket: Michael Droogleever, GitHub: droogmic).


The input and processed representation of the stream of transactions is distinct - by design. The idea of a perfect round-trip is already limited to a subset of the input language. Zhuoyun's comment on this is correct. Adding support for @@ roundtrip isn't really going to resolve that.

I recognise that round-trips are not sought, and this was not my goal, merely a convenient side-effect of my goal. Can I check that text -> python -> text is imperfect, but python -> text -> python is perfect? Is there a reference list of the designed input-processed distinctions?

Here's another idea to tinker with: perhaps storing the total price as an OPTIONAL metadata field that can be used by the rendering code might work, but it should be done in such a way that the metadata containing the total price is always optional, and asserted to be close to the per-unit number?

Maybe I am misunderstanding, but this suggests having duplicate data, code that needs to check the validity of some arbitrary metadata field, and the input spec would have 2 distinct ways of doing the same thing (@@ or @+optional meta). This seems like complexity to me..., and unless you remove @@ altogether, still does not let me use the beancount printer to print @@ postings.

The extra complexity of keeping the total price just for display isn't worth it, maybe finding another solution - a smarter way to round on display? - would solve the display issue. I really do want to keep the schema as simple and unambiguous as it can be, and introducing an alternate field for the units means that all code operating on those numbers would have to handle both cases - that's just not going to be simple and it would break a lot of code.

and from pull request #42:

This is a known design quirk and is intended to remain as is. One of the design objectives for the schema is to keep it as simple as possible. The extra complexity it introduces isn't worth it.

"known design quirk and is intended to remain as is", so a design decision? Fair enough, but given it worked, an alternative is not around, and I have a use for it, I'll probably just see how long I can maintain a functional fork on my end for... I don't like my solution either, at the very least a price-or-totalprice type overload would have been more faithful to the original design. I think I am trying to solve a slighty different problem to the one @wzyboy is in this issue though, as I just want to get @@ into my .beancount file for now, leaving the display issues to be solved later...

I honestly didn't expect adding totalprice the way I did to be so easy, I assume you still found what I had done too complex? The complexity of this change is probably partly a result of the functional-immutable design of the data structures, one of the few downsides I guess.

The reason I currently would prefer not having my ingested transactions getting converted to postings with a single @ is that my bank provides the totalprice in the foreign currency, and the debit or credit in my local currency as expected. Annoyingly though, it also gives a daily exchange rate which will be inconsistent with both the calculated exchange rate, and any pretty truncated decimal version that could be put in the input file. (I am not sure why this is, either inter-day fluctuating rates, or they average out multiple transactions to get as close to the real rate as possible.) There maybe is a "proper" way this should be displayed in beancount, but imo it is with @@ for the totalprice, and a meta field for the day exchange rate.

As a more contrived example, is it possible to have a situation where price per unit is not really applicable, e.g. 10 BEAN @@ 10 USD, but only because it was a bulk deal? How about using beancount to record working hours and incomes, could you have something like 6 WRKHR @@ 120 USD, but where 20USD/hr is not correct? Struggling to wrap my head around this at the moment.

@blais
Copy link
Member Author

blais commented Feb 23, 2018

Original comment by Martin Blais (Bitbucket: blais, GitHub: blais).


Re. two methods: There's no added complexity; the metadata field would only be used by the renderer, nowhere else. Validation would just be there to avoid mistakes. You'd get what you want, without affecting anything else. (This is a bit of a concession BTW, there's very little usage of metadata in the Beancount code itself, I'm afraid of its creeping uses.)

Re. "quirk": It's a design choice. Think about it: if there were two fields, ALL the operations on price (and cost) would have to be using custom operators in order to accommodate for the possible presence of either field. This would create a tremendous amount of complexity! The compromise here is that you don't have the original price, but we have a very simple schema that is easy to understand and operate on, no overloads, no custom operators, etc. Just a single number.

Re. immutable data structures: It's unfair to discount all their benefits. One improvement is to provide constructors with keyword args (I'll do it at some point) so they're easier to create and more robust to code changes.

Re. exchange rate being wrong: That's incorrect--your bank uses a rounded rate and the very small fraction is either lost or given to you. Beancount's purpose has always been to record what /actually/ happens in the transaction (otherwise I'd use rationals), and the correct data to enter is actually the rounded exchange rate with an error tolerance accepted in the transaction to allow it.

Re. prices for bulk things: You can still calculate a price per unit. The fact that you got a particular price or not is simply an artifact the circumstances of your transaction and need not be recorded in the price representation--use metadata if you need to somehow record that (I don't know of anyone doing this).

@blais
Copy link
Member Author

blais commented Feb 25, 2018

Original comment by droogmic (Bitbucket: Michael Droogleever, GitHub: droogmic).


RE: ALL the operations on price (and cost) would have to be using custom operators in order to accommodate for the possible presence of either field -
I just don't think this is that big an issue, and I know as I patched each one manually... It is simpler than the the combination of Cost-CostSpec at least. Could be mostly solved using a class?, see below.

RE: immutable data structures -
Sorry, I don't think I was clear, I agree with the benefits of immutable data structures and their manipulation with pure functions. On top of using kwargs, you could make a further concession by creating an immutable class to build on the namedtuple funtionality, with certain basic functionality built-in. Is total data-function separation important?

RE: exchange rate -
My apologies, I thought I had checked that but was being an idiot, at worst the exchange rate is off by 0.00005 (half precision).

RE: bulk rates -
Yeah, makes sense, thanks

Generally, to me it feels wrong not being able to print @@ as in pull request #207. As this only affects ingest, I can just use my hack to do imports but use the main branch for everything else. Thanks for taking the time reply, hope you find a solution, sadly I won't be looking any further for a fix for the rendering.

@blais blais added P2 Low priority, addressing would result in useful improvements core-ops Component: Core data structures and stream ops BUG This is a bug, not a feature request; something isn't working. labels May 22, 2020
@shaform
Copy link

shaform commented Sep 2, 2020

Re. "quirk": It's a design choice. Think about it: if there were two fields, ALL the operations on price (and cost) would have to be using custom operators in order to accommodate for the possible presence of either field. This would create a tremendous amount of complexity!

Why don't we instead just keep the total cost and discard the unit price in the internal representation? In this case, there would be no need for error tolerance because the rounding errors in the price does not affect the balance anyway. In addition, the OPTIONAL metadata field is not needed either.

In general, I think the price in a transaction is usually just displayed for reference, and the total cost/value is what is actually being recorded in a account. When price * unit != total cost, the account balance is typically calculated based on total cost rather than price * unit in a typical brokerage/bank/credit card account. This is evident in the case of fractional shares for stocks or currency exchange for credit card bills.

@wzyboy
Copy link
Contributor

wzyboy commented Sep 2, 2020

Why don't we instead just keep the total cost and discard the unit price in the internal representation?

That would break compatibility with a lot of existing code. This kind of design choices could only be changed during a major re-design, maybe the proposed Beancount V3 plan.

@shaform
Copy link

shaform commented Sep 2, 2020

That would break compatibility with a lot of existing code.

I think it is possible to avoid breaking compatibility by providing a getter for the original price field. Whenever the price field is accessed, it just computes the price on the fly according to the total cost and unit. In addition, we also provide a setter, and whenever the price field is set, it just computes the total cost by the passed parameter, and only the total cost would be stored. In this way, we still only keep one single value, but other programs that depend on the price field can still work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG This is a bug, not a feature request; something isn't working. core-ops Component: Core data structures and stream ops P2 Low priority, addressing would result in useful improvements
Projects
None yet
Development

No branches or pull requests

3 participants