Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predict postings with amounts #107

Closed
sullivan-sean opened this issue Apr 15, 2021 · 3 comments
Closed

Predict postings with amounts #107

sullivan-sean opened this issue Apr 15, 2021 · 3 comments

Comments

@sullivan-sean
Copy link

I'm more curious if this is possible than anything else. But I am toying around with the idea of trying to use ML to predict amounts on postings in addition to postings. I appreciate all the amazing work you all have done on smart_importer so far, and was looking through the discussions and saw past threads about predicting multiple accounts. I have a common use case which is I split a transaction with somebody, usually 50/50 and would like the output postings to be:

Assets:Cash      -100
Liabilities:Friend  50
Expenses:Food   50

For me, a first pass solution is to naively split the units evenly among the other transactions:

class PredictPostings(EntryPredictor):
    """Predicts posting accounts."""

    weights = {"narration": 0.8, "payee": 0.5, "date.day": 0.1}

    @property
    def targets(self):
        return [
            " ".join(posting.account for posting in txn.postings)
            for txn in self.training_data
        ]

    def apply_prediction(self, transaction, prediction):
        if len(transaction.postings) != 1:
            return transaction
            
        original_posting = transaction.postings[0] 
        accounts = [account for account in prediction.split(" ") if account != original_posting.account]
        new_units = -(original_posting.units / len(accounts))
        new_postings = [original_posting] + [Posting(account, new_units, None, None, None, None) for account in accounts]
        
        return transaction._replace(postings=new_postings)

as right now it seems like smart_importer could predict the names of the postings but not the amounts.

I'm wondering the best way to also try predicting the amounts. A possible ML based solution could be to predict a ratio in addition to an account name for each output posting: the input training data could be something like "Assets:Cash_1.0 Liabilities:Friend_0.5 Expenses:Food_0.5". I've also been looking into scikit's MultiOutputRegressor because it feels wrong to just attach the amount ratio to the account string.

I'm not sure if anyone else has thought about/tried to tackle this, but if so I would love to hear your thoughts. This might be too complicated to add to the library/not worth pursuing, but I wanted to open a thread to discuss in case others share this use case and want to brainstorm.

@johannesjh
Copy link
Collaborator

Regarding the shared-expenses usecase: I doubt if predicted amounts make sense for this usecase. I'll share some considerations regarding your idea and ask some follow-up questions.

  • On a general note: I have personally played around with shared expenses in beancount, but I have never used these features for real-world expense sharing. And I don't think I will in the future. The problem is: I found beancount to be cumbersome because its syntax forces to take the perspective of one specific person or entity, but it is not easily possible to view the same data from the perspectives of other stakeholders.
  • Specifically regarding predicted amounts: I would honestly rather enter the shares by hand, this is very little work compared to what else needs to be done in a shared expenses scenario: Shared expenses usually writing down the expense in the first place, getting an agreement regarding who bears which costs, calculating owed differences, and compensating these differences by paying each other. Predicted amounts would maybe make it easier to perform one little subtask amongst all these activities, but the overall usecase would still not be supported very well.

follow-up questions:
do you have a usecase where you would really benefit from predicted amounts?
how would you deal with the uncertainty that the prediction could be wrong?

@johannesjh
Copy link
Collaborator

long time no hear... shall we close this issue?

@johannesjh
Copy link
Collaborator

(I closed this issue due to inactivity... feel free to re-open, contributions welcome!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants