Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limit predictions to missing second postings #4

Closed
johannesjh opened this issue Feb 14, 2018 · 7 comments
Closed

limit predictions to missing second postings #4

johannesjh opened this issue Feb 14, 2018 · 7 comments
Assignees

Comments

@johannesjh
Copy link
Collaborator

the predict_postings decorator should only predict missing second postings, as opposed to predicting third and fourth postings etc as well, which does not make sense for any usecase i can think of.

@tarioch
Copy link
Collaborator

tarioch commented Feb 23, 2018

Starting to look at this, right now with a simple implementation by letting it be predicted but not adding it to the result. One thing to note is that this also points out that it never makes sense to have predict + suggestion enabled. If you predict then a second posting will be added and therefore there will be no suggestions (correctly).

So i think either you predict or you suggest but never both.

@johannesjh
Copy link
Collaborator Author

Prediction plus suggestions can be useful in the following scenario:

  • The importer produces predicted postings.
  • The user may find they need to edit the prediction because it was incorrect.
  • The UI (e.g. of fava) can enhance the editing of postings using these suggestions, e.g., by populating a dropdown list with the suggestions.

@johannesjh
Copy link
Collaborator Author

Also, I would argue that suggestions are useful no matter how many postings exist in the imported data, because the suggestions can support editing.

@tarioch
Copy link
Collaborator

tarioch commented Feb 25, 2018

I see that case, to simplify things I would only take the account of the first posting as input for the suggestions, which I think is the current implementation. So I'll just revert the changes I did on the suggestion part.

@tarioch
Copy link
Collaborator

tarioch commented Feb 25, 2018

Another thing which comes to mind. Is there a confidence for the prediction? Because if that one is very low I would rather not have a prediction.

@johannesjh
Copy link
Collaborator Author

I see that case, to simplify things I would only take the account of the first posting as input for the suggestions, which I think is the current implementation.

Yes, and that should be sufficient for most usecases. (I think that most importers will only output one posting per transaction anyway, each with the same account).

Note: The pipeline.fit step does learn from all postings in the training data, hence the conversion from Transaction to TxnPosting in predict_postings.py lines 78ff. But the subsequent pipeline.predict step only uses the first posting's data, hence no use of TxnPosting objects in self.pipeline.predict(transactions) in line 147.

@johannesjh
Copy link
Collaborator Author

Another thing which comes to mind. Is there a confidence for the prediction? Because if that one is very low I would rather not have a prediction.

Good point. The decorator could accept a parameter through which users can set a threshold.

The SVM classifier in scikit-learn can calculate probabilities, compare How to get a classifier's confidence score for a prediction in sklearn? on stackoverflow. I don't know how much this will slow down the pipeline.fit method, but I would not worry too much about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants