limit predictions to missing second postings #4

johannesjh · 2018-02-14T22:28:51Z

the predict_postings decorator should only predict missing second postings, as opposed to predicting third and fourth postings etc as well, which does not make sense for any usecase i can think of.

The text was updated successfully, but these errors were encountered:

tarioch · 2018-02-23T19:56:11Z

Starting to look at this, right now with a simple implementation by letting it be predicted but not adding it to the result. One thing to note is that this also points out that it never makes sense to have predict + suggestion enabled. If you predict then a second posting will be added and therefore there will be no suggestions (correctly).

So i think either you predict or you suggest but never both.

johannesjh · 2018-02-25T15:50:00Z

Prediction plus suggestions can be useful in the following scenario:

The importer produces predicted postings.
The user may find they need to edit the prediction because it was incorrect.
The UI (e.g. of fava) can enhance the editing of postings using these suggestions, e.g., by populating a dropdown list with the suggestions.

johannesjh · 2018-02-25T16:02:19Z

Also, I would argue that suggestions are useful no matter how many postings exist in the imported data, because the suggestions can support editing.

tarioch · 2018-02-25T18:16:51Z

I see that case, to simplify things I would only take the account of the first posting as input for the suggestions, which I think is the current implementation. So I'll just revert the changes I did on the suggestion part.

tarioch · 2018-02-25T18:18:24Z

Another thing which comes to mind. Is there a confidence for the prediction? Because if that one is very low I would rather not have a prediction.

johannesjh · 2018-02-25T19:03:18Z

I see that case, to simplify things I would only take the account of the first posting as input for the suggestions, which I think is the current implementation.

Yes, and that should be sufficient for most usecases. (I think that most importers will only output one posting per transaction anyway, each with the same account).

Note: The pipeline.fit step does learn from all postings in the training data, hence the conversion from Transaction to TxnPosting in predict_postings.py lines 78ff. But the subsequent pipeline.predict step only uses the first posting's data, hence no use of TxnPosting objects in self.pipeline.predict(transactions) in line 147.

johannesjh · 2018-02-25T19:10:28Z

Another thing which comes to mind. Is there a confidence for the prediction? Because if that one is very low I would rather not have a prediction.

Good point. The decorator could accept a parameter through which users can set a threshold.

The SVM classifier in scikit-learn can calculate probabilities, compare How to get a classifier's confidence score for a prediction in sklearn? on stackoverflow. I don't know how much this will slow down the pipeline.fit method, but I would not worry too much about it.

tarioch mentioned this issue Feb 23, 2018

Only add prediction/suggestion if we have exactly 1 posting #13

Merged

tarioch added the enhancement label Feb 23, 2018

tarioch self-assigned this Feb 23, 2018

tarioch mentioned this issue Feb 27, 2018

Allow to specify confidence score #16

Closed

tarioch closed this as completed in #13 Feb 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limit predictions to missing second postings #4

limit predictions to missing second postings #4

johannesjh commented Feb 14, 2018

tarioch commented Feb 23, 2018

johannesjh commented Feb 25, 2018

johannesjh commented Feb 25, 2018

tarioch commented Feb 25, 2018

tarioch commented Feb 25, 2018

johannesjh commented Feb 25, 2018

johannesjh commented Feb 25, 2018

limit predictions to missing second postings #4

limit predictions to missing second postings #4

Comments

johannesjh commented Feb 14, 2018

tarioch commented Feb 23, 2018

johannesjh commented Feb 25, 2018

johannesjh commented Feb 25, 2018

tarioch commented Feb 25, 2018

tarioch commented Feb 25, 2018

johannesjh commented Feb 25, 2018

johannesjh commented Feb 25, 2018