Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application in recommender system #230

Closed
Tych0n opened this issue Feb 28, 2019 · 9 comments
Closed

Application in recommender system #230

Tych0n opened this issue Feb 28, 2019 · 9 comments

Comments

@Tych0n
Copy link

Tych0n commented Feb 28, 2019

Hi, aksnzhy! Thank you for this library. Can you please guide me a bit? I have a dataset with four columns: transaction_count, user, item, item_colour. I want to recommend some items to users, based on transaction_count. I can use ALS with transaction_count, user and item columns, for example with "implicit" library. But if i want to take in account item_colour i need to use for example ffm. So, i create ffm-formatted file:

transaction_count user_id:value_id:1 item_id:value_id:1 item_colour_id:value_id:1

5 0:0:1 1:3:1 2:5:1
3 0:1:1 1:4:1 2:6:1
8 0:2:1 1:3:1 2:7:1

and train my model. But, if i want to recommend top-5 items with some colours to a user, i need to create all combinations of user:item:colour rows, score them and then sort among each user all variants of item:colour by modeled probabilities and select 5 best among them. The problem is that such a list of all possible combinations explodes with my dimensions (users=80000, items=14000, colours=5), and impossible to operate. Is there any hack for implementation?

@aksnzhy
Copy link
Owner

aksnzhy commented Mar 1, 2019

@Tych0n I'm not sure if I get your pointer correctly. You want to recommend top-5 items to users, but now xLearn can only do the binary classification task. One possible solution is that you can use one-vers-all to do a multi-class classification task, and then to compare the probability calculated by xLearn, and select the highest probability. It will only spend 5x cost compare to one binary classification task.

@Tych0n
Copy link
Author

Tych0n commented Mar 1, 2019

Yes, you got it right. I'm using 'task':'reg' and transaction_count as continious target, but I sure can scale it to binary one-vers-all form per user. The problem is, that to compare probabilities calculated by xLearn, i need to score 14000*5 variants per user. And do it 80000 times - it ends in creating dataset with 5.6 billion rows, and then scoring it with xLearn. I think may be i'm missing some point and doing it wrong.

@aksnzhy
Copy link
Owner

aksnzhy commented Mar 2, 2019

@Tych0n If you do this job like what you said, you actually facing a classification problem with 14000*5 labels. I think it is not a very good. If you can convert it to a binary task, e.g., given an item, predict the probability how you like it. Then you can predict all the probability of items for each user.

@Tych0n
Copy link
Author

Tych0n commented Mar 4, 2019

@aksnzhy got your point. Correct me please, if i'm wrong: i convert task to binary, score all user-item pairs to get probability of each pair, then sort top-5 items for each user. In this approach i still face the problem of creating 5.6 billion rows dataset to score and select from?

@BrianMiner
Copy link

I have the same exact question - do you have to create all possible permutations of items for each user and score all of them and order by probability?

@BrianMiner
Copy link

@Tych0n did you ever figure out a method?

@Tych0n
Copy link
Author

Tych0n commented Apr 16, 2019

@BrianMiner, no. I wasn't able to use this library in production for my problem, too many items to select recommendations from. I ended up using implicit.ALS. By the way, i tried LightFM - it showed comparable MAP@k, but still wasn't able to outperform ALS. At least in my setup.

@Tych0n Tych0n closed this as completed Apr 16, 2019
@BrianMiner
Copy link

This must be a common issue for these types of models though. There must be some way to pre-filter the candidates. You see FM and FFM used for CTR problems. That could be as large as issue when you need to score so many ads * placements * versions (copy, color etc) for each user.

@sumitsidana
Copy link

Hi, you could use negative sampling to convert the problem from one class classification to binary classification. For every positive (transaction), sample a non-transaction. This way, you can still use this library or FM or FFM. Do sampling only during the training time. While prediction time, consider all the the items for prediction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants