Application in recommender system #230

Tych0n · 2019-02-28T13:30:03Z

Hi, aksnzhy! Thank you for this library. Can you please guide me a bit? I have a dataset with four columns: transaction_count, user, item, item_colour. I want to recommend some items to users, based on transaction_count. I can use ALS with transaction_count, user and item columns, for example with "implicit" library. But if i want to take in account item_colour i need to use for example ffm. So, i create ffm-formatted file:

transaction_count user_id:value_id:1 item_id:value_id:1 item_colour_id:value_id:1

5 0:0:1 1:3:1 2:5:1
3 0:1:1 1:4:1 2:6:1
8 0:2:1 1:3:1 2:7:1

and train my model. But, if i want to recommend top-5 items with some colours to a user, i need to create all combinations of user:item:colour rows, score them and then sort among each user all variants of item:colour by modeled probabilities and select 5 best among them. The problem is that such a list of all possible combinations explodes with my dimensions (users=80000, items=14000, colours=5), and impossible to operate. Is there any hack for implementation?

The text was updated successfully, but these errors were encountered:

aksnzhy · 2019-03-01T03:14:26Z

@Tych0n I'm not sure if I get your pointer correctly. You want to recommend top-5 items to users, but now xLearn can only do the binary classification task. One possible solution is that you can use one-vers-all to do a multi-class classification task, and then to compare the probability calculated by xLearn, and select the highest probability. It will only spend 5x cost compare to one binary classification task.

Tych0n · 2019-03-01T09:13:51Z

Yes, you got it right. I'm using 'task':'reg' and transaction_count as continious target, but I sure can scale it to binary one-vers-all form per user. The problem is, that to compare probabilities calculated by xLearn, i need to score 14000*5 variants per user. And do it 80000 times - it ends in creating dataset with 5.6 billion rows, and then scoring it with xLearn. I think may be i'm missing some point and doing it wrong.

aksnzhy · 2019-03-02T02:45:04Z

@Tych0n If you do this job like what you said, you actually facing a classification problem with 14000*5 labels. I think it is not a very good. If you can convert it to a binary task, e.g., given an item, predict the probability how you like it. Then you can predict all the probability of items for each user.

Tych0n · 2019-03-04T07:51:11Z

@aksnzhy got your point. Correct me please, if i'm wrong: i convert task to binary, score all user-item pairs to get probability of each pair, then sort top-5 items for each user. In this approach i still face the problem of creating 5.6 billion rows dataset to score and select from?

BrianMiner · 2019-04-16T02:41:53Z

I have the same exact question - do you have to create all possible permutations of items for each user and score all of them and order by probability?

BrianMiner · 2019-04-16T16:39:22Z

@Tych0n did you ever figure out a method?

Tych0n · 2019-04-16T19:07:16Z

@BrianMiner, no. I wasn't able to use this library in production for my problem, too many items to select recommendations from. I ended up using implicit.ALS. By the way, i tried LightFM - it showed comparable MAP@k, but still wasn't able to outperform ALS. At least in my setup.

BrianMiner · 2019-04-16T19:10:51Z

This must be a common issue for these types of models though. There must be some way to pre-filter the candidates. You see FM and FFM used for CTR problems. That could be as large as issue when you need to score so many ads * placements * versions (copy, color etc) for each user.

sumitsidana · 2020-07-17T17:17:56Z

Hi, you could use negative sampling to convert the problem from one class classification to binary classification. For every positive (transaction), sample a non-transaction. This way, you can still use this library or FM or FFM. Do sampling only during the training time. While prediction time, consider all the the items for prediction.

Tych0n closed this as completed Apr 16, 2019

ankane mentioned this issue Nov 8, 2019

[Idea] Implicit feedback collaborative filtering #301

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Application in recommender system #230

Application in recommender system #230

Tych0n commented Feb 28, 2019 •

edited

aksnzhy commented Mar 1, 2019 •

edited

Tych0n commented Mar 1, 2019 •

edited

aksnzhy commented Mar 2, 2019 •

edited

Tych0n commented Mar 4, 2019

BrianMiner commented Apr 16, 2019

BrianMiner commented Apr 16, 2019

Tych0n commented Apr 16, 2019

BrianMiner commented Apr 16, 2019

sumitsidana commented Jul 17, 2020

Application in recommender system #230

Application in recommender system #230

Comments

Tych0n commented Feb 28, 2019 • edited

aksnzhy commented Mar 1, 2019 • edited

Tych0n commented Mar 1, 2019 • edited

aksnzhy commented Mar 2, 2019 • edited

Tych0n commented Mar 4, 2019

BrianMiner commented Apr 16, 2019

BrianMiner commented Apr 16, 2019

Tych0n commented Apr 16, 2019

BrianMiner commented Apr 16, 2019

sumitsidana commented Jul 17, 2020

Tych0n commented Feb 28, 2019 •

edited

aksnzhy commented Mar 1, 2019 •

edited

Tych0n commented Mar 1, 2019 •

edited

aksnzhy commented Mar 2, 2019 •

edited