# Weakly Supervised Recommendation Systems

Experiments steps:
 1. **User's Preferences Model**: Leverage the most *explicit* ratings to build a *rate/rank prediction model*. This is a simple *Explicit Matrix Factorization* model. 
 2. **Generate Weak DataSet**: Use the above model to *predict* for all user/item pairs $(u,i)$ in *implicit feedback dataset* to build a new *weak explicit dataset* $(u, i, r^*)$.
 3. **Evaluate**: Use the intact test split in the most explicit feedback, in order to evaluate the performance of any model.

## Explicit Model Experiments

This section contains all the experiments based on the explicit matrix factorization model.

### Explicit Review/Recommend Model

In [1]:
import utils
from spotlight.evaluation import rmse_score

dataset_recommend_train, dataset_recommend_test, dataset_recommend_dev, dataset_play, dataset_purchase = utils.parse_steam()

print('Explicit dataset (TEST) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_test.ratings), ','),
          format(dataset_recommend_test.num_users, ','),
          format(dataset_recommend_test.num_items, ',')))

print('Explicit dataset (VALID) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_dev.ratings), ','),
          format(dataset_recommend_dev.num_users, ','),
          format(dataset_recommend_dev.num_items, ',')))

print('Explicit dataset (TRAIN) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_train.ratings), ','),
          format(dataset_recommend_train.num_users, ','),
          format(dataset_recommend_train.num_items, ',')))

print('Implicit dataset (PLAY) contains %s interactions of %s users and %s items'%(
          format(len(dataset_play.ratings), ','),
          format(dataset_play.num_users, ','),
          format(dataset_play.num_items, ',')))

print('Implicit dataset (PURCHASE) contains %s interactions of %s users and %s items'%(
          format(len(dataset_purchase.ratings), ','),
          format(dataset_purchase.num_users, ','),
          format(dataset_purchase.num_items, ',')))

# train the explicit model based on recommend feedback
model = utils.train_explicit(train_interactions=dataset_recommend_train, 
                             valid_interactions=dataset_recommend_dev,
                             run_name='model_steam_explicit_recommend')

mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
rmse = rmse_score(model=model, test=dataset_recommend_test)
print('-'*20)
print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

Explicit dataset (TEST) contains 1,416 interactions of 1,781 users and 9,535 items
Explicit dataset (VALID) contains 1,416 interactions of 1,781 users and 9,535 items
Explicit dataset (TRAIN) contains 11,324 interactions of 1,781 users and 9,535 items
Implicit dataset (PLAY) contains 217,860 interactions of 1,781 users and 9,535 items
Implicit dataset (PURCHASE) contains 344,292 interactions of 1,781 users and 9,535 items
--------------------
RMSE: 0.4613
MRR: 0.0391
nDCG: 0.0698
nDCG@10: 0.0530
nDCG@5: 0.0331
MAP: 0.0343
success@10: 0.1475
success@5: 0.0758


## Remove all valid/test rating values

In [2]:
test_interact = set()
for (uid, iid) in zip(dataset_recommend_test.user_ids, dataset_recommend_test.item_ids):
    test_interact.add((uid, iid))

for (uid, iid) in zip(dataset_recommend_dev.user_ids, dataset_recommend_dev.item_ids):
    test_interact.add((uid, iid))

# clean implicit dataset from test/dev rating
for idx, (uid, iid, r) in enumerate(zip(dataset_play.user_ids, dataset_play.item_ids, dataset_play.ratings)):
    if (uid, iid) in test_interact:
        dataset_play.ratings[idx] = -1

### Explicit Play Model

Leverage the **explicit review/recommend model** trained at the previous section to annotate **missing values** in the **play** dataset.

In [3]:
# annotate the missing values in the play dataset based on the explicit recommend model
dataset_play = utils.annotate(interactions=dataset_play, 
                              model=model, 
                              run_name='dataset_steam_play_explicit_annotated')

# train the explicit model based on recommend feedback
model = utils.train_explicit(train_interactions=dataset_play, 
                             valid_interactions=dataset_recommend_dev,
                             run_name='model_steam_explicit_play')

# evaluate the new model
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
rmse = rmse_score(model=model, test=dataset_recommend_test)
print('-'*20)
print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

epoch 1 start at:  Tue Apr 23 09:05:08 2019
epoch 1 end at:  Tue Apr 23 09:05:09 2019
RMSE: 0.4413
epoch 2 start at:  Tue Apr 23 09:05:09 2019
epoch 2 end at:  Tue Apr 23 09:05:10 2019
RMSE: 0.4353
epoch 3 start at:  Tue Apr 23 09:05:11 2019
epoch 3 end at:  Tue Apr 23 09:05:11 2019
RMSE: 0.4419
--------------------
RMSE: 0.4590
MRR: 0.0460
nDCG: 0.0661
nDCG@10: 0.0534
nDCG@5: 0.0418
MAP: 0.0382
success@10: 0.1242
success@5: 0.0899


## Implicit Model Experiments

This section contains all the experiments based on the implicit matrix factorization model.

### Implicit Review/Recommend Model

In [3]:
import utils
from spotlight.evaluation import rmse_score

dataset_recommend_train, dataset_recommend_test, dataset_recommend_dev, dataset_play, dataset_purchase = utils.parse_steam()

print('Explicit dataset (TEST) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_test.ratings), ','),
          format(dataset_recommend_test.num_users, ','),
          format(dataset_recommend_test.num_items, ',')))

print('Explicit dataset (VALID) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_dev.ratings), ','),
          format(dataset_recommend_dev.num_users, ','),
          format(dataset_recommend_dev.num_items, ',')))

print('Explicit dataset (TRAIN) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_train.ratings), ','),
          format(dataset_recommend_train.num_users, ','),
          format(dataset_recommend_train.num_items, ',')))

print('Implicit dataset (PLAY) contains %s interactions of %s users and %s items'%(
          format(len(dataset_play.ratings), ','),
          format(dataset_play.num_users, ','),
          format(dataset_play.num_items, ',')))

print('Implicit dataset (PURCHASE) contains %s interactions of %s users and %s items'%(
          format(len(dataset_purchase.ratings), ','),
          format(dataset_purchase.num_users, ','),
          format(dataset_purchase.num_items, ',')))

# train the explicit model based on recommend feedback
model = utils.train_implicit_negative_sampling(train_interactions=dataset_play,
                                               valid_interactions=dataset_recommend_dev,
                                               run_name='model_steam_implicit')

mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
rmse = rmse_score(model=model, test=dataset_recommend_test)
print('-'*20)
print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

Explicit dataset (TEST) contains 1,416 interactions of 1,781 users and 9,535 items
Explicit dataset (VALID) contains 1,416 interactions of 1,781 users and 9,535 items
Explicit dataset (TRAIN) contains 11,324 interactions of 1,781 users and 9,535 items
Implicit dataset (PLAY) contains 217,860 interactions of 1,781 users and 9,535 items
Implicit dataset (PURCHASE) contains 344,292 interactions of 1,781 users and 9,535 items
epoch 1 start at:  Sat Apr 20 11:06:27 2019
epoch 1 end at:  Sat Apr 20 11:06:28 2019
MRR: 0.0741
epoch 2 start at:  Sat Apr 20 11:06:31 2019
epoch 2 end at:  Sat Apr 20 11:06:32 2019
MRR: 0.0800
epoch 3 start at:  Sat Apr 20 11:06:35 2019
epoch 3 end at:  Sat Apr 20 11:06:36 2019
MRR: 0.0857
epoch 4 start at:  Sat Apr 20 11:06:39 2019
epoch 4 end at:  Sat Apr 20 11:06:40 2019
MRR: 0.0856
--------------------
RMSE: 7.5434
MRR: 0.0793
nDCG: 0.0936
nDCG@10: 0.0818
nDCG@5: 0.0681
MAP: 0.0581
success@10: 0.1951
success@5: 0.1416


## Popularity

In [4]:
import utils
from popularity import PopularityModel
from spotlight.evaluation import rmse_score

dataset_recommend_train, dataset_recommend_test, dataset_recommend_dev, dataset_play, dataset_purchase = utils.parse_steam()

print('Explicit dataset (TEST) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_test.ratings), ','),
          format(dataset_recommend_test.num_users, ','),
          format(dataset_recommend_test.num_items, ',')))

print('Explicit dataset (VALID) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_dev.ratings), ','),
          format(dataset_recommend_dev.num_users, ','),
          format(dataset_recommend_dev.num_items, ',')))

print('Explicit dataset (TRAIN) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_train.ratings), ','),
          format(dataset_recommend_train.num_users, ','),
          format(dataset_recommend_train.num_items, ',')))

print('Implicit dataset (PLAY) contains %s interactions of %s users and %s items'%(
          format(len(dataset_play.ratings), ','),
          format(dataset_play.num_users, ','),
          format(dataset_play.num_items, ',')))

print('Implicit dataset (PURCHASE) contains %s interactions of %s users and %s items'%(
          format(len(dataset_purchase.ratings), ','),
          format(dataset_purchase.num_users, ','),
          format(dataset_purchase.num_items, ',')))

# train the explicit model based on recommend feedback
model = PopularityModel()
print('fit the model')
model.fit(interactions=dataset_recommend_train)

# evaluate the new model
print('evaluate the model')
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
# rmse = rmse_score(model=model, test=dataset_rate_test, batch_size=512)
# print('-'*20)
# print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

Explicit dataset (TEST) contains 1,416 interactions of 1,781 users and 9,535 items
Explicit dataset (VALID) contains 1,416 interactions of 1,781 users and 9,535 items
Explicit dataset (TRAIN) contains 11,324 interactions of 1,781 users and 9,535 items
Implicit dataset (PLAY) contains 217,860 interactions of 1,781 users and 9,535 items
Implicit dataset (PURCHASE) contains 344,292 interactions of 1,781 users and 9,535 items
fit the model
evaluate the model
MRR: 0.1126
nDCG: 0.1296
nDCG@10: 0.1150
nDCG@5: 0.0946
MAP: 0.0871
success@10: 0.2437
success@5: 0.1709
