# Weakly Supervised Recommendation Systems

Experiments steps:
 1. **User's Preferences Model**: Leverage the most *explicit* ratings to build a *rate/rank prediction model*. This is a simple *Explicit Matrix Factorization* model. 
 2. **Generate Weak DataSet**: Use the above model to *predict* for all user/item pairs $(u,i)$ in *implicit feedback dataset* to build a new *weak explicit dataset* $(u, i, r^*)$.
 3. **Evaluate**: Use the intact test split in the most explicit feedback, in order to evaluate the performance of any model.

## Explicit Model Experiments

This section contains all the experiments based on the explicit matrix factorization model.

### Explicit Rate Model

In [1]:
import utils
from spotlight.evaluation import rmse_score

dataset_recommend_train, dataset_recommend_test, dataset_recommend_dev, dataset_implicit = utils.parse_dianping()

print('Explicit dataset (TEST) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_test.ratings), ','),
          format(dataset_recommend_test.num_users, ','),
          format(dataset_recommend_test.num_items, ',')))

print('Explicit dataset (VALID) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_dev.ratings), ','),
          format(dataset_recommend_dev.num_users, ','),
          format(dataset_recommend_dev.num_items, ',')))

print('Explicit dataset (TRAIN) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_train.ratings), ','),
          format(dataset_recommend_train.num_users, ','),
          format(dataset_recommend_train.num_items, ',')))

print('Implicit dataset (READ/READING/TAG/COMMENT) contains %s interactions of %s users and %s items'%(
          format(len(dataset_implicit.ratings), ','),
          format(dataset_implicit.num_users, ','),
          format(dataset_implicit.num_items, ',')))

# train the explicit model based on recommend feedback
model = utils.train_explicit(train_interactions=dataset_recommend_train, 
                             valid_interactions=dataset_recommend_dev,
                             run_name='model_dianping_explicit_rate')

# evaluate the new model
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
rmse = rmse_score(model=model, test=dataset_recommend_test)
print('-'*20)
print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

Explicit dataset (TEST) contains 2,679 interactions of 2,115 users and 12,890 items
Explicit dataset (VALID) contains 2,680 interactions of 2,115 users and 12,890 items
Explicit dataset (TRAIN) contains 21,433 interactions of 2,115 users and 12,890 items
Implicit dataset (READ/READING/TAG/COMMENT) contains 211,194 interactions of 2,115 users and 12,890 items
--------------------
RMSE: 0.4332
MRR: 0.0102
nDCG: 0.0204
nDCG@10: 0.0093
nDCG@5: 0.0029
MAP: 0.0067
success@10: 0.0364
success@5: 0.0087


## Remove valid/test ratings

In [2]:
test_interact = set()
for (uid, iid) in zip(dataset_recommend_test.user_ids, dataset_recommend_test.item_ids):
    test_interact.add((uid, iid))

for (uid, iid) in zip(dataset_recommend_dev.user_ids, dataset_recommend_dev.item_ids):
    test_interact.add((uid, iid))

# clean implicit dataset from test/dev rating
for idx, (uid, iid, r) in enumerate(zip(dataset_implicit.user_ids, dataset_implicit.item_ids, dataset_implicit.ratings)):
    if (uid, iid) in test_interact:
        dataset_implicit.ratings[idx] = -1

### Explicit Read/Reading/Tag/Comment Model

Leverage the **explicit rate model** trained at the previous section to annotate **missing values** in the **read/reading/tag/comment** dataset.

In [3]:
# annotate the missing values in the play dataset based on the explicit recommend model
dataset_implicit = utils.annotate(interactions=dataset_implicit, 
                              model=model, 
                              run_name='dataset_dianping_explicit_annotated')

# train the explicit model based on recommend feedback
model = utils.train_explicit(train_interactions=dataset_implicit, 
                             valid_interactions=dataset_recommend_dev,
                             run_name='model_dianping_explicit_read')

# evaluate the new model
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
rmse = rmse_score(model=model, test=dataset_recommend_test)
print('-'*20)
print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

epoch 1 start at:  Tue Apr 23 09:07:43 2019
epoch 1 end at:  Tue Apr 23 09:07:44 2019
RMSE: 0.4632
epoch 2 start at:  Tue Apr 23 09:07:44 2019
epoch 2 end at:  Tue Apr 23 09:07:45 2019
RMSE: 0.4592
epoch 3 start at:  Tue Apr 23 09:07:46 2019
epoch 3 end at:  Tue Apr 23 09:07:47 2019
RMSE: 0.4567
epoch 4 start at:  Tue Apr 23 09:07:47 2019
epoch 4 end at:  Tue Apr 23 09:07:48 2019
RMSE: 0.4557
epoch 5 start at:  Tue Apr 23 09:07:49 2019
epoch 5 end at:  Tue Apr 23 09:07:50 2019
RMSE: 0.4525
epoch 6 start at:  Tue Apr 23 09:07:50 2019
epoch 6 end at:  Tue Apr 23 09:07:52 2019
RMSE: 0.4505
epoch 7 start at:  Tue Apr 23 09:07:52 2019
epoch 7 end at:  Tue Apr 23 09:07:53 2019
RMSE: 0.4515
--------------------
RMSE: 0.4446
MRR: 0.0309
nDCG: 0.0609
nDCG@10: 0.0359
nDCG@5: 0.0132
MAP: 0.0228
success@10: 0.1310
success@5: 0.0386


## Implicit Model Experiments

This section contains all the experiments based on the implicit matrix factorization model.

### Implicit Model using Negative Sampling

In [3]:
import utils
from spotlight.evaluation import rmse_score

dataset_recommend_train, dataset_recommend_test, dataset_recommend_dev, dataset_implicit = utils.parse_dianping()

print('Explicit dataset (TEST) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_test.ratings), ','),
          format(dataset_recommend_test.num_users, ','),
          format(dataset_recommend_test.num_items, ',')))

print('Explicit dataset (VALID) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_dev.ratings), ','),
          format(dataset_recommend_dev.num_users, ','),
          format(dataset_recommend_dev.num_items, ',')))

print('Explicit dataset (TRAIN) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_train.ratings), ','),
          format(dataset_recommend_train.num_users, ','),
          format(dataset_recommend_train.num_items, ',')))

print('Implicit dataset (READ/READING/TAG/COMMENT) contains %s interactions of %s users and %s items'%(
          format(len(dataset_implicit.ratings), ','),
          format(dataset_implicit.num_users, ','),
          format(dataset_implicit.num_items, ',')))

# train the explicit model based on recommend feedback
model = utils.train_implicit_negative_sampling(train_interactions=dataset_implicit, 
                                               valid_interactions=dataset_recommend_dev,
                                               run_name='model_dianping_implicit_read2')

# evaluate the new model
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
rmse = rmse_score(model=model, test=dataset_recommend_test)
print('-'*20)
print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

Explicit dataset (TEST) contains 2,679 interactions of 2,115 users and 12,890 items
Explicit dataset (VALID) contains 2,680 interactions of 2,115 users and 12,890 items
Explicit dataset (TRAIN) contains 21,433 interactions of 2,115 users and 12,890 items
Implicit dataset (READ/READING/TAG/COMMENT) contains 211,194 interactions of 2,115 users and 12,890 items
epoch 1 start at:  Sat Apr 20 11:05:48 2019
epoch 1 end at:  Sat Apr 20 11:05:49 2019
MRR: 0.0455
epoch 2 start at:  Sat Apr 20 11:05:54 2019
epoch 2 end at:  Sat Apr 20 11:05:55 2019
MRR: 0.0453
--------------------
RMSE: 4.0115
MRR: 0.0559
nDCG: 0.0586
nDCG@10: 0.0474
nDCG@5: 0.0342
MAP: 0.0337
success@10: 0.1289
success@5: 0.0692


## Popularity

In [4]:
import utils
from popularity import PopularityModel
from spotlight.evaluation import rmse_score

dataset_recommend_train, dataset_recommend_test, dataset_recommend_dev, dataset_implicit = utils.parse_dianping()

print('Explicit dataset (TEST) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_test.ratings), ','),
          format(dataset_recommend_test.num_users, ','),
          format(dataset_recommend_test.num_items, ',')))

print('Explicit dataset (VALID) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_dev.ratings), ','),
          format(dataset_recommend_dev.num_users, ','),
          format(dataset_recommend_dev.num_items, ',')))

print('Explicit dataset (TRAIN) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_train.ratings), ','),
          format(dataset_recommend_train.num_users, ','),
          format(dataset_recommend_train.num_items, ',')))

print('Implicit dataset (READ/READING/TAG/COMMENT) contains %s interactions of %s users and %s items'%(
          format(len(dataset_implicit.ratings), ','),
          format(dataset_implicit.num_users, ','),
          format(dataset_implicit.num_items, ',')))

# train the explicit model based on recommend feedback
model = PopularityModel()
print('fit the model')
model.fit(interactions=dataset_recommend_train)

# evaluate the new model
print('evaluate the model')
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
# rmse = rmse_score(model=model, test=dataset_recommend_test, batch_size=512)
# print('-'*20)
# print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

Explicit dataset (TEST) contains 2,679 interactions of 2,115 users and 12,890 items
Explicit dataset (VALID) contains 2,680 interactions of 2,115 users and 12,890 items
Explicit dataset (TRAIN) contains 21,433 interactions of 2,115 users and 12,890 items
Implicit dataset (READ/READING/TAG/COMMENT) contains 211,194 interactions of 2,115 users and 12,890 items
fit the model
evaluate the model
MRR: 0.0458
nDCG: 0.0490
nDCG@10: 0.0397
nDCG@5: 0.0292
MAP: 0.0268
success@10: 0.1136
success@5: 0.0685
