# Weakly Supervised Recommendation Systems

Experiments steps:
 1. **User's Preferences Model**: Leverage the most *explicit* ratings to build a *rate/rank prediction model*. This is a simple *Explicit Matrix Factorization* model. 
 2. **Generate Weak DataSet**: Use the above model to *predict* for all user/item pairs $(u,i)$ in *implicit feedback dataset* to build a new *weak explicit dataset* $(u, i, r^*)$.
 3. **Evaluate**: Use the intact test split in the most explicit feedback, in order to evaluate the performance of any model.

## Explicit Model Experiments

This section contains all the experiments based on the explicit matrix factorization model.

### Explicit Rate Model

In [1]:
import utils
from spotlight.evaluation import rmse_score

dataset_recommend_train, dataset_recommend_test, dataset_recommend_dev, dataset_read, dataset_wish = utils.parse_douban()

print('Explicit dataset (TEST) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_test.ratings), ','),
          format(dataset_recommend_test.num_users, ','),
          format(dataset_recommend_test.num_items, ',')))

print('Explicit dataset (VALID) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_dev.ratings), ','),
          format(dataset_recommend_dev.num_users, ','),
          format(dataset_recommend_dev.num_items, ',')))

print('Explicit dataset (TRAIN) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_train.ratings), ','),
          format(dataset_recommend_train.num_users, ','),
          format(dataset_recommend_train.num_items, ',')))

print('Implicit dataset (READ/READING/TAG/COMMENT) contains %s interactions of %s users and %s items'%(
          format(len(dataset_read.ratings), ','),
          format(dataset_read.num_users, ','),
          format(dataset_read.num_items, ',')))

print('Implicit dataset (WISH) contains %s interactions of %s users and %s items'%(
          format(len(dataset_wish.ratings), ','),
          format(dataset_wish.num_users, ','),
          format(dataset_wish.num_items, ',')))

# train the explicit model based on recommend feedback
model = utils.train_explicit(train_interactions=dataset_recommend_train, 
                             valid_interactions=dataset_recommend_dev,
                             run_name='model_douban_explicit_rate')

# evaluate the new model
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
rmse = rmse_score(model=model, test=dataset_recommend_test)
print('-'*20)
print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

Explicit dataset (TEST) contains 4,960 interactions of 3,886 users and 21,833 items
Explicit dataset (VALID) contains 4,960 interactions of 3,886 users and 21,833 items
Explicit dataset (TRAIN) contains 39,678 interactions of 3,886 users and 21,833 items
Implicit dataset (READ/READING/TAG/COMMENT) contains 428,597 interactions of 3,886 users and 21,833 items
Implicit dataset (WISH) contains 496,204 interactions of 3,886 users and 21,833 items
--------------------
RMSE: 0.3104
MRR: 0.0034
nDCG: 0.0059
nDCG@10: 0.0022
nDCG@5: 0.0009
MAP: 0.0018
success@10: 0.0095
success@5: 0.0027


## Remove valid/test ratings

In [2]:
test_interact = set()
for (uid, iid) in zip(dataset_recommend_test.user_ids, dataset_recommend_test.item_ids):
    test_interact.add((uid, iid))

for (uid, iid) in zip(dataset_recommend_dev.user_ids, dataset_recommend_dev.item_ids):
    test_interact.add((uid, iid))

# clean implicit dataset from test/dev rating
for idx, (uid, iid, r) in enumerate(zip(dataset_read.user_ids, dataset_read.item_ids, dataset_read.ratings)):
    if (uid, iid) in test_interact:
        dataset_read.ratings[idx] = -1

### Explicit Read/Reading/Tag/Comment Model

Leverage the **explicit rate model** trained at the previous section to annotate **missing values** in the **read/reading/tag/comment** dataset.

In [3]:
# annotate the missing values in the play dataset based on the explicit recommend model
dataset_read = utils.annotate(interactions=dataset_read, 
                              model=model, 
                              run_name='dataset_douban_read_explicit_annotated')

# train the explicit model based on recommend feedback
model = utils.train_explicit(train_interactions=dataset_read, 
                             valid_interactions=dataset_recommend_dev,
                             run_name='model_douban_explicit_read')

# evaluate the new model
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
rmse = rmse_score(model=model, test=dataset_recommend_test)
print('-'*20)
print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

epoch 1 start at:  Tue Apr 23 09:07:20 2019
epoch 1 end at:  Tue Apr 23 09:07:23 2019
RMSE: 0.3986
epoch 2 start at:  Tue Apr 23 09:07:24 2019
epoch 2 end at:  Tue Apr 23 09:07:26 2019
RMSE: 0.3733
epoch 3 start at:  Tue Apr 23 09:07:27 2019
epoch 3 end at:  Tue Apr 23 09:07:29 2019
RMSE: 0.3525
epoch 4 start at:  Tue Apr 23 09:07:30 2019
epoch 4 end at:  Tue Apr 23 09:07:32 2019
RMSE: 0.3427
epoch 5 start at:  Tue Apr 23 09:07:33 2019
epoch 5 end at:  Tue Apr 23 09:07:36 2019
RMSE: 0.3348
epoch 6 start at:  Tue Apr 23 09:07:36 2019
epoch 6 end at:  Tue Apr 23 09:07:39 2019
RMSE: 0.3357
--------------------
RMSE: 0.3271
MRR: 0.0223
nDCG: 0.0345
nDCG@10: 0.0207
nDCG@5: 0.0120
MAP: 0.0149
success@10: 0.0684
success@5: 0.0302


## Implicit Model Experiments

This section contains all the experiments based on the implicit matrix factorization model.

### Implicit Model using Negative Sampling

In [4]:
import utils
from spotlight.evaluation import rmse_score

dataset_recommend_train, dataset_recommend_test, dataset_recommend_dev, dataset_read, dataset_wish = utils.parse_douban()

print('Explicit dataset (TEST) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_test.ratings), ','),
          format(dataset_recommend_test.num_users, ','),
          format(dataset_recommend_test.num_items, ',')))

print('Explicit dataset (VALID) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_dev.ratings), ','),
          format(dataset_recommend_dev.num_users, ','),
          format(dataset_recommend_dev.num_items, ',')))

print('Explicit dataset (TRAIN) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_train.ratings), ','),
          format(dataset_recommend_train.num_users, ','),
          format(dataset_recommend_train.num_items, ',')))

print('Implicit dataset (READ/READING/TAG/COMMENT) contains %s interactions of %s users and %s items'%(
          format(len(dataset_read.ratings), ','),
          format(dataset_read.num_users, ','),
          format(dataset_read.num_items, ',')))

print('Implicit dataset (WISH) contains %s interactions of %s users and %s items'%(
          format(len(dataset_wish.ratings), ','),
          format(dataset_wish.num_users, ','),
          format(dataset_wish.num_items, ',')))

# train the explicit model based on recommend feedback
model = utils.train_implicit_negative_sampling(train_interactions=dataset_read, 
                                               valid_interactions=dataset_recommend_dev,
                                               run_name='model_douban_implicit_read2')

# evaluate the new model
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
rmse = rmse_score(model=model, test=dataset_recommend_test)
print('-'*20)
print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

Explicit dataset (TEST) contains 4,960 interactions of 3,886 users and 21,833 items
Explicit dataset (VALID) contains 4,960 interactions of 3,886 users and 21,833 items
Explicit dataset (TRAIN) contains 39,678 interactions of 3,886 users and 21,833 items
Implicit dataset (READ/READING/TAG/COMMENT) contains 428,597 interactions of 3,886 users and 21,833 items
Implicit dataset (WISH) contains 496,204 interactions of 3,886 users and 21,833 items
epoch 1 start at:  Sat Apr 20 11:12:30 2019
epoch 1 end at:  Sat Apr 20 11:12:33 2019
MRR: 0.0247
epoch 2 start at:  Sat Apr 20 11:12:45 2019
epoch 2 end at:  Sat Apr 20 11:12:48 2019
MRR: 0.0181
--------------------
RMSE: 4.2549
MRR: 0.0200
nDCG: 0.0239
nDCG@10: 0.0178
nDCG@5: 0.0127
MAP: 0.0119
success@10: 0.0543
success@5: 0.0317


## Popularity

In [5]:
import utils
from popularity import PopularityModel
from spotlight.evaluation import rmse_score

dataset_recommend_train, dataset_recommend_test, dataset_recommend_dev, dataset_read, dataset_wish = utils.parse_douban()

print('Explicit dataset (TEST) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_test.ratings), ','),
          format(dataset_recommend_test.num_users, ','),
          format(dataset_recommend_test.num_items, ',')))

print('Explicit dataset (VALID) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_dev.ratings), ','),
          format(dataset_recommend_dev.num_users, ','),
          format(dataset_recommend_dev.num_items, ',')))

print('Explicit dataset (TRAIN) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_train.ratings), ','),
          format(dataset_recommend_train.num_users, ','),
          format(dataset_recommend_train.num_items, ',')))

print('Implicit dataset (READ/READING/TAG/COMMENT) contains %s interactions of %s users and %s items'%(
          format(len(dataset_read.ratings), ','),
          format(dataset_read.num_users, ','),
          format(dataset_read.num_items, ',')))

print('Implicit dataset (WISH) contains %s interactions of %s users and %s items'%(
          format(len(dataset_wish.ratings), ','),
          format(dataset_wish.num_users, ','),
          format(dataset_wish.num_items, ',')))

# train the explicit model based on recommend feedback
model = PopularityModel()
print('fit the model')
model.fit(interactions=dataset_recommend_train)

# evaluate the new model
print('evaluate the model')
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
# rmse = rmse_score(model=model, test=dataset_recommend_test, batch_size=512)
# print('-'*20)
# print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

Explicit dataset (TEST) contains 4,960 interactions of 3,886 users and 21,833 items
Explicit dataset (VALID) contains 4,960 interactions of 3,886 users and 21,833 items
Explicit dataset (TRAIN) contains 39,678 interactions of 3,886 users and 21,833 items
Implicit dataset (READ/READING/TAG/COMMENT) contains 428,597 interactions of 3,886 users and 21,833 items
Implicit dataset (WISH) contains 496,204 interactions of 3,886 users and 21,833 items
fit the model
evaluate the model
MRR: 0.0215
nDCG: 0.0253
nDCG@10: 0.0191
nDCG@5: 0.0147
MAP: 0.0134
success@10: 0.0550
success@5: 0.0344
