# Weakly Supervised Recommendation Systems

Experiments steps:
 1. **User's Preferences Model**: Leverage the most *explicit* ratings to build a *rate/rank prediction model*. This is a simple *Explicit Matrix Factorization* model. 
 2. **Generate Weak DataSet**: Use the above model to *predict* for all user/item pairs $(u,i)$ in *implicit feedback dataset* to build a new *weak explicit dataset* $(u, i, r^*)$.
 3. **Evaluate**: Use the intact test split in the most explicit feedback, in order to evaluate the performance of any model.

## Explicit Model Experiments

This section contains all the experiments based on the explicit matrix factorization model.

### Explicit Rate Model

In [1]:
import utils
from spotlight.evaluation import rmse_score

dataset_recommend_train, dataset_recommend_test, dataset_recommend_dev, dataset_read, dataset_shelve = utils.parse_goodreads(
    path='/local/terrier/Collections/Recommendations/Goodreads/goodreads_interactions_comics_graphic.json.gz')

print('Explicit dataset (TEST) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_test.ratings), ','),
          format(dataset_recommend_test.num_users, ','),
          format(dataset_recommend_test.num_items, ',')))

print('Explicit dataset (VALID) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_dev.ratings), ','),
          format(dataset_recommend_dev.num_users, ','),
          format(dataset_recommend_dev.num_items, ',')))

print('Explicit dataset (TRAIN) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_train.ratings), ','),
          format(dataset_recommend_train.num_users, ','),
          format(dataset_recommend_train.num_items, ',')))

print('Implicit dataset (READ) contains %s interactions of %s users and %s items'%(
          format(len(dataset_read.ratings), ','),
          format(dataset_read.num_users, ','),
          format(dataset_read.num_items, ',')))

print('Implicit dataset (SHELVE) contains %s interactions of %s users and %s items'%(
          format(len(dataset_shelve.ratings), ','),
          format(dataset_shelve.num_users, ','),
          format(dataset_shelve.num_items, ',')))

# train the explicit model based on recommend feedback
model = utils.train_explicit(train_interactions=dataset_recommend_train, 
                             valid_interactions=dataset_recommend_dev,
                             run_name='model_goodreads_comics_explicit_rate')

# evaluate the new model
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
rmse = rmse_score(model=model, test=dataset_recommend_test)
print('-'*20)
print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

Explicit dataset (TEST) contains 412,459 interactions of 81,172 users and 88,857 items
Explicit dataset (VALID) contains 412,460 interactions of 81,172 users and 88,857 items
Explicit dataset (TRAIN) contains 3,299,673 interactions of 81,172 users and 88,857 items
Implicit dataset (READ) contains 4,293,460 interactions of 81,172 users and 88,857 items
Implicit dataset (SHELVE) contains 6,022,657 interactions of 81,172 users and 88,857 items
--------------------
RMSE: 0.3775
MRR: 0.0381
nDCG: 0.0443
nDCG@10: 0.0318
nDCG@5: 0.0202
MAP: 0.0189
success@10: 0.1239
success@5: 0.0653


### Remove all valid/test ratings

In [2]:
test_interact = set()
for (uid, iid) in zip(dataset_recommend_test.user_ids, dataset_recommend_test.item_ids):
    test_interact.add((uid, iid))

for (uid, iid) in zip(dataset_recommend_dev.user_ids, dataset_recommend_dev.item_ids):
    test_interact.add((uid, iid))

# clean implicit dataset from test/dev rating
for idx, (uid, iid, r) in enumerate(zip(dataset_read.user_ids, dataset_read.item_ids, dataset_read.ratings)):
    if (uid, iid) in test_interact:
        dataset_read.ratings[idx] = -1

### Explicit Read Model

Leverage the **explicit rate model** trained at the previous section to annotate **missing values** in the **read** dataset.

In [3]:
# annotate the missing values in the play dataset based on the explicit recommend model
dataset_read = utils.annotate(interactions=dataset_read, 
                              model=model, 
                              run_name='dataset_goodreads_comics_read_explicit_annotated')

# train the explicit model based on recommend feedback
model = utils.train_explicit(train_interactions=dataset_read, 
                             valid_interactions=dataset_recommend_dev,
                             run_name='model_goodreads_comics_explicit_read')

# evaluate the new model
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
rmse = rmse_score(model=model, test=dataset_recommend_test)
print('-'*20)
print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

epoch 1 start at:  Tue Apr 23 11:18:42 2019
epoch 1 end at:  Tue Apr 23 11:20:56 2019
RMSE: 0.3850
epoch 2 start at:  Tue Apr 23 11:21:01 2019
epoch 2 end at:  Tue Apr 23 11:23:16 2019
RMSE: 0.3836
epoch 3 start at:  Tue Apr 23 11:23:21 2019
epoch 3 end at:  Tue Apr 23 11:25:35 2019
RMSE: 0.3901
--------------------
RMSE: 0.3903
MRR: 0.0773
nDCG: 0.0663
nDCG@10: 0.0575
nDCG@5: 0.0487
MAP: 0.0386
success@10: 0.1709
success@5: 0.1219


## Implicit Model Experiments

This section contains all the experiments based on the implicit matrix factorization model.

### Implicit Model using Negative Sampling

In [3]:
import utils
from spotlight.evaluation import rmse_score

dataset_recommend_train, dataset_recommend_test, dataset_recommend_dev, dataset_read, dataset_shelve = utils.parse_goodreads(
    path='/local/terrier/Collections/Recommendations/Goodreads/goodreads_interactions_comics_graphic.json.gz')

print('Explicit dataset (TEST) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_test.ratings), ','),
          format(dataset_recommend_test.num_users, ','),
          format(dataset_recommend_test.num_items, ',')))

print('Explicit dataset (VALID) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_dev.ratings), ','),
          format(dataset_recommend_dev.num_users, ','),
          format(dataset_recommend_dev.num_items, ',')))

print('Explicit dataset (TRAIN) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_train.ratings), ','),
          format(dataset_recommend_train.num_users, ','),
          format(dataset_recommend_train.num_items, ',')))

print('Implicit dataset (READ) contains %s interactions of %s users and %s items'%(
          format(len(dataset_read.ratings), ','),
          format(dataset_read.num_users, ','),
          format(dataset_read.num_items, ',')))

print('Implicit dataset (SHELVE) contains %s interactions of %s users and %s items'%(
          format(len(dataset_shelve.ratings), ','),
          format(dataset_shelve.num_users, ','),
          format(dataset_shelve.num_items, ',')))

# train the explicit model based on recommend feedback
model = utils.train_implicit_negative_sampling(train_interactions=dataset_read, 
                                               valid_interactions=dataset_recommend_dev,
                                               run_name='model_goodreads_comics_implicit')

# evaluate the new model
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
rmse = rmse_score(model=model, test=dataset_recommend_test)
print('-'*20)
print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

Explicit dataset (TEST) contains 412,459 interactions of 81,172 users and 88,857 items
Explicit dataset (VALID) contains 412,460 interactions of 81,172 users and 88,857 items
Explicit dataset (TRAIN) contains 3,299,673 interactions of 81,172 users and 88,857 items
Implicit dataset (READ) contains 4,293,460 interactions of 81,172 users and 88,857 items
Implicit dataset (SHELVE) contains 6,022,657 interactions of 81,172 users and 88,857 items
epoch 1 start at:  Sat Apr 20 12:26:55 2019
epoch 1 end at:  Sat Apr 20 12:31:14 2019
MRR: 0.0666
epoch 2 start at:  Sat Apr 20 12:54:06 2019
epoch 2 end at:  Sat Apr 20 12:56:43 2019
MRR: 0.0666
epoch 3 start at:  Sat Apr 20 13:19:17 2019
epoch 3 end at:  Sat Apr 20 13:21:54 2019
MRR: 0.0644
--------------------
RMSE: 3.6611
MRR: 0.0637
nDCG: 0.0522
nDCG@10: 0.0412
nDCG@5: 0.0325
MAP: 0.0256
success@10: 0.1563
success@5: 0.0950


## Popularity

In [4]:
import utils
from spotlight.evaluation import rmse_score
from popularity import PopularityModel

dataset_recommend_train, dataset_recommend_test, dataset_recommend_dev, dataset_read, dataset_shelve = utils.parse_goodreads(
    path='/local/terrier/Collections/Recommendations/Goodreads/goodreads_interactions_comics_graphic.json.gz')

print('Explicit dataset (TEST) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_test.ratings), ','),
          format(dataset_recommend_test.num_users, ','),
          format(dataset_recommend_test.num_items, ',')))

print('Explicit dataset (VALID) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_dev.ratings), ','),
          format(dataset_recommend_dev.num_users, ','),
          format(dataset_recommend_dev.num_items, ',')))

print('Explicit dataset (TRAIN) contains %s interactions of %s users and %s items'%(
          format(len(dataset_recommend_train.ratings), ','),
          format(dataset_recommend_train.num_users, ','),
          format(dataset_recommend_train.num_items, ',')))

print('Implicit dataset (READ) contains %s interactions of %s users and %s items'%(
          format(len(dataset_read.ratings), ','),
          format(dataset_read.num_users, ','),
          format(dataset_read.num_items, ',')))

print('Implicit dataset (SHELVE) contains %s interactions of %s users and %s items'%(
          format(len(dataset_shelve.ratings), ','),
          format(dataset_shelve.num_users, ','),
          format(dataset_shelve.num_items, ',')))

# train the explicit model based on recommend feedback
model = PopularityModel()
print('fit the model')
model.fit(interactions=dataset_recommend_train)

# evaluate the new model
print('evaluate the model')
mrr, ndcg, ndcg10, ndcg_5, mmap, success_10, success_5 = utils.evaluate(interactions=dataset_recommend_test,
                                                                        model=model,
                                                                        topk=20)
# rmse = rmse_score(model=model, test=dataset_recommend_test, batch_size=512)
# print('-'*20)
# print('RMSE: {:.4f}'.format(rmse))
print('MRR: {:.4f}'.format(mrr))
print('nDCG: {:.4f}'.format(ndcg))
print('nDCG@10: {:.4f}'.format(ndcg10))
print('nDCG@5: {:.4f}'.format(ndcg_5))
print('MAP: {:.4f}'.format(mmap))
print('success@10: {:.4f}'.format(success_10))
print('success@5: {:.4f}'.format(success_5))

Explicit dataset (TEST) contains 412,459 interactions of 81,172 users and 88,857 items
Explicit dataset (VALID) contains 412,460 interactions of 81,172 users and 88,857 items
Explicit dataset (TRAIN) contains 3,299,673 interactions of 81,172 users and 88,857 items
Implicit dataset (READ) contains 4,293,460 interactions of 81,172 users and 88,857 items
Implicit dataset (SHELVE) contains 6,022,657 interactions of 81,172 users and 88,857 items
fit the model
evaluate the model
MRR: 0.0554
nDCG: 0.0458
nDCG@10: 0.0359
nDCG@5: 0.0291
MAP: 0.0233
success@10: 0.1279
success@5: 0.0814
