In [1]:
import os
import pandas as pd

## Instacart include item add-order for next basket prediction

A new dataset was created through the provided preprocessing script with the 'add_to_cart_order' newly added.

The evaluation scores should be compared when the in cart order is used and when it is not used. 

Both models need to be trained since the preprocessing script did not result in the same dataset that was already provided, regardless of all attempts to make them match.

Prior optional steps: optional since resulting data files are present in new github repo
download original Instacart files and place in 'DataSource/instacart' folder
1. orders.csv
2. order_products__prior.csv
3. order_products__train.csv


create preprocessed files: already on github
```

!cd preprocess
!python ./instacart.py --use_item_order

```

result is two csv files in 'dataset/' folder
1. instacart_itemorder_future.csv
2. instacart_itemorder_history.csv

and 4 json files in 'jsondata/' folder
1. instacart_itemorder_future.json
2. instacart_itemorder_history.json
3. instacart_itemorder_orders_future.json
4. instacart_itemorder_orders_history.json

Take note that the filenames start with 'instacart_itemorder' which indicates the newly created dataset for the itemorder extension. This dataset is later used for training models, both with and without the itemorder embedding layer.

In [2]:
# importance of using original 'add_to_cart_order' column
# and not naively assuming the basket order in the existing data to be the add to cart order
def incorrect_order_percentage(csvfiles):
    correct_position = 0
    incorrect_position = 0
    # loop over both future and history file
    for file in csvfiles:
        df = pd.read_csv('dataset/'+file)
        for _, group in df.groupby(['user_id', 'order_number']):
            for i, item_pos in enumerate(group['add_to_cart_order'].astype(int)):
                if i+1 == item_pos:
                    correct_position += 1
                else:
                    incorrect_position += 1
    incorrect_percentage = (incorrect_position / (correct_position+ incorrect_position))*100
    return incorrect_percentage

history_file = 'instacart_itemorder_history.csv'
future_file = 'instacart_itemorder_future.csv'
incorrect_perc = incorrect_order_percentage([history_file, future_file])
print('The percentage of items that would have an incorrect order assigned if original "add_to_cart_order" column was not retrieved =', incorrect_perc)

The percentage of items that would have an incorrect order assigned if original "add_to_cart_order" column was not retrieved = 19.49978211675793


#### Step 1. Create new folds

In [3]:
#### Step 1. Create new folds
!python keyset_fold.py --dataset instacart_itemorder --fold_id 0 --use_item_order
!python keyset_fold.py --dataset instacart_itemorder --fold_id 1 --use_item_order
!python keyset_fold.py --dataset instacart_itemorder --fold_id 2 --use_item_order
!python keyset_fold.py --dataset instacart_itemorder --fold_id 3 --use_item_order
!python keyset_fold.py --dataset instacart_itemorder --fold_id 4 --use_item_order

{'item_num': 13897, 'max_addorder': 96, 'train': ['12157', '7622', '6262', '15046', '3664', '12314', '13355', '19409', '11290', '17874', '3799', '7258', '10031', '18691', '15214', '6561', '2746', '4999', '9354', '17485', '15986', '18337', '18252', '5825', '10532', '14643', '12193', '9220', '730', '8724', '15144', '582', '12496', '17132', '3876', '3681', '11320', '1623', '2827', '6032', '5190', '12497', '10666', '8401', '15503', '18614', '15960', '17062', '2672', '9868', '6137', '11330', '12590', '3115', '11220', '5187', '12439', '15172', '5265', '16391', '13884', '17471', '16502', '11594', '7697', '225', '7365', '11840', '16859', '8003', '17983', '3149', '10911', '6947', '15483', '7766', '1222', '8985', '968', '17617', '16511', '17648', '12689', '4493', '11055', '5022', '8363', '7453', '10414', '6769', '4319', '19049', '4556', '226', '8059', '3071', '10221', '7048', '8411', '7944', '5661', '9691', '13368', '14146', '5206', '1415', '12851', '11455', '11032', '10497', '13176', '1539', '7

#### Step 2. Training the models
---

In [4]:
os.chdir("./methods")

##### Train 5 folds with item order

In [None]:
!python dream/trainer.py --dataset instacart_itemorder --fold_id 0 --use_attention --use_item_order

In [53]:
!python dream/trainer.py --dataset instacart_itemorder --fold_id 1 --use_attention --use_item_order

../keyset/instacart_itemorder_keyset_1.json
Init training dataset
Init validation dataset
Init model
Device: cpu
Model store location: instacart_itemorder-recall20-0-1-attention-itemorder-Jun-29-2023_03-50-36.pth
Start valid: 0.0009359834366478026 {'recall10': 0.0006563786882907152, 'recall20': 0.0009359834366478026, 'recall40': 0.00179726118221879, 'ndcg5': 0.0005027740844525397, 'ndcg10': 0.0005895470967516303, 'ndcg20': 0.0007128480938263237, 'ndcg40': 0.0010627913288772106}
epoch 0 training [time: 167.39s, train loss:60.5613]
epoch 0 evaluating [time: 3.55s, valid_score (recall20): 0.103009]
valid result: 
recall10 : 0.07410857826471329    recall20 : 0.10300923883914948    recall40 : 0.14560405910015106    ndcg5 : 0.10943804681301117    ndcg10 : 0.09945566207170486    ndcg20 : 0.10146596282720566    ndcg40 : 0.11750207841396332    
Saving current best: models/instacart/instacart_itemorder-recall20-0-1-attention-itemorder-Jun-29-2023_03-50-36.pth
epoch 1 training [time: 160.79s, tra


  0%|          | 0/437 [00:00<?, ?it/s]
  0%|          | 1/437 [00:00<02:38,  2.75it/s]
  0%|          | 2/437 [00:00<02:48,  2.58it/s]
  1%|          | 3/437 [00:01<02:44,  2.64it/s]
  1%|          | 4/437 [00:01<03:04,  2.35it/s]
  1%|          | 5/437 [00:02<02:58,  2.42it/s]
  1%|▏         | 6/437 [00:02<03:01,  2.38it/s]
  2%|▏         | 7/437 [00:02<03:08,  2.28it/s]
  2%|▏         | 8/437 [00:03<03:34,  2.00it/s]
  2%|▏         | 9/437 [00:04<03:47,  1.88it/s]
  2%|▏         | 10/437 [00:04<03:39,  1.95it/s]
  3%|▎         | 11/437 [00:05<03:25,  2.07it/s]
  3%|▎         | 12/437 [00:05<03:26,  2.06it/s]
  3%|▎         | 13/437 [00:06<03:33,  1.98it/s]
  3%|▎         | 14/437 [00:07<04:41,  1.50it/s]
  3%|▎         | 15/437 [00:08<05:45,  1.22it/s]
  4%|▎         | 16/437 [00:09<06:43,  1.04it/s]
  4%|▍         | 17/437 [00:10<05:38,  1.24it/s]
  4%|▍         | 18/437 [00:10<04:48,  1.45it/s]
  4%|▍         | 19/437 [00:10<04:29,  1.55it/s]
  5%|▍         | 20/437 [00:11<04:07,

In [54]:
!python dream/trainer.py --dataset instacart_itemorder --fold_id 2 --use_attention --use_item_order

../keyset/instacart_itemorder_keyset_2.json
Init training dataset
Init validation dataset
Init model
Device: cpu
Model store location: instacart_itemorder-recall20-0-2-attention-itemorder-Jun-29-2023_04-11-30.pth
Start valid: 0.000994531437754631 {'recall10': 0.0005268194945529103, 'recall20': 0.000994531437754631, 'recall40': 0.0020498165395110846, 'ndcg5': 0.0003022913006134331, 'ndcg10': 0.000497611821629107, 'ndcg20': 0.0007274574018083513, 'ndcg40': 0.0011537561658769846}
epoch 0 training [time: 165.56s, train loss:60.6811]
epoch 0 evaluating [time: 3.69s, valid_score (recall20): 0.096090]
valid result: 
recall10 : 0.06670506298542023    recall20 : 0.09609038382768631    recall40 : 0.13635265827178955    ndcg5 : 0.10321035236120224    ndcg10 : 0.09217000752687454    ndcg20 : 0.09477505832910538    ndcg40 : 0.11043452471494675    
Saving current best: models/instacart/instacart_itemorder-recall20-0-2-attention-itemorder-Jun-29-2023_04-11-30.pth
epoch 1 training [time: 153.29s, trai


  0%|          | 0/437 [00:00<?, ?it/s]
  0%|          | 1/437 [00:00<02:31,  2.87it/s]
  0%|          | 2/437 [00:00<02:40,  2.71it/s]
  1%|          | 3/437 [00:01<02:41,  2.69it/s]
  1%|          | 4/437 [00:01<02:48,  2.57it/s]
  1%|          | 5/437 [00:01<02:45,  2.61it/s]
  1%|▏         | 6/437 [00:02<02:43,  2.63it/s]
  2%|▏         | 7/437 [00:02<02:43,  2.63it/s]
  2%|▏         | 8/437 [00:03<02:48,  2.54it/s]
  2%|▏         | 9/437 [00:03<02:38,  2.70it/s]
  2%|▏         | 10/437 [00:03<02:44,  2.60it/s]
  3%|▎         | 11/437 [00:04<02:49,  2.52it/s]
  3%|▎         | 12/437 [00:04<02:48,  2.53it/s]
  3%|▎         | 13/437 [00:04<02:41,  2.62it/s]
  3%|▎         | 14/437 [00:05<02:43,  2.58it/s]
  3%|▎         | 15/437 [00:05<02:45,  2.55it/s]
  4%|▎         | 16/437 [00:06<02:40,  2.62it/s]
  4%|▍         | 17/437 [00:06<02:34,  2.72it/s]
  4%|▍         | 18/437 [00:06<02:36,  2.68it/s]
  4%|▍         | 19/437 [00:07<02:32,  2.75it/s]
  5%|▍         | 20/437 [00:07<02:33,

In [55]:
!python dream/trainer.py --dataset instacart_itemorder --fold_id 3 --use_attention --use_item_order

../keyset/instacart_itemorder_keyset_3.json
Init training dataset
Init validation dataset
Init model
Device: cpu
Model store location: instacart_itemorder-recall20-0-3-attention-itemorder-Jun-29-2023_04-39-24.pth
Start valid: 0.0006741994875483215 {'recall10': 0.00044630104093812406, 'recall20': 0.0006741994875483215, 'recall40': 0.0020171517971903086, 'ndcg5': 0.00013752331142313778, 'ndcg10': 0.0003133654536213726, 'ndcg20': 0.0004023687506560236, 'ndcg40': 0.00092763063730672}
epoch 0 training [time: 170.48s, train loss:60.9876]
epoch 0 evaluating [time: 3.51s, valid_score (recall20): 0.099713]
valid result: 
recall10 : 0.07085257023572922    recall20 : 0.09971264004707336    recall40 : 0.14000153541564941    ndcg5 : 0.10840389132499695    ndcg10 : 0.09686875343322754    ndcg20 : 0.09930200129747391    ndcg40 : 0.11426171660423279    
Saving current best: models/instacart/instacart_itemorder-recall20-0-3-attention-itemorder-Jun-29-2023_04-39-24.pth
epoch 1 training [time: 169.88s, t


  0%|          | 0/437 [00:00<?, ?it/s]
  0%|          | 1/437 [00:00<03:05,  2.35it/s]
  0%|          | 2/437 [00:00<03:11,  2.27it/s]
  1%|          | 3/437 [00:01<03:01,  2.39it/s]
  1%|          | 4/437 [00:01<02:58,  2.43it/s]
  1%|          | 5/437 [00:02<02:51,  2.52it/s]
  1%|▏         | 6/437 [00:02<02:45,  2.60it/s]
  2%|▏         | 7/437 [00:02<03:00,  2.38it/s]
  2%|▏         | 8/437 [00:03<02:50,  2.51it/s]
  2%|▏         | 9/437 [00:03<02:43,  2.61it/s]
  2%|▏         | 10/437 [00:03<02:37,  2.72it/s]
  3%|▎         | 11/437 [00:04<02:33,  2.77it/s]
  3%|▎         | 12/437 [00:04<02:31,  2.80it/s]
  3%|▎         | 13/437 [00:04<02:28,  2.85it/s]
  3%|▎         | 14/437 [00:05<02:32,  2.78it/s]
  3%|▎         | 15/437 [00:05<02:34,  2.73it/s]
  4%|▎         | 16/437 [00:06<02:34,  2.72it/s]
  4%|▍         | 17/437 [00:06<02:29,  2.81it/s]
  4%|▍         | 18/437 [00:06<02:33,  2.73it/s]
  4%|▍         | 19/437 [00:07<02:36,  2.67it/s]
  5%|▍         | 20/437 [00:07<02:33,

In [56]:
!python dream/trainer.py --dataset instacart_itemorder --fold_id 4 --use_attention --use_item_order

../keyset/instacart_itemorder_keyset_4.json
Init training dataset
Init validation dataset
Init model
Device: cpu
Model store location: instacart_itemorder-recall20-0-4-attention-itemorder-Jun-29-2023_05-00-44.pth
Start valid: 0.0005994487437419593 {'recall10': 0.00026778061874210835, 'recall20': 0.0005994487437419593, 'recall40': 0.0012006836477667093, 'ndcg5': 0.00021796928194817156, 'ndcg10': 0.00026244448963552713, 'ndcg20': 0.0004299110150896013, 'ndcg40': 0.0006630430580116808}
epoch 0 training [time: 191.82s, train loss:60.7005]
epoch 0 evaluating [time: 3.78s, valid_score (recall20): 0.103235]
valid result: 
recall10 : 0.07262606173753738    recall20 : 0.10323484987020493    recall40 : 0.1449258178472519    ndcg5 : 0.1079384982585907    ndcg10 : 0.0994139164686203    ndcg20 : 0.10167975723743439    ndcg40 : 0.11812753975391388    
Saving current best: models/instacart/instacart_itemorder-recall20-0-4-attention-itemorder-Jun-29-2023_05-00-44.pth
epoch 1 training [time: 187.97s, t


  0%|          | 0/437 [00:00<?, ?it/s]
  0%|          | 1/437 [00:00<02:51,  2.54it/s]
  0%|          | 2/437 [00:00<02:51,  2.53it/s]
  1%|          | 3/437 [00:01<03:07,  2.31it/s]
  1%|          | 4/437 [00:01<03:09,  2.29it/s]
  1%|          | 5/437 [00:02<03:02,  2.36it/s]
  1%|▏         | 6/437 [00:02<02:54,  2.47it/s]
  2%|▏         | 7/437 [00:02<02:44,  2.61it/s]
  2%|▏         | 8/437 [00:03<02:47,  2.56it/s]
  2%|▏         | 9/437 [00:03<02:51,  2.50it/s]
  2%|▏         | 10/437 [00:04<02:56,  2.42it/s]
  3%|▎         | 11/437 [00:04<03:02,  2.34it/s]
  3%|▎         | 12/437 [00:04<02:53,  2.44it/s]
  3%|▎         | 13/437 [00:05<02:52,  2.45it/s]
  3%|▎         | 14/437 [00:05<02:58,  2.38it/s]
  3%|▎         | 15/437 [00:06<02:55,  2.40it/s]
  4%|▎         | 16/437 [00:06<02:51,  2.45it/s]
  4%|▍         | 17/437 [00:07<03:04,  2.27it/s]
  4%|▍         | 18/437 [00:07<03:16,  2.13it/s]
  4%|▍         | 19/437 [00:08<03:21,  2.08it/s]
  5%|▍         | 20/437 [00:08<02:59,

##### Train 5 folds without item order
Use same dataset as item_order since it is of different size but not use the argument '--use_item_order'

In [57]:
!python dream/trainer.py --dataset instacart_itemorder --fold_id 0 --use_attention

../keyset/instacart_itemorder_keyset_0.json
Init training dataset
Init validation dataset
Init model
Device: cpu
Model store location: instacart_itemorder-recall20-0-0-attention--Jun-29-2023_05-24-14.pth
Start valid: 0.0005408658180385828 {'recall10': 0.00016321867587976158, 'recall20': 0.0005408658180385828, 'recall40': 0.0014063701964914799, 'ndcg5': 0.0, 'ndcg10': 0.0001167338268714957, 'ndcg20': 0.0003427135234232992, 'ndcg40': 0.000724481069482863}
epoch 0 training [time: 167.55s, train loss:62.5063]
epoch 0 evaluating [time: 3.05s, valid_score (recall20): 0.099969]
valid result: 
recall10 : 0.07319536805152893    recall20 : 0.09996878355741501    recall40 : 0.14476479589939117    ndcg5 : 0.11584934592247009    ndcg10 : 0.10264968872070312    ndcg20 : 0.10257020592689514    ndcg40 : 0.11972079426050186    
Saving current best: models/instacart/instacart_itemorder-recall20-0-0-attention--Jun-29-2023_05-24-14.pth
epoch 1 training [time: 166.52s, train loss:32.8255]
epoch 1 evaluatin


  0%|          | 0/437 [00:00<?, ?it/s]
  0%|          | 1/437 [00:00<02:31,  2.88it/s]
  0%|          | 2/437 [00:00<02:52,  2.52it/s]
  1%|          | 3/437 [00:01<02:55,  2.48it/s]
  1%|          | 4/437 [00:01<02:36,  2.76it/s]
  1%|          | 5/437 [00:01<02:40,  2.69it/s]
  1%|▏         | 6/437 [00:02<02:46,  2.59it/s]
  2%|▏         | 7/437 [00:02<02:54,  2.46it/s]
  2%|▏         | 8/437 [00:03<02:43,  2.62it/s]
  2%|▏         | 9/437 [00:03<03:01,  2.35it/s]
  2%|▏         | 10/437 [00:03<02:46,  2.56it/s]
  3%|▎         | 11/437 [00:04<02:39,  2.68it/s]
  3%|▎         | 12/437 [00:04<02:51,  2.48it/s]
  3%|▎         | 13/437 [00:05<02:54,  2.43it/s]
  3%|▎         | 14/437 [00:05<02:54,  2.42it/s]
  3%|▎         | 15/437 [00:06<03:01,  2.33it/s]
  4%|▎         | 16/437 [00:06<02:53,  2.43it/s]
  4%|▍         | 17/437 [00:06<02:46,  2.53it/s]
  4%|▍         | 18/437 [00:07<02:40,  2.61it/s]
  4%|▍         | 19/437 [00:07<02:35,  2.68it/s]
  5%|▍         | 20/437 [00:07<02:36,

In [58]:
!python dream/trainer.py --dataset instacart_itemorder --fold_id 1 --use_attention

../keyset/instacart_itemorder_keyset_1.json
Init training dataset
Init validation dataset
Init model
Device: cpu
Model store location: instacart_itemorder-recall20-0-1-attention--Jun-29-2023_05-56-35.pth
Start valid: 0.0005422802059911191 {'recall10': 6.42673549009487e-05, 'recall20': 0.0005422802059911191, 'recall40': 0.0011260382598266006, 'ndcg5': 0.00013752331142313778, 'ndcg10': 8.924322901293635e-05, 'ndcg20': 0.00035734378616325557, 'ndcg40': 0.0006392095820046961}
epoch 0 training [time: 128.86s, train loss:63.0638]
epoch 0 evaluating [time: 2.85s, valid_score (recall20): 0.103594]
valid result: 
recall10 : 0.07328449934720993    recall20 : 0.10359368473291397    recall40 : 0.14623355865478516    ndcg5 : 0.10945820808410645    ndcg10 : 0.09938158094882965    ndcg20 : 0.10186122357845306    ndcg40 : 0.11783308535814285    
Saving current best: models/instacart/instacart_itemorder-recall20-0-1-attention--Jun-29-2023_05-56-35.pth
epoch 1 training [time: 144.43s, train loss:32.8023


  0%|          | 0/437 [00:00<?, ?it/s]
  0%|          | 1/437 [00:00<02:10,  3.33it/s]
  0%|          | 2/437 [00:00<02:10,  3.33it/s]
  1%|          | 3/437 [00:00<02:04,  3.48it/s]
  1%|          | 4/437 [00:01<02:18,  3.14it/s]
  1%|          | 5/437 [00:01<02:13,  3.24it/s]
  1%|▏         | 6/437 [00:01<02:14,  3.21it/s]
  2%|▏         | 7/437 [00:02<02:13,  3.23it/s]
  2%|▏         | 8/437 [00:02<02:05,  3.43it/s]
  2%|▏         | 9/437 [00:02<02:02,  3.49it/s]
  2%|▏         | 10/437 [00:03<02:07,  3.34it/s]
  3%|▎         | 11/437 [00:03<02:05,  3.39it/s]
  3%|▎         | 12/437 [00:03<02:09,  3.28it/s]
  3%|▎         | 13/437 [00:03<02:12,  3.21it/s]
  3%|▎         | 14/437 [00:04<02:06,  3.35it/s]
  3%|▎         | 15/437 [00:04<02:11,  3.22it/s]
  4%|▎         | 16/437 [00:04<02:17,  3.07it/s]
  4%|▍         | 17/437 [00:05<02:11,  3.18it/s]
  4%|▍         | 18/437 [00:05<02:06,  3.32it/s]
  4%|▍         | 19/437 [00:05<02:04,  3.36it/s]
  5%|▍         | 20/437 [00:06<02:03,

In [59]:
!python dream/trainer.py --dataset instacart_itemorder --fold_id 2 --use_attention

../keyset/instacart_itemorder_keyset_2.json


  0%|          | 0/437 [00:00<?, ?it/s]
  0%|          | 1/437 [00:00<02:14,  3.23it/s]
  0%|          | 2/437 [00:00<02:25,  3.00it/s]
  1%|          | 3/437 [00:00<02:16,  3.19it/s]
  1%|          | 4/437 [00:01<02:22,  3.04it/s]
  1%|          | 5/437 [00:01<02:20,  3.07it/s]
  1%|▏         | 6/437 [00:01<02:20,  3.06it/s]
  2%|▏         | 7/437 [00:02<02:20,  3.05it/s]
  2%|▏         | 8/437 [00:02<02:20,  3.05it/s]
  2%|▏         | 9/437 [00:02<02:14,  3.19it/s]
  2%|▏         | 10/437 [00:03<02:18,  3.07it/s]
  3%|▎         | 11/437 [00:03<02:20,  3.03it/s]
  3%|▎         | 12/437 [00:03<02:23,  2.97it/s]
  3%|▎         | 13/437 [00:04<02:16,  3.11it/s]
  3%|▎         | 14/437 [00:04<02:14,  3.15it/s]
  3%|▎         | 15/437 [00:04<02:15,  3.10it/s]
  4%|▎         | 16/437 [00:05<02:12,  3.19it/s]
  4%|▍         | 17/437 [00:05<02:08,  3.26it/s]
  4%|▍         | 18/437 [00:05<02:12,  3.16it/s]
  4%|▍         | 19/437 [00:06<02:07,  3.28it/s]
  5%|▍         | 20/437 [00:06<02:09,


Init training dataset
Init validation dataset
Init model
Device: cpu
Model store location: instacart_itemorder-recall20-0-2-attention--Jun-29-2023_06-15-11.pth
Start valid: 0.0010391965042799711 {'recall10': 0.0004482304211705923, 'recall20': 0.0010391965042799711, 'recall40': 0.0012969151139259338, 'ndcg5': 0.00010898464097408578, 'ndcg10': 0.0003250420850235969, 'ndcg20': 0.0006975951255299151, 'ndcg40': 0.0007863059872761369}
epoch 0 training [time: 139.46s, train loss:62.0687]
epoch 0 evaluating [time: 2.94s, valid_score (recall20): 0.096090]
valid result: 
recall10 : 0.06670506298542023    recall20 : 0.09609038382768631    recall40 : 0.13553638756275177    ndcg5 : 0.10321035236120224    ndcg10 : 0.09209811687469482    ndcg20 : 0.09471866488456726    ndcg40 : 0.11019783467054367    
Saving current best: models/instacart/instacart_itemorder-recall20-0-2-attention--Jun-29-2023_06-15-11.pth
epoch 1 training [time: 142.91s, train loss:32.8666]
epoch 1 evaluating [time: 2.92s, valid_sc

In [60]:
!python dream/trainer.py --dataset instacart_itemorder --fold_id 3 --use_attention


  0%|          | 0/437 [00:00<?, ?it/s]
  0%|          | 1/437 [00:00<02:23,  3.03it/s]
  0%|          | 2/437 [00:00<02:31,  2.88it/s]
  1%|          | 3/437 [00:01<02:25,  2.99it/s]
  1%|          | 4/437 [00:01<02:30,  2.88it/s]
  1%|          | 5/437 [00:01<02:21,  3.06it/s]
  1%|▏         | 6/437 [00:01<02:20,  3.07it/s]
  2%|▏         | 7/437 [00:02<02:34,  2.78it/s]
  2%|▏         | 8/437 [00:02<02:22,  3.00it/s]
  2%|▏         | 9/437 [00:02<02:18,  3.10it/s]
  2%|▏         | 10/437 [00:03<02:13,  3.20it/s]
  3%|▎         | 11/437 [00:03<02:07,  3.34it/s]
  3%|▎         | 12/437 [00:03<02:06,  3.35it/s]
  3%|▎         | 13/437 [00:04<02:06,  3.36it/s]
  3%|▎         | 14/437 [00:04<02:08,  3.30it/s]
  3%|▎         | 15/437 [00:04<02:11,  3.22it/s]
  4%|▎         | 16/437 [00:05<02:12,  3.18it/s]
  4%|▍         | 17/437 [00:05<02:09,  3.24it/s]
  4%|▍         | 18/437 [00:05<02:12,  3.17it/s]
  4%|▍         | 19/437 [00:06<02:13,  3.12it/s]
  5%|▍         | 20/437 [00:06<02:12,

../keyset/instacart_itemorder_keyset_3.json
Init training dataset
Init validation dataset
Init model
Device: cpu
Model store location: instacart_itemorder-recall20-0-3-attention--Jun-29-2023_06-33-02.pth
Start valid: 0.0015042972518131137 {'recall10': 0.0006888593779876828, 'recall20': 0.0015042972518131137, 'recall40': 0.002418856369331479, 'ndcg5': 0.0004057177866343409, 'ndcg10': 0.0006853094091638923, 'ndcg20': 0.0010079052299261093, 'ndcg40': 0.0013504477683454752}
epoch 0 training [time: 143.13s, train loss:62.7316]
epoch 0 evaluating [time: 2.79s, valid_score (recall20): 0.099277]
valid result: 
recall10 : 0.07085257023572922    recall20 : 0.09927666932344437    recall40 : 0.14200595021247864    ndcg5 : 0.10827571153640747    ndcg10 : 0.09672239422798157    ndcg20 : 0.09889645129442215    ndcg40 : 0.1147291287779808    
Saving current best: models/instacart/instacart_itemorder-recall20-0-3-attention--Jun-29-2023_06-33-02.pth
epoch 1 training [time: 145.31s, train loss:32.7880]
e


 94%|█████████▍| 412/437 [02:17<00:08,  3.09it/s]
 95%|█████████▍| 413/437 [02:17<00:07,  3.14it/s]
 95%|█████████▍| 414/437 [02:17<00:07,  3.24it/s]
 95%|█████████▍| 415/437 [02:18<00:06,  3.33it/s]
 95%|█████████▌| 416/437 [02:18<00:06,  3.12it/s]
 95%|█████████▌| 417/437 [02:18<00:06,  2.99it/s]
 96%|█████████▌| 418/437 [02:19<00:06,  3.09it/s]
 96%|█████████▌| 419/437 [02:19<00:05,  3.04it/s]
 96%|█████████▌| 420/437 [02:19<00:05,  2.93it/s]
 96%|█████████▋| 421/437 [02:20<00:05,  2.87it/s]
 97%|█████████▋| 422/437 [02:20<00:05,  2.83it/s]
 97%|█████████▋| 423/437 [02:20<00:05,  2.76it/s]
 97%|█████████▋| 424/437 [02:21<00:04,  2.96it/s]
 97%|█████████▋| 425/437 [02:21<00:03,  3.00it/s]
 97%|█████████▋| 426/437 [02:21<00:03,  2.91it/s]
 98%|█████████▊| 427/437 [02:22<00:03,  2.91it/s]
 98%|█████████▊| 428/437 [02:22<00:03,  2.96it/s]
 98%|█████████▊| 429/437 [02:22<00:02,  2.94it/s]
 98%|█████████▊| 430/437 [02:23<00:02,  2.93it/s]
 99%|█████████▊| 431/437 [02:23<00:01,  3.22it/s]

In [61]:
!python dream/trainer.py --dataset instacart_itemorder --fold_id 4 --use_attention

../keyset/instacart_itemorder_keyset_4.json
Init training dataset
Init validation dataset
Init model
Device: cpu
Model store location: instacart_itemorder-recall20-0-4-attention--Jun-29-2023_06-51-15.pth
Start valid: 0.0008187235798686743 {'recall10': 3.382492286618799e-05, 'recall20': 0.0008187235798686743, 'recall40': 0.0025447537191212177, 'ndcg5': 0.0, 'ndcg10': 5.0384493079036474e-05, 'ndcg20': 0.00037656095810234547, 'ndcg40': 0.0010309390490874648}
epoch 0 training [time: 142.11s, train loss:62.8706]
epoch 0 evaluating [time: 2.98s, valid_score (recall20): 0.102358]
valid result: 
recall10 : 0.07262606173753738    recall20 : 0.10235769301652908    recall40 : 0.1442832052707672    ndcg5 : 0.10754142701625824    ndcg10 : 0.09932277351617813    ndcg20 : 0.10141273587942123    ndcg40 : 0.11790873110294342    
Saving current best: models/instacart/instacart_itemorder-recall20-0-4-attention--Jun-29-2023_06-51-15.pth
epoch 1 training [time: 144.86s, train loss:32.8009]
epoch 1 evaluati


  0%|          | 0/437 [00:00<?, ?it/s]
  0%|          | 1/437 [00:00<02:07,  3.41it/s]
  0%|          | 2/437 [00:00<02:08,  3.39it/s]
  1%|          | 3/437 [00:00<02:18,  3.14it/s]
  1%|          | 4/437 [00:01<02:13,  3.24it/s]
  1%|          | 5/437 [00:01<02:12,  3.26it/s]
  1%|▏         | 6/437 [00:01<02:07,  3.38it/s]
  2%|▏         | 7/437 [00:02<02:09,  3.33it/s]
  2%|▏         | 8/437 [00:02<02:13,  3.21it/s]
  2%|▏         | 9/437 [00:02<02:12,  3.24it/s]
  2%|▏         | 10/437 [00:03<02:13,  3.19it/s]
  3%|▎         | 11/437 [00:03<02:12,  3.22it/s]
  3%|▎         | 12/437 [00:03<02:05,  3.38it/s]
  3%|▎         | 13/437 [00:03<02:04,  3.42it/s]
  3%|▎         | 14/437 [00:04<02:06,  3.33it/s]
  3%|▎         | 15/437 [00:04<02:05,  3.36it/s]
  4%|▎         | 16/437 [00:04<02:08,  3.28it/s]
  4%|▍         | 17/437 [00:05<02:14,  3.11it/s]
  4%|▍         | 18/437 [00:05<02:22,  2.94it/s]
  4%|▍         | 19/437 [00:05<02:23,  2.91it/s]
  5%|▍         | 20/437 [00:06<02:10,

#### Step 3. Making predictions
----

##### Models item order embeddings
- Include the model checkpoint to the models trained with item order dataset and item order embedding
- The usage of attention is deduced from the layers in the checkpoint which are required for attention
- The usage of the item order embedding layer is similarly set based on if the embedding layer is present in the loaded checkpoint

In [12]:
!python dream/pred_results.py --dataset instacart_itemorder --fold_id 0 --checkpoint_path models/instacart/instacart_itemorder-recall20-0-0-attention-itemorder-Jun-29-2023_07-14-04.pth
!python dream/pred_results.py --dataset instacart_itemorder --fold_id 1 --checkpoint_path models/instacart/instacart_itemorder-recall20-0-1-attention-itemorder-Jun-29-2023_03-50-36.pth 
!python dream/pred_results.py --dataset instacart_itemorder --fold_id 2 --checkpoint_path models/instacart/instacart_itemorder-recall20-0-2-attention-itemorder-Jun-29-2023_04-11-30.pth 
!python dream/pred_results.py --dataset instacart_itemorder --fold_id 3 --checkpoint_path models/instacart/instacart_itemorder-recall20-0-3-attention-itemorder-Jun-29-2023_04-39-24.pth 
!python dream/pred_results.py --dataset instacart_itemorder --fold_id 4 --checkpoint_path models/instacart/instacart_itemorder-recall20-0-4-attention-itemorder-Jun-29-2023_05-00-44.pth 

  File "c:\Users\Dieko\OneDrive - UvA\Studie\Recommender Systems\temploc\A-Next-Basket-Recommendation-Reality-Check\methods\dream\pred_results.py", line 68
    if not os.path.exists(f'pred/{dataset.split('_')[0]}/{dataset}'):
                                                 ^
SyntaxError: f-string: unmatched '('
  File "c:\Users\Dieko\OneDrive - UvA\Studie\Recommender Systems\temploc\A-Next-Basket-Recommendation-Reality-Check\methods\dream\pred_results.py", line 68
    if not os.path.exists(f'pred/{dataset.split('_')[0]}/{dataset}'):
                                                 ^
SyntaxError: f-string: unmatched '('
  File "c:\Users\Dieko\OneDrive - UvA\Studie\Recommender Systems\temploc\A-Next-Basket-Recommendation-Reality-Check\methods\dream\pred_results.py", line 68
    if not os.path.exists(f'pred/{dataset.split('_')[0]}/{dataset}'):
                                                 ^
SyntaxError: f-string: unmatched '('
  File "c:\Users\Dieko\OneDrive - UvA\Studie\Recommender S

##### Models without item order embeddings


In [4]:
os.chdir('methods')

In [12]:
!python dream/pred_results.py --dataset instacart_itemorder --fold_id 0 --checkpoint_path models/instacart/instacart_itemorder-recall20-0-0-attention--Jun-29-2023_05-24-14.pth
!python dream/pred_results.py --dataset instacart_itemorder --fold_id 1 --checkpoint_path models/instacart/instacart_itemorder-recall20-0-1-attention--Jun-29-2023_05-56-35.pth
!python dream/pred_results.py --dataset instacart_itemorder --fold_id 2 --checkpoint_path models/instacart/instacart_itemorder-recall20-0-2-attention--Jun-29-2023_06-15-11.pth
!python dream/pred_results.py --dataset instacart_itemorder --fold_id 3 --checkpoint_path models/instacart/instacart_itemorder-recall20-0-3-attention--Jun-29-2023_06-33-02.pth
!python dream/pred_resultsr.py --dataset instacart_itemorder --fold_id 4 --checkpoint_path models/instacart/instacart_itemorder-recall20-0-4-attention--Jun-29-2023_06-51-15.pth


../keyset/instacart_itemorder_keyset_0.json
Loading model structure and parameters from ['models/instacart/instacart_itemorder-recall20-0-0-attention--Jun-29-2023_05-24-14.pth']
^C
../keyset/instacart_itemorder_keyset_1.json
Loading model structure and parameters from ['models/instacart/instacart_itemorder-recall20-0-1-attention--Jun-29-2023_05-56-35.pth']
^C
../keyset/instacart_itemorder_keyset_2.json
Loading model structure and parameters from ['models/instacart/instacart_itemorder-recall20-0-2-attention--Jun-29-2023_06-15-11.pth']
^C


#### Step 4. Calculating performance results
---

In [11]:
os.chdir('../evaluation')

##### Get performance scores only on the instacart_itemorder dataset
1. use '--use_attention' and with and without '--use_item_order' arguments
2. use '--datasets instacart_itemorder

For finding the correct prediction files in the folder

In [12]:

!python model_performance.py --pred_folder ../methods/pred --fold_list [0,1,2,3,4] --datasets instacart_itemorder --use_attention --use_item_order

fold_id,    recall,    ndcg,    phr
0,    0.0727841720,     0.0830084087,    0.4615780005
1,    0.0699177016,     0.0797683420,    0.4525828836
2,    0.0743544337,     0.0832931354,    0.4577229504
3,    0.0739021389,     0.0825977840,    0.4644050373
4,    0.0759451047,     0.0838003653,    0.4592649704
average over folds:
basket size: 10
recall, ndcg, phr: 0.07338071016771039 0.08249360707304078 0.45911076843998966
repeat-explore ratio: 0.22764841942945263 0.7723515805705475
repeat-explore recall 0.10257629290097992 0.038228505970822176
repeat-explore phr: 0.4200849838112977 0.1556908882107235
fold_id,    recall,    ndcg,    phr
0,    0.1018950074,     0.0984167830,    0.5443330763
1,    0.0975921585,     0.0949455568,    0.5235158057
2,    0.1023777058,     0.0984676210,    0.5325109226
3,    0.1043261403,     0.0988882931,    0.5379079928
4,    0.1050892241,     0.0994207479,    0.5368799794
average over folds:
basket size: 20
recall, ndcg, phr: 0.10225604724895905 0.09802780035528

In [13]:

!python model_performance.py --pred_folder ../methods/pred --fold_list [0,1,2,3,4] --datasets instacart_itemorder --use_attention

fold_id,    recall,    ndcg,    phr
0,    0.0727841720,     0.0830016653,    0.4615780005
1,    0.0706802346,     0.0805088924,    0.4536108969
2,    0.0745417562,     0.0837136060,    0.4564379337
3,    0.0739021389,     0.0824604311,    0.4644050373
4,    0.0725379841,     0.0820824610,    0.4456437934
average over folds:
basket size: 10
recall, ndcg, phr: 0.07288925713128026 0.08235341116231823 0.4563351323567206
repeat-explore ratio: 0.22710357234644052 0.7728964276535595
repeat-explore recall 0.10173567855533794 0.0380592894219288
repeat-explore phr: 0.41731704982661755 0.15589681768364833
fold_id,    recall,    ndcg,    phr
0,    0.1029016088,     0.0985788344,    0.5438190696
1,    0.0973776039,     0.0950664514,    0.5206887690
2,    0.1017727129,     0.0984126732,    0.5337959393
3,    0.1033224480,     0.0982654476,    0.5353379594
4,    0.1050892241,     0.0993352644,    0.5368799794
average over folds:
basket size: 20
recall, ndcg, phr: 0.10209271955322882 0.097931734203807