# Item Response Ranking for DINA

This notebook will show you how to train and use the IRR-DINA.
Refer to [IRR doc](../../docs/IRR.md) for more details.
First, we will show how to get the data (here we use a0910 as the dataset).
Then we will show how to train a IRR-DINA and perform the parameters persistence.
At last, we will show how to load the parameters from the file and evaluate on the test dataset.

In [1]:
import logging
from longling.lib.structure import AttrDict
from longling import set_logging_info
from EduCDM.IRR import pair_etl as etl, point_etl as vt_etl, extract_item

set_logging_info()

params = AttrDict(
    batch_size=256,
    n_neg=10,
    n_imp=10,
    logger=logging.getLogger(),
    hyper_params={"user_num": 4164, "knowledge_num": 123}
)
item_knowledge = extract_item("../../data/a0910/item.csv", params["hyper_params"]["knowledge_num"], params)
train_data, train_df = etl("../../data/a0910/train.csv", item_knowledge, params)
valid_data, _ = vt_etl("../../data/a0910/valid.csv", item_knowledge, params)
test_data, _ = vt_etl("../../data/a0910/test.csv", item_knowledge, params)

train_data, valid_data, test_data

reading records from ../../data/a0910/item.csv: 100%|██████████| 19529/19529 [00:00<00:00, 55368.84it/s]
rating2triplet: 100%|██████████| 17051/17051 [00:15<00:00, 1107.24it/s]


(<longling.lib.iterator.AsyncLoopIter at 0x1fce3084dc0>,
 <torch.utils.data.dataloader.DataLoader at 0x1fcd3e2df10>,
 <torch.utils.data.dataloader.DataLoader at 0x1fce305e550>)

In [2]:
train_df

Unnamed: 0,user_id,item_id,score
0,1615,12977,1.0
1,782,13124,0.0
2,1084,16475,0.0
3,593,8690,0.0
4,127,14225,1.0
...,...,...,...
186044,2280,6019,0.0
186045,121,2,1.0
186046,601,5425,1.0
186047,573,2412,0.0


In [3]:
from EduCDM.IRR import DINA

cdm = DINA(
    4163 + 1,
    17746 + 1,
    123,
    ste=True
)
cdm.train(
    train_data,
    valid_data,
    epoch=2,
)
cdm.save("IRR-DINA.params")

Epoch 0: 727it [01:00, 12.00it/s]
evaluating: 100%|██████████| 101/101 [00:00<00:00, 151.21it/s]
formatting item df: 100%|██████████| 10415/10415 [00:00<00:00, 13243.30it/s]
ranking metrics: 10415it [00:14, 718.34it/s]
Epoch 1: 100%|██████████| 727/727 [01:05<00:00, 11.05it/s]
evaluating: 100%|██████████| 101/101 [00:00<00:00, 130.40it/s]
formatting item df: 100%|██████████| 10415/10415 [00:00<00:00, 11689.41it/s]
ranking metrics: 10415it [00:15, 683.86it/s]
INFO:root:save parameters to IRR-DINA.params


[Epoch 0] Loss: 2.625543, PointLoss: 0.766550, PairLoss: 4.484537
[Epoch 0]
      ndcg@k  precision@k  recall@k      f1@k     len@k  support@k
1   1.000000     0.695919  0.486584  0.540378  1.000000      10415
3   0.894090     0.678829  0.741237  0.689378  1.906961      10415
5   0.895159     0.675855  0.796159  0.713132  2.229573      10415
10  0.894894     0.674277  0.816401  0.720339  2.423428      10415
auc: 0.856217	map: 0.884234	mrr: 0.918452	coverage_error: 3.194929	ranking_loss: 0.406348	len: 2.458569	support: 10415
[Epoch 1] Loss: 2.555724, PointLoss: 0.735666, PairLoss: 4.375782
[Epoch 1]
      ndcg@k  precision@k  recall@k      f1@k     len@k  support@k
1   1.000000     0.697552  0.487173  0.541213  1.000000      10415
3   0.895660     0.680493  0.742846  0.690920  1.906961      10415
5   0.896209     0.676374  0.796789  0.713664  2.229573      10415
10  0.895947     0.674335  0.816513  0.720412  2.423428      10415
auc: 0.859627	map: 0.884662	mrr: 0.919965	coverage_error: 3

In [6]:
cdm.load("IRR-DINA.params")
print(cdm.eval(test_data))

INFO:root:load parameters from IRR-DINA.params
evaluating: 100%|██████████| 218/218 [00:01<00:00, 146.09it/s]
formatting item df: 100%|██████████| 13682/13682 [00:01<00:00, 10267.90it/s]
ranking metrics: 13682it [00:25, 538.15it/s]


      ndcg@k  precision@k  recall@k      f1@k     len@k  support@k
1   1.000000     0.694051  0.381306  0.448280  1.000000      13682
3   0.868723     0.680298  0.672331  0.643307  2.268528      13682
5   0.872068     0.673917  0.775802  0.695325  2.981582      13682
10  0.871886     0.669289  0.846420  0.725201  3.723652      13682
auc: 0.794464	map: 0.835652	mrr: 0.888232	coverage_error: 4.901166	ranking_loss: 0.456207	len: 4.075428	support: 13682
