# Deterministic Inputs, Noisy “And” gate model (DINA)

This notebook will show you how to train and use the GDDINA.
First, we will show how to get the data (here we use a0910 as the dataset).
Then we will show how to train a DINA and perform the parameters persistence.
At last, we will show how to load the parameters from the file and evaluate on the test dataset.

The script version could be found in [DINA.py](DINA.ipynb)

## Data Preparation

Before we process the data, we need to first acquire the dataset which is shown in [prepare_dataset.ipynb](prepare_dataset.ipynb)

In [1]:
import pandas as pd

train_data = pd.read_csv("../../../data/a0910/train.csv")
valid_data = pd.read_csv("../../../data/a0910/valid.csv")
test_data = pd.read_csv("../../../data/a0910/test.csv")
item_data = pd.read_csv("../../../data/a0910/item.csv")

knowledge_num = 123


def code2vector(x):
    vector = [0] * knowledge_num
    for k in eval(x):
        vector[k - 1] = 1
    return vector


item_data["knowledge"] = item_data["knowledge_code"].apply(code2vector)
item_data.drop(columns=["knowledge_code"], inplace=True)

train_data = pd.merge(train_data, item_data, on="item_id")
valid_data = pd.merge(valid_data, item_data, on="item_id")
test_data = pd.merge(test_data, item_data, on="item_id")

train_data.head(5)


Unnamed: 0,user_id,item_id,score,knowledge
0,1615,12977,1,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
1,507,12977,0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
2,2724,12977,1,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
3,3804,12977,1,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
4,3881,12977,0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [2]:
len(train_data), len(valid_data), len(test_data)

(241071, 33131, 71907)

In [3]:
# Transform data to torch Dataloader (i.e., batchify)
# batch_size is set to 256
import torch
from torch.utils.data import TensorDataset, DataLoader

batch_size = 32

def transform(x, y, z, k, batch_size, **params):
    dataset = TensorDataset(
        torch.tensor(x, dtype=torch.int64),
        torch.tensor(y, dtype=torch.int64),
        torch.tensor(k, dtype=torch.float32),
        torch.tensor(z, dtype=torch.float32)
    )
    return DataLoader(dataset, batch_size=batch_size, **params)


train, valid, test = [
    transform(data["user_id"], data["item_id"], data["score"], data["knowledge"], batch_size)
    for data in [train_data, valid_data, test_data]
]
train, valid, test


(<torch.utils.data.dataloader.DataLoader at 0x20c1fbdc430>,
 <torch.utils.data.dataloader.DataLoader at 0x20c1fbdf040>,
 <torch.utils.data.dataloader.DataLoader at 0x20c1fbdf700>)

## Training and Persistence

In [4]:
import logging
logging.getLogger().setLevel(logging.INFO)

In [5]:
from EduCDM import GDDINA

cdm = GDDINA(4164, 17747, knowledge_num)

cdm.train(train, valid, epoch=2)
cdm.save("dina.params")

Epoch 0: 100%|██████████| 7534/7534 [00:54<00:00, 139.51it/s]
evaluating: 100%|██████████| 1036/1036 [00:00<00:00, 1287.18it/s]
Epoch 1: 100%|██████████| 7534/7534 [01:02<00:00, 120.28it/s]
evaluating: 100%|██████████| 1036/1036 [00:00<00:00, 1318.61it/s]
INFO:root:save parameters to dina.params


[Epoch 0] LogisticLoss: 0.705863
[Epoch 0] auc: 0.508466, accuracy: 0.495035
[Epoch 1] LogisticLoss: 0.702710
[Epoch 1] auc: 0.517560, accuracy: 0.504724


## Loading and Testing

In [6]:
cdm.load("dina.params")
auc, accuracy = cdm.eval(test)
print("auc: %.6f, accuracy: %.6f" % (auc, accuracy))

INFO:root:load parameters from dina.params
evaluating: 100%|██████████| 2248/2248 [00:01<00:00, 1301.36it/s]


auc: 0.523625, accuracy: 0.509630
