# Deterministic Inputs, Noisy “And” gate model (DINA)

This notebook will show you how to train and use the DINA.
First, we will show how to get the data (here we use Math1 from math2015 as the dataset).
Then we will show how to train a DINA and perform the parameters persistence.
At last, we will show how to load the parameters from the file and evaluate on the test dataset.

The script version could be found in [DINA.py](DINA.ipynb)

## Data Preparation

Before we process the data, we need to first acquire the dataset which is shown in [prepare_dataset.ipynb](prepare_dataset.ipynb)

In [7]:
# Load the data from files
# all index from 0
import numpy as np
import json

data_set_list=('FrcSub','Math1', 'Math2')
data_set_name=data_set_list[2]
if data_set_name == 'FrcSub':
    read_dir='../data/frcSub/'
    sub_prob_index=[]
elif data_set_name == 'Math1':
    read_dir='../data/math1/'
    sub_prob_index=np.loadtxt(read_dir+'sub_prob_index.csv')
elif data_set_name == 'Math2':
    read_dir='../data/math2/'
    sub_prob_index=np.loadtxt(read_dir+'sub_prob_index.csv')

q_m = np.loadtxt(read_dir+'q_m.csv',  delimiter=',')
prob_num, know_num = q_m.shape[0], q_m.shape[1]

# training data
with open(read_dir+'train.json', encoding='utf-8') as file:
    train_set = json.load(file)
stu_num = int(max([x['user_id'] for x in train_set]))+1
R = -1 * np.ones(shape=(stu_num, prob_num))
for log in train_set:
    R[int(log['user_id'])-1, int(log['item_id'])-1] = log['score']

# testing data
with open(read_dir+'test.json', encoding='utf-8') as file:
    test_set = json.load(file)

In [8]:
print(train_set[0], test_set[0])

{'user_id': 0, 'item_id': 0, 'score': 1.0} {'user_id': 0, 'item_id': 15, 'score': 0.0}


In [9]:
len(train_set), len(test_set)

(62576, 15644)

## Training and Persistence

In [10]:
import logging
logging.getLogger().setLevel(logging.INFO)

In [11]:
from DINA import DINA

cdm = DINA(R, q_m, stu_num, prob_num, know_num, skip_value=-1)

cdm.train(epoch=3, epsilon=1e-3)
cdm.save("dina.params")

training: 100%|██████████| 3/3 [05:00<00:00, 100.24s/it]
INFO:root:save parameters to dina.params


## Loading and Testing

In [12]:
cdm.load("dina.params")
print('data_set_name:',data_set_name)
if len(sub_prob_index)>0:
    (obj_acc,obj_auc,obj_rmse,obj_mae),(sub_rmse,sub_mae)=cdm.eval(test_set,sub_prob_index)
    print("obj_acc: %.6f,obj_auc: %.6f,obj_rmse: %.6f, obj_mae: %.6f,\nsub_rmse: %.6f, sub_mae: %.6f"% (
        obj_acc,obj_auc,obj_rmse,obj_mae,sub_rmse,sub_mae))
else:
    obj_acc,obj_auc,obj_rmse,obj_mae=cdm.eval(test_set,sub_prob_index)
    print("obj_acc: %.6f,obj_auc: %.6f,obj_rmse: %.6f, obj_mae: %.6f" % (
        obj_acc,obj_auc,obj_rmse,obj_mae))

INFO:root:load parameters from dina.params


data_set_name: Math2


evaluating: 100%|██████████| 15644/15644 [00:00<00:00, 412781.15it/s]

obj_acc: 0.569440,obj_auc: 0.585643,obj_rmse: 0.515686, obj_mae: 0.460801,
sub_rmse: 0.472127, sub_mae: 0.412716



