# Item Response Theory with Expectation Maximization Optimization (EMIRT)

This notebook will show you how to train and use the EMIRT.
First, we will show how to get the data (here we use a0910 as the dataset).
Then we will show how to train a EMIRT and perform the parameters persistence.
At last, we will show how to load the parameters from the file and evaluate on the test dataset.

The script version could be found in [IRT.py](IRT.py)

In [1]:
# Load the data from files
import pandas as pd

train_data = pd.read_csv("../../../data/a0910/train.csv")
valid_data = pd.read_csv("../../../data/a0910/valid.csv")
test_data = pd.read_csv("../../../data/a0910/test.csv")

train_data.head(5)

Unnamed: 0,user_id,item_id,score
0,1615,12977,1
1,782,13124,0
2,1084,16475,0
3,593,8690,0
4,127,14225,1


In [2]:
len(train_data), len(valid_data), len(test_data)

(186049, 25606, 55760)

In [2]:
# nb d'étudiants
stu_num = max(max(train_data['user_id']), max(test_data['user_id']))

# nb d'items / de problèmes
prob_num = max(max(train_data['item_id']), max(test_data['item_id']))
print(stu_num, prob_num)

4128 17746


In [4]:
import numpy as np

# construction d'une matrice des scores du training set
R = -1 * np.ones(shape=(stu_num, prob_num))
R[train_data['user_id']-1, train_data['item_id']-1] = train_data['score']

# construction d'une liste des données du test set
test_set = []
for i in range(len(test_data)):
    row = test_data.iloc[i]
    test_set.append({'user_id':int(row['user_id'])-1, 'item_id':int(row['item_id'])-1, 'score':row['score']})

## Training and Persistence

In [5]:
import logging
import importlib

logging.getLogger().setLevel(logging.INFO)

#This module defines functions and classes which implement a flexible event logging system for applications and libraries.

In [6]:
import EduCDM
importlib.reload(EduCDM)
from EduCDM import EMIRT

cdm = EMIRT(R, stu_num, prob_num, dim=1, skip_value=-1) # Attention, ici dim =1 !!

cdm.train(lr=1e-3, epoch=2)
cdm.save("irt.params")

comment ça va ?


INFO:root:save parameters to irt.params


## Loading and Testing

In [11]:
cdm.load("irt.params")
rmse, mae = cdm.eval(test_set)
print("RMSE, MAE are %.6f, %.6f" % (rmse, mae))

INFO:root:load parameters from irt.params
evaluating: 100%|██████████| 55760/55760 [00:00<00:00, 1505186.61it/s]

RMSE, MAE are 0.451248, 0.382634





## Incremental Training

In [14]:
new_data = [{'user_id': 0, 'item_id': 2, 'score': 0.0}, {'user_id': 1, 'item_id': 1, 'score': 1.0}]
cdm.inc_train(new_data, lr=1e-3, epoch=2)

## Evaluate User's State

In [41]:
stu_rec = np.random.randint(-1, 2, size=prob_num)
dia_state = cdm.transform(stu_rec)  # shape = (stu_num, dim)
print("user's state is " + str(dia_state))

user's state is [[-5.18242253]]
