<a href="https://colab.research.google.com/github/Nishthajoshi-ai/DeepKnowledgeTracing/blob/main/DKT_stable.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DKT (Stable) — Minimal Loss
The idea is to follow how students answer questions, estimate how strong they are in each skill, and then recommend a mock test that focuses on their weaker areas.

Goal: providing a stable Deep Knowledge Tracing baseline with:
- Safe hyperparameters (lower LR): Training settings like step size for learning.
- Gradient clipping: A safety limit so the computer doesn’t over-correct during training.
- Robust target building via `shift` (next-step labels)
- Output bias init to empirical priors: Starting guesses that match the average performance in the data.
- Optional class weighting
- Clear evaluation: pre/post loss and AUC

Implementing a small end-to-end simulation with  4 skills Ratios, Algebra, Geometry & Probability. Then, we simulate 120 learners who practice 50 questions each.
  * Some learners guess correctly even if they don’t know the skill (guess).
  * Some learners make mistakes even when they know it (slip).
  * When learners practice, their skill can improve a little (learning gain).

#Flow:
1. Skills/Q-matrix
2. Learner interaction sequences
3. Train a tiny DKT-like RNN (NumPy)
4. Predict per-skill mastery for a focal learner
5. Recommend a mock test based on weak skills.

#Notes:
- Simulatinng 4 skills and 24 items (each tagged to 1 primary skill for clarity).
- Interactions are generated for 120 learners with slip/guess + heterogeneous skill growth.

# Deep Knowledge Tracing (DKT) Simulation

This project implements a minimal Deep Knowledge Tracing (DKT) model using NumPy to simulate and track student skill mastery. The goal is to provide a clear, end-to-end example of a DKT-like model, including:

- **Synthetic Data Generation:** Creating simulated student interaction data based on defined skills, item difficulties, learner abilities, slip/guess probabilities, and learning gains.
- **Q-Matrix:** Defining the relationship between items and skills.
- **Sequence Building:** Transforming raw interaction data into sequences suitable for training a recurrent neural network (RNN).
- **NumPy RNN Model:** Implementing a simple RNN model from scratch with stable training techniques like lower learning rates and gradient clipping.
- **Training and Evaluation:** Training the DKT model on simulated data and evaluating its performance using metrics like Binary Cross-Entropy (BCE) loss and Area Under the ROC Curve (AUC).
- **Skill Mastery Prediction:** Using the trained model to predict per-skill mastery levels for a focal learner.
- **Mock Test Recommendation:** Recommending a personalized mock test based on the focal learner's predicted skill weaknesses.

This project serves as a baseline for understanding the core concepts of DKT and can be extended for more complex scenarios.

In [None]:
import numpy as np
import pandas as pd
from sklearn.metrics import roc_auc_score
import matplotlib.pyplot as plt

#Random number generator

rng = np.random.default_rng(7)

#Numpy print options

np.set_printoptions(precision=4, suppress=True)

# Synthetic data creation: Skills and Q-matrix

In [None]:
skills = ["Ratios", "Algebra", "Geometry", "Probability"]

n_skills = len(skills)

items = []

for s_idx, s in enumerate(skills):

  for i in range(6):

    items.append({
      "item_id": f"{s[:3].upper()}_{i+1}",
      "skill": s,
      "skill_id": s_idx,
      "difficulty": rng.normal(0, 0.6)
    })

item_df = pd.DataFrame(items)

q_matrix = pd.DataFrame([
  {"item_id": it["item_id"], **{sk: (1 if i==it["skill_id"] else 0) for i, sk in enumerate(skills)}}
  for it in items
])

n_learners = 120

steps_per_learner = 50

abilities = rng.normal(0.3, 0.6, size=(n_learners, n_skills))

p_slip = 0.08

p_guess = 0.18

learning_gain = 0.10

In [None]:
print(q_matrix)

   item_id  Ratios  Algebra  Geometry  Probability
0    RAT_1       1        0         0            0
1    RAT_2       1        0         0            0
2    RAT_3       1        0         0            0
3    RAT_4       1        0         0            0
4    RAT_5       1        0         0            0
5    RAT_6       1        0         0            0
6    ALG_1       0        1         0            0
7    ALG_2       0        1         0            0
8    ALG_3       0        1         0            0
9    ALG_4       0        1         0            0
10   ALG_5       0        1         0            0
11   ALG_6       0        1         0            0
12   GEO_1       0        0         1            0
13   GEO_2       0        0         1            0
14   GEO_3       0        0         1            0
15   GEO_4       0        0         1            0
16   GEO_5       0        0         1            0
17   GEO_6       0        0         1            0
18   PRO_1       0        0    

In [None]:
item_df

Unnamed: 0,item_id,skill,skill_id,difficulty
0,RAT_1,Ratios,0,0.000738
1,RAT_2,Ratios,0,0.179247
2,RAT_3,Ratios,0,-0.164483
3,RAT_4,Ratios,0,-0.534355
4,RAT_5,Ratios,0,-0.272802
5,RAT_6,Ratios,0,-0.594988
6,ALG_1,Algebra,1,0.036086
7,ALG_2,Algebra,1,0.804129
8,ALG_3,Algebra,1,-0.295324
9,ALG_4,Algebra,1,-0.372285


# Simulate learner interactions
Simulating learner interactions with items based on their skill abilities and item difficulties

In [None]:
# Mapping a real-valued input to a value between 0 and 1

def sigmoid(x):

  return 1.0 / (1.0 + np.exp(-x))

# Selection of random item from the item_df

def select_item_for_skill(skill_id):

  pool = item_df[item_df["skill_id"]==skill_id]

  return pool.sample(1, random_state=int(rng.integers(0, 1e9))).iloc[0]

rows = []

#Iterating learners

for L in range(n_learners):

  abil = abilities[L].copy()

  for t in range(steps_per_learner):

    #Identify skill where learner has lowest ability

    weak_skill = int(np.argmin(abil))

    #Predicting which skill the learner will attempt a item. 60% of the time, they attempt an item related to their weakest skill; otherwise, a random skill is chosen.

    skill_id = weak_skill if rng.uniform() < 0.6 else int(rng.integers(0, n_skills))

    row = select_item_for_skill(skill_id)

    # Difficulty of the selected item

    diff = float(row["difficulty"])

    #True probability of the learner answering correctly based on their ability in that skill and the item's difficulty

    p_true = sigmoid(abil[skill_id] - diff)

    # Observed probability of a correct response and incorporating simulated "slip"

    p_obs  = p_true*(1-p_slip) + (1-p_true)*p_guess

    #Determine if learner answered correctly

    correct = 1 if rng.uniform() < p_obs else 0

    rows.append({"learner_id": L, "t": t, "item_id": row["item_id"], "skill_id": skill_id, "correct": correct})

    # Update skill

    if correct:

      abil[skill_id] += learning_gain * (1 - sigmoid(abil[skill_id]))

inter_df = pd.DataFrame(rows).sort_values(["learner_id", "t"]).reset_index(drop=True)

print("Mean(correct):", inter_df["correct"].mean())


Mean(correct): 0.6106666666666667


In [None]:
inter_df

Unnamed: 0,learner_id,t,item_id,skill_id,correct
0,0,0,GEO_1,2,0
1,0,1,GEO_1,2,0
2,0,2,GEO_3,2,1
3,0,3,GEO_2,2,0
4,0,4,RAT_3,0,0
...,...,...,...,...,...
5995,119,45,RAT_5,0,0
5996,119,46,GEO_4,2,1
5997,119,47,ALG_3,1,0
5998,119,48,RAT_1,0,1


In [None]:
inter_df.to_csv("Attempted questionset.csv")

# Sequence builder (robust, shift-based next labels)
The model needs to know:
  * What the learner just did (skill, correct/wrong).
  * What the learner will face next (the next skill and whether they got it right).
It reshapes the data into input/output pairs so the model can practice predicting the next move.

Therefore, transforming the raw interaction data for a single learner into sequences of inputs (X), target outputs (Y), and masks (M) suitable for training a DKT model.
1. *Filtering and Sorting:* It first filters the inter_df to get only the interactions for the specified learner_id and sorts them by time (t).
2. *Creating Next-Step Labels:* It uses the .shift(-1) method to create two new columns: next_skill_id and next_correct. These columns contain the skill ID and correctness of the next interaction in the sequence. This is a crucial step for preparing the data for a sequential model like DKT, where the model learns to predict the outcome of the next interaction based on the current one.
3. *Handling Empty Sequences:* It checks if the resulting sequence is empty after dropping rows where the next step data is not available (the last interaction in a learner's sequence). If it's empty, it returns empty NumPy arrays.
4. *Building Input (X):* It creates a NumPy array X with dimensions (number of interactions in the sequence, 2 * number of skills). This array represents the input to the DKT model. For each interaction, it sets the value to 1.0 at the index corresponding to the current skill_id and correctness. The input size is 2 * n_skills because for each skill, there are two possible outcomes: correct (skill_id + n_skills) or incorrect (skill_id).
5. *Building Targets (Y) and Masks (M):* It creates two NumPy arrays, Y and M, both with dimensions (number of interactions in the sequence, number of skills).
    * Y represents the target output. For each interaction, it sets the value to the next_correctness at the index corresponding to the next_skill_id.
    * M is a mask. It sets the value to 1.0 at the index corresponding to the next_skill_id and 0.0 otherwise. This mask is used during training to ensure that the model's predictions are only compared to the actual outcome of the skill the learner interacted with in the next step.
6. *Returning Sequences:* The function returns the constructed X, Y, and M arrays.


In [None]:
def build_sequences(df, learner_id, n_skills):

  seq = (df[df['learner_id'] == learner_id]
              .sort_values(['t']) # Filtering and Sorting
              .assign(next_skill_id=lambda d: d['skill_id'].shift(-1), # Creating Next-Step skill_id
          next_correct =lambda d: d['correct'].shift(-1)) # Creating Next-Step correct
              .dropna(subset=['next_skill_id','next_correct'])) # Dropping when does not exist

  # Assign empty numpy array instead of blanks

  if seq.empty:

    return np.empty((0, 2*n_skills), dtype=np.float32), np.empty((0, n_skills), dtype=np.float32), np.empty((0, n_skills), dtype=np.float32)

  # Input sequence for DKT

  X = np.zeros((len(seq), 2*n_skills), dtype=np.float32)

  cur_k = seq['skill_id'].astype(int).to_numpy()

  cur_corr = seq['correct' ].astype(int).to_numpy()

  X[np.arange(len(seq)), cur_k + (cur_corr * n_skills)] = 1.0

  nxt_k = seq['next_skill_id'].astype(int).to_numpy()

  nxt_corr = seq['next_correct' ].astype(int).to_numpy()

  # Target & mask

  Y = np.zeros((len(seq), n_skills), dtype=np.float32)

  M = np.zeros((len(seq), n_skills), dtype=np.float32)

  Y[np.arange(len(seq)), nxt_k] = nxt_corr

  M[np.arange(len(seq)), nxt_k] = 1.0

  return X, Y, M

# Testing output for learner 0

X0, Y0, M0 = build_sequences(inter_df, learner_id=0, n_skills=n_skills)

# Maximum value along axis 1 (skill dimension)

idx = M0.argmax(1)

# Average correctness of learner 0 next interactions

print("Next-step positive rate (learner 0):", Y0[np.arange(len(Y0)), idx].mean())


Next-step positive rate (learner 0): 0.53061223


# DKT model (NumPy RNN) with safe defaults & gradient clipping
Core components of the Deep Knowledge Tracing (DKT) model using NumPy, including the model's architecture, activation functions, loss function, and gradient clipping.

RNN model is a small memory system. Each time a learner answers, the model updates its picture of their knowledge. If they answer right, the model raises its estimate of their skill. If wrong, it lowers it. Over time, the model learns patterns of strengths and weaknesses.

**Model Parameters:**
1. *hidden_size, input_size, output_size*: These variables define the dimensions of the different layers in the RNN.
2. *input_size is 2 * n_skills* because the input represents both correct and incorrect interactions for each skill. output_size is n_skills as the model predicts mastery for each skill.
3. *W_in, W_h, b_h:* These are the weight matrices and bias vector for the input-to-hidden and hidden-to-hidden transitions in the RNN layer. They are initialized with small random values.
4. *W_out, b_out*: These are the weight matrix and bias vector for the hidden-to-output layer. b_out will be initialized later based on skill priors.

**Activation Functions:**
1. *tanh(x):* The hyperbolic tangent function, used as the activation function in the hidden layer.
2. *dtanh(x):* The derivative of the tanh function, used during backpropagation.
3. *sigmoid_arr(x):* The sigmoid function, used to squash the output of the model to a range between 0 and 1, representing the probability of correctly answering an item for each skill.

**forward_sequence(X) Function:**

This function performs the forward pass of the RNN for a given sequence of inputs X.
1. It initializes the previous hidden state h_prev to zeros.
2. It iterates through each time step t in the input sequence X.
3. In each time step, it calculates the pre-activation value z, the hidden state h (using tanh), and the output y (using sigmoid_arr).
4. It stores the hidden states (hs), outputs (ys), and pre-activation values (zs) for each time step.
5. The current hidden state h becomes the h_prev for the next time step.
6. It returns the lists of hidden states, outputs, and pre-activation values.

**bce_loss(pred, target, mask, eps=1e-7) Function:**

This function calculates the binary cross-entropy loss between the predicted mastery levels (pred) and the true outcomes (target), using a mask to only consider the skills that were actually interacted with in the next step. eps is added to clip predictions to avoid log(0).

**clip_grad(G, c=5.0) Function:**

This function implements gradient clipping to prevent exploding gradients during training. If the L2 norm of the gradient G exceeds a threshold c (defaulting to 5.0), the gradient is scaled down proportionally.

In [None]:
#Model Parameters

hidden_size = 32
input_size  = 2 * n_skills
output_size = n_skills

W_in  = rng.normal(0, 0.05, size=(hidden_size, input_size))
W_h   = rng.normal(0, 0.05, size=(hidden_size, hidden_size))
b_h   = np.zeros((hidden_size,))
W_out = rng.normal(0, 0.05, size=(output_size, hidden_size))
b_out = np.zeros((output_size,))

#Activation Functions

def tanh(x): return np.tanh(x)

def dtanh(x): return 1 - np.tanh(x)**2

def sigmoid_arr(x): return 1.0 / (1.0 + np.exp(-x))

#Utility Functions

def forward_sequence(X):

  h_prev = np.zeros((hidden_size,))

  hs, ys, zs = [], [], []

  for x in X:

    z = W_in @ x + W_h @ h_prev + b_h

    h = tanh(z)

    y = sigmoid_arr(W_out @ h + b_out)

    hs.append(h); ys.append(y); zs.append(z)

    h_prev = h

  return hs, ys, zs

def bce_loss(pred, target, mask, eps=1e-7):

  pred = np.clip(pred, eps, 1-eps)

  num = np.sum(mask);

  if num < 1: return 0.0

  return -np.sum(mask * (target*np.log(pred) + (1-target)*np.log(1-pred))) / num

def clip_grad(G, c=5.0):

  n = np.linalg.norm(G)

  return G * (c / max(c, n)) if n > 0 else G


# Train split & bias init to empirical priors
Preparing the data for training and initializing the output layer bias of the DKT model based on empirical skill priors.

1. **Learners for Training and Validation:**

    * *focal_learner = 5:* Excluding a specific learner (learner_id = 5) as the focal learner for whom predictions and recommendations will be made later. The model learns general patterns of how skills improve or decline across other student interactions.
    * *all_learners:* Creates a list of all unique learner IDs in the inter_df, excluding the focal_learner.
    * *rng.shuffle(all_learners):* Randomly shuffles the list of remaining learners.
    * *val_learners = set(all_learners[:10]):* Selects the first 10 shuffled learners to be part of the validation set.
    * *train_learners = [L for L in all_learners if L not in val_learners]:* The remaining learners are assigned to the training set.

2. **compute_skill_priors(inter_df, learners) Function:**
Calculates the empirical prior probability of a correct answer for each skill based on the interactions of a given set of learners.
    * It initializes counts and ones arrays to store the total number of interactions and the number of correct interactions for each skill, respectively.
    * It iterates through the specified learners.
    * For each learner, it builds the sequences (X, Y, M) using the build_sequences function.
    * It extracts the skill IDs and correctness labels for the next step from the Y and M arrays.
    * It then iterates through each skill and counts the total number of interactions for that skill and the number of correct interactions.
    * *p_hat = (ones + 1.0) / (counts + 2.0):* Calculates the empirical prior probability of correctness for each skill. A Laplace smoothing of +1 correct and +1 incorrect is added to avoid probabilities of 0 or 1, providing a more stable estimate, especially for skills with few interactions.
    * It returns the array of skill priors (p_hat).

In [None]:
#Learners for Training and Validation

focal_learner = 5

#Excluding the focal learner

all_learners = [int(x) for x in inter_df['learner_id'].unique() if x != focal_learner]

rng.shuffle(all_learners)

val_learners = set(all_learners[:10])

train_learners = [L for L in all_learners if L not in val_learners]

def compute_skill_priors(inter_df, learners):

  counts = np.zeros(output_size, dtype=np.float64)

  ones   = np.zeros(output_size, dtype=np.float64)

  for L in learners:

    X, Y, M = build_sequences(inter_df, L, n_skills)

    if len(X)==0: continue

    idxs = M.argmax(1)

    labs = Y[np.arange(len(Y)), idxs]

    for k in range(output_size):

      sel = (idxs == k)

      counts[k] += sel.sum()

      if sel.any():

        ones[k] += labs[sel].sum()

  p_hat = (ones + 1.0) / (counts + 2.0)

  return p_hat

#Execution

p_hat = compute_skill_priors(inter_df, train_learners)

b_out = np.log(p_hat / (1.0 - p_hat))

W_out *= 0.05

#Display

print("Skill priors:", p_hat)


Skill priors: [0.6173 0.5548 0.6259 0.671 ]


# Training loop
Implementing the training loop for the DKT model using NumPy. Incorporating stable training techniques like a lower learning rate and gradient clipping.

1. **Initialization:**
    * *epochs = 40:* Number of training epochs.
    * *lr = 0.01:* Sets the learning rate for the gradient descent update.
    * *print_every = 1:* Determines how often to print training and validation metrics.
2. **train_epoch() Function:**
This function performs one full training epoch.
    * *global W_in, W_h, b_h, W_out, b_out:* Declares that the function will modify the global model parameters.
    * Initializes total_loss and steps to track the training loss.
    * *rng.shuffle(train_learners):* Randomly shuffles the order of training learners for each epoch.
    * It iterates through each learner in the train_learners list.
    * For each learner, it builds the sequences (X, Y, M) using build_sequences.
    * If the sequence is not empty, it performs a forward pass using forward_sequence to get the hidden states, outputs, and pre-activation values.
    * Initializes gradients (dW_in, dW_h, db_h, dW_out, db_out) to zeros.
    * Initializes dh_next to zeros, which is used to accumulate gradients flowing back from the next time step.
    * *Backpropagation Through Time (BPTT):* It iterates backward through the time steps of the sequence (for t in reversed(range(len(X))):).
      1. Calculates the error dy at the output layer using the difference between the prediction y, target target, and mask.
      2. Calculates gradients for W_out and b_out.
      3. Calculates the gradient for the hidden state dh, including the gradient from the next time step (dh_next).
      4. Calculates the gradient for the pre-activation dz using the derivative of the tanh activation (dtanh).
      5. Calculates gradients for W_in, W_h, and b_h.
      6. Updates dh_next for the next backward step.
      7. Accumulates the BCE loss for the current time step.
      8. Increments the steps counter.
3. **evaluate_mean_loss(learners) Function:** Calculates the average BCE loss for a given set of learners (either training or validation).
    1. It iterates through the specified learners.
    2. For each learner with a non-empty sequence, it performs a forward pass.
    3. It calculates and accumulates the BCE loss for each time step using bce_loss.
    4. Returns the average loss.
4. **next_step_auc(learners) Function:** Calculates the Area Under the ROC Curve (AUC) for the next-step predictions for a given set of learners.
    1. It iterates through the specified learners.
    2. For each learner with a non-empty sequence, it performs a forward pass.
    3. It extracts the true next-step outcomes (y_true) and the corresponding predicted probabilities (y_pred) based on the mask M.
    4. If there are at least two distinct classes in y_true (necessary for AUC calculation), it calculates and returns the roc_auc_score. Otherwise, it returns nan.
5. **Execution:**
    1. Print the mean loss before training.
    2. Enter the main training loop (for epoch in range(epochs):).
    3. In each epoch, call train_epoch() to perform the training step.
    4. If the epoch number is a multiple of print_every, calculate and print the training loss, validation loss, and validation AUC.
    5. After the training loop finishes, print the mean loss after training and the final AUC for both the training and validation sets.

Check the model before and after training. If it has learned, its loss (average error) goes down, and its accuracy (AUC) goes up.

In [None]:
#Initialization

epochs = 40
lr = 0.01
print_every = 1

#Training

def train_epoch():

  global W_in, W_h, b_h, W_out, b_out
  total_loss = 0.0
  steps = 0
  rng.shuffle(train_learners)

  #Sequencing

  for L in train_learners:

    X, Y, M = build_sequences(inter_df, L, n_skills)

    if len(X)==0:

      continue

    #Hidden states, outputs and pre-activation values.

    hs, ys, zs = forward_sequence(X)

    #Gradients

    dW_in = np.zeros_like(W_in); dW_h  = np.zeros_like(W_h); db_h  = np.zeros_like(b_h)
    dW_out= np.zeros_like(W_out); db_out = np.zeros_like(b_out)
    dh_next = np.zeros((hidden_size,))

    #Accumulate bce_loss

    for t in reversed(range(len(X))):

      y = ys[t]; h = hs[t]; z = zs[t]

      target = Y[t]; mask = M[t]

      dy = (y - target) * mask
      dW_out += np.outer(dy, h)
      db_out += dy
      dh = W_out.T @ dy + dh_next
      dz = dh * dtanh(z)

      dW_in += np.outer(dz, X[t])
      dW_h  += np.outer(dz, (hs[t-1] if t>0 else np.zeros_like(h)))
      db_h  += dz
      dh_next = W_h.T @ dz

      total_loss += bce_loss(y, target, mask)

      steps += 1

    dW_in  = clip_grad(dW_in, 5.0); dW_h  = clip_grad(dW_h, 5.0); db_h  = clip_grad(db_h, 5.0)
    dW_out = clip_grad(dW_out,5.0); db_out= clip_grad(db_out,5.0)

    W_in  -= lr * dW_in
    W_h   -= lr * dW_h
    b_h   -= lr * db_h
    W_out -= lr * dW_out
    b_out -= lr * db_out

  return total_loss / max(1, steps)

def evaluate_mean_loss(learners):

  tot = 0.0; steps = 0

  for L in learners:

    X, Y, M = build_sequences(inter_df, L, n_skills)

    if len(X)==0: continue

    _, ys, _ = forward_sequence(X)

    for t in range(len(X)):

      tot += bce_loss(ys[t], Y[t], M[t])

      steps += 1

  return tot / max(1, steps)

# Area Under the ROC Curve

def next_step_auc(learners):

  y_true, y_pred = [], []

  for L in learners:

    X, Y, M = build_sequences(inter_df, L, n_skills)

    if len(X)==0: continue

    _, ys, _ = forward_sequence(X)

    idx = M.argmax(1)

    #Next step outcomes

    y_true.extend(Y[np.arange(len(Y)), idx])

    #Probabilities

    y_pred.extend(ys[t][idx[t]] for t in range(len(ys)))

  if len(set(y_true)) < 2:

    return float('nan')

  return roc_auc_score(y_true, y_pred)

#Execution

pre_train_loss = evaluate_mean_loss(train_learners)

print("Pre-train mean loss:", pre_train_loss)

for epoch in range(epochs):

  mean_loss = train_epoch()

  if (epoch+1) % print_every == 0:

    val_loss = evaluate_mean_loss(list(val_learners))

    val_auc  = next_step_auc(list(val_learners))

    print(f"Epoch {epoch+1:02d}/{epochs} | train_mean_loss={mean_loss:.4f} | val_mean_loss={val_loss:.4f} | val_AUC={val_auc:.3f}")

post_train_loss = evaluate_mean_loss(train_learners)

#Display

print("Post-train mean loss:", post_train_loss)
print("Train AUC:", next_step_auc(train_learners))
print("Val   AUC:", next_step_auc(list(val_learners)))


Pre-train mean loss: 0.6615225999638981
Epoch 01/40 | train_mean_loss=0.6623 | val_mean_loss=0.6987 | val_AUC=0.497
Epoch 02/40 | train_mean_loss=0.6625 | val_mean_loss=0.6976 | val_AUC=0.499
Epoch 03/40 | train_mean_loss=0.6622 | val_mean_loss=0.6951 | val_AUC=0.515
Epoch 04/40 | train_mean_loss=0.6619 | val_mean_loss=0.6972 | val_AUC=0.524
Epoch 05/40 | train_mean_loss=0.6616 | val_mean_loss=0.6996 | val_AUC=0.499
Epoch 06/40 | train_mean_loss=0.6616 | val_mean_loss=0.7016 | val_AUC=0.501
Epoch 07/40 | train_mean_loss=0.6610 | val_mean_loss=0.6962 | val_AUC=0.516
Epoch 08/40 | train_mean_loss=0.6609 | val_mean_loss=0.7016 | val_AUC=0.508
Epoch 09/40 | train_mean_loss=0.6605 | val_mean_loss=0.7024 | val_AUC=0.498
Epoch 10/40 | train_mean_loss=0.6600 | val_mean_loss=0.7020 | val_AUC=0.489
Epoch 11/40 | train_mean_loss=0.6590 | val_mean_loss=0.7017 | val_AUC=0.502
Epoch 12/40 | train_mean_loss=0.6591 | val_mean_loss=0.7122 | val_AUC=0.469
Epoch 13/40 | train_mean_loss=0.6586 | val_mean_

# Mastery & mock test allocation
Now that the model has an idea of a learner’s skill calculating the predicted mastery for the focal learner and then using it to recommend a mock test tailored to weakest skills.

1. Predict mastery for the focal learner
2. Display predicted mastery
3. Mock test recommender function
4. Execute and display mock test allocation

Allocating more questions in weaker skills and fewer in stronger skills.

**Example:** If the model thinks a learner is weakest in Probability, the mock test will have more Probability questions.

In [None]:
#Input, target & sequence for learner ID 5

X_f, Y_f, M_f = build_sequences(inter_df, focal_learner, n_skills)

#Forward pass of the trained DKT mode

_, ys_f, _ = forward_sequence(X_f) #predicted probabilities of answering correctly

# last predicted mastery scores

mastery = pd.Series(ys_f[-1], index=skills).sort_values()

display(pd.DataFrame({"Skill": mastery.index, "Predicted mastery (P(correct next))": mastery.values.round(3)}))

def recommend_mock_test(mastery_series, total_q=20):

  # "inverse mastery" or "weakness" for each skill

  inv = 1.0 - mastery_series.values

  # Weights for each skill based on their inverse mastery. Skills with lower mastery will have higher weights.

  weights = inv / inv.sum()

  # base allocation of 2 questions for each skill to ensure all skills are represented

  base = np.array([2]*n_skills)

  # number of remaining questions to allocate

  remaining = total_q - base.sum()

  # Distributing the remaining questions using a multinomial distribution. More questions to weaker skills.

  alloc = base + np.random.multinomial(remaining, weights)

  selected = []

  #Iterating each skill and allocated questions

  for s_idx, n in enumerate(alloc):

    # Item selection

    pool = item_df[item_df["skill_id"]==s_idx].copy()

    # Prioritizing items with difficulty closer to 0

    pool["diff_abs"] = pool["difficulty"].abs()

    pool = pool.sort_values("diff_abs")

    chosen = list(pool.head(min(n, len(pool)))["item_id"])

    #If not enough unique items for a skill then duplicate existing items with a new version no.

    if len(chosen) < n and len(pool)>0:

      chosen += [f"{pool.iloc[i%len(pool)]['item_id']}_v{(i//len(pool))+1}" for i in range(n-len(pool))]

    selected += [(skills[s_idx], it) for it in chosen]

  return selected, dict(zip(skills, alloc))

# Excecuting the recommender function for a 20-questions set

selected_items, allocation = recommend_mock_test(mastery, total_q=20)

#Summarising the allocation per skill

alloc_table = pd.DataFrame({"Skill": list(allocation.keys()), "Items allocated": list(allocation.values())})

# Mapping each skill to the list of item IDs

items_by_skill = pd.DataFrame(selected_items, columns=["Skill","Item ID"]).groupby("Skill")["Item ID"].apply(lambda x: ", ".join(x)).reset_index()

# Merging allocation table with the items

alloc_table = alloc_table.merge(items_by_skill, on="Skill", how="left")

#Display

display(alloc_table)


Unnamed: 0,Skill,Predicted mastery (P(correct next))
0,Algebra,0.478
1,Ratios,0.633
2,Probability,0.701
3,Geometry,0.813


Unnamed: 0,Skill,Items allocated,Item ID
0,Ratios,8,"RAT_1, RAT_3, RAT_2, RAT_5, RAT_4, RAT_6, RAT_..."
1,Algebra,3,"ALG_1, ALG_6, ALG_5"
2,Geometry,5,"GEO_3, GEO_1, GEO_6, GEO_4, GEO_2"
3,Probability,4,"PRO_4, PRO_6, PRO_5, PRO_2"
