# Python implementation of Bayesian Knowledge Tracing (BKT)

Benjamin Xie
University of Washington
bxie@uw.edu

For Codeitz study

Modifications to BKT:
* KT-IDEM: Made guess and slip parameters part of item (not just part of skill/concept) so items are not homogenous (have "difficulty"). From Pardos & Heffernan 2011.
* Sequencing algorithm: Using algorithm to select items. From David, Avi, & Ya'Akov 2016.

## Terminology
* item/exercise: unit representing a task which user does and gets scored. Each item maps to exactly _one_ concept. Each item has 2 parameters: slip and guess probabilities.
* user: learner using Codeitz. A user attempts 0, 1, or many items. They have response data and probability of knowing a certain concept associated with them.
* concept/skill: unit representing latent construct that user could learn. one or many items map to a concept. Each concept has 2 parameters: inital learning and transfer probabilities.

## Notes
* Code documentation roughly follow [NumPy style](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html)

## References
* David, Yossi Ben, Avi Segal, and Ya’akov (kobi) Gal. 2016. “Sequencing Educational Content in Classrooms Using Bayesian Knowledge Tracing.” In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, 354–63. ACM.
* Pardos, Zachary A., and Neil T. Heffernan. 2011. “KT-IDEM: Introducing Item Difficulty to the Knowledge Tracing Model.” In User Modeling, Adaption and Personalization, 243–54. Springer Berlin Heidelberg.

In [4]:
import numpy as np
import pandas as pd

## Constants

In [75]:
# params
INIT = 'init'
TRANSFER = 'trasfer'
SLIP = 'slip'
GUESS = 'guess'

# concepts/skills
VAR = 'variable'
CONDITIONAL = 'conditional'

# terms
CONCEPT = 'concept'
EID = "eid"

## Example Data

In [113]:
# making example concepts
c_init = pd.Series([0.05, 0.2])
c_transfer = pd.Series([0.3, 0.4])
c_concepts = pd.Series([CONDITIONAL, VAR])

df_concepts = pd.DataFrame({CONCEPT: c_concepts, INIT:c_init, TRANSFER:c_transfer})
print('df_concepts')
print(df_concepts)

# making example item (exercise)
ex_ids = pd.Series(['if/else', 'read vars', 'write vars'])
ex_slips = pd.Series([0.2, 0.05, 0.1])
ex_guesses = pd.Series([0.05, 0.2, 0.1])
ex_concepts = pd.Series([CONDITIONAL, VAR, VAR])

# item 0 should be harder, 1 should be easier, 2 in the middle
df_items = pd.DataFrame({EID: ex_ids, SLIP:ex_slips, GUESS:ex_guesses, CONCEPT:ex_concepts})
print('\ndf_items')
print(df_items)

# making example learned
k_ids = pd.Series(['alex', 'sam'])
df_learn = pd.DataFrame(columns=['uid', 'concept', 'step', 'known'])
print('\ndf_learn')
print(df_learn)

# making example responses
opp_uids = pd.Series(['alex', 'sam', 'sam'])
opp_eids = pd.Series(['read vars', 'read vars', 'write vars'])
opp_step = pd.Series([1, 1, 2])
opp_correct = pd.Series([0, 1, 0])

df_opp = pd.DataFrame({'uid':opp_uids, 'eid': opp_eids, 'step': opp_step, 'correct': opp_correct})
print('\ndf_opp')
print(df_opp) # TODO: add timestamp

df_concepts
       concept  init  trasfer
0  conditional  0.05      0.3
1     variable  0.20      0.4

df_items
          eid  slip  guess      concept
0     if/else  0.20   0.05  conditional
1   read vars  0.05   0.20     variable
2  write vars  0.10   0.10     variable

df_learn
Empty DataFrame
Columns: [uid, concept, step, known]
Index: []

df_opp
    uid         eid  step  correct
0  alex   read vars     1        0
1   sam   read vars     1        1
2   sam  write vars     2        0


In [124]:
df_concepts
df_items
df_opp
df_learn

Unnamed: 0,uid,concept,step,known


## BKT functions w/ item difficulty

In [125]:
# done! (I think)
def posterior_pknown(is_correct, eid, transfer, item_params, prior_pknown):
    """
    updates BKT estimate of learner knowledge
    
    Parameters
    ----------
    result: boolean
        True if response was correct
    eid: String
        exercise ID
    item_params: pd.DataFrame
        slip and guess parameters for each item
    prior_pknown: float
        prior probability user learned this concept
    """
    if not eid in item_params[EID].unique():
        raise Exception('Given exercise ID not in response data. Return w/ no update. EID is {}'.format(eid))
        return prior_pknown

    posterior = -1.0
    slip = item_params[item_params[EID] == eid][SLIP]
    guess = item_params[item_params[EID] == eid][GUESS]
    
    if is_correct:
        posterior = (prior_pknown * (1.0 - slip)) / ((prior_pknown * (1 - slip)) + ((1.0-prior_pknown)*guess))
    else:
        posterior = (prior_pknown * slip) / ((prior_pknown * slip) + ((1.0-prior_pknown)*(1.0-guess)))
    
    return (posterior + (1.0-posterior) * transfer)

In [74]:
def pknown_seq(uid, concept, exercise_seq, df_opp, concept_params, item_params):
    """
    predict sequence of probability a concept is known after each step
    """
    # TODO: could filter to ensure all exercise_seq of same concept
    
    n_opps = len(exercise_seq)
    pk = pd.Series(np.zeros(n_opps)+1)
    pk[0] = concept_params[concept_params[CONCEPT] == concept][INIT]
    if(n_opps > 0):
        for(i in range(0:n_opps)):
            p[i+1] = posterior_pknown(df_opp[df_opp['uid']==uid]) # TODO: fill in
            print(-1) # TODO: remove

1    2.2
Name: init, dtype: float64

## Sandbox

In [117]:
df_opp

Unnamed: 0,uid,eid,step,correct
0,alex,read vars,1,0
1,sam,read vars,1,1
2,sam,write vars,2,0


In [108]:
result = 0
eid = 'read vars'
transfer = df_concepts[df_concepts[CONCEPT]==VAR][TRANSFER]
item_params = df_items
prior_pknown = df_concepts[df_concepts[CONCEPT]==VAR][INIT]

correct= result==1
posterior = pd.Series(np.zeros(1))
slip = item_params[item_params[EID] == eid][SLIP]
guess = item_params[item_params[EID] == eid][GUESS]

In [111]:
transfer

1    0.5
Name: trasfer, dtype: float64

In [132]:
new_prior = posterior_pknown(0, 'read vars', df_concepts[df_concepts[CONCEPT]==VAR][TRANSFER], df_items, df_concepts[df_concepts[CONCEPT]==VAR][INIT])
# new_prior_2 = posterior_pknown(0, 'read vars', df_concepts[df_concepts[CONCEPT]==VAR][TRANSFER], df_items, new_prior)
# posterior_pknown(0, 'read vars', df_concepts[df_concepts[CONCEPT]==VAR][TRANSFER], df_items, new_prior_2)

1    0.424899
dtype: float64

In [116]:
eid in item_params[EID].unique()

True