Skip to content

CAHLR/skill-equivalency

Repository files navigation

Skill Equivalency Learning

This repo contains the datasets and code for the paper:

Li, Z., Ren, C., Li, X., & Pardos, Z.A. (2021) Learning Skill Transfer Models Across Systems. In N. Dowell, S. Joksimovic, M. Scheffel, & G. Siemens (Eds.) Proceedings of the 11th International Conference on Learning Analytics and Knowledge (LAK). ACM. Pages 354-363.

Files and folders

example.sh: two example commands to run experiments
main.py: the main method to run experiments
arguments.py: argument processing
translate.py: translation model learning
evaluate.py: evaluation logic
skill_representations: six models to represent skills as vectors
data: the datasets we release
output: the output of two example experiments by running example.sh

Datasets

We release some datasets we used in the paper at data folder. They are either public or created by us.
Khan Academy: exercise contents are web-scraped from Khan Academy and its github repo.
ASSISTments: clickstream data available on ASSISTmentsData, we further processed the clickstream data into skill sequences data.
Cognitive Tutor: clickstream data available on KDD Cup 2010, we used Algebra I 2008-2009 and further processed the clickstream data into skill sequences data.
Skill equivalency labels: we provide the equivalency labels between all three platforms annotated as described in the paper.

Instructions

Run experiments by calling main.py and passing correct parameters, for example:

python main.py --src-name cog \
               --dst-name assist \
               --src-repr-model skill2vec \
               --dst-repr-model skill2vec \
               --labels-path ./data/labels_cog2assist.csv \
               --src-sequences-path ./data/sequences_cog.json \
               --src-skill2id-path ./data/skill2id_cog.json \
               --dst-sequences-path ./data/sequences_assist.json \
               --dst-skill2id-path ./data/skill2id_assist.json

For each run, src-repr-model, dst-repr-model, labels-path are required. Besides, depending on the models used for representation, different input data are required. If the model needs content (BOW, TFIDF, content2vec, content2vec_skill2vec, TAMF), problems-path should be given; and if the model involves context (skill2vec, content2vec_skill2vec, TAMF), sequences-path and skill2id-path should be specified.

You can also set various hyperparameters for skill representations and translation model. See arguments.py for more details.

Input data format

Please note that the library requires specific input data format. You can find examples in the data folder.
Equivalency labels: csv file with two columns named "source" and "destination" respectively
Problem contents: tsv file with two columns named "skill" and "content". Each row is a problem, thus many rows can share the same the "skill".
Sequences: two json files, a "skill2id" file that maps each skill to an int id, and a "sequences" file that contains the actual skill sequences. This "skill2id" is meant to prevent the "sequences" file from being too large by replacing the skill name strings with a smaller-sized id.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages