Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.

Duolingo Shared Task on Second Language Acquisition Modeling

This repository contains code for running the 2nd place (Spanish-to-English) and 3rd place (English-to-Spanish and French-to-English) model in the Duolingo SLAM competition. The paper describing our approach can be found here.

Acquiring the data

Download from here and unzip in the "data" folder

Running the model

To preprocess the data, run on each data file. See the file's docstring for more details on getting google SyntaxNet set up. Then run to generate external word-frequency features.

The model can then be trained to produce predictions on the dev set using or on the test set using The language trained on (en_es, fr_en, es_en, or all) and the number of user trained on can be controlled using the --lang and --users flags.

Models trained on each individual language can be averaged with a model trained on all languages using the script.

Testing model lesions

To test the effects of removing different feature sets, first run to create a pickled version of the data and cut down on preprocessing time across different lesions. Then run, using the --lesion flag to choose the lesion experiment to conduct. See code or paper for list of options.

The results of the lesions can be plotted using graph_lesions.r (in R, not python).


2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM) (






No releases published


No packages published