Duolingo Shared Task on Second Language Acquisition Modeling

This repository contains code for running the 2nd place (Spanish-to-English) and 3rd place (English-to-Spanish and French-to-English) model in the Duolingo SLAM competition. The paper describing our approach can be found here.

Acquiring the data

Download from here and unzip in the "data" folder

Running the model

To preprocess the data, run reprocess_syntax.py on each data file. See the file's docstring for more details on getting google SyntaxNet set up. Then run translate_frequency.py to generate external word-frequency features.

The model can then be trained to produce predictions on the dev set using lightgbm_dev.py or on the test set using lightgbm_script.py. The language trained on (en_es, fr_en, es_en, or all) and the number of user trained on can be controlled using the --lang and --users flags.

Models trained on each individual language can be averaged with a model trained on all languages using the average_models.py script.

Testing model lesions

To test the effects of removing different feature sets, first run preprocess_to_pickle.py to create a pickled version of the data and cut down on preprocessing time across different lesions. Then run run_lesion.py, using the --lesion flag to choose the lesion experiment to conduct. See code or paper for list of options.

The results of the lesions can be plotted using graph_lesions.r (in R, not python).

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
data		data
doc		doc
notebooks		notebooks
starter_code		starter_code
.gitignore		.gitignore
README.md		README.md
average_models.py		average_models.py
feature_explorer.ipynb		feature_explorer.ipynb
graph_lesions.r		graph_lesions.r
lightgbm_dev.py		lightgbm_dev.py
lightgbm_script.py		lightgbm_script.py
model_script.py		model_script.py
preprocess_to_pickle.py		preprocess_to_pickle.py
processing.py		processing.py
reprocess_syntax.py		reprocess_syntax.py
run_lesion.py		run_lesion.py
translate_frequency.ipynb		translate_frequency.ipynb
translate_frequency.py		translate_frequency.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Duolingo Shared Task on Second Language Acquisition Modeling

Acquiring the data

Running the model

Testing model lesions

About

Releases

Packages

Contributors 5

Languages

NYUCCL/duolingoSLAM

Folders and files

Latest commit

History

Repository files navigation

Duolingo Shared Task on Second Language Acquisition Modeling

Acquiring the data

Running the model

Testing model lesions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages