Riiid! Answer Correctness Prediction

This is the repository for our final project for the Advanced Machine Learning 2020 course at Tsinghua University.

Repository Guide

├── Notebooks                       
│   ├── SAINT_training.ipynb   			# Training code for SAINT+
│   ├── SAKT_inference.ipynb   			# Inference code for SAKT
│   ├── SAINT_inference.ipynb   		# Inference code for SAINT
│   ├── SAINT_encoder_inference.ipynb   # Inference code for SAINT (encoder only)
├── Docs                				
│   ├── Poster               			# Poster for our project
│   ├── Report.pdf                		# Final report for our project

RIIID competition dataset

train.csv
1. row_id: unique id for the entry
2. timestamp: the time between user interaction and first event completion
3. user_id: the user related to this interaction.
4. content_id: id code for the user interaction (used for ref)
5. content_type_id: '0' for question, '1' for watching a lecture
6. task_container_id: id code for container of questions/lectures.
7. user_answer: could be 0-3, -1 is null for lectures
8. answered_correctly: could be '-1', '0', or '1', -1 is null for lectures
9. prior_question_elapsed_time: average time to complete the last container (without watching lectures)
10. prior_question_had_explanation: whether or not user saw the correct answers or an explanation to the last container.
questions.csv
1. question_id: corresponds to content_id for content_type_id == 0
2. bundle_id: the corresponding bundle
3. correct_answer
4. part: corresponds to TOEIC format
  1. 1-4 relates to listening tasks.
  2. 5-8 relates to reading tasks.
5. tags: can be one or more.
lectures.csv
1. lecture_id: corresponds to content_id for content_type_id == 1
2. tag: can only be one.
3. part: corresponds to TOEIC format
  1. 1-4 relates to listening tasks.
  2. 5-8 relates to reading tasks.
4. type_of: brief desc of the purpose of the lecture, could be 'concept', 'solving question', etc.
example_test.csv: similar to test.csv but has the two below columns as well
1. prior_group_responses: provides all of the user_answer entries in a string
2. prior_group_answers_correct: provides all of the user_answer entries in a string

Data Analysis

train.csv

There are 101,230,332 entries: 99,271,300 questions and 1,959,032 lectures.
There are 13,782 unique content IDs: 13,523 questions and 259 lectures.
There are 10,000 unqiue containers.
There are 393,656 unique users.
Timestamp is relative to each user.
useless columns: user_answer, row_id

Questions.csv

There are 13,523 entries.
There are 9,765 bundles, a bundle has at most 5 questions and at least 1 question.
There are 188 tags, one question without a tag.
There are 8 parts, 4 Listening and 4 Reading.
useless columns: correct_answer

Lectures.csv

There are 418 entries.
There are 151 different tags, a tag has at most 7 lectures and at least 1 lecture.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Docs		Docs
Notebooks		Notebooks
Papers.md		Papers.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs

Docs

Notebooks

Notebooks

Papers.md

Papers.md

README.md

README.md

Repository files navigation

Riiid! Answer Correctness Prediction

Repository Guide

RIIID competition dataset

Data Analysis

About

Releases

Packages

Contributors 2

Languages

Sahandfer/Riiid-Answer-Correctness-Prediction

Folders and files

Latest commit

History

Repository files navigation

Riiid! Answer Correctness Prediction

Repository Guide

RIIID competition dataset

Data Analysis

About

Topics

Resources

Stars

Watchers

Forks

Languages