This is the repository for our final project for the Advanced Machine Learning 2020 course at Tsinghua University.
├── Notebooks
│ ├── SAINT_training.ipynb # Training code for SAINT+
│ ├── SAKT_inference.ipynb # Inference code for SAKT
│ ├── SAINT_inference.ipynb # Inference code for SAINT
│ ├── SAINT_encoder_inference.ipynb # Inference code for SAINT (encoder only)
├── Docs
│ ├── Poster # Poster for our project
│ ├── Report.pdf # Final report for our project
- train.csv
- row_id: unique id for the entry
- timestamp: the time between user interaction and first event completion
- user_id: the user related to this interaction.
- content_id: id code for the user interaction (used for ref)
- content_type_id: '0' for question, '1' for watching a lecture
- task_container_id: id code for container of questions/lectures.
- user_answer: could be 0-3, -1 is null for lectures
- answered_correctly: could be '-1', '0', or '1', -1 is null for lectures
- prior_question_elapsed_time: average time to complete the last container (without watching lectures)
- prior_question_had_explanation: whether or not user saw the correct answers or an explanation to the last container.
- questions.csv
- question_id: corresponds to content_id for content_type_id == 0
- bundle_id: the corresponding bundle
- correct_answer
- part: corresponds to TOEIC format
- 1-4 relates to listening tasks.
- 5-8 relates to reading tasks.
- tags: can be one or more.
- lectures.csv
- lecture_id: corresponds to content_id for content_type_id == 1
- tag: can only be one.
- part: corresponds to TOEIC format
- 1-4 relates to listening tasks.
- 5-8 relates to reading tasks.
- type_of: brief desc of the purpose of the lecture, could be 'concept', 'solving question', etc.
- example_test.csv: similar to test.csv but has the two below columns as well
- prior_group_responses: provides all of the user_answer entries in a string
- prior_group_answers_correct: provides all of the user_answer entries in a string
train.csv
- There are 101,230,332 entries: 99,271,300 questions and 1,959,032 lectures.
- There are 13,782 unique content IDs: 13,523 questions and 259 lectures.
- There are 10,000 unqiue containers.
- There are 393,656 unique users.
- Timestamp is relative to each user.
- useless columns: user_answer, row_id
Questions.csv
- There are 13,523 entries.
- There are 9,765 bundles, a bundle has at most 5 questions and at least 1 question.
- There are 188 tags, one question without a tag.
- There are 8 parts, 4 Listening and 4 Reading.
- useless columns: correct_answer
Lectures.csv
- There are 418 entries.
- There are 151 different tags, a tag has at most 7 lectures and at least 1 lecture.