Synerise at ACM Twitter RecSys Challenge 2021

Implementation of our 2nd place solution to Twitter RecSys Challenge 2021. The goal of the competition was to predict user engagement with 1 billion tweets selected by Twitter. An additional challenge was the test phase - the models were evaluated in a very constrained test environment - just 1 CPU core, with no GPUs available, and a time limit of 24h for all predictions which gives about 6ms per single tweet prediction.

The challenge focuses on a real-world task of tweet engagement prediction in a dynamic environment. It considers predicting four different engagement types: Likes, Retweet, Quote, and Replies.

Approach

Getting Started

Register and download training and validation set from competition webstie
Setup a configuration file config.yaml:
- working_dir - path where all preprocess files will be saved
- recsys_data - path to directory with uncompressed training data parts
- validation_part - path to validation uncompressed part
- max_n_parts - maximum number of training parts that are taken into training. Limit it for speedup training.
- max_n_parts_in_memory - number of training parts that are loaded into memory at the same time. Limiting it allows to limit RAM usage
- authors_similarity_top_N - denotes maximum number of similar users to current one
- authors_similarity_threshold - users similarity threshold
- validation_percentage - percentage of validation set used for testing, the other part is used for finetuning
- num_validation_chunks - number of validation chunks
- sketch_width - sketch width for tweet sketch
- sketch_depth - sketch depth for tweet sketch
Finetune DistilBERT and precompute token sketches

    python bert_finetuning.py

BERT checkpoints will be saved periodically. You can run sketch computation on any chosen checkpoint:

Prepare sketches of tokens from checkpoint model that was trained with the above script. Change /distilbert_checkpoints/checkpoint-1000 to the most updated model path.

    python prepare_token_embeddings.py --checkpoint-path ./distilbert_checkpoints/checkpoint-1000

Apply EMDE to compute sketches

    python emde.py

Steps 2 and 3 can be run simultaneously

Preprocess dataset and compute interactions of users:

    python interactions_extraction.py

Train model

    python train.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Synerise at ACM Twitter RecSys Challenge 2021

Approach

Getting Started

About

Releases

Packages

Languages

License

BaseModelAI/recsys-challenge-2021

Folders and files

Latest commit

History

Repository files navigation

Synerise at ACM Twitter RecSys Challenge 2021

Approach

Getting Started

About

Resources

License

Stars

Watchers

Forks

Languages