This repository contains the official implementation of the Cura system from paper "Cura: Curation at Social Media Scale".
The raw Reddit data and the checkpoint of the best model are available at: Google Drive.
Check out the code from our repository using
git clone https://github.com/Azure-Vision/Curation-Modeling.git
git checkout TransPlace .env.development in the root directory.
Install the dependencies using
conda env create -f environment.ymlThen activate the environment with
conda activate cr2Run the following script
python train.py CONFIG_PATHThe configurations used in the paper “Cura: Curation at Social Media Scale” can be found at configs/subreddit_minority_no_peer_new.yml, and the configurations for the online experiment that includes more subreddits can be found at configs/subreddit_minority_no_peer_more_subs.yml.
Evaluate the prediction accuracy and confidence of the curation model under different conditions: run test_model.ipynb.
Evaluate the change in prediction accuracy when the curation model receives more peer votes: run sim_new_votes.ipynb.
Perform curation on selected subreddit given selected curators: run curation.ipynb.
Luanch the interface for administrators to select curators and perform curation using
streamlit run curation_interface.pyCollect and preprocess posting and user data from Curio app: run process_CURIO_data.ipynb.
Finetune the pretrained curation model on Curio data using
cd trained_models; mkdir finetune_CURIO_full_data; mkdir deploy_CURIO_full_data; cp subreddit_minority_no_peer_new/latest.pt finetune_CURIO_full_data/latest.pt; cd ..; python train.py configs/finetune_CURIO_full_data.yml; cp trained_models/finetune_CURIO_full_data/best.pt trained_models/deploy_CURIO_full_data/best.ptLaunch the curation model backend for Curio with
uvicorn curation_backend:app --port 5000