SocialData is a collection of data science competition environments sourced from DrivenData. It contains 5 multi-turn sandboxed tasks where agents develop machine learning models to solve real-world prediction problems spanning public health, infrastructure, natural language processing, and disaster response.
- Data exploration and feature engineering
- Machine learning model development
- Time series prediction
- Multi-target classification and regression
- Document summarization with LLMs
Agents are given a sandboxed environment with 1 CPU and 4 GB RAM, with access to scientific Python libraries (pandas, scikit-learn, etc.).
There are 5 environment variants, each with a train split:
| Variant | Description | Metric |
|---|---|---|
| FluVaccinePrediction | Predict H1N1 and seasonal flu vaccination probabilities | Mean ROC AUC |
| PumpItUpPrediction | Classify water pump functionality in Tanzania | F1-micro |
| DocSumTask | Summarize social science research papers | ROUGE-2 F1 |
| DengAIPrediction | Predict weekly dengue fever case counts | Mean Absolute Error |
| RichterPrediction | Predict earthquake building damage grades | F1-micro |
This is a multi-turn environment. Agents explore data, develop models, generate predictions, and submit via the submit_predictions tool. Each variant uses its specific evaluation metric:
- FluVaccinePrediction: Mean ROC AUC across H1N1 and seasonal targets (0-1)
- PumpItUpPrediction: Micro-averaged F1 across 3 classes (0-1)
- DocSumTask: ROUGE-2 F1 score (0-1)
- DengAIPrediction: Inverted MAE (lower error = higher reward)
- RichterPrediction: Micro-averaged F1 across 3 damage grades (0-1)
Training data is mounted read-only at /orwd_data. Each competition includes:
- Training features and labels
- Test features (labels hidden)
- Data dictionaries and descriptions
Data is sourced from DrivenData competitions and stored on the OpenReward platform.
Each variant provides CLI tools plus a submission tool:
| Tool | Description |
|---|---|
bash |
Execute shell commands in the sandbox |
glob |
Find files by pattern |
grep |
Search file contents |
ls |
List directory contents |
read |
Read file contents |
write |
Write file contents |
edit |
Edit existing files |
multi_edit |
Make multiple edits |
todo_write |
Track task progress |
submit_predictions |
Submit predictions CSV for evaluation. Ends the episode. |
Multi-turn. Agents explore data, develop and train models, generate predictions, save to submission.csv, and submit for evaluation.
[Put environment difficulty here]
None. All evaluation is deterministic using competition-specific metrics.
Agents in SocialData work within sandboxed environments to develop ML models. The environment does not present direct safety risks.
@software{socialdata_openreward,
title={SocialData: DrivenData Competition Environments for OpenReward},
author={GeneralReasoning},
year={2025},
url={https://openreward.ai/GeneralReasoning/SocialData}
}