The purpose of this repository is to store code used to perform data management tasks in the STAR and ALICE projects.
STAR and ALICE have spent a lot of money and time designing their data management architecture for the experiment data, down to the level of the micro-dst's but the local data analyses are completely ad hoc.
Provide a local light-weight data management infrastructure that allows analyses to be more seamlessly parallelized, and the results to be easily queried and shared. The infrastructure will be generally applicable to data analysis workflows in STAR, ALICE, and other Nuclear Physics projects.
- Automated distillation of metadata from analysis data files and allow user annotations where required
- Automated capture provenance of data analysis process
- Store intermediate data and final data products in a schema-less database
- Fast sub-setting of data and metadata for high-throughput parallel analyses, including use of MapReduce frameworks such as Hadoop.
- Dan Gunter
- Keith Beattie
- Jeff Porter
- Lavanya Ramakrishnan