New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A trusted software framework to support and enhance scientific discovery #16

u1848757 opened this Issue Jan 21, 2015 · 0 comments


None yet
3 participants

u1848757 commented Jan 21, 2015

Use Case: A trusted software framework to support and enhance scientific discovery

*Contribtuors: Lesley Wyborn, Ryan Fraser, Ben Evans, Lutz Gross, Jens Klump

Goals and Summary

A trusted software framework is proposed to enable reliable software to be discovered, accessed and then deployed on multiple hardware environments. More specifically, this framework will enable those who generate the software, and those who fund the development of software, to gain credit for the effort, IP, time and dollars spent, and facilitate quantification of the impact of individual codes. For scientific users, the framework delivers reviewed and benchmarked scientific software with mechanisms to reproduce results.

The trusted framework will have five separate, but connected components: Register, Review, Reference, Run, and Repeat.

  1. The Register component will facilitate discovery of relevant software from multiple open source code repositories.
  2. The Review component is targeting on the verification of the software typically against a set of benchmark cases.
  3. Referencing will be accomplished by linking the Software Framework to groups such as Figshare or ImpactStory
  4. The Run component will draw on information supplied in the registration process etc to instantiate the scientific code on the selected environment.
  5. The Repeat component will tap into existing Provenance Workflow engines that will automatically capture information that relate to a particular run of that software.

Why it is important and to whom?

Recent investments in HPC, cloud and Petascale data stores, have dramatically increased the scale and resolution that earth science challenges can now be tackled. These new infrastructures are highly parallelised and to fully utilise them and access the large volumes of earth science data now available, a new approach to software stack engineering needs to be developed. The size, complexity and cost of the new infrastructures mean any software deployed has to be reliable, trusted and reusable.
Increasingly software is available via open source repositories, but these usually only enable code to be discovered and downloaded. As a user it is hard for a scientist to judge the suitability and quality of individual codes: rarely is there information on how and where codes can be run, what the critical dependencies are, and in particular, on the version requirements and licensing of the underlying software stack.

It is important to scientists so that they can be rapidly guided into finding and then choosing the most effective code for the scientific problem they are trying to achieve. It is important to the funding agencies so that they can measure the impact of the codes that they are funding and judge whether they are worth maintaining. It is important to the programmers as it provides mechanisms for them to benchmark their codes. And for those running the major computational centers it would help them to discern which codes to trust.

Why hasn't it been solved yet?

Individual components have been built, but no one has put it together in total.

Actionable Outcomes

Additional Information and Links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment