New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Case: HydroShare Resource Publication Example #11

Open
dtarb opened this Issue Jan 19, 2015 · 0 comments

Comments

Projects
None yet
3 participants
@dtarb

dtarb commented Jan 19, 2015

Use Case: HydroShare Resource Publication Example

Goals and Summary

HydroShare is a collaborative environment being developed for open sharing of hydrologic data and models (Tarboton et al., 2014a; 2014b; 2015). The goal is to enable scientists to load data and models into HydroShare, easily discover and access hydrologic data and models, retrieve them to their desktop, or perform analyses in a distributed computing environment that may include grid, cloud, or high performance computing model instances, and publish data and models as permanent digital objects supporting reproducible research.

This Collaborative Data Analysis and Publication is one use case driving the development of HydroShare (Figure 1). This extends existing Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) Hydrologic Information System (HIS) (Tarboton et al., 2009) data sharing functionality into a dynamic collaborative environment leading to the archival publication of data.

usecasefig
Figure 1. Collaborative data analysis and publication use case.

At (1) data are observed and then loaded (2). In the current CUAHSI Hydrologic Information System (HIS) data is loaded into an observations data model relational database on a server that publishes it using web services (Horsburgh et al., 2008; 2010). Metadata is harvested by the HIS Central catalog, and supports geographic and context based data discovery. A desktop client user (3) discovers, downloads and analyzes the data, or uses it in a model. Steps 1 to 3 are supported by the existing CUAHSI HIS. HydroShare picks up from here allowing the user to next post the results (data and model) to HydroShare as resources (4). HydroShare also supports direct entry of new resources in formats selected to be broadly useful (standard) to the hydrology community. The user shares posted resources with colleagues (5), designating who has permission to access the resources (Couch et al., 2015). A group collaborates on refining the analysis, model or result. HydroShare tracks provenance supporting reproducibility and transparency. After iteration, the result is finalized and submitted for publication (6). At this point the resources produced (data, model, workflow, paper) are made immutable, access is opened and permanent persistent identifiers (e.g., DOIs) are assigned (7).

Why is it important and to whom

  • The Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) is an organization representing 100+ universities and affiliated organizations, funded by the US National Science Foundation, to develop community infrastructure and services to advance hydrologic science. HydroShare is a software development research project to expand and enhance the data sharing capability available to the CUAHSI Hydrologic Science community. The CUAHSI water data center operates the CUAHSI data catalog and cloud based data services and will expand to include HydroShare upon completion of the HydroShare research project. Data and model publication is important to support reproducible hydrologic science, and mechanisms for citation are important to trace the evolution of ideas and provide credit. Hydrology faces the same issues as other sciences regarding reproducibility, attribution and credit.
  • Hydrology is a team sport. Advancing hydrologic understanding, and addressing the grand challenges at the center of hydrologic research, requires integration of information from multiple sources, is data and computationally intensive, and requires collaboration and working as a team/community. HydroShare is being developed as community cyberinfrastructure to support the CUAHSI hydrologic research community.
  • We need to move beyond time series (the focus of WaterML and the CUAHSI HIS) to more general hydrologic information systems that better support the data and models we use and the way we work. Data and models used by hydrologists are diverse. They include time series, geographic rasters, geographic features, multidimensional space-time data, model programs and model instances, among other things.
  • At its heart, HydroShare is a system for sharing resources and collaborating. Resources are shared and published using a Resource Data Model (Horsburgh et al., 2015) that draws upon the Open Archives Initiative's Object Reuse and Exchange standard (Lagoze et al., 2008). Resources are defined to be files and sets of files structured to represent a hydrologic process, model, or element in the hydrologic environment. We cast hydrologic datasets and models as "social objects" that can be published, collaborated around, annotated, discovered and accessed. This attracts users to the system and the opening of their data at the beginning rather than the end of a research project. Standard data models for resources enhance interoperability and enable tools (web apps) that operate on resources and provide value for a user. Metadata is automatically captured to the maximal extent possible, and easy to use interfaces allow users to add metadata while working with resources, transforming metadata gathering into a dynamic process whereby metadata grows with use. Also, value is added by taking advantage of social media functionality for users to comment on and rate resources.

Why has it not been solved yet

Challenges related to the publication, citation and re-use of data include:

  • Hosting the data and supporting its use in perpetuity. The HydroShare proposal suggested that the CUAHSI Water Data Center should be the permanent repository for published HydroShare resources, but uncertainty regarding ongoing long term funding for both CUAHSI and the development team places this at risk.
  • Data Formats and Usability. The heterogeneity of published data and models makes their re-use technically challenging. HydroShare is taking the approach of publishing data in a structured standard format with metadata following a formally specified resources data model for each resource type (Tarboton et al., 2014b; Horsburgh et al., 2015). HydroShare will also support tools that operate on these standard formats to enhance the ease with which published HydroShare resources can be re-used.
  • Citation metrics and getting credit. Cited data and models, even with DOI's, are generally not included in citation services, such as the science citation index or Google scholar. As such these products do not factor into citation metrics commonly used as academic measures of productivity.

Actionable Outcomes

  • HydroShare is under active ongoing development which includes the implementation of this and other use cases. HydroShare is following an open development model, using github for source code management, issue tracking and community participation. The HydroShare collaborative environment is also operational, with new functionality being added frequently.

Acknowledgements

This work was supported by the National Science Foundation under collaborative grants OCI-1148453 and OCI-1148090 for the development of HydroShare (http://www.hydroshare.org). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References

  • Couch, A., D. Tarboton, R. Idaszak, J. Horsburgh, H. Yi and M. Stealey, (2015), "A Flexible File Sharing Mechanism for iRODS," iRODS User Group Meeting, Chapel Hill, iRODS Consortium, 61-68, http://irods.org/wp-content/uploads/2015/09/UMG2015_P.pdf
  • Horsburgh, J. S., D. G. Tarboton, D. R. Maidment and I. Zaslavsky, (2008), "A Relational Model for Environmental and Water Resources Data," Water Resour. Res., 44: W05406, http://dx.doi.org/10.1029/2007WR006392.
  • Horsburgh, J. S., D. G. Tarboton, K. A. T. Schreuders, D. R. Maidment, I. Zaslavsky and D. Valentine, (2010), "Hydroserver: A Platform for Publishing Space-Time Hydrologic Datasets," 2010 AWRA Spring Specialty Conference Geographic Information Systems (GIS) and Water Resources VI, Orlando Florida, American Water Resources Association, Middleburg, Virginia, TPS-10-1, http://www.awra.org/tools/members/Proceedings/1003conference/doc/abs/JefferyHorsburgh_7cb420e3_6602.pdf.
  • Horsburgh, J. S., M. M. Morsy, A. M. Castronova, J. L. Goodall, T. Gan, H. Yi, M. J. Stealey and D. G. Tarboton, (2015), "Hydroshare: Sharing Diverse Environmental Data Types and Models as Social Objects with Application to the Hydrology Domain," JAWRA Journal of the American Water Resources Association, http://dx.doi.org/10.1111/1752-1688.12363, Link to open author preprint.
  • Lagoze, C., H. Van de Sompel, P. Johnston, M. Nelson, R. Sanderson and S. Warner, (2008), Open Archives Initiative Object Reuse and Exchange: ORE User Guide – Primer, http://www.openarchives.org/ore/1.0/primer, accessed 10/23/2012.
  • Tarboton, D. G., J. S. Horsburgh, D. R. Maidment, T. Whiteaker, I. Zaslavsky, M. Piasecki, J. Goodall, D. Valentine and T. Whitenack, (2009), "Development of a Community Hydrologic Information System," in R. S. Anderssen, R. D. Braddock and L. T. H. Newham (eds), 18th World IMACS Congress and MODSIM09 International Congress on Modelling and Simulation, July, Modelling and Simulation Society of Australia and New Zealand and International Association for Mathematics and Computers in Simulation, 988-994, http://www.mssanz.org.au/modsim09/C4/tarboton_C4.pdf.
  • Tarboton, D. G., R. Idaszak, J. S. Horsburgh, J. Heard, D. Ames, J. L. Goodall, L. Band, V. Merwade, A. Couch, J. Arrigo, R. Hooper, D. Valentine and D. Maidment, (2014a), "HydroShare: Advancing Collaboration through Hydrologic Data and Model Sharing," in D. P. Ames, N. W. T. Quinn and A. E. Rizzoli (eds), Proceedings of the 7th International Congress on Environmental Modelling and Software, San Diego, California, USA, International Environmental Modelling and Software Society (iEMSs), ISBN: 978-88-9035-744-2, http://www.iemss.org/sites/iemss2014/papers/iemss2014_submission_243.pdf.
  • Tarboton, D. G., R. Idaszak, J. S. Horsburgh, J. Heard, D. Ames, J. L. Goodall, L. E. Band, V. Merwade, A. Couch, J. Arrigo, R. Hooper, D. Valentine and D. Maidment, (2014), "A Resource Centric Approach for Advancing Collaboration Through Hydrologic Data and Model Sharing," 11th International Conference on Hydroinformatics, HIC 2014, New York City, USA, Paper 314, http://academicworks.cuny.edu/cc_conf_hic/314/.
  • Tarboton, D. G., R. Idaszak, J. S. Horsburgh, D. Ames, J. L. Goodall, L. E. Band, V. Merwade, A. Couch, R. Hooper, D. Valentine, D. Maidment, P. Dash, M. Stealey, H. Yi, T. Gan, T. Castronova, B. Miles, S. Livingston, C. Frisby (2015), "HydroShare: Advancing Hydrology through Collaborative Data and Model Sharing," iRODS User Group Meeting, Chapel Hill, https://irods.org/wp-content/uploads/2015/06/Tarboton-HydroShare.pdf.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment