Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.

Prov-a-thon: Practical Tools for Reproducible Science

Provenance information enables datasets that are linked to the software and analysis code that created them and used them in research. It allows users to trace new and ongoing uses of data, and provides rich information about the origins of data that ultimaltely supports reproducible research workflows. Prov-a-thon is a two-day workshop designed to advance practical approaches to incorporating provenance information into tools and workflows that are useful in earth, environmental, and archeological research domains.


  • Learn about Whole Tale/DataONE/other provenance tools and reproducibility
  • Add provenance data for rich datasets into DataONE
  • Build interest amongst data creators/submitters about adding provenance data
  • Organize efforts about reproducibility training and evangelization in archaeology and environmental science
  • Stimulate coordination of the development/use of provenance and reproducibility tools

Background reading

Lowndes, J. S. S., B. D. Best, C. Scarborough, J. C. Afflerbach, M. R. Frazier, C. C. O’Hara, N. Jiang, and B. S. Halpern. 2017. Our path to better science in less time using open data science tools. Nature Ecology & Evolution 1:0160.

Marwick, B. 2017. Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. Journal of Archaeological Method and Theory 24:424–450.

McPhillips, T., Song, T., Kolisnik, T., Aulenbach, S., Belhajjame, K., Bocinsky, R.K., Cao, Y., Cheney, J., Chirigati, F., Dey, S. and Freire, J., 2015. YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts. International Journal of Digital Curation, 10(1), pp.298-313.

Cao, Y., Jones, C., Cuevas-Vicenttín, V., Jones, M.B., Ludäscher, B., McPhillips, T., Missier, P., Schwalm, C., Slaughter, P., Vieglais, D. and Walker, L., 2016,June. DataONE: A Data Federation with Provenance Support. (extended preprint) In International Provenance and Annotation Workshop (pp. 230-234). Springer.

Ludäscher B, Chard K, Gaffney N, Jones M, Nabrzyski J, Stodden V, Turk M, Capturing the "Whole Tale" of Computational Research: Reproducibility in Computing Environments, Science Gateways Workshop, San Diego, 2016.

Resources links



Day 1: Thursday, August 31, 2017

0715 - 0800 Breakfast

0800 - 1000 Welcome and Overviews (Room: Tamaya ABC)

  • 0800 - 0825 Overview of DataONE (Bill Michener, DataONE)

  • 0825 - 0845 Overview of Provenance (Bertram Ludäscher, UIUC)

    • Different notions and uses of provenance, reproducibility
  • 0845 - 0945 Overview of the Status of Provenance Tools (Matt Jones, NCEAS)

  • 0945 - 1000 Goals of Prov-a-thon (Dave Vieglais, DataONE)

1000 - 1030 Break

1030 - 1200 Introductions and Lightning Talks (Room: Eagle AB)

  • 1030 - 1050 Around the room introductions (Amber Budden, DataONE)

  • 1050 - 1150 Lightning talks: Provenance and Reproducible Workflows (Kyle Bocinsky, Whole Tale)

    • Kyle Bocinsky, Ben Marwick, Paulina Przystupa, Matt Harris, Eric Kansa, Carl Boettiger, Emory Boose, Josh London, Jamie Afflerbach, Julie Lowndes, Peter Darch (confirmed)
  • 1150 - 1200 Agenda review (Matt Jones)

1200 - 1300 Lunch

  • poster session featuring summer internships related to provenance (DataONE, Whole-Tale)
    • Xiaoliang Jiang, Linh Hoang, Hui Lyu, Pratik Srivastava

1300 - 1445 Provenance Tools I (Room: Eagle AB)

  • 1300 - 1400 Intro to the DataONE R provenance tools (Matt)

    • R libraries: dataone, datapack, recordr
  • 1400 - 1445 Intro to YesWorkflow (Bertram)

    • YW modeling exercise (Bertram)

1445 - 1515 Break

1515 - 1700 Provenance Tools II (Room: Eagle AB)

  • 1515 - 1700 Intro to the Whole Tale web tool (Matt & Bertram)

    • Hands on with WT tool, including importing data from DataONE

Day 2: Friday, September 1, 2017

0715 - 0800 Breakfast

0800 - 1000 Breakout Groups: Archaeology (Room: Eagle A), Environmental Science (Room: Eagle B)

  • Environmental Science (Jones)

    • Breakout agenda planning

    • Hands on provenance metadata writing activities, troubleshooting, usability

    • Identify future development directions (DataONE/YW/WT/rrtools/others?)

    • Discussion of barriers to reproducibility in environmental sciences

    • Planning for advocacy for reproducible research approaches in environmental science

  • Archaeology (Bocinsky)

    • Hands-on with WT/rrtools/opencontext/dataone — Building tales

    • Discussion of barriers to reproducibility in archaeology (generalizable to other disciplines; ideas below)

      • Lack of training in computational methods/reproducibility

      • Persistence of data hoarding/siloing

      • Data sensitivity & archaeological looting

      • Few “sticks” from journals/funding agencies/professional societies

      • Few “carrots” from journals/peers/tenure committees/funding agencies

  • Archaeology Goals (Bocinsky):

    • Tool assessment/usability feedback (YW/WT)

    • Identify future development directions (DataONE/YW/WT/rrtools/others?)

    • Create provenance records (DataONE)

    • Identify ways to promote reproducibility in the communities

    • Identify next steps/plans for further collaboration

    • How To Do Archaeological Science Using R book status update (Ben/Matt/Paulina/Kyle)

    • Intro and feedback on rrtools package (Ben)

    • Plan further advocacy for reproducibility in Archaeology (ideas below)

      • SAA committees (publications/curriculum)

      • SAA Events/forums/workshops?

      • Collaborations with open journals?

        • White paper on data access and reproducibility for arch. journals?

1000 - 1030 Break

1030 - 1200 Continued Breakout Sessions

1200 - 1300 Lunch

1300 - 1445 Continued Breakout Sessions

1445 - 1515 Break

1515 - 1700 Plenary: Reproducibility and Provenance for Science

  • Report back from breakout groups (10 minutes each)

  • Moderated Discussion (Kyle & Matt):

    • Evangelism and advocacy for reproducible research in general

    • The tool landscape supporting reproducible research

  • Next steps

    • Roadmap for DataONE and Whole Tale tool development

      • How can the working group(s) help?
    • How to get buy-in from data contributors at the user level

    • How to build a community pulling towards the same reproducible research goals

    • What do we collectively want to do next

  • Conclusion, establish report back mechanism

Day 3: Saturday, September 2, 2017

0730 - 0800 Breakfast

0800 - 1100 Review, Reflections, Follow-up

1100 - 1200 Lunch (boxes)


Provathon: practical tools for reproducible science







No releases published


No packages published