Skip to content

Latest commit

 

History

History
138 lines (107 loc) · 15.2 KB

README.md

File metadata and controls

138 lines (107 loc) · 15.2 KB

fruits of provenance fruits of provenance

End-To-End-Provenance

Provenance (also known as pedigree or lineage) refers to the complete history of a document. In the scientific community, provenance refers to the information that describes data in sufficient detail to facilitate reproduction and enable validation of results. In the archival community, provenance refers to the chain of ownership and the transformations a document has undergone. However, in most computer systems today, provenance is an after-thought, implemented as an auxiliary indexing structure parallel to the actual data.

Our goal in this project is to design, build, and study an end-to-end system that extends all the way from original data analyses by real scientists to management and analysis of the resulting provenance in a common framework with common tools.

Software Tools

  • R Tools: Provenance tools for the R statistical language
  • Extended PROV-JSON: An extension of the W3C PROV-JSON standard for fine-grained provenance
  • CamFlow: A practical whole-system provenance capture mechanism for the Linux OS
  • Unicorn: An intrusion detection system
  • Xanthus: An automated reproducibility platform

Active Contributors

Past Contributors

  • Shay Adams, Mount Holyoke College
  • Orenna Brand, Columbia University
  • Vasco Carinhas, Universidad de Puerto Rico en Arecibo
  • Marios Dardas, Holy Cross College
  • Elizabeth Fong. Mount Holyoke College
  • Connor Gregorich-Trevor, Grinnell College
  • Nikki Hoffler, Mount Holyoke College
  • Jen Johnson, Middlebury College
  • Andy Kaldunski, Ripon College
  • Alex Liu, Amherst College
  • Khanh Ngo, Mount Holyoke College
  • Erick Oduniyi, University of Kansas
  • Miruna Oprescu, Harvard University
  • Huiyun Peng, Mount Holyoke College
  • Luis Perez, Harvard University
  • Lia Poulos, Mount Holyoke College
  • Moe Pwint Phyu, Mount Holyoke College
  • Garrett Rosenblatt, University of Rochester
  • Rose Sheehan, Mount Holyoke College
  • Sofiya Taskova, Mount Holyoke College
  • Cory Teshera-Sterne, Mount Holyoke Colege
  • Morgan Vigil, Westmont College
  • Mohammed Rahman, Indian Institutes of Technology
  • Yujia Zhou, Dickinson College

Publications

2022

  • Barbara Lerner, Emery Boose, Orenna Brand, Aaron M. Ellison, Elizabeth Fong, Matthew Lau, Khanh Ngo, Thomas Pasquier, Luis A. Perez, Margo Seltzer, Rose Sheehan and Joseph Wonsil , Making Provenance Work for You, The R Journal, Volume 14, Number 4, December 2022, pages 141-159.

2020

2019

  • Sheung Chi Chan, James Cheney, Pramod Bhatotia, Thomas Pasquier, Ashish Gehani, Hassaan Irshad, Lucian Carata, and Margo Seltzer. 2019. ProvMark: A provenance expressiveness benchmarking system. International Middleware Conference.
  • Thomas Pasquier, David Eyers, and Margo Seltzer. 2019. Visonpaper -- From Here to Provtopia. Towards Polystores that manage multiple Databases, Privacy, Security and/or Policy Issues for Heterogenous Data (POLY, co-located with VLDB 2019).

2018

2017

  • Emery Boose and Barbara Lerner. 2017. Replication of Data Analyses: Provenance in R. In Stepping in the Same River Twice: Replication in Biological Research, edited by A. Shavit and A. M. Ellison. Yale University Press.
  • Xueyuan Han, Thomas Pasquier, Tanvi Ranjan, Mark Goldstein, and Margo Seltzer. 2017. FRAPpuccino: Fault-detection through Runtime Analysis of Provenance. In Workshop on Hot Topics in Cloud Computing (HotCloud'17). USENIX.
  • Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, David Eyers, Margo Seltzer, and Jean Bacon. 2017. Practical Whole-System Provenance Capture. In Symposium on Cloud Computing (SoCC’17). ACM.
  • Thomas Pasquier, Matthew K. Lau, Ana Trisovic, Emery R. Boose, Ben Couturier, Mercè Crosas, Aaron M. Ellison, Valerie Gibson, Chris R. Jones, and Margo Seltzer. 2017. If these data could talk. Nature Scientific Data, 4.
  • Thomas Pasquier, Jatinder Singh, Julia Powles, David Eyers, Margo Seltzer, and Jean Bacon. 2017. Data provenance to audit compliance with privacy policy in the Internet of Things. Springer Personal and Ubiquitous Computing.

2015

2014

  • Lucian Carata, Sherif Akoush, Nikilesh Balakrishnan, Thomas Bytheway, Ripduman Sohan, Margo Seltzer, and Andy Hopper. 2014. A primer on provenance Communications of the ACM, 57, 5, Pp. 52–60.

2013

  • Michelle A Borkin, Chelsea S Yeh, Madelaine Boyd, Peter Macko, Krzysztof Z Gajos, Margo Seltzer, and Hanspeter Pfister. 2013. Evaluation of filesystem provenance visualization tools IEEE Transactions on Visualization and Computer Graphics, 19, 12, Pp. 2476–2485.
  • Peter Macko, Daniel Margo, and Margo Seltzer. 2013. Local clustering in provenance graphs In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, Pp. 835–840. ACM.

2011

2010

2009

  • Daniel W Margo and Margo I Seltzer. 2009. The Case for Browser Provenance In Workshop on the Theory and Practice of Provenance.
  • Kiran-Kumar Muniswamy-Reddy, Uri Braun, David A Holland, Peter Macko, Diana L MacLean, Daniel W Margo, Margo I Seltzer, and Robin Smogor. 2009. Layering in Provenance Systems In USENIX Annual technical conference.
  • Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo I Seltzer. 2009. Making a Cloud Provenance-Aware In Workshop on the Theory and Practice of Provenance.
  • James Cheney, Stephen Chong, Nate Foster, Margo Seltzer, and Stijn Vansummeren. 2009. Provenance: a future history In Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications, Pp. 957–964. ACM.

2008

2007

2006

  • Uri Braun, Simson Garfinkel, David A Holland, Kiran-Kumar Muniswamy-Reddy, and Margo I Seltzer. 2006. Issues in automatic provenance collection In International Provenance and Annotation Workshop, Pp. 171–183. Springer.
  • Kiran-Kumar Muniswamy-Reddy, David A Holland, Uri Braun, and Margo I Seltzer. 2006. Provenance-aware storage systems In USENIX Annual Technical Conference, General Track, Pp. 43–56.

2005

  • Jonathan Ledlie, Chaki Ng, and David A Holland. 2005. Provenance-aware sensor data storage In Data Engineering Workshops, 2005. 21st International Conference on, Pp. 1189–1189. IEEE.

Acknowledgments

R tool development started in 2013 at the Harvard Forest and was supported by NSF grants DEB-1237491, DBI-1459519, and SSI-1450277, a Charles Bullard Fellowship at Harvard University, and a faculty fellowship from Mount Holyoke College, and is a contribution from the Harvard Forest Long-Term Ecological Research (LTER) program. This project benefited greatly from earlier work by Lee Osterweil, Lori Clarke, and Alexander Wise at the University of Massachusetts Amherst.

CamFlow development started in 2014 at the University of Cambridge’s Opera Research Group under grant EPSRC EP/K011510/1. Later development was supported by Harvard University’s Center for Research on Computation and Society under NSF grant SSI-1450277. We also acknowledge the support of the University of Cambridge’s Digital Technology Group and the University of Bristol Cyber Security Group.

We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC).

Cette recherche a été financée par le Conseil de recherches en sciences naturelles et en génie du Canada (CRSNG).