-
Notifications
You must be signed in to change notification settings - Fork 95
Description
With @JonasIsensee , @tamasgal and @sebastianpech we discussed that smaller groups of scientists may not find it sensible to opt for large data management software such as CaosDB. But it would still be great to have basic data provenance for forms out data outside .bson.
.bson and similar formats are covered satisfactorily by DrWatson due to the automatic adding of git info, and the automatic adding of source file that generated them. This is not possible for e.g. figures or CSV files.
What could be possible is to have a central file, next to Project.toml, that is also .toml or .yml based, and works as a dictionary. It maps unique identifiers to a set of properties, the first of which is file, and it just contains the file path relative to the project main folder. The advantage of using toml is that it is human readable and can be searched with Ctrl+F. Notice that specialized parameter searches are more suited for the result of a function like collect_data and thus do not need to be considered for this functionality.
Other properties could be added, like source file used, date produced, savename of parameters used, author, git commit, etc.
All in all this is a great compromise between the complexity of a full data manager and having data provenance for figures, CSV, etc.