Meeting on April 17, 2018

Upcoming meeting

Time: 18:00 Paris time

Place: https://yale.zoom.us/j/809419495

Agenda item: IE data inventory/catalogue and IE database: Please have a look at the sheet 'overview' on https://docs.google.com/spreadsheets/d/1yupwhtfUiBnW5DcAzOTek03gxzH22hDzn_djzwocDGA/edit#gid=949862222 Scroll to the right for examples! Thanks! Stefan

Meeting Minutes

Irreproducibility Report: https://www.nas.org/projects/irreproducibility_report

Beware partisan propaganda offered in bad faith. NAS.org = National Association of Scholars, right wing advocacy group in US.

JIE issue

Authors submitting python code - Reid wants advice on what to offer / require.

BK: Jupyter notebook is nice because it allows you to reproduce it it but it may not work
RLu: if they posted it on Github then people would be able to read it right away, even if they don't have python installed. Also "My binder" service allows people to host their code and run it remotely. but it requires more than just providing a notebook- because you also need to specify what packages are required
RLi: please write that down
RLu: will make a wiki page

Guidelines for sharing Jupyter notebooks

To be specific, the files are ipynb - we will probably want to have 'required' and 'optional' aspects of the guidelines.

RLi: norm is that the reader should know what the supplementary files are and what info they contain. e.g. spreadsheet, the first tab is documentation. what's equivalent for jupyter?
BK: notebook supports inline documentation but yous till need to be able to read it.
RLu: also Zenodo allows you to archive a report / get a DOI (put that on the wiki too)

Also important to determine what the journal wants to accomplish-- is there the ability to run code while you're reading the article?

RLu: you could do much more, e.g. have the notebook reproduce the figures, but that is more than is required.
RLi: if the code is self-contained and it is just a link to take you out of the author, then it's only a URL and no live interactive capability is required in the article.

In the short run, the basic suggestion would be to put it in github and/or zenodo and simply have a reference to it.

NH: presumably Wiley has an opinion about it? Don't put too much trust in 3rd party services that just cropped up and could disappear.
RLi: permanence of the record is a profound issue. e.g. Large grey-literature report -- should it be part of the SI? Publishing world has not grappled with that.

BK: separate question about URLs: inline vs footnote vs reference or all three?

References should be references- supporting a statement- inline links should be for interaction though not necessarily for support.

SP: Atlantic article "The scientific paper is obsolete" https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/

Stefan's spreadsheet

reference: https://docs.google.com/spreadsheets/d/1yupwhtfUiBnW5DcAzOTek03gxzH22hDzn_djzwocDGA

Tried to structure the feedback I got last week, what it boils down to is that there's not an inventory or database, but different stages. My idea was just to report plain-text metadata but Brandon observed that may not actually be very useful.

Catalogs:

Type 1: Keyword-based catalog. Ideally, we have all data in a general database and the data are all linked. e.g. a person types 'Sweden' and gets all the information pertaining to Sweden.

Type 2: Data structure catalog without classifications. The "sweet spot" is where we have a catalog, with ontologies for different data types; and data users need to do some work to link their data to the existing types

Type 3: Data structure catalog with classifications. like type 2 but the authors do the work of linking to the types

Actual Databases

Type 4: metadata included, numeric data excluded

Type 5: metadata and data included

"Aspects" or "dimensions" are different entries in an n-tuple of data about something.

RLu: "measures" are the quantitative extent of dimensions in the 'data cube' context.
Example of that in RLu paper on sankey diagrams
BK: these are problems that other fields must have already dealt with. Incoherent talk about NCEAS
perspective from other fields - e.g. AGMiP http://www.agmip.org example of a large scale synthesis project
Steven Kraines - early work on ontologies in JIE (2004-2005) (e.g. https://doi.org/10.1162/1088198054821690)
We are a bit stuck
Mathematics + computer science
More of a human problem than a CS problem
SP: in this MFA software I used this data structure and I will carry on so that we have something to point to and iterate on
Important question is raw data versus proxy data. e.g. steel data in ecoinvent comes from observations on one steel plant in Switzerland and gets processed into a proxy model
Same thing with IO data- goes through dozens of transformations, but once the MRIO table is generated it is treated as "raw" data because it satisfies a balancing requirement and because the upstream transformations are not visible

Regarding aspects- we need to identify what aspects are "required" for different data types, e.g. a flow needs "at least two processes"

BK: Look at two existing ontology design patterns (both of these are "published" as conference papers in the semantic web community):
- Spatiotemporal Scopes in LCA https://geog.ucsb.edu/~jano/stscope_ontology.pdf
- Material Transformation http://www.semantic-web-journal.net/system/files/swj1120.pdf
RM: System description is the most important aspect of a datapoint. What is the point of a data point if it is not contextualized?

Work going forward

All of this is in the context of material flows:

Different mandatory and optional aspects of different data types
"Data integration": putting the disconnected data from different sources and data into one system; figuring out how they fit together and the relationships between them.

International Society for Industrial Ecology