Binder for everybody
Shared value? Shared vision?
- Sharing knowledge, sharing our work, sharing outputs.
- Needs to be open commons, not-for-profit
- Needs to be inclusive, not just for english-speaking high bandwidth people
- Sustainable in the long term, so people can trust it (and it is trustworthy)
What if a company (for-profit) runs it, but is free and fully open? What are the indicators you can use to see whom to trust and whom to not trust?
What does trust mean? "We try things and we do not trust them anymore"? It's a leap of faith. Nobody can guarantee much.
Technically trustworthy? Or socially trustworthy? It needs to be not a 'sole source'. There's no single point of failure, no single point of responsibility. Mobility of compute & content.
They set up their own instance of Zenodo(sp?), they trust it to run.
Institutional commitment & health of community - no human single point of failure.
Be careful when you buy into libraries, since some folks are 'big dollar people'. Not all library people are good, be careeeeffffuuullll.
Goal - Clear & bounded Cost model
Need absolute transparancy & clarity about technically what is happening, for legal reasons. Be careful about copyright law, and what not. Be very intentional and careful. Ephemeralness of it is critical. Lots of technology rises and falls based on this.
Maybe just disallow saving & downloading?
- What is the governance model? (just the grant right now)
- Difference between binder and jupyter. Binder can run jupyter and also other things. This feeds into who is part of the governence model.
Pitch is "Binder tells you in a single package, what to run, what credentials you have to run it, what needs to be pre-loaded". It's a 'protocol'. Be careful.
Binder means reproducibility. If you have all these things, you have reproducibility. Really significant concerns around Jupyter for reproducibility. You need to talk about workflow systems, and multiple systems. Maybe expect inspectability, not entirely fully reproducibility. Gotta mention it here, or you aren't going to start.
Don't let the enemy of the good be the best.
Grading of repos? A binder badge that gives you a grade? Based on a set of criteria that community decides? Like LICENSE, etc.
What is a scholarly article? What's a scholarly object? That needs to be a thing in the binder ecosystem.
This is a way to give us a new way forward on scholarly records that are failing us on so many levels right now. What knowledge management should be in the 21st century. It's not fixity for its own sake or status quo sake. It's fixity to the extent the version is good. You've a responsibility for making new identifiers for things.
These objects need to be reviewable & annotatable. Versioned is not just enough. Uniquely identifyable.
What made you pick zenodo?
- Way it ships. It integrates into GitHub (or wherever it is actively worked on). It accepts the human workflow natively. Not 'yet another place' for you to do stuff.
- Ships with nicities that are nice (for metadata standards, for example). Better room to work with. DOI is forced march to.
- ROI was best calculation.
It is CERN based, and 'we' trust CERN. made it better to pick zenodo.
Would you begin running this at CalTech today?
- We'd have to have commitment from early adopters on campus who are willing to provide content. That's what you need first.
- Find out how many users are from each institution. For overleaf, we fonud out how many free accounts are being used and then that let us cut a deal.
- Pick datastores and try to get binders on them. Prove that this works and there are people using it for stuff. Get the compute close to a big dataset and make it happen.
- Add support for a zenodo DOI to start with binder. Work with crossref & datacite. Pull together multiple DOIs into one big blob that we can use.
- In a data repository, click a button and you launch this. Dataverse & invinio, just start with two data repository platforms. Give them launch binder buttons and see how that goes.
Groups of people to involve:
- Scholarly infrastructure folks
- Open Source Community
- Dynamic infrastructure for running this in places. Find infrastructure partners who can do this. Sciverse/Jetstream/Whatever, Center for Open Science, Zenodo. What about Microsoft / Google? (some disagreement). Maybe 'ubiquity press' (lots of good things about it). They rescue people from Evilsevier.
Governance? Whatever providers we use, have a committment to diversity & make good on it & have properly demonstrated it.
"5 people here are already using it, then we should take a more concrete look at it" rather than "here we are doing a sales call". "How many people this is going to serve". People will build thing with no plans for survival past them going out.
Everything is author focused. Everyone's bandwidth is stretched thin for what people's things are.
ACTION ITEM: Try to list how many launches are from various universities?
Carpentry should be more involved in here, in various ways. Jupyter needs to be more stable, documentation needs to be more valued in the Jupyter community. Need to be able to teach this to students with the idea that they should be able to use this for 4y.
Challenge is not developing lessons, it is sustaining them. And respect for them. Find the right funders, wrap all this up. Training is essential for these reason, it's going to a proven organization.
Try to adopt the GitHub model. Have a structured way of doing this. Log in with Orcid, want to be able to choose which of these 5 to run on.