# Open Science

<a name="top"></a>Outline
---

1. ['Open Science' as a concept](#openscience)
  * [What is it](#what)
  * [Why bother?](#why)
  * [Pitfalls and cons](#cons)
  * [How do I practice it?](#how)
2. [Open Access](#open access)
  * [The 'green' and the 'golden' road](#green and golden)
  * [Preprint servers](#preprint)
  * [The executable paper](#executable paper)
3. [Open Data](#open data)
  * [The FAIR principle for data](#fair)
4. [Open Source](#open source)
  * [Citable code](#citable code)
  * [Live presentation](#live)
5. [Final remarks](#fin)
  * [Questions](#questions)
  * [Evaluation](#evaluation)

<a name="openscience"></a>'Open Science' as a concept
===

<a name="what"></a>What is it?
---

Open Science is a buzzword that contains many sub-concepts from different areas of the scientific life. Common to all the sub-concepts are the goals that Open Science tries to reach with its approach:
* Transparency
* Reproducibility
* Accessibility
* Efficiency

The most commonly named sub-concepts when talking about Open Science are
* Open Access
* Open Data
* Open Source
* Open Peer Review
* Open Labbooks
* Citizen Science
This presentation is mostly concerned with the first three of them.

[top](#top)

<a name="why"></a>Why bother?
---

Making your science and data accessible and transparent involves some additional work on your part. Therefore we need some _incentives_ to make that work worth your while. Some incentives from the more direct ones to the more ideologoical:
* Save money (for example for software licenses).
* Save time (understandable, transparent and accessible work is to your own benefit).
* Open tools often have a larger community.
* Larger audience -> increased chance for collaborations/citations.
* More efficient research because more data and knowledge is available can be re-used and re-mixed.
* Money that is currently going to the shareholders of big publishers could be used to give one of YOU a permanent job in science instead.
* More 'equal' access to knowledge and chances for example in developing countries.
* Ethics of 'good scientific practice' demand reproducibility and accessibility.
* The current 'reproducibility crisis' (http://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970)
* Other: ...

![reproducibilitycrisis](http://www.nature.com/polopoly_fs/7.36717.1464174452!/image/reproducibility-graphic-online2.jpg_gen/derivatives/landscape_630/reproducibility-graphic-online2.jpg)

[top](#top)

<a name="cons"></a>Pitfalls and cons
---

Being open with your science is not always a win-win situation for all involved parties and its not all flowers and sunshine. Some counterarguments and pitfalls you need to be aware of:
* Open Access journals are paid by the _authors_ to cover their cost.  
$\rightarrow$ Be wary of excessive fees that are not warranted.
* Open Access journals often have lower _impact factors_.  
$\rightarrow$ look out for institutions that have pledged to value _quality_ over citation metrics.
* Being scooped is an imminent danger.  
$\rightarrow$ It is OK to make your data and source code accessible _after_ you published.
* But my code is too messy...   
$\rightarrow$ People will be _thankfull_ that there is anything at all.
* Loss of quality control/gatekeeping.  
$\rightarrow$ The reproducibility crisis is one of the best indicators that the current approach doesn't work either and it is time to try something else.
* Other: ...

<a name="how"></a>How do I practice it?
---

* **Publish in Open Access journals** (https://doaj.org/ helps you find one).
* **Publish your data** along with the corresponding publication and _give it a license_ (for example at https://zenodo.org/).
* **Publish your code** used to analyze or (in case of numerical simulations) produce your data and _give it a license_ (for example on https://github.com/). 

A word on licenses: You can use the creative commons license chooser (https://creativecommons.org/choose/) to look for a license that fits your needs and attach/link to it in your publication.  

You can explicitely allow or forbid
* sharing
* remixing
* commercial use
* (military use) - not yet in creative commons  

If something you publish does not have an explicit license attached to it, it is by default forbidden to share, remix and use comercially!

[top](#top)

<a name="open access"></a>Open Access
===

> Open access (OA) refers to online research outputs that are free of all restrictions on access (e.g. access tolls) and free of many restrictions on use (e.g. certain copyright and license restrictions). Open access can be applied to all forms of published research output, including peer-reviewed and non peer-reviewed academic journal articles, conference papers, theses, book chapters, and monographs.

[Wikipedia on Open Access]

<a name="green and golden"></a>The 'green' and the 'golden' road
---

* 'Golden' open access journals publish their articles Open Access immediately (not after a certain dealy).
* 'Green' open access journals endorse open self-publishing by the authors (for example on preprint or institutional servers) but have paywalls for their articles themselves.

<a name="preprint"></a>Preprint servers
---

A preprint is a version of a paper that precedes publication in a peer-reviewed journal. The preprint may persist, often as a non-typeset version available free, after a paper is published in a journal.
Two examples
* arXiv for physics, astronomy and computer science (http://biorxiv.org/)
* bioRxiv for biology (http://biorxiv.org/)

Aim for golden Open Access. If your journal of choice is not Open Access itself, check if it endorses open self-publishing (most of them do!) and take the five minutes to upload your paper on an appropriate preprint server.

[top](#top)

<a name="executable paper"></a>The executable paper
---

An executable paper is a publication that comes with the data used to create it published along with it. The analysis done to arrive at the results and the scripts used to create the graphics are embedded in the paper and can be reproduced step by step by the reader. This is as good as it can get in terms of transparent and reproducible research. Possible tools
* Jupyter-notebooks 
* R-markdown (http://rmarkdown.rstudio.com/)  

Examples:
* Full publication: http://nbviewer.jupyter.org/github/mbobra/machine-learning-with-solar-data/blob/master/cme_svm.ipynb
* Only the figures: https://github.com/yoavram/ruggedsim/blob/master/manuscript/supplementry.ipynb

[top](#top)

<a name="open data"></a>Open Data
===

Data itself has become recognized as primary research output (as opposed to papers). Journals for data-publishing have emerged and more and more publications use freely accessible datasets in a new context to discover new findings.  

In this context we often talk about **Data Objects**. A data object consists of _data elements_ + _metadata_ + _an identifier_.  

To make data re-use possible, the data you publish needs to adhere to a few basic principles.

<a name="fair"></a>The FAIR principle
---

Following https://www.force11.org/group/fairgroup/fairprinciples, published data needs to be
* **F**indable
  - To be Findable any Data Object should be uniquely and persistently identifiable (DOI).
  - A Data Object should minimally contain basic machine actionable metadata that allows it to be distinguished from other Data Objects.
* **A**ccessible
  - Data is Accessible in that it can be always obtained by machines and humans.
  - Through a well-defined protocol and upon authorization where necessary.
* **I**nteroperable
  - (Meta) data is machine-actionable (parseable).
  - (Meta) data formats utilize shared vocabularies. 
* **R**eusable
  - (Meta) data should be sufficiently well-described and rich that it can be automatically (or with minimal human effort) linked or integrated, like-with-like, with other data sources.
  - Published Data Objects should refer to their sources with rich enough metadata, a license and provenance to enable proper citation.

No matter what platform you finally use, adhering to the FAIR principle is always a good idea. It also isn't too much additional work if your data was in a machine-readable format in the first place.

[top](#top)

<a name="open source"></a>Open Source
===

> Open source promotes universal access via an open-source or free license to a product's design or blueprint, and universal redistribution of that design or blueprint.

[Wikipedia]

This often refers to software or code but can also be applied to for example Open Source plant cultures. Benefits of making your code openly accessible:
* Encourages other people to work with it.
* Valuable feedback, suggestions for improvements and bug reports.
* More collaborations and contributions.
* Connection to scientists and coders that work on the same problems that you would never have gotten to know otherwise.  

The same principles as with Open Data apply to Open Source:
* Make it findable by tagging it with keywords and hosting it at a place such at GitHub.
* Make it re-usable by including a license and author's attribution.
* Include a README file that details how to use the code.
* Document and comment your code.  

Especially the last point should be part of any software project or more involved analysis script anyways. Use the experience of publishing your code to find some bugs, clean up your codebase and add a proper documentation - this will greatly help you safe time in the future!

[top](#top)

<a name="citable code"></a>Citable Code
---

Platforms like Zenodo allow you to assign a DOI to your code. Here is the workflow:
* Log into Zenodo using your GitHub account (GitHub is much like GitLab but public by default).
* Select one of your repositories.
* Create a release of your code (freezes and tags the current status).
* Get your DOI that can be used to cite your work.

[top](#top)

<a name="fin"></a>Final Remarks
===