# IA369Z - Best Practices - Gustavo Retuci Pinheiro

This is a best practices guide to create a reproducible research (RR) using Jupyter Notebook (JN). Here you will find tips about the organization of RR, usage of JN, coding in Python, sharing data and codes, and much more based on the research community knowledge and my own experience.
However, as the content is massive, I will be attached to the most important topics.

### Reproducible Research

Reproducibility in research is the ability of the experiment to be re-run by the author or someone else and achieve the same result or results that lead to the same conclusion.


In computational research, the way to make the work reproducible is sharing the data, codes, documentation, and running those in the same environment.

There is a common structure to follow, using popular platforms, in order to achieve good reproducibility (Fig. 1):
- Sharing/Distribution
    - Github, Bitbucket, ...
- Data
    - MRI, CSV, ...
- Code
    - Python, MATLAB, ...
- Documentation
    - Overleaf/Latex, PDF, ...
- Environments
    - Windows/Linux/macOS, Docker, ...
    
   

<img src="figures/Reproducibe_research_01.png" width="700" alt=""/>
<figcaption>Fig. 1 - Reproducible Paper: Structure and tools</figcaption>

#### Do's and Don't's
- Try to make the research as reproducible as possible
    - Use good documentation
    - Code efficiently
    - Avoid using too many external resources
    - Use standard tools
- Documentation
    - Create a document that gives an overview of the work, explains the methods, and states the conclusions (separately of the codes or, preferably, using literate programming)
    - Create a README file to inform how to setup the environment and run the codes.
    - Create a Workflow
    - Attach a Licence file to make sure the work will be used properly
- Publish your work where it will have a good visibility

## Workflow

When the project is dependent on many software or many different codes, it is recommended to use a workflow management tool. This kind of tool will manage the flow of the data through the sections to ensure that the data will be manipulated and be available properly.
However, if the project has a low dependency on different software/modules, the development and reproducibility will be easier if only a flowchart is used as Workflow.

### [draw.io](https://www.draw.io/)

draw.io  is free online diagram software for making flowcharts. As the Workflow for simple projects can be represented as a flowchart, this online tool is a good option to draw the Workflow.

##### Do's and Don't's
- Use soft colors on the charts to make it pleasant to read
- Try to create a logical flow on the charts
- Keep the flowchart clean to avoid confusion
- In case of multiple code sections, point in the flowchart what block corresponds to each section

## Jupyter notebook

[Jupyter](http://jupyter.org/) notebook is an open-source web application that allows the creation and sharing of documents that contain live code, equations, visualizations and explanatory text.
Because of those features, it is considered a literate programming tool.


Since it has many advantages such as a fast growing community, good online support, compatibility with several programming languages, compatibility with Latex, and so on, Jupyter Notebook is a great tool to make reproducible research.

Some examples of its use are [here](https://nbviewer.jupyter.org/).


#### How to install
- Install Jupyter using [Anaconda](https://www.continuum.io/downloads) to make sure every prerequisite will be installed



#### Do's and Don't's
- Use Python 2.7 because the 3.x versions are incompatible with many libraries
- Setup different Python environments, mainly if you want to try new libraries
    - To switch between environments use the command "activate [enviroment_label]" in the command prompt
- Keep the [pip](https://pip.pypa.io/en/stable/installing/) tool up to date
- "pip install" is able to install almost all Phyton packages (just make sure you are at the right environment)
- Use a pattern to name the notebooks. I recommend using <b>title_mm-dd-yy_OWNERINITIALS.ipynb</b>. It will keep the files well organized inside the notebook folders
- Create a notebook for every working day even if there are no effective changes
- Jupter Notebook DO NOT import modules that are already imported. If you change the content of a module, restart the kernel to make sure the last version will be imported
- If you are using multiple notebooks in your research, the simplest way to share data among the is by saving the processed data in files and reading in the next notebook
- Exporting PDF or .tex directly from Notebooks might be useful, but this tool is still in the beginning and will require some finishing

## Python

Python is a powerful high-level programming language that runs on many operating systems, so the codes can be interchangeable between platforms.

This programming language is also widely used in the academic world. This is due to the fact that it has a very good online documentation, there is a lot of available libraries, it is very powerful in data manipulation, it has many graphic tools incorporated, relatively easy to code and to read, it is free, etc.

Because of those features and many others, this is the programming language I recommend for a reproducible research.

[Here](https://wiki.python.org/moin/BeginnersGuide/Programmers) is a Beginners Guide for Python Language.



#### Do's and Don't's
- Be careful with the indentation
- Try to avoid external dependencies
- Use implicit "for loops" to make the code more efficient
- Stablish a pattern for data and function format and stick to it (this avoids incompatibility between functions)
- Use the command "print" to debug your code
- Try to make the code as short as possible to avoid redundancies and inefficiencies
    - When the code starts to get big, create a module with your functions and import it to your notebook (you also can do it importing another notebook as a module)
    - When your library is well established, you can make it public and "pip installable"
        - How to create a "pip installable" module [link 1](https://packaging.python.org/distributing/#setup-py), [link 2](https://www.youtube.com/watch?v=NFpmPDIqQZw), [link 3](https://www.youtube.com/watch?v=J_vhfkSb9pU)
        - The main commands are: "python setup.py sdist" and "twine upload dist/*"
- Place comments along the code to make it easier to read
- Name the variables intuitively
- To keep track of active variables, use the command "whos"
- Use proper variable types to avoid warnings and errors

## Overleaf

[Overleaf](https://www.overleaf.com/) is an online Latex platform that allows a collaborative editing of the documents.

Overleaf is one of the best Latex editor available. It offers standard templates, real-time preview of the editing, many installed packages, and so on.

The main advantage of this platform is that it is online, so there is no need for installation and hard setups. However, the main disadvantage of it is also because it is online. When the file begins to get big, the compilation gets heavier and some glitches can occur. Furthermore, if the internet is down, there is no access to the file.

#### Do's and Don't's
- It is limited on the free version. Find another platform if you work will have more than 60 files (including images)
- Name your files with the work name and your actual name. If you don't, it will be hard for the people to whom you sharing the file to identify it is yours
- Make sure you have a stable internet connection (principally if you close to the deadline)
- Avoid compilation errors because they are very hard to identify on the log file

## Distribution

The easiest way to distribute and make the version control of your codes is using [Github](https://github.com). This tool allows the user to share a repository with data, codes, documentation, licenses, and everything else a reproducible research needs.
Github also has a friendly interface, both online and offline, witch makes it easy to use.

However, in the free version, everything you put there became public. If you intend to publish your ongoing work in a conference, for example, you can not make it public. Even after the conference published it, you have transferred the copyrights to them. In this case, the solution is to pay for the service or use the [bitbucket](https://bitbucket.org/) platform, that is very similar to Github but with free private projects.

Before starting your project, you should decide if the work will be public from the beginning. If it there is no need to be private, I recommend using Github because of popularity and easiness of use.

#### Do's and Don't's
- Using Bitbucket and Github at the same time could be tricky as they try to "steal" the ownership of the project
- The graphic interface is much easier to use than the command lines
- If you kee your project public, there is the possibility of viewing the notebooks in the [nBviewer](https://nbviewer.jupyter.org/)

<html>
<head>
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css">
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script>
</head>
<body>

<div class="container">
  <h2>Other Tips/Do's and Don't's</h2>
  <p>Here are some other tips to use the platforms</p>
  <div class="panel-group" id="accordion">
    <div class="panel panel-default">
      <div class="panel-heading">
        <h4 class="panel-title">
          <a data-toggle="collapse" data-parent="#accordion" href="#collapse1">Jupyter Notebook</a>
        </h4>
      </div>
      
      <div id="collapse1" class="panel-collapse collapse in">
        <div>
        
        <div>- Import notebook as a module (instructions [here](https://github.com/GustavoRP/Notebook-as-Module)).</div>

        <div>- Try to use the notebook to create slide presentations is very painful.</div>
        
        <div>- It is "impossible" to keep the codes always organized, so periodically reorganize your notebboks.</div>
        
        <div>- keep your test or messy notebooks in a specific folder.</div>
        
        
        </div>
      
      </div>
    </div>
    
    
    
    <div class="panel panel-default">
      <div class="panel-heading">
        <h4 class="panel-title">
          <a data-toggle="collapse" data-parent="#accordion" href="#collapse2">Python</a>
        </h4>
      </div>
      <div id="collapse2" class="panel-collapse collapse">
        <div>
        
        <div>- Import all needed modules inside the function that needs them. It ensures that the code will no break because of nonimporting modules.</div>
        <div>- Use matplotlib to show graphics.</div>
        
        <div>- Try to keep the imports in the same cell for organization.</div>
        
        <div>- As long it is possible, compact code sections in functions. It will help saving memory.</div>
        
        
      </div>
      </div>
    </div>
    <div class="panel panel-default">
      <div class="panel-heading">
        <h4 class="panel-title">
          <a data-toggle="collapse" data-parent="#accordion" href="#collapse3">General Tips</a>
        </h4>
      </div>
      <div id="collapse3" class="panel-collapse collapse">
        <div>
         
         <div>- Docker does no run on Windows 10 Home edition.</div>
         
         <div>- Do everything objectively.</div>
         
        
        
      </div>
      
      </div>
    </div>
  </div> 
</div>
    
</body>
</html>

### References

- draw.io  (https://www.draw.io/)
- Jupyter  (http://jupyter.org/)
- Anaconda  (https://www.continuum.io/downloads)
- pip  (https://pip.pypa.io/en/stable/installing/)
- BeginnersGuide  (https://wiki.python.org/moin/BeginnersGuide/Programmers)
- creating a "pip installable" module  (https://packaging.python.org/distributing/#setup-py)
- Overleaf  (https://www.overleaf.com/)
- Github  (https://github.com)
- bitbucket  (https://bitbucket.org/)
- nBviewer  (https://nbviewer.jupyter.org/)
- notebook as a module  (https://github.com/GustavoRP/Notebook-as-Module)