# Workshop on Reproducible Science with `Jupyter Notebook` and `Python` 

## Sam Fraiberger
- Data Scientist, Big Data Program
- Visiting Scholar, NYU Computer Science
- Fellow, Harvard IQSS

---

# Objectives

- Understand reproducible workflow and its value
    - To work more efficiently
    - To help help the advancement of Science
- Learn practical tools to create reproducible workflow, in particular by:
    - Familiarizing yourself with Jupyter Notebooks, and 
    - Gaining some knowledge of Python
- Have the confidence to continue improving reproducibility of your research

# Motivation

- The sciences have a reproducibility problem. [Nature article](http://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970) : **many published studies cannot be reproduced**

    - Nature's survey of 1,576 researchers 
    - More than 70% of researchers have tried and failed to reproduce another scientist's experiments 
    - More than half have failed to reproduce their own experiments
 
- Science retracted a study of how canvassers can sway people's opinions about gay marriage [538 story](http://fivethirtyeight.com/features/how-two-grad-students-uncovered-michael-lacour-fraud-and-a-way-to-change-opinions-on-transgender-rights/)

- Reinhart and Roghoff (2010) controversy [New Yorker Story](https://www.newyorker.com/news/john-cassidy/the-reinhart-and-rogoff-controversy-a-summing-up)

# What is Reproducibility ?

- **Documentation**: note the difference between binary files (e.g. docx) and .txt files and why text files are preferred for documentation
- **Organization**: tools to organize your projects so that you don’t have a single folder with hundreds of files
- **Automation**: the power of scripting to create automated data analyses
- **Dissemination**: publishing is not the end of your analysis, rather it is a way station towards your future research and the future research of others

# Who should care?

- Anyone who performs computational analysis
- Anyone using in Excel/Stata and would rather work >10 times faster and in a more reliable way
- Anyone familiar with this situation:
![irreproducible](./images/irreproducible.png)

---

# Prerequitise
- Basic notion of programming, preferably in Python
- Basic notion of statistics

---

# Plan

### Part 1 - Working With Jupyter Notebook
### Part 2 -Data Analysis in Python

# Installing Jupyter Notebook and Python

- **Requirement**: reasonably up-to-date browser
- Go to [Anaconda](https://www.continuum.io/anaconda)
- All in One Installer
- **Python version 3.x**
- Contains [`Jupyter Notebook`](http://jupyter.org/), a programming environment that runs in a web browser

## Windows

[Video tutorial](https://www.youtube.com/watch?v=xxQ0mzZ8UvA)

1. Open http://continuum.io/downloads with your web browser.
1. Download the `Python 3` installer for Windows.
1. Install `Python 3` using all of the defaults for installation except make sure to check **Make Anaconda the default Python**.

---

## Mac OSX

[Video tutorial](https://www.youtube.com/watch?v=TcSAln46u9U)

1. Open http://continuum.io/downloads with your web browser.
1. Download the `Python 3` installer for OS X.
1. Install `Python 3` using all of the defaults for installation.

---

## Linux

1. Open http://continuum.io/downloads with your web browser.
1. Download the `Python 3` installer for Linux.
1. Install `Python 3` using all of the defaults for installation. (Installation requires using the shell. If you aren't comfortable doing the installation yourself stop here and request help at the workshop.)
1. Open a terminal window.
1. Type:
~~~
bash Anaconda3-
~~~
and then press tab. The name of the file you just downloaded should appear.

1. Press enter. You will follow the text-only prompts. When there is a colon at the bottom of the screen press the down arrow to move down through the text. Type yes and press enter to approve the license. Press enter to approve the default location for the files. Type yes and press enter to prepend Anaconda to your PATH (this makes the Anaconda distribution the default `Python`).

---

# Additional Resources

## `Project Jupyter`

- `Project Jupyter` [Homepage](http://jupyter.org/)
- `Project Jupyter` [Google group](https://groups.google.com/forum/#!forum/jupyter)
- `Jupyter` [documentation](https://jupyter.readthedocs.io/en/latest/)
- [GitHub](https://github.com/jupyter/help)
- Free `Project Jupyter` tutorials:
    - [Readthedocs](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/)
    - [YouTube](https://www.youtube.com/watch?v=Rc4JQWowG5I)
    
---

## `Python`

- [`Python`](https://www.python.org/)
- `Python` [documentation](https://docs.python.org/3/)
- `Python` [Google group](https://groups.google.com/forum/#!forum/comp.lang.python) - Note, there are many!
- [Stack Overflow](http://stackoverflow.com/questions/tagged/python)
- `Python` [Help](https://www.python.org/about/help/)
- Free `Python` tutorials:
    - [Google's Python tutorial](https://developers.google.com/edu/python/)
    - [Data Camp](https://www.datacamp.com/)
    - [Berkeley Institute for Data Science Python Boot Camp](https://www.youtube.com/playlist?list=PLKW2Azk23ZtSeBcvJi0JnL7PapedOvwz9)
    
---