# Programming With Python for Scientists

<div class="alert alert-success">
This notebook is a brief introduction to Programming for Scientists, focusing on Python. It is basically a quick overview, plus links to external resources.
</div>

<div class="alert alert-info">
For a top-level overview of computing in science, check out
<a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510"> Good Enough Practices in Scientific Computing</a>.
<br>
There is also the <a href="http://collections.plos.org/ten-simple-rules">10 Simple Rules</a> collection, from PLoS, that includes several useful papers sketching out guidelines.
</div>

![title](img/python.png)

## Python

<div class="alert alert-success"> 
Python is an <a href="https://en.wikipedia.org/wiki/Open-source_model">open-source</a>,
<a href="https://en.wikipedia.org/wiki/High-level_programming_language">high-level</a>,
<a href="https://en.wikipedia.org/wiki/General-purpose_programming_language">general purpose</a>,
<a href="https://en.wikipedia.org/wiki/Interpreted_language">interpreted</a>,
<a href="https://en.wikipedia.org/wiki/Programming_language">programming language</a>, one of the most popular programming languages, including for science. It also has the benefits of having a large, active user-community, and is a generalizable skill, also common in many areas of industry.
</div>

<div class="alert alert-info">
The official Python Website: <a href="https://www.python.org">https://www.python.org</a>
</div>

## Jupyter Notebooks

<img src="img/jupyter.png" style="width: 300px;"/>

<div class="alert alert-success">
Jupyter notebooks are a way to intermix code, outputs and plain text. 
They run in a web browser, and connect to a kernel to be able to execute code. 
</div>

<div class="alert alert-info">
You can find more information about Project Jupyter at their <a href=http://jupyter.org>website</a>.
A quick introduction to Jupyter notebooks is also available 
<a href="https://github.com/COGS108/Tutorials/blob/master/01-JupyterNotebooks.ipynb">here</a>.
</div>

<a href=""></a>

## Learning Python

- [Codecademy](https://www.codecademy.com/tracks/python) is great for very beginner levels. 
- There is also the [Official Beginners Guide](https://wiki.python.org/moin/BeginnersGuide).
- [Whirlwind Tour](https://github.com/jakevdp/WhirlwindTourOfPython) of Python, by [Jake Vanderplas](https://github.com/jakevdp) is a nice quick overview of the language
    - This resource is ideal if you have some experience in other languages, and want a primer on Python
- [Python Challenge](http://www.pythonchallenge.com/) is a good place for (sometimes infuriating) programming challenges. 
- [Leet Code](https://leetcode.com/) is a place for more intense technical coding questions and challenges (geared towards industry interviews).

## Choosing a Development Environment

<div class="alert alert-success"> 
Different 'development environments' - text editors, integrated development environments, and/or computational notebooks provide support and extra tools to assist in writing code, which can improve the experience of, and expedite, code writing. 
</div>

At it's core, Python is just text, and can be written directly in text editors such as Notepad or TextEdit. Over and above that, integrated development environments (IDE) and enriched text editors offer a suite of tools that can help you write code. 
[Syntax highlighting](https://en.wikipedia.org/wiki/Syntax_highlighting), 
[hot-keys](https://en.wikipedia.org/wiki/Keyboard_shortcut),
[auto-completion](https://en.wikipedia.org/wiki/Command-line_completion),
[integrated linters](https://en.wikipedia.org/wiki/Lint_(software),
[variable inspector](https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/varInspector/README.html)
and [debugging tools](https://en.wikipedia.org/wiki/Debugger)
are some examples of useful features you can use in an IDEs. 

**IDEs** are often larger pieces of software that have project structure management, single-line execution, as well as more sophisticated tools like real-time variable inspector. Some examples are:
- Eclipse
- pyCharm
- spyder
- JupyterLab

**Text editors** are light-weight and often more flexible, with downloadable add-ons that extend functions towards being a full IDE. Some examples are:
- Sublime
- atom
- VS Code
- (vim)

**Jupyer Notebooks** are a bit different than both, being a 'computational notebook, that's similar to an IDE, and excels at narrative documents, visualization and sequential story-telling.

<div class="alert alert-info">
For a more comprehensive list of this
<a href="https://docs.python-guide.org/dev/env/">Python guide</a>
and/or this list from
<a href="https://www.datacamp.com/community/tutorials/data-science-python-ide">Datacamp</a>.
</div>

# Data Science & Scientific Computing in Python

<img src="img/scipy.png" style="width: 250px;"/>

<div class="alert alert-success">
The main collection of scientific code in Python is Scipy, which is really a superset of a large number of individual packages. 
</div>

<div class="alert alert-info">
Scipy's official homepage <a href="https://www.scipy.org/">https://www.scipy.org/</a> 
</div>

Notably, the family of packages for scientific computing / data science in Python includes:
- [numpy](http://www.numpy.org): Numerical Computing and Array Objects
- [matplotlib](https://matplotlib.org): Plotting
- [scikit learn](http://scikit-learn.org/stable/): Machine Learning
- [pandas](http://pandas.pydata.org): Data Analysis

<div class="alert alert-info">
For comprehensive, hands-on tutorials for the scientific computing ecosystem in Python, check out the [Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook) by [Jake Vanderplas](http://github.com/jakevdp).
<br>

There is also a comprehensive set of Tutorials available as part of the [Hands On Data Science](https://github.com/COGS108/Tutorials) class. 
</div>

# Project Management & Code Organization

<div class="alert alert-success">
Using and following 'modular programming', in which code is sub-divided and organized into different components (functions, modules, scripts, notebooks, etc), can help keep code organized, and facilitates project management and reproducible results. 
</div>

<div class="alert alert-info">
More information on [modular programming](https://en.wikipedia.org/wiki/Modular_programming). 
</div>

Scientific programming that involves data analysis is like a food processing line: 
- Functions in **modules** are the individual processing steps, like the cutting station, the fryer, etc. Usually a collection of .py files that are modular and imported by the analysis script, e.g., scipy. Quick rule of thumb: functions that are called many times should be placed in modules.


- **Scripts** collect those functions in modules and arrange them in a particular order suitable for your data, like the blueprint for the entire assembly line. Scripts can be made in stages with saved intermediate outputs, like preprocessed data. It's desireable to have a one-click-and-run processing script that will reproduce your results from beginning to end, especially in scientific research.


- **Notebooks** are for displaying the final product, including visualizations and key results. Be aware: run order of cells in notebooks can be confusing if not careful.

In addition to code organization, consider where raw and intermediate forms of data should be stored to optimize memory usage, speed, and harddrive space.

### Some example projects:
- [neurodsp](https://github.com/voytekresearch/neurodsp): time-series analysis and simulation toolbox created by the Voytek lab, has several modules and tutorial notebooks, but no analysis pipeline.
- [fooof](https://github.com/voytekresearch/fooof): toolbox for parametrizing neural power spectra - modular and small codebase with tutorials, designed as an indexed package.
- [Cole, 2017](https://github.com/voytekresearch/Cole_2017): repo includes modules and notebooks to fully reproduce results from a publication.

# Programming Practices

Writing code that runs isn't enough. You want code that is understandable, and does what it is supposed to do, in a reasonaly efficient manner. 

## Style Guides

Python has a style guide - a guide to how to write Pythonic code - called 'PEP8', that you should follow. Style guides keep code consistent between people, increasing readability. There are automatic tools, called 'linters', that can help you check if your code follows style guides.  

- PEP8: https://www.python.org/dev/peps/pep-0008/
- Linting (pylint): https://pylint.readthedocs.io/en/latest/

## Documentation

Document your code! Add comments explaining what your code does, and why it does it like that. Numpy docs is a good style guide for documenting code.

Numpy Docs:
- How to: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt
- Example: https://github.com/numpy/numpy/blob/master/doc/example.py

## Testing

Code testing means checking that your code does what it is supposed to do. Automated methods can run tests on demand and check that your code is runs properly, and provides the correct output on known test cases. Having tests can make it much easier to update code, to run it on different systems. 

There are various 'levels' of software testing, for example:
- Smoke Tests: Testing whether your code runs without breaking. 
- Unit Tests: Testing each small, independent 'unit' of the code.
- Integration Tests: Testing the combination of software modules as a group. 

Pytest:
- https://docs.pytest.org/en/latest/

## Profiling

Profiling means to check the performance, either in terms of memory use and/or computation time, of code. 

## Package Management

Python code is split up into 'packages'. Managing packages is easiest with [anaconda](http://anaconda.org). 

<br>
<br>

## Version Control

![Github](img/github.png)
<br>

<br>
<div class="alert alert-success">
Version control is a way of managing changes to code, and dealing with multiple different versions of code, including across different people. 
</div>

<div class="alert alert-info">
The most common tools for version control are
<a href="https://git-scm.com">git</a> and <a href="https://github.com">Github</a>
tutorials for which are avialable at
<a href="http://swcarpentry.github.io/git-novice/">Software Carpentry</a>
, and/or in papers including this
<a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004668"> Quick Introduction </a> and/or this
<a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004947"> Ten Simple Rules </a>
paper.
An overview of the tools as well as and links to other resources is also available <a href="https://github.com/COGS108/ExtraMaterials/blob/master/X1-Git.ipynb">here</a>.
</div>

<div class="alert alert-info">
Github and git is often used from the shell (terminal). If you are unfamiliar with the shell, Software Carpentry has a tutorial <a href="https://swcarpentry.github.io/shell-novice/">here</a>. <br>There are also several graphical user interfaces for using git and Github - we recommend <a href="https://www.sourcetreeapp.com">SourceTree</a>.
</div>

# Reproducible Computation

One of the most powerful aspects of using computation is that precisely specified procedures that can repeatedly applied. However, using code does guarantee that your projects are reproducible. As well as using good programming practices to keep your code organized, accurate and efficient, the following guidelines can help make sure your analysis pipelines are reproducible:
- Whenever possible, automate, and avoid manual interventions
- Use modular programming, and organize your code into functions, modules and scripts
- If you use interactive environments, like Jupyter, or integrated development environments, make sure to also collect code into scripts
- Keep track of your code environment, and the software versions that you are using
- As your project scales, create 'master scripts' that call underlying functions / scripts.

The ultimate goal of creating a reproducible computational workflow is that you should be able to have a single script that can (repeatedly) run the whole project, from loading the data, through pre-processing and analysis, and re-creating all the figures. 

# Recommendations

<div class="alert alert-success">
Pulling this all together, the following are our recommendations for using code in a scientific context (mostly language agnostic).
</div>


## Basic Recommendations

- Software Management (Python)
    - Download and use the [anaconda distribution](https://docs.anaconda.com/anaconda/) of Python3
- Data Organization
    - If possible / available, use standardized formats and organizational structures for storing all your data
    - And/or: At least, have and follow some internal (lab) data storage standards
- Code Organization & Version Control
    - Use git & Github
- Code Style & Documentation
    - Follow standard style guides
    - Document your code!
- Code Tests
    - Write and use 'smoke tests'
- Share Code & Data when possible
    - If you keep your code on Github, and document as you go along, this is as easy as making the repository public
- Using external libraries
    - Scope out what exists, and decide what needs to be custom, and what you will use from elsewhere
    - Try to reduce custom code
    - Have a general consensus on this within the lab, and check back in periodically, both across the lab, and by exploring available packages

<div class="alert alert-info">
For more information, check out the '10 Simple Rules' guides on  <a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002802"> Open Development of Scientific Software </a> and/or <a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005412"> Making Software More Robust</a>.
</div>

## Advanced Recommendations

<div class="alert alert-success">
As your use of programming extends, you may need more tools and practices. Some examples of more advanced practices are listed below.
</div>

- Code Testing
    - Write unit and integration tests
- Continuous Integration
    - Use [travisCI](https://travis-ci.org) or [circleCI](https://circleci.com)
- Managing Software Environments
    - [Anaconda Environments](https://conda.io/docs/user-guide/tasks/manage-environments.html)
    - [Docker](https://www.docker.com)
- Profile and Optimize Code
    - For Python: can speed up with [Cython](http://cython.org)
    - Parallelize your code
- Formally Release Code
    - For example, through [PYPI](https://pypi.python.org/pypi) - Python Package Index
- Run computation on the cloud

# Finally: Find Your People

Programming languages bring with them a community and a culture, or rather, many communities with many different cultures. Find the people doing similar things in the same language, and get in touch with them, whether that be locally, or through web forums, social media groups and special interest groups. 