# Developers Lightning Talk 2017-03-30

### Shaun Bell (EcoFOCI): 
- email: <shaun.bell@noaa.gov>   
- github: @shaunwbell)
    

## DataScience Swiss Army Knife

I do a lot of data exploration and one-of-a-kind analysis for scientific exploration, quality control, and publication.  This involves linking the raw data that EcoFOCI acquires from field operations, to data from other resources and making them consistently available to the internal/external community.  

I share directed results, pilot data analysis routines, and develop qc operating procedures for our suite of oceanographic instrumentation.  Ultimately, I do a lot of glueing together of packages/routines/programs/data to provide whatever I can for the PMEL/EcoFOCI/FOCI community.

**Tech Buzzword lately is "Data Driven Stories"**

## Existing Challenge

Data science, data analysis, plotting and synthesis of data is a rapidly expanding field.  Its easy to get settled in a workflow that is optimum for a while (with a large amount of invested time) and then not evolve to new workflows due to the spinup time required.  

Questions I hear often:
- I know Python is popular but I don't know where to get started?
- Is Python as good as {familiar utility} for doing {specific task}?
- How did you learn to do this?
- Can you plot or take a look at this data and let me know what you think?

## Purpose for talk

I want to introduce my skills to the PMEL community and provide the framework from where I start when it comes to analysis requests.  I dabble in a wide variety of computer services / development tools and want to learn what other tools and workflows are being developed for data/scientific analysis.

I'm not limited to mucking about in Python (I play in R, javascript, C/Fortran, html, mysql, etc)... there are as many tools avaible as there problems to solve and this is just one instance of a currently popular and community supported software/data/analysis stack.

## A Basic Python Framework/Installation for data science/exploration in EcoFOCI

### Python Environment/Distribution


The Anaconda Python Scientific Package found here [Continuum Anaconda Download](https://store.continuum.io/cshop/anaconda/) which comes preloaded with most of the scientific and data oriented python utilities that one needs to get started.  Its strength is also a simple and straightforward integrated package management utilitity (Conda) which can manage libraries and dependancies as well as python programs.   

It is available for Windows, Mac, and OSX, can be installed locally (can be entirely self managed as admin rights are not necessary) and is available for python 2 and/or python 3 (and both flavors can be installed simultaneously on the same system without conflicts)

Anaconda Python Distribution is designed to contain all the fundemental packages for data exploration.
Miniconda Python Distribution is designed to only have Python and the Conda package manager.

Best of all... keeping installations across multiple systems (work laptop/ work desktop) is streamlined without the need to link and compile libraries since its all managed by Conda.

#### Expand Conda's knowledge of updates by including other "channels"

Continuum vets the packages that they include in their update channel... consider it a "stable release" channel.  However, many useful packages are maintained by the community and are stable or stable enough that you want to use them Continuum just doesn't have the time/manpower to vet these packages.

The [conda-forge channel](https://conda-forge.github.io/), a community managed repository hosted and built through git/github is the first additional channel I would link to (there are others that are worthwhile too like the IOOS - Integrated Ocean Observing System channel... which is mostly replicated in conda-forge )

From the command line: update conda settings as follows:   
(This makes the conda-forge your primary channel)

    `conda config --add channels conda-forge`

If you just wish to install a package from conda-forge without setting the channel (leaving the channel set to continuum analytics base channel as the default) then just use the -c flag:

    `conda install netcdf4 -c conda-forge` 
    
##### Looking for a newer package with feature updates

    `conda update netcdf4` 

##### Looking to keep all your packages at the most recent release

    `conda update --all`
    
##### Additional Features

Check out the conda documentation.  You can "pin" packages so that they never update.  You can easily establish multiple unique Python environments for testing, and you can build your own recipes/packages for distribution.

### Installed by default in Anaconda Python (common components of the PyData framework)
- numpy, scipy #robust numerical/math packages
- pandas       #data analysis and statistics package 
- matplotlib, basemap, cartopy #scientific plotting and geospatial graphics
- Spyder       #Visual Interactive Display Environment (IDE) similar to Matlab
- Jupyter Notebook #program used to create this talk

### Additional Useful Python Packages
- `conda install xarray netcdf4`  #reading/writing/viewing multidimensional scientific data  
- `conda install gsw seawater`    #Oceanographic Thermodynamics Conversion Tools (TEOS-10 and EOS-80)
- `conda install pip`             #Additional package repository installer for code not on conda-forge (Pure Python... historically no support for required library dependencies)
- `conda install cmocean`         #Colormaps designed for oceanographic parameters [cmocean website](http://matplotlib.org/cmocean/)

**That's it!!** You're ready to start using python for scientific/data analysis

From your command line (and potentially gui shortcuts) you can run:
- python scripts *.py   
        `python myprogram.py`
- fire up the Spyder IDE    
        `spyder`
- fire up the Jupyter/Ipython Notebook server    
        `jupyter notebook`



### [Example using NOAA HRES OI V2 SST Data (Netcdf) ](https://github.com/NOAA-PMEL/EcoFOCI_Jupyter_Notebooks/blob/master/AlaskaRegion_NOAA_HRESV2OISST_DailySummary.ipynb)

## Further Discussion, Talks and Links

Is there any Desire for...
- Workshop or Tutorial Session for:
    Scientific Data Analysis or Visualization in Python
- Using Jupyter/Ipython Notebooks for sharing work/code/results and leveraging github for more than just code

### Further Reading
- [conda myths](https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/) A blog post on what Conda is and isnt.
- [PyData](https://pydata.org/index.html) Information about the Python Open Science Data Stack
- [Jupyter Notebook](https://jupyter.readthedocs.io/en/latest/index.html) Originally IpythonNotebook - Creates an environment to test code, view results, and present commentary in a clean format.
- [Github+Jupyter Notebook](https://github.com/blog/1995-github-jupyter-notebooks-3) Using GitHub with Jupyter Noteboks
- [Markdown Github Cheat Sheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet), [Markdown](https://daringfireball.net/projects/markdown/syntax) Simple language syntax used in creating the text of Jupyter Notebook documents

### Helpful links for EarthScience Analysis
- [Tour of Python Geoscience Stack](http://nbviewer.jupyter.org/format/slides/github/ocefpaf/SciPyLA_2016_tutorial/blob/master/index.ipynb#/)
- [python4oceanographers](https://ocefpaf.github.io/python4oceanographers/) Blog of relevant examples for oceanography and python utilities
- [Sea-Py](http://pyoceans.github.io/sea-py/) community organized collection of python tools for the oceanographic community
- [python4geosciences](https://github.com/kthyng/python4geosciences) Class material from Texas A&M