**WORKSHOP NAVIGATION**

---
* [A quick introduction to Python](ws01b_intro.ipynb)
* [Python string basics](ws01c_strings.ipynb)
* [A quick tour of working with files](ws01d_files.ipynb)
* **Jupyter: A flyby introduction**
* [Introduction to Github](ws01f_github.ipynb)
---

# Table of Contents
 <p><div class="lev1 toc-item"><a href="#WORKSHOP-1-/-Jupyter:-A-flyby-introduction" data-toc-modified-id="WORKSHOP-1-/-Jupyter:-A-flyby-introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>WORKSHOP 1 / Jupyter: A flyby introduction</a></div><div class="lev2 toc-item"><a href="#Jupyter-Notebooks:-What?" data-toc-modified-id="Jupyter-Notebooks:-What?-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Jupyter Notebooks: What?</a></div><div class="lev2 toc-item"><a href="#Jupyter-Notebooks:-Why?" data-toc-modified-id="Jupyter-Notebooks:-Why?-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Jupyter Notebooks: Why?</a></div><div class="lev2 toc-item"><a href="#Jupyter-Notebooks:-How?" data-toc-modified-id="Jupyter-Notebooks:-How?-13"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Jupyter Notebooks: How?</a></div><div class="lev3 toc-item"><a href="#Narratives" data-toc-modified-id="Narratives-131"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Narratives</a></div><div class="lev3 toc-item"><a href="#Plain-text-narratives-with-Markdown" data-toc-modified-id="Plain-text-narratives-with-Markdown-132"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span>Plain text narratives with Markdown</a></div><div class="lev3 toc-item"><a href="#Math-narratives-with-LaTeX" data-toc-modified-id="Math-narratives-with-LaTeX-133"><span class="toc-item-num">1.3.3&nbsp;&nbsp;</span>Math narratives with $\LaTeX$</a></div><div class="lev3 toc-item"><a href="#Code-narratives" data-toc-modified-id="Code-narratives-134"><span class="toc-item-num">1.3.4&nbsp;&nbsp;</span>Code narratives</a></div><div class="lev2 toc-item"><a href="#Jupyter-Notebooks:-When?" data-toc-modified-id="Jupyter-Notebooks:-When?-14"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Jupyter Notebooks: When?</a></div><div class="lev2 toc-item"><a href="#Resources" data-toc-modified-id="Resources-15"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Resources</a></div>

# WORKSHOP 1 / Jupyter: A flyby introduction

## Jupyter Notebooks: What?

In the last few years, Jupyter Notebooks have become one of several standard tools being used in data analysis and data science today. In our workshop we will make heavy use of its features, but we will begin with the basics.  So what is a Jupyter Notebook?

A Jupyter Notebook can be minimally summarized as :

* an _interactive_, _executable_ **document**, that
* _mixes code_ and _data_ with **narrative** (text, technical notes, etc.), that
* is _sharable_ and _editable_.


Think of your notebook as a way to explore data, while sharing the code along side your data exploration, while simultaneously having a conversation (through the narrative) describing the code, the data and the goal(s) you're trying to accomplish.

Jupyter is much more than described above, and certainly not the only tool out there, but it is one of the best at what it does and is getting better with each new version!

## Jupyter Notebooks: Why?

In the scientific method, we often like to break the process down into the following rough steps:

1. formulate a question (from some general observation(s) / phenomenon)
2. propose a hypothesis that might explain the phenomenon
3. develop a prediction or model
4. gather data and test predictions / model
5. refine, alter, reject, expand hypothesis (and if necessary model)
6. **[iterate between steps 3-5]**
7. develop general theories from the outcomes

The scientific method works best when there is supporting documentation of each of these steps, because when sharing the outcomes of research, we are very much concerned with making sure our work is _repeatable_ and _reproducible_. Work is _repeatable_ when we execute the same methods on the same data and get the same result. It is _reproducible_ when we execute the same methods on new or different data and get results that are consistent with our theory or hypothesis. Both are necessary for good science. 

We know that today much of our science is _data-driven_.  We have large data sets to explore a hypothesis with, which is often supported by software. There is almost no end in sight to the volume of data being collected, and thus increasing our demands on good software that can be shared. 

The power of Jupyter Notebooks is demonstrated in their flexibility to support nearly all computational and data-driven phases of the scientific method. We can use the documentation and narrative building tools of the notebook to document our questions and hypotheses (1 and 2 above).  We can even link the documentation of our hypothesis to the original research being referenced or built upon.  Using the interactive programming paradigm of Jupyter (along side the documentation) we can test computations and build predictions and models in code, and even test early versions of those models interactively, perhaps even holding on to variations during iterative refinement.  

## Jupyter Notebooks: How?

### Narratives
Data-driven science today can be considered a combination of explanatory narrative with computational support for that narrative.  With technology like Jupyter, we can marry the two so that they can co-exist together where they (arguably) make sense to.  

Jupyter supports narratives by introducing the concept of a "cell", where a cell _is a container for stuff_ ... that stuff being one of the following

* executing code (in a language like Python),
* plain text and Markdown, 
* math via $\LaTeX$ mixed in with plain text and Markdown, or
* HTML.

What makes Jupyter different from other environments like Matlab is that cells containing executing code can be in a variety of supported languages.  While Python, R and Julia have robust support in Jupyter, other languages like Java, Javascript, FORTRAN and others can be included in a notebook.  Furthermore, cell can support dynamic visualizations in HTML and Javascript, taking interactivity to another level.  These are advanced topics and do require configuration beyond the scope of these workshops, but when it comes to flexibility, Jupyter is almost certainly on your side.

### Plain text narratives with Markdown

Jupyter cells that are designed for text can use a simple markup notation called [Markdown](https://daringfireball.net/projects/markdown/) and also $\LaTeX$ for mathematical notation. Markdown is much like plain old text with a few simple notations for things like bold, italics, headings and linking.  In general, Markdown is not hard to learn and the more you use it the more natural it becomes.  The motivating philosophy behind it was to strive for readability, simplicity and unobtrusiveness.

We won't spend a lot of time here with Markdown as there are superb tutorials online to get you on your way.

### Math narratives with $\LaTeX$
[$\LaTeX$](http://www.latex-project.org) has been used for over  three decades to produce highly professional scientific documents and books.  It is likely you own at least one textbook (or course notes) that were written using $\LaTeX$, especially if the subject is math, physics or computer science.  One reason you may not have learned to use it, however, is that in general it has a high learning curve for beginners, and while the final output of a $\LaTeX$ document is most certainly going to please, it may have taken much longer to produce your first version when compared to Google or Microsoft tools.

Luckily, Jupyter makes this much easier by removing the need for any complex formatting, and instead just supporting Markdown, while at the same time providing a great deal of support for the bulk of moderately complex $\LaTeX$ math without much of the fancy typesetting that brings beginners to cut their losses and quit before a final product can be realized.  Breath easy, this will not be as difficult as may seem.

If you need to build narratives that are math heavy, Jupyter support for basic mathematical notation is built in to help you build nice looking documents (with _beautifully_ typeset math) that could _theoretically_ be exported to PDF and look almost as nice as a document written in a word processing environment like Google Docs or Microsoft Word (or even full $\LaTeX$).  While admittedly lacking many of the features of a "word processor", for computational narratives the environment is compelling.

Mathematical narratives in Jupyter can be useful in many contexts, for example,

* to explain a theoretical result or show the usage of a specific mathematical technique,
* to provide a mathematical proof of a result that you are deriving,
* to show the mathematical underpinnings of an algorithm, 
* to show the formulae used in a concrete calculation or manipulation of data.


Here are a few  math examples (view the source code for this notebook to see how easily it can be done):

Suppose that,
$$
f'(x) = { \lim_{ \Delta x \rightarrow 0 }  { { f(x + \Delta x) - f(x) } \over \Delta x } }
$$

and maybe
$$
\int x^n dx = { {x^{n+1}} \over n+1 } 
$$
therefore we have
$$
\sum_{k}^\infty { {t^k \over k!} = e^t }
$$

and in rare cases
$$
\sum_{\substack{
   0<i<m \\
   0<j<n
  }} 
 P(i,j).
$$

But we must consider the following cases

$$ f(n) =
  \begin{cases}
    n \over 2       & \quad \text{if } n \text{ is even}\\
    -(n+1) \over 2  & \quad \text{if } n \text{ is odd}\\
  \end{cases}
$$

and in cases where we can't, we might use

$$
A_{m,n} = 
 \begin{pmatrix}
  a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
  a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
  \vdots  & \vdots  & \ddots & \vdots  \\
  a_{m,1} & a_{m,2} & \cdots & a_{m,n} 
 \end{pmatrix}.
$$



But alas, we will end with some Greek

$$\alpha, \beta, \gamma, \kappa, \Pi, \Upsilon, \Omega, \ldots$$

Once you get the hang of developing narratives, you might never look back and wonder how you lived without these tools.  Here are some additional resources for using $\LaTeX$:

* this [Math Wiki](https://en.wikibooks.org/wiki/LaTeX/Mathematics) for $\LaTeX$ has a nice summary of common math patterns,
* the [$\LaTeX$ Cheatsheet]() is an indispensable tool that summarizes a lot of the macros in a printable two-pager, bookmark it ... it is your friend for life if you go deeper into using it!

### Code narratives

The strength of Jupyter is apparent, when looking at what it can do to help you build interactive documents that aim to support computational work. Let's just work through a real example ... where we will:

* get USGS earthquake data,
* explore this data for the highest magnitude earthquakes (let's say the top 10)
* put those earthquakes on an interactive map.

**A Data Exploration of Earthquakes**

Let's say we are interested in programmatically obtaining the largest earthquakes in the 2016.

Luckily the [USGS has an API](https://earthquake.usgs.gov/fdsnws/event/1/) to do most of the heavy lifting for.  We simply need to create a query to the service to ask for all the earthquakes of magnitude 7 or greater in 2016 and then we'll filter them on some additional magnitude criterion on our own.

We'll first make the request, using the [documentation]() as a guide

In [1]:
import requests
r = requests.get("https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2016-01-01&endtime=2017-01-01&minmagnitude=7")

Let's store the data in a variable named `data`:

In [2]:
data = r.json()

And quickly inspect it (it should be in [GeoJSON](http://geojson.org)) ...

In [3]:
print(data) # yup, looks like GeoJSON

{'type': 'FeatureCollection', 'metadata': {'generated': 1496342529000, 'url': 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2016-01-01&endtime=2017-01-01&minmagnitude=7', 'title': 'USGS Earthquakes', 'status': 200, 'api': '1.5.7', 'count': 16}, 'features': [{'type': 'Feature', 'properties': {'mag': 7.6, 'place': '41km SW of Puerto Quellon, Chile', 'time': 1482675747010, 'updated': 1490309522040, 'tz': -240, 'url': 'https://earthquake.usgs.gov/earthquakes/eventpage/us10007mn3', 'detail': 'https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=us10007mn3&format=geojson', 'felt': 265, 'cdi': 9.1, 'mmi': 7.8, 'alert': 'yellow', 'status': 'reviewed', 'tsunami': 1, 'sig': 1130, 'net': 'us', 'code': '10007mn3', 'ids': ',at00oiqvxe,pt16360050,us10007mn3,', 'sources': ',at,pt,us,', 'types': ',dyfi,finite-fault,general-text,geoserve,impact-link,impact-text,losspager,moment-tensor,origin,phase-data,poster,shakemap,', 'nst': None, 'dmin': 0.355, 'rms': 0.79, 'gap': 2

This data (pretty printed for readability) looks like:
```json
{
    "type": "FeatureCollection",
    "metadata": {
        "generated": 1496339846000,
        "url": "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2016-01-01&endtime=2017-01-01&minmagnitude=7",
        "title": "USGS Earthquakes",
        "status": 200,
        "api": "1.5.7",
        "count": 16
    },
    "features": [
        {
            "type": "Feature",
            "properties": {
                "mag": 7.6,
                "place": "41km SW of Puerto Quellon, Chile",
                "time": 1482675747010,
                "updated": 1490309522040,
                "tz": -240,
                "url": "https://earthquake.usgs.gov/earthquakes/eventpage/us10007mn3",
                "detail": "https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=us10007mn3&format=geojson",
                "felt": 265,
                "cdi": 9.1,
                "mmi": 7.8,
                "alert": "yellow",
                "status": "reviewed",
                "tsunami": 1,
                "sig": 1130,
                "net": "us",
                "code": "10007mn3",
                "ids": ",at00oiqvxe,pt16360050,us10007mn3,",
                "sources": ",at,pt,us,",
                "types": ",dyfi,finite-fault,general-text,geoserve,impact-link,impact-text,losspager,moment-tensor,origin,phase-data,poster,shakemap,",
                "nst": null,
                "dmin": 0.355,
                "rms": 0.79,
                "gap": 29,
                "magType": "mww",
                "type": "earthquake",
                "title": "M 7.6 - 41km SW of Puerto Quellon, Chile"
            },
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -73.9413,
                    -43.4064,
                    38
                ]
            },
            "id": "us10007mn3"
        }, ...
```

Without going into tremendous detail here, we would like to iterate over this data and grab the following information:

* the **magnitude** of the earthquake,
* the **USGS location name** of that place (in English), and 
* the **lat/lon coordinates** so we can plot on the map.

**ALL** of this information is in our `data` object, so let's get to it:

In [4]:
for f in data['features']:
    place = f['properties']['place']
    magnitude = f['properties']['mag']
    lat, lon, z = f['geometry']['coordinates']
    
    print("{}\t{}\t{}\t{}".format(place, magnitude, lat, lon))

41km SW of Puerto Quellon, Chile	7.6	-73.9413	-43.4064
54km E of Taron, Papua New Guinea	7.9	153.5216	-4.5049
69km WSW of Kirakira, Solomon Islands	7.8	161.3273	-10.6812
54km NNE of Amberley, New Zealand	7.8	173.054	-42.7373
175km NE of Gisborne, New Zealand	7	179.1461	-37.3586
North of Ascension Island	7.1	-17.8255	-0.0456
South Georgia Island region	7.4	-31.8766	-55.2852
110km E of Ile Hunter, New Caledonia	7.2	173.1167	-22.4765
29km SW of Agrihan, Northern Mariana Islands	7.7	145.5073	18.5429
53km NNE of Visokoi Island, South Georgia and the South Sandwich Islands	7.2	-26.9353	-56.2409
2km N of Norsup, Vanuatu	7	167.3786	-16.0429
27km SSE of Muisne, Ecuador	7.8	-79.9218	0.3819
1km E of Kumamoto-shi, Japan	7	130.7543	32.7906
Southwest of Sumatra, Indonesia	7.8	94.3299	-4.9521
88km N of Yelizovo, Russia	7.2	158.5463	53.9776
86km E of Old Iliamna, Alaska	7.1	-153.4051	59.6363


For the purposes of our demo, we need not store this information in any special way.  We'll return to the data and our final task of plotting this on an _interactive_ map.  We'll use the library [folium](https://folium.readthedocs.io/en/latest/) to get the trick done as easily as possible.

In [5]:
import folium
map_eq = folium.Map(location=[37.733795, -122.446747], zoom_start=2)

In [6]:
for f in data['features']:
    place = f['properties']['place']
    magnitude = f['properties']['mag']
    lat, lon, z = f['geometry']['coordinates']

    # place the data on the map!
    folium.Marker([lon, lat], popup='place:{}\nmagnitude:{}'.format(place, magnitude)).add_to(map_eq)
map_eq

Cool!

## Jupyter Notebooks: When?

Jupyter Notebooks can be used in a variety of contexts, only a few of which are mentioned below:

* documenting homework,
* documenting an algorithm or research methodology,
* writing a paper that is about code or a dataset,
* developing and documenting a research idea that explores a theoretical result backed by an implementation (code),
* exploring and documenting a dataset or an algorithm on a specific dataset,
* testing code,
* playing with code,
* and many other contexts.


## Resources
The best place to begin learning how to use Jupyter is to go directly to 

* the [Jupyter.org](https://jupyter.org) site,
* but for inspiration see the [gallery of Jupyter examples](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks)
* and some for [Earth Sciences](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks#earth-science-and-geo-spatial-data-1) notebooks examples,
* and another [general gallery](http://nb.bianp.net/sort/views/), 
* and many, many more you may find and explore on your own ...