In [1]:
from IPython.display import Image

In [1]:
from IPython.core.display import HTML

def set_css_style(css_file_path):
    """
    Read the custom CSS file and load it into Jupyter.
    Pass the file path to the CSS file.
    """

    styles = open(css_file_path, "r").read()
    return HTML(styles)

set_css_style('styles/custom.css')

# Preliminaries
- Why Python?
 
- Python covered 
 
- Functionality Covered 
 
- Introduction to Jupyter


### Why Python - 1

- Python is a general-purpose programming language and a popular alternative for data analysis
  - Can be used to analyze data or to write, a business application (Dropbox) or a full-fledged web application (Instagram).
    
    
- A _de facto_ favorite for working with complex data, where the majority of the time is spent preparing the data for analysis.


 - Has access to a vibrant ecosystem of statistical tools and libraries (NTS fraction of the data analysis time).
 
 
- An easy-to-learn language that provides a high-level of abstraction without sacrificing functionality and robustness.


### Why Python - 2

- Batteries included 

|Domain | Python Library |
|:-------|:----------------:| 
| Astronomy | `Astroppy` |
| Molecular biology |  `BioPython` | 
| Neuroscience  | `PsychoPy` |
| Seismology | `ObsPy` | 
| Quantitative finance |  `Zipline` | 
| Statistics | `Statsmodels` |
| Language and topic <br/>modeling | `Gensim` | 
|_etc_. | ... |
  
  
- A really thriving community with a rich ecosystem of contributions

### Necessary "Data Analysis" Functionality in a Programming Language?

- Retrieve and Write data`.`
 - Understand the different and standard file formats such as Excel, `tsv`, `csv`, `R`, _etc_`.`
 
 
- Built-in data types and collections to store data during computation`.`
  - Example table or Matrices`.`
  - Efficient handling of those structures to extract a subset, merge or concatenate datasets, _etc_`.`
  
  
- Conditionals and iteration`.`
  - Repeat logic on subset or all entries in a data set`.`
  
  
- Mathematical functionality`.`
  - Compute means, averages, rolling windows, _etc_.`.`
 

### General Functionality in a Programming Language?

- Features that facilitate network communication`.` 

- Ability to write user interface`.` 

- Software engineering concepts such as class, inheritance, _etc_`.`

- Functionality for parallel processing`.`

- Those general concepts, although useful, are not necessary`.`

### Python Covered in This Workshop

- Python as a tool for Data Analysis`.`
  - We will not cover the functionality required to develop a full-fledged program but rather on the functionality used working with data`.`



- We will use principally pandas and will explore `matplotlib` and `seaborn` for visualizing data
 - Think of pandas as Excel on steroids`.`


### Python Covered in This Workshop - Cont'd 

- Preliminaries`.`
- Introduction to Python`.`
- Introduction to Pandas`.`
- Exploring Data in pandas`.`
- Common DataFrame operation and missing Data`.`
- DataFrame Groups`.`
- Combining and stacking Data in Pandas`.`
- Plotting and viz - 1`.`
- Plotting and viz - 2`.`
 

### About the Workshop

- Interactive`.`


- Short module (~30-45 minutes)`.`


- Presentation slides based on the ipython notebook)`.`
  - Jupyter notebook contain much information for future reference`.`


- Practical Sessions at the end of each module (~20-25 minutes)`.`


- Last half-day reserved for`:`
  1. Catching up on previous practical`.`
  2. Working on a full dataset provided by the instructors`.`
  3. Working on your own dataset.
    - Let us know today`!`
    

### Python versions

- There are two currently used versions of Python (Python 2  and Python 3)`.`


- Python code in one version is not guaranteed to work for the other version as some new functionality in v.3 is not available in v.2, and some constructs in v.2 are no longer accepted syntax in 3`.`
    - Minor changes to convert Python 2 to 3`.`


- Python 2 (more recent version is 2.7) will not be supported past 2020`.`


- We will be using Python 3 in this workshop`.`

### Running (interpreting) Python Code

- Series of Python statements (commands) are bundled into a script (program) that, by convention, has the ".py" extension`.`
   - For example, you could bundle your analysis int a program called `compute_co-prescribed_drugs.py``.`


- Python is an interpreted language`.`
  - Statements are executed successively`.`



- A Python program is run by executing it using the python interpreter

```bash
    python plot_distribution.py
```

### Working in an Read–Eval–Print Loop Environment 

- In scientific (or engineering) computing, it’s common to use the Python Shell in a process referred to a Read–Eval–Print Loop (__REPL__)`.`



- In short, you type, a command, the command is evaluated, the result of the command is printed and you repeat the process (__l__oop) until you complete your task`.`



- The   $~~>>>$ symbols are used to prefix the code that we type into the interpreter
  - The commands we type are called __statements__`.`
  - So the statements that follow the $~~>>>$ are what is executed
 
 
- Outputs are printed withou any prefix`.`




### The interpreter 
When not already running a command, the Python Interpreter is ready receive new command.

```python
>>>
```

### A First Command

```python
>>> 2+4
6

>>>
```

### More Commands

```python
>>> 2+4
6

>>> 39/4 
9.75

```

- _etc_.

### What are Jupyter Notebooks

- A web-based environment that is used for programming and documenting code`.`
  - Includes pretty-printed code and output, formatted text, interactive elements, _etc., in a single document`.`
    - Similar to `Matlab`, `Mathematica` and `SAGE` and other notebooks



- In addition to Python, Python notebooks support other languages such as R, Scala, _etc_.

### Jupyter Notebook Screenshot
![](images/example_chunk_jupyyter.png)

### Why Use Jupyter Notebooks?

- Create readable analyses of interactive code, which is notoriously difficult to document`.`



- Facilitate collaboration as notebooks can be shared in a variety of formats with people with different levels of expertise`.` 
  - Converted into `pdf` documents, slides, shareable `HTML` pages _etc_.
    - These slides are automatically generated from the workshop's Python Notebook`.`


- Code and idea can be well illustrated with text and, if needed, rich media`.`


- Ideal for teaching and explaining complex ideas`.`
  - Breaking down concepts into small, well-documented chunks of code`.`

### Jupyter Notebooks
![](./images/Jupyter_Nature.png)

### Running Jupyter notebook

- After installing either Anaconda or Miniconda as described on the workshop page, launch a Jupyter notebook by typing this command from the terminal`:`

```bash
jupyter notebook
```

- The notebook should open automatically in your browser`.`

  - If it does not or if you wish to use a different browser, open a page and browse to the URL`:`
  
              http://localhost:8888
  

### Jupyter Notebooks Tree (Dashboard)
![](./images/jupyter_tree.png)

### Jupyter Notebook Behind the Scenes

- After typing the command Jupyter notebook, the following happens:


  - A Jupyter Notebook server is automatically created on your local machine`.`
    - The Jupyter Notebook server runs on your machine only and does not use an internet connection`.`

  -  The Jupyter Notebook server opens the Jupyter notebook client, also known as the notebook user interface, in your default web browser`.`


- To create a new Python notebook select the “New” drop-down on the upper right of the screen`.`

### Create a new notebook 

- The dashboard show file and folder at the path from which it was started`.`

- To create a new Python notebook select the “New” dropdown on the upper right of the screen`.`

![Jupyter Images](images/jupyter_new_notebook.png "Title")


### Components of a notebook

- A notebook is comprised of one or more cells`.`
  - Each cell can contain Python statements (Code) or text (Markdown)`.`
    - Markdown is a syntax that facilitates formatting text`.`

    ![](images/jupyter_cell.png)


### Components of a notebook

- A notebook is comprised of one or more cells`.`

  - Each cell can contain Python statements (__Code__) or text (__Markdown__)`.`

    - Markdown is a syntax that facilitates formatting text`.`
    - See [here](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) for a quick overview of the syntax`.` 

    ![](images/jupyter_cell.png)


### Executing  a cell

- You can run a cell (render markdown or execute code) by pressing `SHIFT+ENTER`

![Jupyter Images](images/jupyter_example.png)


### Jupyter Notebook File

- You can click/edit the temporary tile (`Untitled`) to change the file name`.`
- The notebook files have the `ipynb` extension`.`
- The files are stored in a format called `JSON.`
  - They are not meant to be human readable/editable. Therefore you will need to open them on Jupyter to edit or run them`.`


### Practical 1: Getting Familiar with the Jupyter notebook.

`1.` Create a `Python 3` notebook.

  - Open a Terminal and type the following command:

```bash
     jupyter notebook
 ```
  - Create a new Python 3 file. By default, the file is created in the folder from which the notebook was started.

  - Name the file `Test_notebook` 

`2.` Change the type of the first cell to Markdown, paste the code below into it and render it by pressing __`SHIFT+ENTER`__

```
### Analysis of Experimental Data From Kaneohe Bay

This notebook plots the distribution of temperatures in `Celcius` measured at Kaneohe Bay (decimal degrees: 21.4293766,-157.7912551), during 132 consecutive days.


![Jupyter Images](images/map.png)


The data was collected by John Doe on 01/01/2017.
This notebook was created by Jane Doe on 04/01/2017.

```

`3.` Create a Code cell and paste the following code into it and execute by pressing __`SHIFT+ENTER`__

```python
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

fig, axs = plt.subplots(1,2, figsize=(15, 5))
temperatures = pd.read_csv("https://www.dropbox.com/s/920mlvt77nh3z23/experiment_john_doe.csv?dl=1")
temperatures.plot(ax=axs[0])
temperatures.plot(kind='kde', ax=axs[1])
```

`4.` Close your notebook browser page and stop the server that you started in the `terminal window by typing __CTRL+C__`.

`5.` Navigate to the folder where you created the `ipython` notebook file. Open the file using the following command:

```bash 
jupyter notebook first_test.ipynb
```

`6.` It is customary to have the imports and configuration statement -- those are the three first lines in the Code cell above -- in a separate cell. In what follows, you will move the imports and configuration statements to a new `Code` cell, which you will place at the top of the notebook.


- Create a new Code cell and move onto it the lines starting with `import` and `%matplotlib` from the old Code cell.

- Click on the new cell and use the up arrow (button's menu) to move the newly created cell to the top of your notebook.

- Re-run all the cells to make sure everything works.
- Note that you can run each cell independently using __`SHIFT+ENTER`__ or by choosing `Cell-> Run All` from the top menu.


7. The final notebook should look as in the image below.

![](images/practical_1_output.png)