<h1><center> PPOL564 - Data Science I: Foundations </center><h1>
<h3><center> Lecture 3 <br><br><font color='grey'>Using Jupyter Notebooks</font></center></h3>

## Plan for Today:

- Ways to use/interact with **Python**
- **Kernels**
- **Using notebooks**
- **Magic Commands**
- Beginning delving into object-oriented programming with Python.

# Interacting with Python

There are a number of ways we can evaluate and read Python code.

**Interactively**

- From the shell using the **REPL** ("Read, Evaluate, Print, Loop")
    - Fire up a Python kernel from your terminal
    - Alternatively, we can use an `ipython` setup, which underpins the Jupyter Notebook (inputs/outputs) and provides a sleeker user interface.
    
    
- From a **Jupyter Notebook**

> Interactive programming allows you to dynamically explore code and to make sure your code is doing what you need it to do. It offers a fantastic way to learn the programming environment and problem solve. Moreover, as a data scientist, data structures come in all shapes and sizes and there is not always a "one size fits all" solution. Interactively probing the data in a setup like a Jupyter Notebooks offers us both the flexibility to explore and a way of recording a narrative so that we can remember how we got there. 

**Script**

- From the shell processing a .py script

> The advantages of a full scripting language is that we can build larger programs to process our data and then execute those programs in the shell. This offers us a way of streamlining data processing and analysis in important ways. 

**Somewhere _In-Between_**

- We can leverage both the power of a Jupyter Kernel and the ease of working in a script using `Atom` with the `Hydrogen` extension. 
- Likewise, `Spyder` is GUI that provides an "RStudio-like" experience when programming in Python.

# What is a Notebook and why use it?

The Jupyter Notebook is an open-source web application that allows you to **create and share documents** that contain 

- live code, 
- equations, 
- visualizations and 
- narrative text. 

Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. 

**Pros**:

- Notebooks are **ubiquitous**, 
- Reproducible: transmitting and conveying results
- We can **build code interactively** (like we do in `R`). This makes Jupyter notebooks particularly friendly when you're first learning Python
    - This also makes `Atom` + `Hydrogen` and `Spyder` equally useful.
- stable

**Cons**:

- _Non-linear_: sometimes we can fall out of sequence when writing code. E.g. write code dependencies _after_ we first need to use them.
- There is a process to spinning a Notebook up.


### `.ipynb` is really a `JSON` file
At it's core, an Jupyter notebook is a [JSON (JavaScript Object Notation) file](https://en.wikipedia.org/wiki/JSON).  Let's see what the notebook that we are currently using looks like:

In [1]:
!cat lecture_03-using-jupyter-notebooks.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<h1><center> PPOLS564: Foundations of Data Science </center><h1>\n",
    "<h3><center> Lecture 3 <br><br><font color='grey'>Using Jupyter Notebooks</font></center></h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plan for Today:\n",
    "\n",
    "- Ways to use/interact with **Python**\n",
    "- **Kernels**\n",
    "- **Using notebooks**\n",
    "- **Magic Commands**\n",
    "- Beginning delving into object-oriented programming with Python."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Interacting with Python\n",
    "\n",
    "There are a number of ways we can evaluate and read Python code.\n",
    "\n",
    "**Interactively**\n",
    "\n",
    "- From the shell using the **REPL** (\"Read, Evaluate, Print, Loop\")\n",
    "    - F

# Initializing a Notebook

There are two primary methods for initializing a notebook. 

1. **Via the command line**
    - Go into the working directory containing your `.ipynb` notebook. 
        - e.g. `cd /Users/me/Desktop/`
    - type `jupyter notebook`
    - the web application will open up in your default browser. 
    - from there, click on the notebook and "spin it up". The notebook will then be "running". 
    - We can close the notebook by clicking on the `Quit` and `Logout` buttons on the page. 
        - `Quit` == close the local server (i.e. the web application connection)
        - `Logout` == shut down the home page of the web application (but keep the server running)
    - We can also close the server connection in the console using the combo of `Control-C` in the console. 
    - We can also relocate the the server (say if we accidentally close the Notebook) by using the local URL pathway provided when the notebook first activates.


2. **Via the Anaconda Navigator** (requires you installing an [Anaconda distribution](https://www.anaconda.com/distribution/))

    - Click on the Anaconda Icon
    - Click "Launch" on the jupyter notebook icon.
    - The web application will immediately fire up (also yielding a console panel much like what we say via the command line approach). 
    - One issue is that your working directory (i.e. where the notebook thinks you are on your computer) will be where ever Anaconda is stored (for me, it's at the very top of my file directory). Housing your projects here can be suboptimal for a whole range of reasons, so we'll need to **_change the working directory_** to the actual location that we want. 
        - One benefit of spinning up a Jupyter notebook via the command line is that your working directory will always be where you initialized the notebook. 

---

# Kernels

A kernel is a computational engine that executes the code contained in a notebook document. A cell (or "Chunk") is a container for text to be displayed in the notebook or code to be executed by the notebook's kernel.

Though we can only have one type of kernel running for any given notebook (we can't change between kernels in the middle of a notebook), we can use jupyter beyond just a python kernel. Here is a [list of all the kernels](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels) that you can use with a jupyter notebook. For example, we can easily employ an [R kernel in a jupyter notebook](https://irkernel.github.io/). This was always the notebooks original intent. Actually, "Jupyter" is a loose acronym meaning Julia, Python and R

---

# Usage


## Code Chunks

Code chunks are what we use to execute Python (or whatever kernel we have running) code. In addition, we can write prose in a code chunk by altering the metadata regarding how the code should be run.

There are **two states** of a code chunk:

- **Edit Mode**: Edit mode is indicated by a <font color ="green">green cell border</font> and a prompt showing in the editor area. When a cell is in edit mode, you can type into the cell, like a normal text editor. Enter edit mode by pressing Enter or using the mouse to click on a cell's editor area.


- **Command Mode**: Command mode is indicated by a grey cell border with a <font color = "blue">blue left margin</font>. When you are in command mode, you are able to edit the notebook as a whole, but not type into individual cells. Most importantly, in command mode, the keyboard is mapped to a set of shortcuts that let you perform notebook and cell actions efficiently. For example, if you are in command mode and you press c, you will copy the current cell - no modifier is needed. Don't try to type into a cell in command mode. Enter command mode by pressing `Esc` or using the mouse to click outside a cell's editor area.

We can **switch between Markdown and Code chunks** either 

- By using the **_drop down menu_** in the tool bar (in either mode)


- By using the **_shortcut_**:
    - Press `y` when on the cell in Command Mode to switch to a code chunk.
    - Press `m` when on the cell in Command Mode to switch to a markdown chunk

## Executing Code 
A code chunk will always reflect the behavior of the kernel that you're using (e.g. a Python code chunk will follow Python coding Syntax). 

**Best Practices**

- Break code chunks up! 
- Every code chunk should render some output (the aim is to be able to read what we were doing without needing to fire the notebook back up)
- Use spaces. Keep the chunk readable. Less is more.

## Using Markdown
The Markdown chunks will use the [Markdown](https://www.markdownguide.org/) and will allow for writing mathematical equations using LaTex. 

# Header 1
## Header 2
### Header 3
#### Header 4

Bullet points 

- _Italics_ or *Italics*
- **Bold**
- **_Both_**

Enumerated Lists
 
1. _Italics_ or *Italics*
2. **Bold**
3. **_Both_**

We can <u>underline</u> using html tags. Likewise, we can change <font color ="darkred">font colors</font>,

<center> center </center>

And include [hyperlinks](https://en.wikipedia.org/wiki/Grape)

Write math inline with `$$`. For example, $y_i = \beta_0 + \beta_1 x_i + \epsilon$

Or stand alone,

$$pr(y_i=1) = \frac{1}{1+e^{\beta_0 + \beta_1 x_i + \epsilon}}$$

We can embed images. 

![](https://ichef.bbci.co.uk/news/624/cpsprodpb/3B83/production/_108753251_trudeau-crop-3.jpg)

And videos!

In [2]:
%%HTML
<iframe width="900" height="300" frameborder="0" src="https://www.bbc.com/news/av/embed/p07n10lt/49656611"></iframe>

## Using the Shell (Command Line)

As we saw in lab, we can use the command line from directly inside a notebook by preceding all shell code with a `!`. This allows use to really streamline our coding process. 

In [3]:
!pwd # Current working directory

/Users/ericdunford/Dropbox/Georgetown/Courses/PPOL564-Foundations/lectures/lecture_03


In [4]:
!git status #checking our git status (anything we need to commit?)

On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	[31mmodified:   ../lecture_02/lecture_02_version-control.ipynb[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31m./[m

no changes added to commit (use "git add" and/or "git commit -a")


In [5]:
!ls #list off all the files in the current working directory.

lecture_03-using-jupyter-notebooks.ipynb
my_fib_func.py


Also note that we can use inline magic to use shell commands.

In [6]:
%%sh 
for i in {1..10}
do
    echo $i
done

1
2
3
4
5
6
7
8
9
10


## Shortcuts

As with most user interfaces, Jupyter Notebooks have developed their own way of doing things. Thus there are a number of useful shortcuts that you can employ to help perform useful tasks. 

We can access a full (searchable) list of keyboard shortcuts by pressing `p` when in Command Mode, or by clicking the keyboard icon in the tools.

Important ones while in Command Mode:

- `a`: create a new code chunk _above_ the current one.
- `b`: create a new code chunk _below_ the current one.
- `ii`: interrupt the kernel (really useful when some code is running too long or you've accidentally initiated an infinite loop!
- `y`: code mode
- `m`: markdown mode
- `shift` + `m`: merge cells (when more than one cell is highlighted)

Important ones while in Edit Mode:

- `shit` + `ctrl` + `minus`: split cell

---

# Magic Commands

Magic commands, and are prefixed by the `%` character. These magic commands are designed to succinctly solve various common problems in standard data analysis. 

Magic commands come in two flavors: 

- **line magics**, which are denoted by a _single_ `%` prefix and operate on a single line of input, 
- **cell magics**, which are denoted by a _double_ `%%` prefix and operate on multiple lines of input. 

List off all the available magic commands.

In [7]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%python  %%python

Or consult the quick reference sheet of all available magic

In [8]:
%quickref

## Useful Magic

Here are some useful magic commands that come in handy as you're working with code.

### Bookmarking
"Come back here later"

In [9]:
%bookmark Home

See below

### Changing working directories 

In [10]:
%cd ~/Desktop

/Users/ericdunford/Desktop


In [11]:
%pwd

'/Users/ericdunford/Desktop'

Using the bookmark to return to where we were...

In [12]:
%cd -b Home

(bookmark:Home) -> /Users/ericdunford/Dropbox/Georgetown/Courses/PPOL564-Foundations/lectures/lecture_03
/Users/ericdunford/Dropbox/Georgetown/Courses/PPOL564-Foundations/lectures/lecture_03


In [13]:
%pwd

'/Users/ericdunford/Dropbox/Georgetown/Courses/PPOL564-Foundations/lectures/lecture_03'

### Writing code to files

Extremely useful when we develop some functionality that we'd like to utilize later on.

In [14]:
%%writefile my_fib_func.py
def fib(n):
    '''Fibonacci Sequence'''
    x = [0]*n
    for i in range(n):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
        else:
            x[i] = x[i-2] + x[i-1]
    return x

Overwriting my_fib_func.py


In [15]:
%ls # list files ( see our function)

lecture_03-using-jupyter-notebooks.ipynb
my_fib_func.py


### Reading in files

In [None]:
# %load my_fib_func.py
def fib(n):
    '''Fibonacci Sequence'''
    x = [0]*n
    for i in range(n):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
        else:
            x[i] = x[i-2] + x[i-1]
    return x

### Run an external file as a program

In [17]:
%run my_fib_func.py

### Timing Code

How fast does what we wrote run?

In [18]:
%time 
fib(10)

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 6.2 µs


[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

How long does many runs take (statistical sample)?

In [19]:
%%timeit
fib(10)

2.98 µs ± 92.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


### Look up object names in the name space

In [20]:
main_dat = [1,2,3,4]
main_key = ["a","b"]
x = 5
y = 6

In [21]:
%psearch main*

Whenever you encounter an error or exception, just open a new notebook cell, type `%debug` and run the cell. This will open a command line where you can test your code and inspect all variables right up to the line that threw the error. Type `n` and hit Enter to run the next line of code (The `->` arrow shows you the current position). Use `c` to continue until the next breakpoint. `q` quits the debugger and code execution.

### Asking for help

In [22]:
%%timeit?

---

# Notebook Extensions

We can expand the functionality of Jupyter notebooks through extensions. Extensions allow for use to create and use new features that better customize the notebook's user experience. For example, there are extensions for spell check, a table of contents to ease navigation, run code in parallel, and for viewing differences in notebooks when using Version control.

Download python module to install notebook extensions: https://github.com/ipython-contrib/jupyter_contrib_nbextensions


Using `PyPi` (module manager):
```
pip install jupyter_nbextensions_configurator jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextensions_configurator enable --user
```

Using `Conda` (Anaconda module manager):
```
conda install -c conda-forge jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextensions_configurator enable --user
```

Extensions can be activated most easily on the home screen when you first activate your Jupyter notebook.


## Useful Extensions

- **Collapsible headings**: allows you to collapse some parts of the notebooks.
- **Notify**: sends a notification when the notebook becomes idle (for long running tasks)
- **Code folding**: folds function, loops, and indented code chunks (makes things tidy)
- **nbdime**: provides tools for git differencing and merging of Jupyter Notebooks.
    - Requires installation: `pip install nbdime`

---

# Being Pythonic

- whitespace is significant
    - Indentations demarcate code blocks. 
    - Four spaces == indentation (PEP8)
- everything is an object
- the aim is readable code
- Updates to python are recorded in [Python Enhancement Proposals](https://www.python.org/dev/peps/) (or PEPs)
    - When there is a change to python, it is recorded here
    - also the python "philosophy" lives here in its suggestions (e.g. PEP8 re: spacing)

In [23]:
# Python has a sort of philosophy to it. 
import this 

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### Standard Library

Python comes with an extensive [standard library](https://docs.python.org/3/library/) and built-in functions.

In [24]:
import math
math.log(100)

4.605170185988092

In [25]:
import re
my_string = "this is a dog"
re.sub("this","That",my_string)

'That is a dog'

In [26]:
import random
random.randint(1, 10)

10

## Importing Modules

Excerpt from [Real Python post](https://realpython.com/python-modules-packages/)

Modular programming refers to the process of breaking a large, unwieldy programming task into separate, smaller, more manageable subtasks or modules. Individual modules can then be cobbled together like building blocks to create a larger application.

There are several advantages to modularizing code in a large application:

- **Simplicity**: Rather than focusing on the entire problem at hand, a module typically focuses on one relatively small portion of the problem. If you’re working on a single module, you’ll have a smaller problem domain to wrap your head around. This makes development easier and less error-prone.

- **Maintainability**: Modules are typically designed so that they enforce logical boundaries between different problem domains. If modules are written in a way that minimizes interdependency, there is decreased likelihood that modifications to a single module will have an impact on other parts of the program. (You may even be able to make changes to a module without having any knowledge of the application outside that module.) This makes it more viable for a team of many programmers to work collaboratively on a large application.

- **Reusability**: Functionality defined in a single module can be easily reused (through an appropriately defined interface) by other parts of the application. This eliminates the need to recreate duplicate code.

- **Scoping**: Modules typically define a separate namespace, which helps avoid collisions between identifiers in different areas of a program. (One of the tenets in the Zen of Python is Namespaces are one honking great idea—let’s do more of those!)

Functions, modules and packages are all constructs in Python that promote code modularization.

In [27]:
import sys

In [28]:
import numpy as np

In [29]:
from sklearn import metrics

### Installing Modules

In [30]:
!pip install numpy

[33mYou are using pip version 18.0, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [31]:
!conda install numpy

/bin/sh: conda: command not found
