# Introduction to Jupyter Notebooks

Jupyter notebooks is a GUI-based interactive Python session similar to running `python3` from command line

The following sections briefly describe how to navigate around the notebook environment

## Editing

Default key bindings for working with notebook cells:
-  move up / down between cells `arrow up`/`arrow down` or `j`/`k`
- 'up', 'down' arrows in menu shift cell order
- add new cell below current cell `b`
- add new cell above current cell `a`
- delete current cell `x` or `dd`

### Markdown
- use `#` to indicate headings in order to easily find sections of code
- sub headings are specified with multiple `#`'s ex:
    - `##`
    - `###`
    - `####`

# Heading 1
## Heading 2
### Heading 3
### Heading 4

### Checkpoints

Make frequent save of your notebook with Menu: --> Save icon

You can also take a snapshot of your current code and return to it later by using checkpoints:
- File --> Save and Checkpoint
- File --> Revert to Checkpoint


## Cell types

There are 3 cell types
- code `y`
- markdown `m`
- raw `r`

## Running cells
- ctrl-enter, 'run' icon in menu
    - bracked number (`[xx]`) to the left of an executed cell is a counter for number of commands run in current session
    - only present for `code` cells

- cells print output of last command
- cells retain all commands and variables of current session
- reset session from Menu:  Kernel --> Restart

- clear cell output from Menu:  
    1. Cell --> Current Output --> Clear
    1. Cell --> All Output --> Clear

# Data Science with Jupyter Notebook

Below is an example code cell to visualize some simple data using two popular packages in Python. 

We'll use [NumPy](https://numpy.org/) to create some random data, and 
[Matplotlib](https://matplotlib.org) to visualize it.


In [None]:
from matplotlib import pyplot as plt
import numpy as np

# Generate 100 random data points along 3 dimensions
x, y, scale = np.random.randn(3, 100)
fig, ax = plt.subplots()

# Map each onto a scatterplot we'll create with Matplotlib
ax.scatter(x=x, y=y, c=scale, s=np.abs(scale)*500)
ax.set(title="Some random data, created with Jupyter!")
plt.show()

## Experiment with the above cell and create different charts

For example:
- using squares
- the points are red

## Loading Datasets into Pandas

Work with `aapl_stock.json` from Lab 4 and `titanic` dataset

In [None]:
import pandas as pd
import json
import matplotlib.pyplot as plt

## Checking the notebook's working directory

In [None]:
import os
os.getcwd()

### Change working directory

In [None]:
os.chdir('/home/pi/workspace/labs')

In [None]:
os.getcwd()

## Titanic dataset

In [None]:
# run curl at command line to download local version of titanic.csv
!curl https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv --output titanic.csv

In [None]:
!head -n 4 titanic.csv # show first 4 lines of file

## AAPL stock dataset


In [None]:
json_file_path = '/home/pi/workspace/labs/aapl_stock.json'

In [None]:
aapl_stock = []
with open(json_file_path, 'r') as f:
    for line in f:
        line_json = json.loads(line) # convert json string into dictionary
        aapl_stock.append(line_json) # append dictionary to list

In [None]:
aapl_stock[:2]

In [None]:
# pandas dataframe constructor https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

df = pd.DataFrame(aapl_stock, columns=['date','close_price','volume','open','high','low'])


In [None]:
type(df)

In [None]:
df.head(3)

# Understanding DataFrame Objects

DataFrame constructor https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame

In [None]:
df.dtypes

# Reading and Writing Data in Pandas

https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html

# Select subset of data

https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html

# Create new columns of data

https://pandas.pydata.org/docs/getting_started/intro_tutorials/05_add_columns.html

# Calculate summary statistics
https://pandas.pydata.org/docs/getting_started/intro_tutorials/06_calculate_statistics.html

# Manipulating Text Data
https://pandas.pydata.org/docs/getting_started/intro_tutorials/10_text_data.html