![jupyter](https://jupyter.org/assets/try/jupyter.png "Jupyter")

# Introduction to Jupyter Notebook

A talk by Jan-Tobias Sohns.



<p>Visual Information Analysis Group, TU Kaiserslautern<img src="images/via_logo.png" alt="Logo" align="right" >
</p>

Email: j_sohns12@cs.uni-kl.de

## Learnings of this lecture:

- Jupyter notebook
    - What
    - Why
    - How
- Coding data visualization
    - Python
    - Pandas
    - Bokeh

## What is Jupyter Notebook?

Web-based application for prototyping, documenting and sharing code
<img src="images/code-and-jupyter.png" alt="Code vs. Jupyter" align="center" width=700>

## Why use Jupyter Notebook?

***Jupyter notebooks will make your life easier!*** 

- Run Python code in your web browser interactively
- Edit and re-run parts of your code in no time
- Use helpful IPython extensions for debugging + styling
- Document your code for a transparent workflow
- Share interactive python code

The *perfect* solution for a data scientist!

## How Jupyter Notebook works

<img src="images/schema.svg" alt="Jupyter Schema" align="center" width=700>

- `jupyter notebook` starts server and browser client
- selecting a file starts kernel

## Live demo: Startup & Cells

## Introduction to Python

- Every variable in Python is an object
- No need to declare variables before using them, or declare their type. 

In [None]:
number = 42
pi = 3.14
word = 'towel'
print(type(number))
print(isinstance(pi, float))
type(word)

### Lists

In [None]:
item_list = ['towel', 3.14, number]
print(item_list)
item_list[1]

### Dictionaries

In [None]:
phonebook_dictionary = {
    "John" : 938477566,
    "Jack" : 938377264,
    "Jill" : 947662781
}
print(phonebook_dictionary)
phonebook_dictionary["Jack"]

### Conditions

In [None]:
if number == 42 and pi is not 42:
    print("Is 42 in item_list?", number in item_list)
else:
    print("No answer today")

### Loops

In [None]:
for item in item_list:
    print(item)

In [None]:
for x in range(4):
    if x is 2: 
        continue
    print(x)

In [None]:
while x < 10:
    x += 1
    print(x)
    if x is 6:
        break

### Functions

In [None]:
def my_function(my_arg):
    print(my_arg)
    
my_function('towel')

### Classes

In [None]:
class MyClass:
    variable = "towel"

    def my_class_function(self, my_arg):
        print("Class var:", self.variable, ", my_arg:", my_arg)

x = MyClass()
x.my_class_function(42)

### Libraries

- arrays, matrices, vectorization: ***numpy***
- tabular data, processing, analysis: ***pandas***
- machine learning: ***scikit-learn***, *tensorflow*, *pytorch*
- visualization: ***bokeh***, *altair*, *plotly*, *seaborn*, *matplotlib*

In [None]:
# GOOD:
import numpy as np
import numpy.random as rnd

# Global variables:
data_x = np.array([1,2,3])
data_y = rnd.random(3)

## Introduction to Pandas

### Loading data

- from SQL: via the `sqlite3` package
- from csv / excel: with pandas

In [None]:
import pandas as pd

df = pd.read_csv('exams.csv')
df

### DataFrames

In [None]:
df

In [None]:
df.dtypes # note that 'DateTime' and 'String' could not be auto-inferred

In [None]:
# convert student_id to string
df['class'] = df['class'].astype(str)
df['student_id'] = df['student_id'].astype(str)

### Viewing a DataFrame

In [None]:
df.head(3) # gives the first 3 rows

In [None]:
df.tail(2) # gives the last 2 rows

In [None]:
df.shape # gives the dimensions

In [None]:
df.columns # gives the column names

In [None]:
df

### Indexing DataFrames

In [None]:
df

In [None]:
df = df.set_index('id')
df

Selection by label:

In [None]:
df.date # column='date'

In [None]:
df['date']

In [None]:
df.loc[4,'grade'] # row_index='4', column='grade'

Selection by position:

In [None]:
df.iloc[4] # row=4

In [None]:
df.iloc[4,3] # row=4, column=3 ('grade')

In [None]:
df

Selection by value:

In [None]:
df[df['grade']>3.0]

In [None]:
df

### Analysis in pandas

In [None]:
df.describe() # gives statistical summary

In [None]:
df.mean() # mean of each column

In [None]:
df.groupby('class').size() # group rows by exam and print the size of the groups

In [None]:
df.groupby('class').grade.mean() # average grade per class

## Introduction to Bokeh

Interactive, highly customizable plotting!

In [None]:
from bokeh.plotting import output_notebook, figure, show
output_notebook() # output in notebook

# create a new plot
p = figure(x_axis_label='grade', y_axis_label='credits', plot_width=310, plot_height=300)

# add scatter points
p.circle(source=df, x='grade', y='credits')

# display plot
show(p)

Bokeh comes with a lot of styling options, allowing you to create the plot of your choice!

In [None]:
from bokeh.models import CategoricalColorMapper
from bokeh.palettes import RdBu3

# create map from student_id to color
color_mapper = CategoricalColorMapper(factors=sorted(df.student_id.unique()), 
                                      palette=[RdBu3[2], RdBu3[0]])

# create a new plot with a title and tooltips
p = figure(title="student performance", x_axis_label='grade',
           y_axis_label='credit', plot_width=310, plot_height=300,
           tools=["wheel_zoom,pan,hover"], tooltips=[("student", "@student_id"),
           ("class", "@class"), ("date", "@date")])

# add colormapped scatter points
p.circle(source=df, x='grade', y='credits', legend_field="student_id",
         color={'field': 'student_id', 'transform': color_mapper}, 
         fill_alpha=0.8, size=10)

# change plot styling
p.legend.location = 'bottom_right'

show(p)

In [None]:
from bokeh.models import LogColorMapper, ColorBar, LogTicker
from bokeh.palettes import Viridis6 as palette
import examples

# create logarithmic map from unemployment rate to color
palette = tuple(reversed(palette))
color_mapper = LogColorMapper(palette=palette)

# prepare data
data = examples.prepare_unemployment_data()

# create plot
p = figure(title="Texas Unemployment Rates in %, 2009", 
           plot_width= 360, plot_height=300, tools="pan,wheel_zoom,reset",
           x_axis_location=None, y_axis_location=None)

p.patches('x', 'y', source=data,
          fill_color={'field': 'rate', 'transform': color_mapper}, 
          line_color="white", line_width=0.5)

p.add_layout(ColorBar(color_mapper=color_mapper, location=(0,0)), 'right')
p.grid.grid_line_color = None

show(p)

In [None]:
show(examples.antibiotics_plot())

You will discover more Bokeh plots in the exercises.

## What we learned today:

- What is jupyter notebook and how to use it
- How to use Python, Pandas and Bokeh to create data visualizations

## Questions?