# Environment Basics

Crash course into the Python language and the Jupyter environment.

While Python is the programming language for this workshop, Jupyter is the interactive environment in which we will operate. The first workshop looks at language basics and built-in packages, while future workshops focus on packages open up that many possibilities.

This workshop starts at an introductory level and ends up with advanced (optional) techniques. To tell if this workshop is for you, this is the intended audience:
- if you don't have any programming experience: please be very attentive and try to hit the ground running
- if you have some programming experience in another language but not Python: this workshop will draw the parallels between Python and other programming languages, so you can quickly translate your skills over
- if you have a lot of experience in Python: skim the content to see if there's anything relevant for you (perhaps string interpolation, `for-else`, type hints, or doctests?) in the first section, otherwise, you can learn about Jupyter
- if you proficient in both Python and Jupyter: you might have something new to find out today (perhaps HTML markup, auto-reloading or interactivity?)

### Table of contents

- [The Python Language](#Python)
 - [Primitives](#Primitives)
 - [Variables](#Variables)
 - [Data Structures](#Data-Structures)
   - [List](#List)
   - [String](#String)
   - [Dictionary](#Dictionary)
   - [Others](#Other-Data-Structures)
   - [Iteration](#Iteration)
 - [Functions](#Functions)
 - [Conditional Statements](#Conditional-Statements)
 - [Error Handling](#Error-Handling)
 - [Documentation](#Documentation)
 - [Relevant Built-ins](#Relevant-Built-ins)
   - [Misc Functions](#Misc-Relevant-Functions)
   - [Regular Expressions](#Regular-Expressions)
   - [OS](#OS)
   - [JSON](#JSON)
   - [Date and Time](#Date-and-Time)
   - [Serialization](#Serialization)


- [The Jupyter Environment](#Jupyter)
 - [Navigation](#Navigation)
     - [Keyboard Shortcuts](#Keyboard-Shortcuts)
     - [The Interface](#Interface-Navigation)
 - [Features](#Term-Definitions)
     - [Cells](#About-Cells)
     - [Kernel](#About-the-Kernel)
     - [Notebook](#About-the-Notebook)
 - [Code Cells](#Code-Cells)
     - [Execution History](#Execution-History)
     - [Outputting Tricks](#Outputting-Tricks)
 - [Text Cells](#Text-Markup)
   - [Markdown](#Markdown)
   - [Latex](#Latex)
   - [HTML](#HTML)
 - [Magics](#Magics)
   - [Timing](#Timing)
   - [Auto-reloading](#Auto-reloading)
 - [Shell Integration](#Shell-Integration)
 - [Interactivity](#Interactivity)


- [Topics Not Covered](#Topics-Not-Covered)
- [Further Reading](#Further-Reading)

## Python

Python is a general-purpose dynamic programming language that uses significant whitespace and emphasizes readability.

A few reasons why:
- friendly syntax: easy to pick up and expressive, not unnecessarily verbose
- popular: packages, tutorials and support are constantly being generated — at such a high volume that quality ones are bound to emerge
- most widely used language for scientific computation (Artificial Intelligence, Machine Learning, Data Science)
- prolific in other areas as well: backend web development, big data, embedded systems, general scripting (and a few uses in game development interface building)


---

**HOW TO USE THIS DOCUMENT**: 
- Run a piece code by selecting it and pressing **`ctrl+enter`**
- Click inside cells to edit code
- Press **b** to insert a new cell
 - [+ Code] button at the top left, in Colaboratory (cloud-version).

More commands will be shown in the second part of the workshop.

---

Comments are not executed:

In [1]:
# this is a comment

### Primitives

The most used Python primitives are:
 - `int`: integers
 - `float`: real numbers
 - `str`: strings (note that there is no character primitive)
 - `bool`: truth value
 - `None`: absence of a value (similar to `null` in other languages)

### Basic arithmetic

In [2]:
1 + 9

10

In [3]:
2 ** 3  # power operator

8

### Variables

Assign a value to a symbol:

In [4]:
name = 'Peter Parker'  # strings use either single or double quotes
age = 21
height = 5.84
married = False

In [5]:
age / 2

10.5

In [6]:
age // 2  # only whole part

10

**💪 Exercise**: create a variable called `favorite_number`, which stores your favorite (as an integer) and then compute its square:

In [7]:
favorite_number = 24

In [8]:
favorite_number ** 2

576

**ℹ️ Tip**: always give descriptive names to variables, readability over terse. A descriptive name makes it easy to work with the variable and allows to programmer to more intuitively estimate what's going to happen.

---

Find out what kind of value a variable holds:

In [9]:
type(age)

int

In [10]:
type(height) is float

True

---

Convert compatible values:

In [11]:
int(2.8)

2

In [12]:
int('2')

2

In [13]:
float(2)

2.0

In [14]:
str(123)

'123'

**💪 Exercise**: check the type of before `'7'` and after conversion to `int` and `float`.

In [15]:
type('7')

str

In [16]:
type(int('7'))

int

In [17]:
type(float('7'))

float

### Data Structures

Data structures are containers for other values.

#### List

A list is a sequence of elements.

In [18]:
squares = [0, 1, 4, 9]

In [19]:
squares

[0, 1, 4, 9]

In [20]:
squares.append(25)  # add one element at the end

In [21]:
squares

[0, 1, 4, 9, 25]

In [22]:
len(squares)  # number of elements

5

In [23]:
squares[0]  # first element, zero-indexed

0

In [24]:
squares[-1]  # last element, "wrap-around"

25

In [25]:
4 in squares  # test membership

True

In [26]:
5 in squares  # the list does not contain 5

False

In [27]:
squares.index(4)  # get the index of an element

2

In [28]:
squares

[0, 1, 4, 9, 25]

In [29]:
squares[1:]  # slice: from the second element onwards

[1, 4, 9, 25]

In [30]:
squares[:-1]  # all but the last element

[0, 1, 4, 9]

In [31]:
squares + [49, 'abc', 'xyz']  # concatenate lists

[0, 1, 4, 9, 25, 49, 'abc', 'xyz']

**💪 Exercise**: create another list `cubes` containing the first three cubes. Then create another list `L` by concatenating `squares` and `cubes`. Access the element in the middle of the list:

In [32]:
cubes = [0, 1, 8]
L = squares + cubes

In [33]:
L

[0, 1, 4, 9, 25, 0, 1, 8]

In [34]:
L[len(L) // 2]

25

#### String

A string behaves just like a list, except its elements are letters.

**ℹ️ Tip**: compared to other C-like languages, they are not made up of `char`s, and are immutable.

In [35]:
name

'Peter Parker'

In [36]:
len(name)

12

In [37]:
name[0]

'P'

In [38]:
name[1:]

'eter Parker'

In [39]:
'Hello ' + name

'Hello Peter Parker'

In [40]:
'ha' * 3

'hahaha'

---

Some useful built-in functions operating on strings:

In [41]:
name.split()  # by default, it splits by blanks

['Peter', 'Parker']

In [42]:
name.lower()

'peter parker'

In [43]:
'-'.join(['ab', 'cd', 'yz'])

'ab-cd-yz'

In [44]:
f'{name} is {age} years old'  # note the f at beginning

'Peter Parker is 21 years old'

In [45]:
'abc abcd'.replace('ab', 'X')

'Xc Xcd'

In [46]:
'  abc  '.strip()

'abc'

---

Control how many decimals a number is displayed with:

In [47]:
pi = 3.14159

In [48]:
f'π is {pi}'

'π is 3.14159'

In [49]:
f'π as a whole number {pi:.0f}'  # no decimals

'π as a whole number 3'

In [50]:
f'first decimals of π {pi:.3f}'

'first decimals of π 3.142'

Format sub-unitary (and not only) numbers as percentages:

In [51]:
per = 0.705  # 70.5%
f'as a percentage {per:.1%}'

'as a percentage 70.5%'

In [52]:
f'as a percentage {per:.4%}'

'as a percentage 70.5000%'

---

**💪 Exercise**: create a string `favorite_flavor` which contains your favorite ice cream flavor and then another string which containing the sentence "My favorites are: vanilla and 24.00" using `favorite_flavor` and `favorite_number` (shown with two decimals):

In [53]:
name = 'Peter Parker'

In [54]:
favorite_flavor = 'vanilla'

sentence = f'My faves are: {favorite_flavor} and {favorite_number:.2f}'

In [55]:
sentence

'My faves are: vanilla and 24.00'

**ℹ️ Tip**: Python strings can store any* unicode character, making working with symbols, different alphabets, or even emojis easy. Quotation marks can be included in a string by alternating the types of marks (e.g.: `"Trader Joe's vegetables"`), or by escaping it with a backslash (e.g.: `'Trader Joe\'s vegetables'`).

#### Dictionary

A dictionary behaves like a list where the keys are not numbers.

**ℹ️ Tip**: has the time complexity of a hashmap, but through some [very clever engineering](https://mail.python.org/pipermail/python-dev/2012-December/123028.html), its elements are ordered.

In [56]:
superhero_ages = {
    'Ironman':   36,
    'Hulk':      38,
    'Thor':      'varies',  # does not have to be homogenous
}

In [57]:
len(superhero_ages)

3

In [58]:
superhero_ages['Ironman']

36

In [59]:
superhero_ages['Spiderman'] = 21  # add an element

In [60]:
superhero_ages

{'Ironman': 36, 'Hulk': 38, 'Thor': 'varies', 'Spiderman': 21}

In [61]:
del superhero_ages['Hulk']  # remove an element

In [62]:
superhero_ages

{'Ironman': 36, 'Thor': 'varies', 'Spiderman': 21}

In [63]:
'Deadpool' in superhero_ages  # whether the key is in the dictionary

False

In [64]:
'Thor' in superhero_ages

True

In [65]:
21 in superhero_ages

False

**💪 Exercise**: create a dictionary which contains the words `"cardinal"` and `"gold"` as keys and their translation in your mother tongue as their values (or their hex color values if your mother tongue is english):

In [66]:
color_translations = {
    'cardinal': 'rosu',
    'gold':     'aur',
}

In [67]:
color_translations

{'cardinal': 'rosu', 'gold': 'aur'}

---

#### Other Data Structures

Tuples are similar to lists, but they are meant for heterogenous data.

In [68]:
info = ('Los Angeles', 21, 'vanilla')  # information about a person

In [69]:
info[0]

'Los Angeles'

**ℹ️ Tip**: you can think of tuples as lightweight classes. If interested, you can also look into [named tuples](https://docs.python.org/3/library/collections.html#collections.namedtuple) and [data classes](https://realpython.com/python-data-classes/).

---

Sets are similar to lists, but they are unordered* and may contain no duplicates. Their advantage is that checking for membership takes constant time.

In [70]:
d = {'a': 1, 'b': 2}

In [71]:
s = {2, 1, 4}

In [72]:
2 in s

True

Converting a list into a set yields its unique elements:

In [73]:
set([1, 2, 1, 1, 3])

{1, 2, 3}

**ℹ️ Tip**: checking if an element is contained in a collection is trivial when there are 5 elements to compare against, but when there are thousands or millions of them, and you need to do many such queries, it is essential to be able to do it quickly.

#### Iteration

Going through each element of a collection is called an interation.

In [74]:
squares

[0, 1, 4, 9, 25]

In [75]:
# list iteration
for sq in squares:
    print(sq)

0
1
4
9
25


In [76]:
for key in superhero_ages:
    print(key)

Ironman
Thor
Spiderman


In [77]:
for val in superhero_ages.values():
    print(val)

36
varies
21


In [78]:
# dict iteration
for name, age in superhero_ages.items():
    print(name, age)

Ironman 36
Thor varies
Spiderman 21


---

Generate sequential numbers:

In [79]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


Optionally, takes `start`, `stop` and `step` arguments:

In [80]:
for i in range(4, 12, 2):  # start from 4, add 2 each time, stop when reaching 12
    print(i)

4
6
8
10


**ℹ️ Tip**: this is the pythonic way for C-like laguages' `for(i = 0; i < n; ++i)` idiom

---

Iteration-related functions:

In [81]:
colors = ['red', 'green', 'blue', 'black']

In [82]:
for col in colors:
    print(col)

red
green
blue
black


In [83]:
for col in reversed(colors):  # from last to first
    print(col)

black
blue
green
red


In [84]:
for i, col in enumerate(colors):  # also give the index of each element
    print(i, col)

0 red
1 green
2 blue
3 black


In [85]:
for i, col in reversed(list(enumerate(colors))):
    print(i, col)

3 black
2 blue
1 green
0 red


In [86]:
for i, col in enumerate(reversed(colors)):
    print(i, col)

0 black
1 blue
2 green
3 red


In [87]:
for sq, col in zip(squares, colors):  # pair up elements of two lists (up to the shortest one's length)
    print(sq, col)

0 red
1 green
4 blue
9 black


In [88]:
squares  # just as a reminder

[0, 1, 4, 9, 25]

---

In [89]:
# `while` is not used as often
k = 0
while k <= 10:
    k += 1  # idiom, k++ does not exist (because it makes implementing operator overload much easier)

In [90]:
k

11

**💪 Exercise**: print, one by one, the numbers from `100` to `110`, squared:

In [91]:
for i in range(100, 111):
    print(i**2)

10000
10201
10404
10609
10816
11025
11236
11449
11664
11881
12100


---

Multiple ways of creating a collection based on another's elements:

In [92]:
colors

['red', 'green', 'blue', 'black']

In [93]:
color_lengths = []

for col in colors:
    l = len(col)
    color_lengths.append(l)

In [94]:
color_lengths

[3, 5, 4, 5]

In [95]:
[len(col) for col in colors]  # list comprehension, equivalent but shorter

[3, 5, 4, 5]

In the next subsection we'll learn an even shorter way, using `map`

In [96]:
{col: len(col) for col in colors}  # dict comprehension

{'red': 3, 'green': 5, 'blue': 4, 'black': 5}

**💪 Exercise**: create the list `first_letters` which contains the first letter of each `color`:

In [97]:
first_letters = [col[0] for col in colors]

In [98]:
first_letters

['r', 'g', 'b', 'b']

### Functions

A _function_ is a block of code which only runs when called. Data can be given as input and results can be returned.

In [99]:
def double(n):
    # takes an argument and returns it doubled
    return n * 2

In [100]:
double = lambda n: n * 2  # equivalent but shorter way to write simple functions

In [101]:
double(5)

10

Duck-typing allows the function to work on any kind of argument that supports multiplication:

In [102]:
double(1.2)

2.4

In [103]:
double('ha')

'haha'

In [104]:
double([1, 2, 3])

[1, 2, 3, 1, 2, 3]

**ℹ️ Tip**: it is best practice to name functions as sverbs, since they perform an action.

---

Functions can take anything as arguments, even other functions.

One such special function is `map`, which takes two arguments, a function and a collection, and it applies the function to each element of the collection.

In [105]:
map(len, colors)  # get the length of each color, equivalent to definitions above

<map at 0x10f96a630>

In order to preserve memory and processing time, `map` is designed to use lazy-evaluation, meaning that it actually returns a `generator` object, which yields one element at a time, upon being called:

In [106]:
result = map(len, colors)
for l in result:
    print(l)

3
5
4
5


Calling it again correctly yields no elements, as they have all been consumed:

In [107]:
for l in result:
    print(l)

Another way to force the evaluation of the entire generator is converting it into a list:

In [108]:
list(map(double, squares))

[0, 2, 8, 18, 50]

In [109]:
squares  # just to remember what the original list looked right

[0, 1, 4, 9, 25]

**ℹ️ Tip**: Such techniques to do impact performance when the list is made up of only 5 elements, but when there are thousands or millions of elements, it would be expensive to wait to create the whole list in the beginning and to then only consume elements one by one. Or we might not even need more than the first couple of elements. 
This applies not only when the length of the list is very large, but also when coming up with each of them is computationally expensive. We will not go in depth, but look into [iterators](https://www.programiz.com/python-programming/iterator) and [generators](https://www.programiz.com/python-programming/generator) to learn more about them.

---

A function can have default arguments:

In [110]:
def repeat(s, times=3):
    # by default, it repeats 3 times
    return s * times

In [111]:
repeat('ha ')

'ha ha ha '

In [112]:
repeat('ha ', 5)

'ha ha ha ha ha '

In [113]:
repeat('ha ', times=5)  # arguments can be named

'ha ha ha ha ha '

In [114]:
def repeat_extra(s):
    # returns multiple values
    return len(s), s * 3

In [115]:
repeat_extra('ha')

(2, 'hahaha')

In [116]:
result = repeat_extra('ha')
length   = result[0]
repeated = result[1]

In [117]:
(length, repeated) = repeat_extra('ha')  # shorter way of destructuring the result

In [118]:
length

2

In [119]:
repeated

'hahaha'

In [120]:
a, b, c = [1, 2, 3]  # works everywhere

In [121]:
a

1

In [122]:
[a, b, c] = [1, 2, 3, 4]  # gives an error upon mismatch of length

ValueError: too many values to unpack (expected 3)

**💪 Exercise**: create a `yell` function that takes a string. 
It returns the string in all caps and adds an exclamation mark at the end.

In [123]:
def yell(s):
    return s.upper() + '!'

In [124]:
yell('hello')

'HELLO!'

### Conditional Statements

For control flow

In [125]:
def is_even(n):
    # tell whether the argument is divisible by two
    return (n % 2 == 0)

In [126]:
if is_even(2):
    print('it works!')
else:
    print("it's broken")

it works!


Multiple tests can be sequenced using the `if-elif-else` construct:

In [127]:
x = 13
if x % 3 == 0:
    print('divisible by three')
elif x % 5 == 0:
    print('divisible by five')
else:
    print('divisible by neither')

divisible by neither


You can also do inline evaluations (similarly to the ternary operator `: ?` in C-like languages):

In [128]:
size = 'big' if 2**10 > 100 else 'small'

In [129]:
size

'big'

---

Some values are truthy, some are falsy. So it is pythonic to check for empty list by writing `if l` instead of `if l is nto []`:

In [130]:
if True:
    print('truthy')

truthy


In [131]:
if 1:
    print('truthy')

truthy


In [132]:
if 0:
    print('truthy')

In [133]:
if 1.0:
    print('truthy')

truthy


In [134]:
if []:
    print('truthy')

In [135]:
if '':
    print('truthy')

In [136]:
if None:
    print('truthy')

---

`break` exits the loop:

In [137]:
for x in range(4):
    print(x)

0
1
2
3


In [138]:
for x in range(4):
    if x == 2:
        break
    print(x)

0
1


`continue` skips the current iterration:

In [139]:
for x in range(4):
    if x == 2:
        continue
    print(x)

0
1
3


---

Just like with mapping, there are multiple ways to create a list based on another's elements, while applying some conditions.

In [140]:
small_squares = []
for sq in squares:
    if sq < 5:
        small_squares.append(sq)

In [141]:
small_squares

[0, 1, 4]

In [142]:
[sq for sq in squares if sq < 5]  # works in list comprehension as well

[0, 1, 4]

In [143]:
list(filter(is_even, squares))  # filter is another higher-order-function, which returns just those elements that pass the predicate

[0, 4]

**ℹ️ Tip**: an `else` can also be attached to a `for` or `while` loop:

In [144]:
target = 10  # change it! make it something that is and then something that is not in the list

for x in [2, 5, 1, 4, 2]:
    if x == target:
        break

else:  # no break
    print('target not in the list')

target not in the list


---

**💪 Exercise**: print the numbers from `1` to `25`, but for each multiple of 3 print `fizz` and for each  multiple of 5 print `buzz`

In [145]:
for n in range(1, 26):
    s = ''
    if n % 3 == 0:
        s += 'fizz'
    if n % 5 == 0:
        s += 'buzz'
    
    print(s if s else n)

1
2
fizz
4
buzz
fizz
7
8
fizz
buzz
11
fizz
13
14
fizzbuzz
16
17
fizz
19
buzz
fizz
22
23
fizz
buzz


**👾 Trivia**: if you could solve the above exercise, congratulations! [You are better](https://softwareengineering.stackexchange.com/questions/15623/fizzbuzz-really) than 99% of applicants or "a significant portion" of programmers!

---

**💪 Exercise**: update the `yell` function to add the exclamation mark at the end only if an optional boolean `exclaim` argument is given.

In [146]:
def maybe_yell(s, exclaim=False):
    s = s.upper()
    if exclaim:
        return s + '!'
    else:
        return s

In [147]:
maybe_yell('hi')

'HI'

In [148]:
maybe_yell('hi', exclaim=True)

'HI!'

### Error Handling

In [149]:
l = [1, 2, 3]

In [150]:
len(l)

3

In [151]:
l[10]  # the list does not have that many elements, thus an error is raised

IndexError: list index out of range

In [152]:
try:
    my_result = l[10]
except:
    print('an error occurred')

an error occurred


Multiple `except` clauses, allow handling only some kinds of errors, which are "expected", while still being able to get alerts for unexpected ones:

In [153]:
try:
    lizt[10]
    
except IndexError:
    print('bad index')
    
except Exception as e:
    print('a different error occured:', str(e))

a different error occured: name 'lizt' is not defined


### Documentation

_Type hints_ tell the user what the function is expected to receive and to output. They also aid the pre-compiler (slightly) in optimizing code. But ultimately are just that, hints, not hard rules.

In [154]:
def is_even(n: int) -> bool:
    return n % 2 == 0

**ℹ️ Tip**: it is best practice to prefix boolean variables with `is_`, `are_`, etc.

In [155]:
def is_even(n):
    """
    This multi-line comment explains what the function does.
    This function returns whether the integer passed as argument is divisible by two.
    """
    return n % 2 == 0

Short _unit tests_ show example of usage, succintly describing the function's behavior:

In [156]:
def is_even(n):
    """
    The examples below show usage and expected output:
    
    >>> is_even(2)
    True

    >>> is_even(5)
    False
    
    >>> is_even(1)
    False
    """
    return n % 2 == 0

The unit tests can also be tested, ensuring the implementation respects the desired outcome:

In [157]:
import doctest
doctest.testmod()

TestResults(failed=0, attempted=3)

**💪 Exercise**: correct the failing tests in `is_even` (above) and run the test suite again.

**ℹ️ Tip**: it is good practice to include a short description of what your function does. For more complex ones, add type information and examples of usage and expected outcome as well. In larger projects, this helps others (and yourself after a period of time) more quickly understand what the code does. We spend about ten times more time reading rather than writing code, so putting effort into making it more readable is sure to be turn out valuable.

---

**💪 Exercise**: create a function that returns the minimum and maximum of the numeric values in the `superhero_ages` dictionary.

In [158]:
min_val = None
max_val = None

for v in superhero_ages.values():
    if type(v) not in [int, float]:
        continue
    if min_val is None or v < min_val:  # here, we are specifically checking against None, because `if not min_val` could also be triggered when `min_val == 0`
        min_val = v
    if max_val is None or v > max_val:
        max_val = v

In [159]:
min_val

21

In [160]:
max_val

36

In [161]:
superhero_ages

{'Ironman': 36, 'Thor': 'varies', 'Spiderman': 21}

### Relevant Built-in Functions

Sort a list (works on any data type that the order operator, can optionally be given a `key` expression):

In [162]:
sorted([9, 1, 4])  # not in-place

[1, 4, 9]

Read the contents of a text file:

In [163]:
with open('example_files/plain.txt') as f:
    print('file contents:', f.read())

file contents: Hello!



Aggregation functions:

In [164]:
any([True, True, False])  # returns True if at least one of the elements is True

True

In [165]:
all([True, True, False])  # returns True if every element is True

False

In [166]:
sum([1, 2, 3])  # works on any data type that implements the addition operation

6

In [167]:
max([1, 5, 2])

5

**ℹ️ Tip**: due to its polymorphy, `sum` can be used to concatenate multiple lists into a single one, by making it start with the empty list:

In [168]:
sum([
    [1, 2, 3],
    [4, 5],
    [6, 7],
], [])  # start from []

[1, 2, 3, 4, 5, 6, 7]

In fact, `any`, `all`, and `sum` are all special cases of [reduce](https://docs.python.org/3/library/functools.html#functools.reduce) (or `foldr` in other functional languages). Another useful tool, specifically if dealing with higher order functions is [partial](https://docs.python.org/3/library/functools.html#functools.partial). More useful functions are found in the [itertools](https://docs.python.org/3/library/itertools.html) module.

### Relevant Built-in Packages

Work with information in structured text using regular expressions:

In [169]:
import re

Create rules for replacing patterns:

In [170]:
re.sub(
    pattern='\d{5}',  # matches any five digits
    repl='[removed]',
    string='Hi, this is funny_bunny_94 and my zip code is 90007',
)

'Hi, this is funny_bunny_94 and my zip code is [removed]'

Check if a certain pattern is present:

A rudimentary email pattern for didactical purposes. Matches one or more alphanumeric characters, followed by `@`, followed by at least three letters and ending in either `.com` or `.edu`.

In [171]:
if re.match('\w+@[a-z]{3,}\.(com|edu)', 'name94@example.com'):
    print('matched')

matched


Capture specific portions of the pattern using _named groups_:

In [172]:
match = re.match('(?P<username>\w+)@(?P<domain>[a-z]{3,})\.(com|edu)', 
                 'tommy@usc.edu')

In [173]:
match['username']

'tommy'

In [174]:
match['domain']

'usc'

---

Sometimes we need to interact with the file system

In [175]:
import os  # library for OS-specific functions

In [176]:
os.makedirs('example_files', exist_ok=True)  # create a file (if it doesn't exit)

In [177]:
from pathlib import Path  # a new library which makes dealing with folders a breeze

In [178]:
current_folder = Path('.')

In [179]:
# print only files in the current folder
for entry in current_folder.iterdir():
    if not entry.is_dir():
        print(entry)

index.html
.DS_Store
requirements.txt
intro.ipynb
lucky.py
readme.md
workshop-4-WIP.ipynb
.gitignore
workshop-2-WIP.ipynb
workshop-1.ipynb
appendix.ipynb
appendix.md
workshop-3-WIP.ipynb


In [180]:
current_folder / 'example_files'  # traverse using the / operator

PosixPath('example_files')

---

Besides `csv` (explored in depth in the next workshop), `JSON` (JavaScript Object Notation) is another very popular data format

In [181]:
import json

In [182]:
with open('example_files/objects.json') as f:
    print(json.load(f))

[{'name': 'Alice', 'year': 2, 'grade': 3.9}, {'name': 'Bob', 'year': 3, 'grade': 3.8}, {'name': 'Chris', 'year': 1, 'grade': 3.85}]


---

Working with dates and times can be very messy, but thankfully, there nice ways to accomplish this

In [183]:
from datetime import datetime

In [184]:
datetime.now()

datetime.datetime(2019, 2, 10, 0, 55, 40, 818968)

In [185]:
datetime(year=2018, month=2, day=28)

datetime.datetime(2018, 2, 28, 0, 0)

In [186]:
# not built-in but related, and very useful
from dateparser import parse as parse_date

In [187]:
parse_date('1 day and 2 hours ago')

datetime.datetime(2019, 2, 8, 22, 55, 41, 471784)

In [188]:
parse_date('28 feb')

datetime.datetime(2019, 2, 28, 0, 0)

In [189]:
parse_date('29 jan 2000')

datetime.datetime(2000, 1, 29, 0, 0)

---

Serialization gives a method to store (and transfer) variables that cannot be easily represented by traditional file formats

In [190]:
import pickle  # because you store it away... programmer humor 😅

In [191]:
serialized = pickle.dumps(superhero_ages)  # get a binary representation of the dictionary

In [192]:
serialized  # you can save this in as a binary file

b'\x80\x03}q\x00(X\x07\x00\x00\x00Ironmanq\x01K$X\x04\x00\x00\x00Thorq\x02X\x06\x00\x00\x00variesq\x03X\t\x00\x00\x00Spidermanq\x04K\x15u.'

In [193]:
pickle.loads(serialized)  # load the binary representation back in memory

{'Ironman': 36, 'Thor': 'varies', 'Spiderman': 21}

## Jupyter

Jupyter is an execution environment for Python, a web-based REPL (Read Eval Print Loop). It creates interactive documents which can be used to tell a story, walk through the exploration process, showcase both the code and its outcomes and display rich media such as images or animations.

### Term Definitions

A **notebook** is a text file, with the extension `ipynb`. It is composed of multiple cells, and can be executed inside a kernel.

A **cell** is a container that stores either code or text. 

The **kernel** is the Python engine in which code is executed. It stores variable values, imported libraries and other environment state data. A notebook is the document containing the pieces of code, while the kernel gives it ability to run it. There is always at most one kernel per notebook.

**👾 Trivia**: Jupyter supports more than just Python, in fact, its name comes from the original three languages: Julia, Python and R. The extension name comes from Jupyter's precursor, [IPython](https://ipython.org), and.. the word _notebook_.

_Note_: the cloud version (Colaboratory) supports a restricted subset of these features. The interface is also stripped down. More restrictions (time, persistence, memory) apply.

### Keyboard Shortcuts

Jupyter has a lot of functionality, this workshop presents just the most useful commands and concepts:

 - **`ctrl+s`** save

Running:
 - **`ctrl+enter`** run cell
 - **`shift+enter`** run cell and advance to next one
 - **`ii`** interupt kernel
 - **`00`** restart kernel
 - [Edit] > [Clear all outputs]

Cell operations:
- **`esc`** exit out of edit mode
- **`z`** undo
- **`shift-z`** redo
- **`x`** cut (use as delete)
- **`c`** copy
- **`v`** paste
- **`a`** insert a new cell above
- **`b`** insert a new cell below
- **`shift+m`** merge cells
- **`ctrl + shift + -`** split a cell at the current cursor position (in edit mode)

Change cell type:
- **`m`** to text/markdown
- **`y`** to code (default when creating a cell)
- **`1`** through **`6`** to header 1 through 6

In [194]:
x = 2

In [195]:
2 + 3

5

In [196]:
import time
time.sleep(2)

### About Cells
 - Code cells can be executed in the current state of the kernel and output its result. 
 - If the cell's output is `None`, nothing is displayed.
 - Text cells can be "executed" (same shortcuts) to render their formatting.


 - A cell that has not been executed has a `[ ]` before it
 - After execution, each cell is given an index, turning `[ ]` into `[n]`, where `n` is the cell's execution order
 - A cell currently in execution has `[*]` before it. Multiple cells can be queued for execution, all having `[*]` before them.


 - While writing code, `tab` completion is available.
 - When calling a function, after the opening bracket `shift+tab` shows its docstring (arguments, default values, examples, etc).
 
 
 - Multiple cells can be selected (using `shift`), and cell operations are then performed on all of them.
 - When multiple cells are being scheduled to be executed, they will run in order. Execution stops when one of them raises an error.
 - The [Run] menu (at the top) contains options to run all cells above/below selected one.
 - More Vim-inspired shortcuts are available for navigation, such as **`j`** and **`k`**.

### About the Kernel
 - A new kernel is automatically started when you open a notebook.
 - The kernel can be shut down or restarted. 
 - All cell contents persist beyond kernels, but variable are given values only in the context of a running kernel.
 
 
 - While a cell is running, the kernel is _working_, indicated by ⬤ in the top right corner.
 - After finishing execution, the kernel is _idle, indicated by ◯ in the top right corner.
 - The kernel can be interrupted if it the kernel is stuck, or processing takes too long. Variables are intact, just the cell that was running has been stopped.
 
 
 - To clean-up the results from all cells, use [Kernel] > [Restart Kernel and Clear Outputs]
 - To re-execute the entire notebook, use [Kernel] > [Restart Kernel and Run All Cells]

### About the Notebook

- Every 120 seconds, Jupyter will autosave your notebook into the `.ipynb_checkpoints` folder. It stores a copy of your current `.ipynb` file, without altering your current one (and with no running kernel). It can be reverted using [File] > [Revert Notebook to Checkpoint].
- Inline images can drastically increase the size of a notebook. 
- Code platforms (such as Github, Bitbucket, Gitlab, Atlassian's Git Workflow) render notebooks natively.
- [NBViewer](https://nbviewer.jupyter.org) is an online tool which allows you to visualize any notebook without having to start an instance of Jupyter locally.

**ℹ️ Tip**: restarting the kernel and running all cells insures that you have no stale variables, and it is ready to share.

### Interface Interaction
 - Navigate folders and files in the [File Browser] (on the sidebar to the right, or **`ctrl+b`**).
 - Switch between open notebooks using the tabs at the top.
 - Select multiple, move files using drag-n-drop.
 - Download/upload using drag-n-drop to/from your system's file browser.
 - Split the screen and view multiple files at once by arranging them through drag-n-drop.
 - As in any Unix system, notebooks can be safely renamed without affecting their current state.
 
 - [View] > [Show line numbers] to more easily track error lines.
 - Each cell's output, and its input can be collapsed by pressing the blue bar to the left of it.
 
 
 - Create a new file by opening the _File Browser_ and clicking the plus (**`+`**) sign.
 - Besides Python kernels, it also gives the option of creating _Terminal_ instances for running system commands.
 
 
 - Search all commands in the [Commands Palete] (on the sidebar to the right).
 - All commands presented above are also available in the menu bar at the top (and many more).
 
 - Jupyter supports many kinds of useful data types. Some samples are provided in the `example_files` folder:
  - plain text (`.txt`) 
  - images (`.png`, `.jpg`, etc)
  - JSON files (`.json`), non-editable but allows collapsing and expanding nodes
  - comma-separated-values (`.csv`), big files can be navigated smoothly
  - markdown files (`.md`) with a lightweight rendition of the formatting
  - Python source files (`.py`), with syntax highlighting
  - other languages such as HTML, with syntax highlighting

### Code Cells

#### Execution History

`In` and `Out` are special variables which automatically store the input and output for each executed cell (in order their order of execution).

In [201]:
In[10]  # the content of tenth executed cell, as a string

'type(height) is float'

In [202]:
Out[10]  # the output of the tenth executed cell, if it had any

True

There are also some shortcuts:

In [203]:
_10  # as a shortcut for Out[10]

True

In [204]:
_  # output of the latest ran cell

In [205]:
__  # output of the cell ran before that

'HI!'

**👾 Trivia**: `__` is called a _dunder_ (for double under) in Python

#### Outputting Tricks

You can combine the assignment of a variable and display into a single cell

In [206]:
a = 1 + 2

In [207]:
a

3

Because the output of a cell is the result of the last expression in it

In [208]:
a = 1 + 2
a

3

---

The following cell evaluates the expression inside of it, a string, and shows its output, which is the same string:

In [209]:
'hello'

'hello'

The following cell, on the other side, evaluates the expression inside of it, the print statement, and shows its output (`None`, so nothing displayed), while also showing the console output (the string given as argument):

In [210]:
print('hello')

hello


To better illustrate this, the following function produces both an output (the number 24) and console output (the string):

In [211]:
def custom_print(s):
    print(s)
    return 24

custom_print('hello')

hello


24

---

Sometimes, we just want to run a statement for its functionality, and don't care about the returned value

In [212]:
def launch_missile():
    print('missile launched')
    success_odds = 0.8
    return success_odds

In [213]:
launch_missile()

missile launched


0.8

We can either assign it to a dummy variable (conventionally `_`):

In [214]:
_ = launch_missile()

missile launched


Or terminate the current statement using `;`:

In [215]:
launch_missile();

missile launched


---

You can change the behavior to showing more than just the last expression:

In [216]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

In [217]:
name[:3]
name[-1]

'Spi'

'n'

In [218]:
InteractiveShell.ast_node_interactivity = 'last_expr'  # back to the default

### Text Cells

As illustrated throughout this document, _text_ cells can contain more than just plain text

#### Markdown

_Markdown_ is a lightweight markup language with plain text, intuitive formatting syntax. Though not as powerful as other markup languages (such as HTML), due to its simplicity and expressivity, it is widely used (Github readmes, Slack messages, StackOverflow posts, static site generators, project management tools).

Double-click this cell to see the source that creates these styles.


Text styles:
 - regular
 - **bold**
 - _italic_ 
 - `code` 
 - [link](https://www.example.com)
 
 
 
> this is a quote


block of code (text, not executable):
```html
<div id="greeting">hello</div>
```


Ordered and unordered lists:
 1. first
 2. second
 3. third
   - this is
   - a sublist

Headers (1`#` thorugh 6`######`)

Below is a separation line:

---

Below is an embedded image (note the `!` before the link):

![usc logo](http://i238.photobucket.com/albums/ff58/Portergirl2311/University_of_Southern_California_s.png)

You can also embedd gifs:

![tommy flag](https://media.giphy.com/media/DpoZg6IWyRSsE/giphy.gif)

**ℹ️ Tip**: remember the format for a link like this:
 - the first part is what's displayed, acts like a button thus it is surrounded by square brackets `[link]`
 - the second part is what it links to `(address.com)`

#### Latex

_Latex_ is the standard in scientific documents. It can be used to typeset beautiful equations such as $e^{i\pi} + 1 = 0$

**ℹ️ Tip**: the combination of Markdown and Latex a common one. It blends quick organization with complex snippets when needed. This makes it very useful in contexts such as note taking (and not only for STEM fields). I recommend [Typora](https://typora.io) for a standalone editor and [StackEdit](https://stackedit.io/) for a web-based editor.

#### HTML

HTML formatting is available for more complex formatting: 

In [219]:
from IPython.display import HTML

In [220]:
HTML('<div style="text-align: center; color: orange; font-size: 30px">Hello from HTML!</div>')

Just a small example is shown in this workshop, but you can use (almost) everything from HTML in a notebook, even JS scripts:

In [221]:
HTML('''
    alert("Hello from JavaScript!")
''')

**💪 Exercise**: put `<script>` `</script>` tags  around the `alert` statement inside the string above and run the cell!

### Magics

A _magic_ is a special commands for Jupyter, which start with one `%` for line-magics and `%%` for cell-magics.

#### Timing

Measure execution time in for logging or optimization purposes

In [222]:
from time import sleep

Measure how long the entire cell takes to run:

In [223]:
%%time
for i in range(3):
    print(i)
    sleep(.5)

0
1
2
CPU times: user 4.43 ms, sys: 2.12 ms, total: 6.55 ms
Wall time: 1.51 s


Measure how long a single line takes to run:

In [224]:
%time _=[n ** 2 for n in range(1_000_000)]

CPU times: user 279 ms, sys: 12.9 ms, total: 292 ms
Wall time: 291 ms


In [225]:
%time _=list(map(lambda n: n ** 2, range(1_000_000)));

CPU times: user 316 ms, sys: 16.8 ms, total: 333 ms
Wall time: 331 ms


Due to external environment variations, running it again might yield different results. Doing multiple trials is more robust to such noise:

In [226]:
%timeit _=[n ** 2 for n in range(1_000_000)]

258 ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [227]:
%timeit _=list(map(lambda n: n ** 2, range(1_000_000)))

308 ms ± 17.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


---

#### Auto-reloading

Continously scan an external file and re-import it upon changes:

In [228]:
%load_ext autoreload
%autoreload 2
# first load the extension, then activate it

In [229]:
from lucky import lucky_number

In [230]:
lucky_number()

24

**💪 Exercise**: Change the number in `lucky.py`, save the file and then run the cell above again.

---

Pass variables between notebooks:
 - `%store x` to pass the `x` variable from the source notebook
 - `%store -r x` to assign the passed value to the variable `x` in the destination notebook

**ℹ️ Tip**: Jupyter is meant to be a complement for your IDE, not a replacement. The bulk of your processing (functions, classes, etc) should be organized in `.py` files, while notebooks should be used for importing public functions/classes, running them and inspecting the results.

---

Due to how Python modules are structured, this trick is needed in order to import from nested folders:

In [231]:
from sys import path
path.append('.')  # add the current root to the list of directories where to look for packages

In [232]:
from example_files.module import f

In [233]:
f()

4

### Shell Integration

Direct interaction with the shell can be achieved by running system commands prepended with a bang (`!`):

In [234]:
!ls

[34m__pycache__[m[m          intro.ipynb          workshop-2-WIP.ipynb
appendix.ipynb       lucky.py             workshop-3-WIP.ipynb
appendix.md          readme.md            workshop-4-WIP.ipynb
[34mexample_files[m[m        requirements.txt
index.html           workshop-1.ipynb


In [235]:
!echo 'Hello from shell!'

Hello from shell!


Command outputs can be assigned to Python variables

In [236]:
n_files = !ls | wc -l

In [237]:
n_files  # does not always come in the desired format

['      13']

In [238]:
int(n_files[0].strip())  # but can be easily converted

13

A frequent usecase is installing packages without leaving the notebook:

In [239]:
!pip install --upgrade pip

Requirement already up-to-date: pip in /Users/stefan/.virtualenvs/viz-workshop/lib/python3.7/site-packages (19.0.2)


### Interactivity

"Animations" can be used by clearing the output of a cell and then filling it again:

In [240]:
from IPython.display import clear_output

In [241]:
for i in range(5):
    clear_output()
    print(i)
    sleep(.5)

4


---

Besides code and text, Jupyter also supports _widgets_. They can be used as alternative input methods which also refresh on change.

_Note:_ you might have to run `!jupyter labextension install @jupyter-widgets/jupyterlab-manager` and possibly re-run Jupyter (`ctrl-C, ctrl-C` and `jupyter lab`) if this shows an error.

In [242]:
import ipywidgets
from ipywidgets import interact

In [243]:
def power(base, exp, negative):
    result = base ** exp
    if negative:
        result *= -1
    return round(result, 2)

interact(power, base=2.5, exp=3, negative=False);

interactive(children=(FloatSlider(value=2.5, description='base', max=7.5, min=-2.5), IntSlider(value=3, descri…

---

**ℹ️ Tip**: you might see other people/tutorials using Jupyter **Notebook**. The first version of the Jupyter environment was called _Jupyter Notebook_, which has extra functionality. We are now using the latest version, called _Jupyter Lab_. The files both versions operate on are called _Jupyter notebooks_ (or _notebooks_ for short). Extra functionality in Jupyter Lab includes tabs, split view, cell operations across notebooks, support for multiple kinds of files, making it a more capable and efficient environment.

## Topics Not Covered

This workshop is not meant to exhaustively cover every functionality of Python. If you are curious and want to learn more, you can either learn more about the topics briefly covered today, or explore other basic topics that were not touched upon in this workshop:

- classes and inheritance
- `main` and executing
- modules and importing
- user input
- iterators
- decorators
- asynchronicity
- immutability
- debugging
- code style
- turning notebooks into presentations
- setting up Jupyter

## Further Reading
These are some of the resources that cover the important information and do so efficiently:

 - Python: 
   - [PDF cheatsheets](https://ehmatthes.github.io/pcc/cheatsheets/README.html)
   - [online cheatsheets](https://www.pythonsheets.com)
   - [interactive tutorial](https://www.codecademy.com/learn/learn-python-3)
   - [official tutorial](https://docs.python.org/3/tutorial/)
 - Jupyter: 
   - [built-in magics](https://ipython.readthedocs.io/en/stable/interactive/magics.html)
   - [built-in widgets](https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20List.html)
   - [gallery of interesting notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks#data-visualization-and-plotting)
   - [IPython options](https://ipython.readthedocs.io/en/stable/config/options/terminal.html)
   - [list of tips and tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/)
   - [contrib extensions](https://github.com/ipython-contrib/jupyter_contrib_nbextensions)
 - Regex: [official reference](https://docs.python.org/3/library/re.html)
 - Latex: [tutorial series](https://www.sharelatex.com/blog/latex-guides/beginners-tutorial.html)
 - Markdown: [GFM guide](https://guides.github.com/features/mastering-markdown/)
 - HTML: [interactive tutorial](https://www.codecademy.com/learn/learn-html)
 - JavaScript: [MDN guide](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide)
 - Command line: [interactive tutorial](https://www.codecademy.com/learn/learn-the-command-line)
 - Date formatting: [reference](http://strftime.org)
 - Number formatting: [reference](https://docs.python.org/3/library/string.html#formatspec)
 - IDE: [PyCharm](https://www.jetbrains.com/pycharm/) (free with .edu email)
 - Code Style: discussed in the [appendix](appendix.ipynb)