# 1. Environment Basics

While Python is the programming language for this workshop, Jupyter is the interactive environment in which we will operate. The first workshop looks at language basics and built-in packages, while future workshops focus on packages that open up many possibilities.

## 1a. Python

Python is a general-purpose dynamic programming language that uses significant whitespace and emphasizes readability.

Why Python?

- **In demand**: prolific not only in scientific computation but also web development and other areas (top 5 most popular technology and still the fastest growing one: [Stack Overflow 2019](https://insights.stackoverflow.com/survey/2019#most-popular-technologies))
- **Enjoyable**: easy to pick up and expressive, not unnecessarily verbose (second most loved programming language: [StackOverflow 2019](https://insights.stackoverflow.com/survey/2019#most-loved-dreaded-and-wanted))
- **Strong community**: no shortage of quality packages and tutorials ([IEEE 2019](https://spectrum.ieee.org/computing/software/the-top-programming-languages-2019) named it the top programming language)

<img src="https://i.imgur.com/HqeQKP8.png" width=400></img>
<img src="https://i.imgur.com/K5iCwvH.png" width=450></img>

### Table of contents

- [The Python Language](#Python)
 - [Primitives](#Primitives)
 - [Variables](#Variables)
 - [Data Structures](#Data-Structures)
   - [List](#List)
   - [String](#String)
   - [Dictionary](#Dictionary)
   - [Others](#Other-Data-Structures)
   - [Iteration](#Iteration)
 - [Functions](#Functions)
 - [Conditional Statements](#Conditional-Statements)
 - [Error Handling](#Error-Handling)
 - [Documentation](#Documentation)
 - [Relevant Built-ins](#Relevant-Built-ins)
   - [Misc Functions](#Misc-Relevant-Functions)
   - [Regular Expressions](#Regular-Expressions)
   - [OS](#OS)
   - [JSON](#JSON)
   - [Date and Time](#Date-and-Time)
   - [Serialization](#Serialization)

---

**HOW TO USE THIS DOCUMENT**: 
- Run a code cell by selecting it and pressing **`ctrl+enter`**
- Click inside cells to edit code
- Press **b** to insert a new cell
 - or the [+ Code] in Colaboratory

More commands will be shown in the second part of the workshop.

---

Comments are not executed:

In [1]:
# this is a comment

### Primitives

The most used Python primitives are:
 - `int`: integers
 - `float`: real numbers
 - `str`: strings (note that there is no character primitive)
 - `bool`: truth value
 - `None`: absence of a value (similar to `null` in other languages)

### Basic arithmetic

In [2]:
1 + 9

10

In [3]:
2 ** 3  # power operator

8

### Variables

Assign a value to a symbol:

In [4]:
name = 'Peter Parker'  # strings use either single or double quotes
age = 21
height = 5.84
married = False

In [5]:
age / 2

10.5

In [6]:
age // 2  # only whole part

10

**💪 Exercise**: create a variable called `favorite_number`, which stores your favorite (as an integer) and then compute its square:

In [7]:
favorite_number = 24

In [8]:
favorite_number ** 2

576

**ℹ️ Tip**: always give descriptive names to variables, readability over tersity. A descriptive name makes it easy to work with the variable and allows to programmer to more intuitively estimate what's going to happen.

**ℹ️ Tip**: make number literals more readable by:
 - omitting the leading zero (`.24 + .17` instead of `0.25 + 0.17`)
 - omitting the trailing zero when forcing float conversion (`10.` instead of `10.0`)
 - separating long numebrs using `_` (`1_240_000` instead of `1240000`)

---

Find out what kind of value a variable holds:

In [9]:
type(age)

int

In [10]:
type(height) is float

True

---

Convert compatible values:

In [11]:
int(2.8)

2

In [12]:
int('2')

2

In [13]:
float(2)

2.0

In [14]:
str(123)

'123'

**💪 Exercise**: check the type of before `'7'` and after conversion to `int` and `float`.

In [15]:
type('7')

str

In [16]:
type(int('7'))

int

In [17]:
type(float('7'))

float

### Data Structures

Data structures are containers for other values.

#### List

A list is a sequence of elements.

In [18]:
squares = [0, 1, 4, 9]

In [19]:
squares

[0, 1, 4, 9]

In [20]:
squares.append(25)  # add one element at the end

In [21]:
squares

[0, 1, 4, 9, 25]

In [22]:
len(squares)  # number of elements

5

In [23]:
squares[0]  # first element, zero-indexed

0

In [24]:
squares[-1]  # last element, "wrap-around"

25

In [25]:
4 in squares  # test membership

True

In [26]:
5 in squares  # the list does not contain 5

False

In [27]:
squares.index(4)  # get the index of an element

2

In [28]:
squares

[0, 1, 4, 9, 25]

In [29]:
squares[1:]  # slice: from the second element onwards

[1, 4, 9, 25]

In [30]:
squares[:-1]  # all but the last element

[0, 1, 4, 9]

In [31]:
squares + [49, 'abc', 'xyz']  # concatenate lists

[0, 1, 4, 9, 25, 49, 'abc', 'xyz']

**💪 Exercise**: create another list `cubes` containing the first three cubes. Then create another list `L` by concatenating `squares` and `cubes`. Access the element in the middle of the list:

In [32]:
cubes = [0, 1, 8]
L = squares + cubes

In [33]:
L

[0, 1, 4, 9, 25, 0, 1, 8]

In [34]:
L[len(L) // 2]

25

#### String

A string behaves just like a list, except its elements are letters.

**ℹ️ Tip**: compared to other C-like languages, they are not made up of `char`s, and are immutable.

In [35]:
name

'Peter Parker'

In [36]:
len(name)

12

In [37]:
name[0]

'P'

In [38]:
name[1:]

'eter Parker'

In [39]:
'Hello ' + name

'Hello Peter Parker'

In [40]:
'ha' * 3

'hahaha'

---

Some useful built-in functions operating on strings:

In [41]:
name.split()  # by default, it splits by blanks

['Peter', 'Parker']

In [42]:
name.lower()

'peter parker'

In [43]:
'-'.join(['ab', 'cd', 'yz'])

'ab-cd-yz'

In [44]:
f'{name} is {age} years old'  # note the f at beginning

'Peter Parker is 21 years old'

In [45]:
'abc abcd'.replace('ab', 'X')

'Xc Xcd'

In [46]:
'  abc  '.strip()

'abc'

---

Control how many decimals a number is displayed with:

In [47]:
pi = 3.14159

In [48]:
f'π is {pi}'

'π is 3.14159'

In [49]:
f'π as a whole number {pi:.0f}'  # no decimals

'π as a whole number 3'

In [50]:
f'first decimals of π {pi:.3f}'

'first decimals of π 3.142'

Format sub-unitary (and not only) numbers as percentages:

In [51]:
per = 0.705  # 70.5%
f'as a percentage {per:.1%}'

'as a percentage 70.5%'

In [52]:
f'as a percentage {per:.4%}'

'as a percentage 70.5000%'

---

**💪 Exercise**: create a string `favorite_flavor` which contains your favorite ice cream flavor and then another string which containing the sentence "My favorites are: vanilla and 24.00" using `favorite_flavor` and `favorite_number` (shown with two decimals):

In [54]:
favorite_flavor = 'vanilla'

sentence = f'My faves are: {favorite_flavor} and {favorite_number:.2f}'

In [55]:
sentence

'My faves are: vanilla and 24.00'

**ℹ️ Tip**: Python strings can store any* unicode character, making working with symbols, different alphabets, or even emojis easy. Quotation marks can be included in a string by alternating the types of marks (e.g.: `"Trader Joe's vegetables"`), or by escaping it with a backslash (e.g.: `'Trader Joe\'s vegetables'`).

#### Dictionary

A dictionary behaves like a list where the keys are not numbers.

**ℹ️ Tip**: has the time complexity of a hashmap, but through some [very clever engineering](https://mail.python.org/pipermail/python-dev/2012-December/123028.html), its elements are ordered.

In [56]:
superhero_ages = {
    'Ironman':   36,
    'Hulk':      38,
    'Thor':      'varies',  # does not have to be homogenous
}

In [57]:
len(superhero_ages)

3

In [58]:
superhero_ages['Ironman']

36

In [59]:
superhero_ages['Spiderman'] = 21  # add an element

In [60]:
superhero_ages

{'Ironman': 36, 'Hulk': 38, 'Thor': 'varies', 'Spiderman': 21}

In [61]:
del superhero_ages['Hulk']  # remove an element

In [62]:
superhero_ages

{'Ironman': 36, 'Thor': 'varies', 'Spiderman': 21}

In [63]:
'Deadpool' in superhero_ages  # whether the key is in the dictionary

False

In [64]:
'Thor' in superhero_ages

True

In [65]:
21 in superhero_ages

False

**💪 Exercise**: create a dictionary which contains the words `"cardinal"` and `"gold"` as keys and their translation in your mother tongue as their values (or their hex color values if your mother tongue is english):

In [66]:
color_translations = {
    'cardinal': 'rosu',
    'gold':     'aur',
}

In [67]:
color_translations

{'cardinal': 'rosu', 'gold': 'aur'}

---

#### Other Data Structures

Tuples are similar to lists, but they are meant for heterogenous data.

In [68]:
info = ('Los Angeles', 21, 'vanilla')  # information about a person

In [69]:
info[0]

'Los Angeles'

**ℹ️ Tip**: you can think of tuples as lightweight classes. If interested, you can also look into [named tuples](https://docs.python.org/3/library/collections.html#collections.namedtuple) and [data classes](https://realpython.com/python-data-classes/).

---

Sets are similar to lists, but they are unordered* and may contain no duplicates. Their advantage is that checking for membership takes constant time.

In [70]:
d = {'a': 1, 'b': 2}

In [71]:
s = {2, 1, 4}

In [72]:
2 in s

True

Converting a list into a set yields its unique elements:

In [73]:
set([1, 2, 1, 1, 3])

{1, 2, 3}

**ℹ️ Tip**: checking if an element is contained in a collection is trivial when there are 5 elements to compare against, but when there are thousands or millions of them, and you need to do many such queries, it is essential to be able to do it quickly.

#### Iteration

Going through each element of a collection is called an interation.

In [74]:
squares

[0, 1, 4, 9, 25]

In [75]:
# list iteration
for sq in squares:
    print(sq)

0
1
4
9
25


In [76]:
for key in superhero_ages:
    print(key)

Ironman
Thor
Spiderman


In [77]:
for val in superhero_ages.values():
    print(val)

36
varies
21


In [78]:
# dict iteration
for name, age in superhero_ages.items():
    print(name, age)

Ironman 36
Thor varies
Spiderman 21


---

Generate sequential numbers:

In [79]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


Optionally, takes `start`, `stop` and `step` arguments:

In [80]:
for i in range(4, 12, 2):  # start from 4, add 2 each time, stop when reaching 12
    print(i)

4
6
8
10


**ℹ️ Tip**: this is the pythonic way for C-like laguages' `for(i = 0; i < n; ++i)` idiom

---

Iteration-related functions:

In [81]:
colors = ['red', 'green', 'blue', 'black']

In [82]:
for col in colors:
    print(col)

red
green
blue
black


In [83]:
for col in reversed(colors):  # from last to first
    print(col)

black
blue
green
red


In [84]:
for i, col in enumerate(colors):  # also give the index of each element
    print(i, col)

0 red
1 green
2 blue
3 black


In [85]:
for i, col in reversed(list(enumerate(colors))):
    print(i, col)

3 black
2 blue
1 green
0 red


In [86]:
for i, col in enumerate(reversed(colors)):
    print(i, col)

0 black
1 blue
2 green
3 red


In [87]:
for sq, col in zip(squares, colors):  # pair up elements of two lists (up to the shortest one's length)
    print(sq, col)

0 red
1 green
4 blue
9 black


In [88]:
squares  # just as a reminder

[0, 1, 4, 9, 25]

---

In [89]:
# `while` is not used as often
k = 0
while k <= 10:
    k += 1  # idiom, k++ does not exist (because it makes implementing operator overload much easier)

In [90]:
k

11

**💪 Exercise**: print, one by one, the numbers from `100` to `110`, squared:

In [91]:
for i in range(100, 111):
    print(i**2)

10000
10201
10404
10609
10816
11025
11236
11449
11664
11881
12100


---

Multiple ways of creating a collection based on another's elements:

In [92]:
colors

['red', 'green', 'blue', 'black']

In [93]:
color_lengths = []

for col in colors:
    l = len(col)
    color_lengths.append(l)

In [94]:
color_lengths

[3, 5, 4, 5]

In [95]:
[len(col) for col in colors]  # list comprehension, equivalent but shorter

[3, 5, 4, 5]

In the next subsection we'll learn an even shorter way, using `map`

In [96]:
{col: len(col) for col in colors}  # dict comprehension

{'red': 3, 'green': 5, 'blue': 4, 'black': 5}

**💪 Exercise**: create the list `first_letters` which contains the first letter of each `color`:

In [97]:
first_letters = [col[0] for col in colors]

In [98]:
first_letters

['r', 'g', 'b', 'b']

### Functions

A _function_ is a block of code which only runs when called. Data can be given as input and results can be returned.

In [99]:
def double(n):
    # takes an argument and returns it doubled
    return n * 2

In [100]:
double = lambda n: n * 2  # equivalent but shorter way to write simple functions

In [101]:
double(5)

10

Duck-typing allows the function to work on any kind of argument that supports multiplication:

In [102]:
double(1.2)

2.4

In [103]:
double('ha')

'haha'

In [104]:
double([1, 2, 3])

[1, 2, 3, 1, 2, 3]

**ℹ️ Tip**: it is best practice to name functions as verbs, since they perform an action.

**ℹ️ Tip**: a very important thing to understand is that function arguments are neither call-by-value nor call-by-reference. Read more about [argument behavior](https://jeffknupp.com/blog/2012/11/13/is-python-callbyvalue-or-callbyreference-neither/) and [copies](https://docs.python.org/3.7/library/copy.html)

---

Functions can take anything as arguments, even other functions.

One such special function is `map`, which takes two arguments, a function and a collection, and it applies the function to each element of the collection.

In [105]:
map(len, colors)  # get the length of each color, equivalent to definitions above

<map at 0x10f96a630>

In order to preserve memory and processing time, `map` is designed to use lazy-evaluation, meaning that it actually returns a `generator` object, which yields one element at a time, upon being called:

In [106]:
result = map(len, colors)
for l in result:
    print(l)

3
5
4
5


Calling it again correctly yields no elements, as they have all been consumed:

In [107]:
for l in result:
    print(l)

Another way to force the evaluation of the entire generator is converting it into a list:

In [108]:
list(map(double, squares))

[0, 2, 8, 18, 50]

In [109]:
squares  # just to remember what the original list looked right

[0, 1, 4, 9, 25]

**ℹ️ Tip**: Such techniques to do impact performance when the list is made up of only 5 elements, but when there are thousands or millions of elements, it would be expensive to wait to create the whole list in the beginning and to then only consume elements one by one. Or we might not even need more than the first couple of elements. 
This applies not only when the length of the list is very large, but also when coming up with each of them is computationally expensive. We will not go in depth, but look into [iterators](https://www.programiz.com/python-programming/iterator) and [generators](https://www.programiz.com/python-programming/generator) to learn more about them.

---

A function can have default arguments:

In [110]:
def repeat(s, times=3):
    # by default, it repeats 3 times
    return s * times

In [111]:
repeat('ha ')

'ha ha ha '

In [112]:
repeat('ha ', 5)

'ha ha ha ha ha '

In [113]:
repeat('ha ', times=5)  # arguments can be named

'ha ha ha ha ha '

In [114]:
def repeat_extra(s):
    # returns multiple values
    return len(s), s * 3

In [115]:
repeat_extra('ha')

(2, 'hahaha')

In [116]:
result = repeat_extra('ha')
length   = result[0]
repeated = result[1]

In [117]:
(length, repeated) = repeat_extra('ha')  # shorter way of destructuring the result

In [118]:
length

2

In [119]:
repeated

'hahaha'

In [120]:
a, b, c = [1, 2, 3]  # works everywhere

In [121]:
a

1

In [122]:
[a, b, c] = [1, 2, 3, 4]  # gives an error upon mismatch of length

ValueError: too many values to unpack (expected 3)

**💪 Exercise**: create a `yell` function that takes a string. 
It returns the string in all caps and adds an exclamation mark at the end.

In [123]:
def yell(s):
    return s.upper() + '!'

In [124]:
yell('hello')

'HELLO!'

### Conditional Statements

For control flow

In [125]:
def is_even(n):
    # tell whether the argument is divisible by two
    return (n % 2 == 0)

In [126]:
if is_even(2):
    print('it works!')
else:
    print("it's broken")

it works!


Multiple tests can be sequenced using the `if-elif-else` construct:

In [127]:
x = 13
if x % 3 == 0:
    print('divisible by three')
elif x % 5 == 0:
    print('divisible by five')
else:
    print('divisible by neither')

divisible by neither


You can also do inline evaluations (similarly to the ternary operator `: ?` in C-like languages):

In [128]:
size = 'big' if 2**10 > 100 else 'small'

In [129]:
size

'big'

---

Some values are truthy, some are falsy. So it is pythonic to check for empty list by writing `if l` instead of `if l is nto []`:

In [130]:
if True:
    print('truthy')

truthy


In [131]:
if 1:
    print('truthy')

truthy


In [132]:
if 0:
    print('truthy')

In [133]:
if 1.0:
    print('truthy')

truthy


In [134]:
if []:
    print('truthy')

In [135]:
if '':
    print('truthy')

In [136]:
if None:
    print('truthy')

---

`break` exits the loop:

In [137]:
for x in range(4):
    print(x)

0
1
2
3


In [138]:
for x in range(4):
    if x == 2:
        break
    print(x)

0
1


`continue` skips the current iterration:

In [139]:
for x in range(4):
    if x == 2:
        continue
    print(x)

0
1
3


---

Just like with mapping, there are multiple ways to create a list based on another's elements, while applying some conditions.

In [140]:
small_squares = []
for sq in squares:
    if sq < 5:
        small_squares.append(sq)

In [141]:
small_squares

[0, 1, 4]

In [142]:
[sq for sq in squares if sq < 5]  # works in list comprehension as well

[0, 1, 4]

In [143]:
list(filter(is_even, squares))  # filter is another higher-order-function, which returns just those elements that pass the predicate

[0, 4]

**ℹ️ Tip**: an `else` can also be attached to a `for` or `while` loop:

In [144]:
target = 10  # change it! make it something that is and then something that is not in the list

for x in [2, 5, 1, 4, 2]:
    if x == target:
        break

else:  # no break
    print('target not in the list')

target not in the list


---

**💪 Exercise**: print the numbers from `1` to `25`, but for each multiple of 3 print `fizz` and for each  multiple of 5 print `buzz`

In [145]:
for n in range(1, 26):
    s = ''
    if n % 3 == 0:
        s += 'fizz'
    if n % 5 == 0:
        s += 'buzz'
    
    print(s if s else n)

1
2
fizz
4
buzz
fizz
7
8
fizz
buzz
11
fizz
13
14
fizzbuzz
16
17
fizz
19
buzz
fizz
22
23
fizz
buzz


**👾 Trivia**: if you could solve the above exercise, congratulations! [You are better](https://softwareengineering.stackexchange.com/questions/15623/fizzbuzz-really) than 99% of applicants or "a significant portion" of programmers!

---

**💪 Exercise**: update the `yell` function to add the exclamation mark at the end only if an optional boolean `exclaim` argument is given.

In [146]:
def maybe_yell(s, exclaim=False):
    s = s.upper()
    if exclaim:
        return s + '!'
    else:
        return s

In [147]:
maybe_yell('hi')

'HI'

In [148]:
maybe_yell('hi', exclaim=True)

'HI!'

**ℹ️ Tip**: writing guards for early-exit before processing takes place reduces indentation levels and makes the code readable:
```python
if check_1:
    # some pre-processing
    if check_2:
        # some processing
        return all_good
    else:
        return failed_check_2
else:
    return failed_check_1
```

Equivalent implementation with fewer levels of indentation:
```python
if not check_1:
    return failed_check_1
# some pre-processing
if not check_2:
    return failed_check_2
# some processing
return all_good
```

### Error Handling

In [149]:
l = [1, 2, 3]

In [150]:
len(l)

3

In [151]:
l[10]  # the list does not have that many elements, thus an error is raised

IndexError: list index out of range

In [152]:
try:
    my_result = l[10]
except:
    print('an error occurred')

an error occurred


Multiple `except` clauses, allow handling only some kinds of errors, which are "expected", while still being able to get alerts for unexpected ones:

In [153]:
try:
    lizt[10]
    
except IndexError:
    print('bad index')
    
except Exception as e:
    print('a different error occured:', str(e))

a different error occured: name 'lizt' is not defined


### Documentation

_Type hints_ tell the user what the function is expected to receive and to output. They also aid the pre-compiler (slightly) in optimizing code. But ultimately are just that, hints, not hard rules.

In [154]:
def is_even(n: int) -> bool:
    return n % 2 == 0

**ℹ️ Tip**: it is best practice to prefix boolean variables with `is_`, `are_`, etc.

In [155]:
def is_even(n):
    """
    This multi-line comment explains what the function does.
    This function returns whether the integer passed as argument is divisible by two.
    """
    return n % 2 == 0

Short _unit tests_ show example of usage, succintly describing the function's behavior:

In [156]:
def is_even(n):
    """
    The examples below show usage and expected output:
    
    >>> is_even(2)
    True

    >>> is_even(5)
    False
    
    >>> is_even(1)
    False
    """
    return n % 2 == 0

The unit tests can also be tested, ensuring the implementation respects the desired outcome:

In [157]:
import doctest
doctest.testmod()

TestResults(failed=0, attempted=3)

**💪 Exercise**: correct the failing tests in `is_even` (above) and run the test suite again.

**ℹ️ Tip**: it is good practice to include a short description of what your function does. For more complex ones, add type information and examples of usage and expected outcome as well. In larger projects, this helps others (and yourself after a period of time) more quickly understand what the code does. We spend about ten times more time reading rather than writing code, so putting effort into making it more readable is sure to be turn out valuable.

---

**💪 Exercise**: create a function that returns the minimum and maximum of the numeric values in the `superhero_ages` dictionary.

In [158]:
min_val = None
max_val = None

for v in superhero_ages.values():
    if type(v) not in [int, float]:
        continue
    if min_val is None or v < min_val:  # here, we are specifically checking against None, because `if not min_val` could also be triggered when `min_val == 0`
        min_val = v
    if max_val is None or v > max_val:
        max_val = v

In [159]:
min_val

21

In [160]:
max_val

36

In [161]:
superhero_ages

{'Ironman': 36, 'Thor': 'varies', 'Spiderman': 21}

### Relevant Built-in Functions

Sort a list (works on any data type that the order operator, can optionally be given a `key` expression):

In [162]:
sorted([9, 1, 4])  # not in-place

[1, 4, 9]

Read the contents of a text file:

In [163]:
with open('example_files/plain.txt') as f:
    print('file contents:', f.read())

file contents: Hello!



Aggregation functions:

In [164]:
any([True, True, False])  # returns True if at least one of the elements is True

True

In [165]:
all([True, True, False])  # returns True if every element is True

False

In [166]:
sum([1, 2, 3])  # works on any data type that implements the addition operation

6

In [167]:
max([1, 5, 2])

5

**ℹ️ Tip**: due to its polymorphy, `sum` can be used to concatenate multiple lists into a single one, by making it start with the empty list:

In [168]:
sum([
    [1, 2, 3],
    [4, 5],
    [6, 7],
], [])  # start from []

[1, 2, 3, 4, 5, 6, 7]

In fact, `any`, `all`, and `sum` are all special cases of [reduce](https://docs.python.org/3/library/functools.html#functools.reduce) (or `foldr` in other functional languages). Another useful tool, specifically if dealing with higher order functions is [partial](https://docs.python.org/3/library/functools.html#functools.partial). More useful functions are found in the [itertools](https://docs.python.org/3/library/itertools.html) module.

### Relevant Built-in Packages

Beyond the basic functionality that is always available, some specialized functionality resides in _packages_ or modules, which need to be `import`ed before use. Usually, you call the function from the package by prefixing with the package name and a dot.

---

Work with information in structured text using regular expressions:

In [169]:
import re

Create rules for replacing patterns:

In [170]:
re.sub(
    pattern='\d{5}',  # matches any five digits
    repl='[removed]',
    string='Hi, this is funny_bunny_94 and my zip code is 90007',
)

'Hi, this is funny_bunny_94 and my zip code is [removed]'

Check if a certain pattern is present:

A rudimentary email pattern for didactical purposes. Matches one or more alphanumeric characters, followed by `@`, followed by at least three letters and ending in either `.com` or `.edu`.

In [171]:
if re.match('\w+@[a-z]{3,}\.(com|edu)', 'name94@example.com'):
    print('matched')

matched


Capture specific portions of the pattern using _named groups_:

In [172]:
match = re.match('(?P<username>\w+)@(?P<domain>[a-z]{3,})\.(com|edu)', 
                 'tommy@usc.edu')

In [173]:
match['username']

'tommy'

In [174]:
match['domain']

'usc'

---

Sometimes we need to interact with the file system

In [175]:
import os  # library for OS-specific functions

In [176]:
os.makedirs('example_files', exist_ok=True)  # create a file (if it doesn't exit)

In [177]:
from pathlib import Path  # a new library which makes dealing with folders a breeze

In [178]:
current_folder = Path('.')

In [179]:
# print only files in the current folder
for entry in current_folder.iterdir():
    if not entry.is_dir():
        print(entry)

index.html
.DS_Store
requirements.txt
intro.ipynb
lucky.py
readme.md
workshop-4-WIP.ipynb
.gitignore
workshop-2-WIP.ipynb
workshop-1.ipynb
appendix.ipynb
appendix.md
workshop-3-WIP.ipynb


In [180]:
current_folder / 'example_files'  # traverse using the / operator

PosixPath('example_files')

---

Besides `csv` (explored in depth in the next workshop), `JSON` (JavaScript Object Notation) is another very popular data format

In [181]:
import json

In [182]:
with open('example_files/objects.json') as f:
    print(json.load(f))

[{'name': 'Alice', 'year': 2, 'grade': 3.9}, {'name': 'Bob', 'year': 3, 'grade': 3.8}, {'name': 'Chris', 'year': 1, 'grade': 3.85}]


---

Working with dates and times can be very messy, but thankfully, there nice ways to accomplish this

In [183]:
from datetime import datetime

In [184]:
datetime.now()

datetime.datetime(2019, 2, 10, 0, 55, 40, 818968)

In [185]:
datetime(year=2018, month=2, day=28)

datetime.datetime(2018, 2, 28, 0, 0)

In [186]:
# not built-in but related, and very useful
from dateparser import parse as parse_date

In [187]:
parse_date('1 day and 2 hours ago')

datetime.datetime(2019, 2, 8, 22, 55, 41, 471784)

In [188]:
parse_date('28 feb')

datetime.datetime(2019, 2, 28, 0, 0)

In [189]:
parse_date('29 jan 2000')

datetime.datetime(2000, 1, 29, 0, 0)

---

Serialization gives a method to store (and transfer) variables that cannot be easily represented by traditional file formats

In [190]:
import pickle  # because you store it away... programmer humor 😅

In [191]:
serialized = pickle.dumps(superhero_ages)  # get a binary representation of the dictionary

In [192]:
serialized  # you can save this in as a binary file

b'\x80\x03}q\x00(X\x07\x00\x00\x00Ironmanq\x01K$X\x04\x00\x00\x00Thorq\x02X\x06\x00\x00\x00variesq\x03X\t\x00\x00\x00Spidermanq\x04K\x15u.'

In [193]:
pickle.loads(serialized)  # load the binary representation back in memory

{'Ironman': 36, 'Thor': 'varies', 'Spiderman': 21}

## Topics left uncovered

This workshop is not meant to exhaustively cover every functionality of Python. If you are curious and want to learn more, you can either learn more about the topics briefly covered today, or explore other basic topics that were not touched upon in this workshop:

- classes and inheritance
- `main` and executing
- modules and importing
- user input
- iterators
- decorators
- asynchronicity
- immutability
- debugging
- code style

## Further reading
These are some of the resources that cover the important information and do so efficiently:

 - Python: 
   - [PDF cheatsheets](https://ehmatthes.github.io/pcc/cheatsheets/README.html)
   - [online cheatsheets](https://www.pythonsheets.com)
   - [interactive tutorial](https://www.codecademy.com/learn/learn-python-3)
   - [official tutorial](https://docs.python.org/3/tutorial/)
 - Regex: [official reference](https://docs.python.org/3/library/re.html)
 - Date formatting: [reference](http://strftime.org)
 - Number formatting: [reference](https://docs.python.org/3/library/string.html#formatspec)
 - IDE: [PyCharm](https://www.jetbrains.com/pycharm/) (free with .edu email)
 - Code Style: discussed in the [appendix](appendix.ipynb)