Andreas Bollig - Freelance Data Scientist - andreas.bollig@communityredi-school.org

# Lecture 13 - Intro to Libraries + Python Standard Library

* Library (called **package** in Python): collection of modules that extends the functionality of Python
    * Examples:
        * Pandas: Computation on tables of data
        * Django: Framework for web applications
        * Matplotlib: Data visualization
* Module: File with extension .py, which contains reusable Python code. Example: `greetings.py` 

-> show `greetings.py`

## Importing from a module

Note: The first time you import something from a module, its code is executed!

In [1]:
import greetings

You imported greetings


In [2]:
greetings.hello("Anne")

'Hello Anne!'

In [6]:
greetings.whats_up('Henning')

"What's up Henning?"

In [3]:
from greetings import howdy

howdy("Paul")

'Howdy Paul!'

In [5]:
from greetings import hello
hello('Paul')

'Hello Paul!'

In [7]:
import greetings as g

g.whats_up("Jennifer")

"What's up Jennifer?"

In [None]:
from greetings import whats_up as sup

sup("Kevin")

## Module search

`greetings` can be imported because `greetings.py` is in the same folder as this notebook. What other places does the Python interpreter look when I try to import something?

1. Built-in modules (built into the Python interpreter, e.g., `sys`)
2. .py files in `sys.path` (in the order of `sys.path`)

`sys.path` is initialized with

* The current directory you are in when starting the Python interpreter by typing `python` into the shell, or the directory containing the script you are running with `python my_script.py`
* Directories listed in the `PYTHON_PATH` environment variable
* The default paths provided by your Python distribution (e.g. Anaconda) -> this includes the libraries you installed with `pip`

In [None]:
# What is in my sys.path?

import sys

sys.path

In [None]:
# The directory I ran `jupyter notebook` in

import os

os.getcwd()

In [None]:
# I didn't specify PYTHON_PATH
os.getenv("PYTHON_PATH", "empty!")

In [None]:
# You can modify sys.path in your code (it's just a list of strings)

sys.path = ["my_secret_module_collection"] + sys.path

sys.path

In [None]:
# Then I can import modules from the path I added

from my_utils import add_one

add_one(1)

## Packages

A package is a directory containing a file called `__init__.py` ("dunder init dot p y")

Example:
```
sound/                          Top-level package
      __init__.py               Initialize the sound package
      formats/                  Subpackage for file format conversions
              __init__.py
              wavread.py
              wavwrite.py
              aiffread.py
              aiffwrite.py
              auread.py
              auwrite.py
              ...
      effects/                  Subpackage for sound effects
              __init__.py
              echo.py
              surround.py
              reverse.py
              ...
      filters/                  Subpackage for filters
              __init__.py
              equalizer.py
              vocoder.py
              karaoke.py
              ...
```

I created a scaffold of this folder structure in the current working directory.

In [8]:
import sound.effects.echo

sound.effects.echo.echofilter([3, 5, 7, 9, 1, 6, 3, 0])

Not implemented!!


In [None]:
from sound.effects import echo

echo.echofilter([3, 5, 7, 9, 1, 6, 3, 0])

In [None]:
from sound.effects.echo import echofilter

echofilter([3, 5, 7, 9, 1, 6, 3, 0])

## Python Package Index (PyPI -> "py p i")

* https://pypi.org/
* Central repository of Python packages (currently over 300k packages)
* Example: https://pypi.org/project/pandas/ -> look around
* Install packages with `pip install <package name>`

### More about packages 

* `conda` package manager (Anaconda package repository, conda-forge, ...), `poetry`, `pipenv`, ... 
* Dependency definition files
* Virtual environments

---> Lecture 17: "Dependency Management 101"

### How do I publish my own package on PyPI?

Out of scope of this course, but you can start here: https://packaging.python.org/

## [Python Standard Library](https://docs.python.org/3/library/index.html) ("What do you mean, batteries included?")

* Python's standard library is huge. I'll pick a couple of examples that I use every day.

### Pick #1: datetime + calendar

In [9]:
from datetime import datetime, date, timedelta

In [11]:
# current date and time

now = datetime.now()

now

datetime.datetime(2021, 11, 3, 19, 32, 0, 272410)

In [12]:
# current date

today = date.today()

today

datetime.date(2021, 11, 3)

In [18]:
tomorrow = today + timedelta(days=1)

tomorrow

datetime.date(2021, 11, 4)

In [16]:
if now.day == 1:
    print("Payday, yay!")

In [19]:
tomorrow.day

4

In [None]:
if now.weekday == 4:  # Friday -> weekday goes from 0 to 6 for Monday to Sunday
    print("Hoooray, weekend!")

In [20]:
# Format datetimes as strings and parse datetimes from strings
# https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

my_datetime = datetime(2021, 5, 4)

my_datetime

datetime.datetime(2021, 5, 4, 0, 0)

In [21]:
# German

my_datetime.strftime("%d.%m.%Y")

'04.05.2021'

In [22]:
# American

my_datetime.strftime("%m/%d/%Y")

'05/04/2021'

In [23]:
# ISO System
# example: 2021-11-03
my_datetime.strftime("%Y-%m-%d")

'2021-05-04'

In [24]:
# parse formatted datetime

datetime.strptime("05/04/2021", "%m/%d/%Y")

datetime.datetime(2021, 5, 4, 0, 0)

In [None]:
# (Year, calendar week, weekday) -> date
# Note, that here, weekday goes from 1 to 7 for Monday to Sunday
# New in Python 3.8!

date.fromisocalendar(2021, 18, 3)

In [None]:
import calendar

# number of days in month

for i in range(12):
    year = 2010 + i
    n_days_in_month = calendar.monthrange(year, 2)[1]
    print(f"{year}: {n_days_in_month}")

More infos: 

* https://docs.python.org/3/library/datetime.html
* https://docs.python.org/3/library/calendar.html

Pro tip: check out python-dateutil: https://dateutil.readthedocs.io/en/stable/index.html

### Pick #2: itertools

In [26]:
import itertools

In [27]:
# all possible combinations of size n

team = ["Anne", "Paul", "Jennifer", "Kevin", "Lea", "Matthew"]

get_to_know_lunches = list(itertools.combinations(team, 2))

get_to_know_lunches

[('Anne', 'Paul'),
 ('Anne', 'Jennifer'),
 ('Anne', 'Kevin'),
 ('Anne', 'Lea'),
 ('Anne', 'Matthew'),
 ('Paul', 'Jennifer'),
 ('Paul', 'Kevin'),
 ('Paul', 'Lea'),
 ('Paul', 'Matthew'),
 ('Jennifer', 'Kevin'),
 ('Jennifer', 'Lea'),
 ('Jennifer', 'Matthew'),
 ('Kevin', 'Lea'),
 ('Kevin', 'Matthew'),
 ('Lea', 'Matthew')]

More info: https://docs.python.org/3/library/itertools.html

## Pick #4: time

In [29]:
# Measure how long something takes

from time import time, sleep


def slow_func(n):
    sleep(3)
    print(n)


start = time()  # start the stopwatch, keep note of start time in variable start

slow_func("hello")  # run our function

end = time()  # stop the stopwatch

print(f"This took {round(end - start, 9)} seconds")

hello
This took 3.007156372 seconds


### Pick #3: functools

In [31]:
from functools import cache, partial

# PLEASE NOTE: THE CACHE FUNCTION WAS ONLY ADDED IN PYTHON 3.9
# YOU CAN CHECK YOUR VERSION OF PYTHON AS FOLLOWS

import sys

sys.version

'3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) [MSC v.1916 64 bit (AMD64)]'

In [32]:
# Cache slow computations

cached_slow_func = cache(slow_func)

start = time()
cached_slow_func("wow")
print(f"This took {round(time() - start, 1)} seconds")

wow
This took 3.0 seconds


In [33]:
start = time()
cached_slow_func("wow")
print(f"This took {round(time() - start, 1)} seconds")

This took 0.0 seconds


In [None]:
start = time()
cached_slow_func("still fast?")
print(f"This took {round(time() - start, 1)} seconds")

In [None]:
start = time()
cached_slow_func("still fast?")
print(f"This took {round(time() - start, 1)} seconds")

In [35]:
# partial function execution


def add_number(n, m):
    print(f'n={n}, m={m}')
    return n + m


add_one = partial(add_number, 1)
add_two = partial(add_number, 2)

add_two(3)

n=2, m=3


5

### Pick #4: os + shutil

Working with files and more

In [None]:
import os
import shutil

In [None]:
with open("test_file.txt", "w") as f:
    f.write("hello\n")

In [None]:
os.listdir()

In [None]:
shutil.move("test_file.txt", "new_name.txt")
os.listdir()

In [None]:
os.remove("new_name.txt")
os.listdir()

### Pick #5: json + pickle

Serialize everything

In [36]:
import json
import pickle

In [37]:
class employee:
    def __init__(self, name, age):
        self.name = name
        self.age = age


my_employee = employee("George", 58)

# employee_dict = {"name": "George", "age": 58}

In [41]:
my_employee

<__main__.employee at 0x25ac70e4d30>

In [None]:
# JSON can serialize lists, dicts and primitive data types just fine

with open("employee_file.json", "w") as f:
    json.dump(employee_dict, f)

In [None]:
# !cat employee_file.json # This only works on Linux and MacOS

In [None]:
# But it can't serialize custom objects

try:
    with open("employee_file.json", "w") as f:
        json.dump(my_employee, f)
except TypeError as e:
    print(e)

In [42]:
# pickle can serialize almost anything

with open("employee_file.pickle", "wb") as f:
    pickle.dump(my_employee, f)

In [43]:
with open("employee_file.pickle", "rb") as f:
    unpickled_employee = pickle.load(f)

unpickled_employee

<__main__.employee at 0x25ac6fe0100>

In [45]:
unpickled_employee.name

'George'

More info:

* https://docs.python.org/3/library/pickle.html
* https://docs.python.org/3/library/json.html

### Wrap Up

For more info about the standard library, check out the two part tour: 

* Part 1: https://docs.python.org/3/tutorial/stdlib.html
* Part 2: https://docs.python.org/3/tutorial/stdlib2.html

# Excercises

### Exercise 1

Create a Python module containing a simple function. Import the module into this notebook and execute the function.

In [None]:
from advmath import plus_two

plus_two(3)

### Exercise 2

What date is 71 days from today?

In [None]:
date.today() + timedelta(71)

### Exercise 3

Which weekday is Halloween this year? Hint: it's on October 31.

In [None]:
weekdays = [
    "Monday",
    "Tuesday",
    "Wednesday",
    "Thursday",
    "Friday",
    "Saturday",
    "Sunday",
]


halloween_weekday = datetime(2021, 10, 31).weekday()
halloween_weekday_name = weekdays[halloween_weekday]

print(f"This year, halloween is on a {halloween_weekday_name}")

### Exercise 4

Write a function 'format_date' that formats a date so it looks like the following example

`print(format_date(datetime(2021, 5, 5)))` -> `Wednesday, May 05, 2021`

In [None]:
def format_date(dt):
    return dt.strftime("%A, %b %d, %Y")


format_date(datetime(2021, 5, 5))

### Exercise 5

What if I want to serialize a dictionary into a string instead of serializing it into a file? Maybe I need this to send the JSON string to a webserver.

Research how this can be done and program an example

In [None]:
my_dict = {"name": "Michael", "occupation": "Boss"}

json.dumps(my_dict)

### Exercise 6 (Bonus)

Write a random coffee meeting generator that produces pairs of employees from the list `team` (see above) such that no two people have a random coffee meeting twice. A list of random coffee meetings that already occurred is provided.

Example output: "This week's random coffees: [('Anne', 'Kevin'), ('Paul', 'Jennifer'), ('Lea', 'Matthew')]"

Hints: 

* You can use `itertools.permutations` to create all possible permutations of a list
* You can check if a tuple is in a list like this: `("Anne", "Paul") in already_had_coffee` -> True

In [None]:
already_had_coffee = [
    ("Anne", "Paul"),
    ("Jennifer", "Lea"),
    ("Kevin", "Matthew"),
    ("Anne", "Jennifer"),
]


def random_coffee_generator(team, already_had_coffee):
    for permutation in itertools.permutations(team, 6):
        permutation = list(permutation)

        #     print(f"Trying {permutation}")

        random_coffees = []
        while len(permutation) > 0:
            participant1 = permutation.pop(0)
            participant2 = permutation.pop(0)
            random_coffees.append((participant1, participant2))

        constellation_valid = True
        for done in already_had_coffee:
            if done in random_coffees:
                constellation_valid = False

        if constellation_valid:
            return f"These are this week's random coffees: {random_coffees}"


random_coffee_generator(team, already_had_coffee)

# Homework

* If you didn't finish all exercises in the class, do the remaining ones as homework
* Read up on Python modules: https://docs.python.org/3/tutorial/modules.html
* Skim the TOC of the Python Standard Library reference and check out things that you find interesting: https://docs.python.org/3/library/index.html

In [None]:
1. Open a console
2. run "pip install numpy pandas matplotlib"