# Useful modules in standard library

---

Python comes with a built-in selection of modules which provide commonly used functionality. This is only a brief overview of a small subset of the available modules – you can see the full list, and find out more details about each one [here](https://docs.python.org/3/library/index.html).

## Date and time: `datetime`

The datetime module provides us with objects which we can use to store information about dates and times:

+ `datetime.date` is used to create dates which are not associated with a time.
+ `datetime.time` is used for times which are independent of a date.
+ `datetime.datetime` is used for objects which have both a date and a time.
+ `datetime.timedelta` objects store differences between dates or datetimes – if we subtract one datetime from another, the result will be a timedelta.
+ `datetime.timezone` objects represent time zone adjustments as offsets from UTC. This class is a subclass of `datetime.tzinfo`, which is not meant to be used directly.

Here are a few examples:

In [1]:
import datetime

# this class method creates a datetime object with the current date and time
now = datetime.datetime.today()

print(now.year)
print(now.hour)
print(now.minute)

print(now.weekday())

print(now.strftime("%a, %d %B %Y"))

long_ago = datetime.datetime(1999, 3, 14, 12, 30, 58)

print(long_ago) # remember that this calls str automatically
print(long_ago < now)

difference = now - long_ago
print(type(difference))
print(difference) # remember that this calls str automatically

2020
9
17
4
Fri, 15 May 2020
1999-03-14 12:30:58
True
<class 'datetime.timedelta'>
7732 days, 20:46:02.521399


## Mathematical functions: `math`

The `math` module is a collection of mathematical functions. These can be used on `floats` or `integers`, but are mostly intended to be used on `floats`, and usually return `floats`. Here are a few examples:

In [2]:
import math

# These are constant attributes, not functions
math.pi
math.e

# round a float up or down
math.ceil(3.3)
math.floor(3.3)

# natural logarithm
math.log(5)
# logarithm with base 10
math.log(5, 10)
math.log10(5) # this function is slightly more accurate

# square root
math.sqrt(10)

# trigonometric functions
math.sin(math.pi/2)
math.cos(0)

# convert between radians and degrees
math.degrees(math.pi/2)
math.radians(90)

1.5707963267948966

## Pseudo-random numbers: `random`

Pseudo-random number sequences are generated by some kind of predictable algorithm, but they possess enough of the properties of truly random sequences that they can be used in many applications that call for random numbers. Because pseudo-random sequences aren’t actually random, it is also possible to reproduce the exact same sequence twice. That isn’t something we would want to do by accident, but it is a useful thing to be able to deliberately while debugging software, or in an automated test.

In Python we use the `random` module to generate pseudo-random numbers, and do a few more things which depend on randomness. The core function of the module generates a random float between 0 and 1, and most of the other functions are derived from it. Here are a few examples:

In [3]:
import random

# a random float from 0 to 1 (excluding 1)
random.random()

pets = ["cat", "dog", "fish"]
# a random element from a sequence
random.choice(pets)
# shuffle a list (in place)
random.shuffle(pets)

# a random integer from 1 to 10 (inclusive)
random.randint(1, 10)

2

## Matching string patterns: `re`

The `re` module allows us to write regular expressions. Regular expressions are a mini-language for matching strings, and can be used to find and possibly replace text.

### Regular expressions

A regular expression is a string which describes a pattern. This pattern is compared to other strings, which may or may not match it. A regular expression can contain normal characters (which are treated literally as specific letters, numbers or other symbols) as well as special symbols which have different meanings within the expression.

Here are some very simple examples:

In [4]:
# this regular expression contains no special symbols
# it won't match anything except 'cat'
"cat"

# a . stands for any single character (except the newline, by default)
# this will match 'cat', 'cbt', 'c3t', 'c!t' ...
"c.t"

# a * repeats the previous character 0 or more times
# it can be used after a normal character, or a special symbol like .
# this will match 'ct', 'cat', 'caat', 'caaaaaaaaat' ...
"ca*t"
# this will match 'sc', 'sac', 'sic', 'supercalifragilistic' ...
"s.*c"

# + is like *, but the character must occur at least once
# there must be at least one 'a'
"ca+t"

# more generally, we can use curly brackets {} to specify any number of repeats
# or a minimum and maximum
# this will match any five-letter word which starts with 'c' and ends with 't'
"c.{3}t"
# this will match any five-, six-, or seven-letter word ...
"c.{3,5}t"

# One of the uses for ? is matching the previous character zero or one times
# this will match 'http' or 'https'
"https?"

# square brackets [] define a set of allowed values for a character
# they can contain normal characters, or ranges
# if ^ is the first character in the brackets, it *negates* the contents
# the character between 'c' and 't' must be a vowel
"c[aeiou]t"
# this matches any character that *isn't* a vowel, three times
"[^aeiou]{3}"
# This matches an uppercase UCT student number
"[B-DF-HJ-NP-TV-Z]{3}[A-Z]{3}[0-9]{3}"

# we use \ to escape any special regular expression character
# this would match 'c*t'
r"c\*t"
# note that we have used a raw string, so that we can write a literal backslash

# there are also some shorthand symbols for certain allowed subsets of characters:
# \d matches any digit
# \s matches any whitespace character, like space, tab or newline
# \w matches alphanumeric characters -- letters, digits or the underscore
# \D, \S and \W are the opposites of \d, \s and \w

# we can use round brackets () to *capture* portions of the pattern
# this is useful if we want to search and replace
# we can retrieve the contents of the capture in the replace step
# this will capture whatever would be matched by .*
"c(.*)t"

# ^ and $ denote the beginning or end of a string
# this will match a string which starts with 'c' and ends in 't'
"^c.*t$"

# | means "or" -- it lets us choose between multiple options.
"cat|dog"

'cat|dog'

### Using the `re` module

Now that we have seen how to construct regular expression strings, we can start using them. The `re` module provides us with several functions which allow us to use regular expressions in different ways:

+ `search` searches for the regular expression inside a string – the regular expression will match if any subset of the string matches.
+ `match` matches a regular expression against the entire string – the regular expression will only match if the whole string matches. `re.match('something', some_string)` is equivalent to `re.search('^something$', some_string)`.
+ `sub` searches for the regular expression and replaces it with the provided replacement expression.
+ `findall` searches for all matches of the regular expression within the string.
+ `split` splits a string using any regular expression as a delimiter.
+ `compile` allows us to convert our regular expression string to a pre-compiled regular expression object, which has methods analogous to the `re` module. Using this object is slightly more efficient.

Here are some usage examples:

In [5]:
import re

# match and search are quite similar
print(re.match("c.*t", "cravat")) # this will match
print(re.match("c.*t", "I have a cravat")) # this won't
print(re.search("c.*t", "I have a cravat")) # this will

# We can use a static string as a replacement...
print(re.sub("lamb", "squirrel", "Mary had a little lamb."))
# Or we can capture groups, and substitute their contents back in.
print(re.sub("(.*) (BITES) (.*)", r"\3 \2 \1", "DOG BITES MAN"))
# count is a keyword parameter which we can use to limit replacements
print(re.sub("a", "b", "aaaaaaaaaa"))
print(re.sub("a", "b", "aaaaaaaaaa", count=1))

# Here's a closer look at a match object.
my_match = re.match("(.*) (BITES) (.*)", "DOG BITES MAN")
print(my_match.groups())
print(my_match.group(1))

# We can name groups.
my_match = re.match("(?P<subject>.*) (?P<verb>BITES) (?P<object>.*)", "DOG BITES MAN")
print(my_match.group("subject"))
print(my_match.groupdict())
# We can still access named groups by their positions.
print(my_match.group(1))

# Sometimes we want to find all the matches in a string.
print(re.findall("[^ ]+@[^ ]+", "Bob <bob@example.com>, Jane <jane.doe@example.com>"))

# Sometimes we want to split a string.
print(re.split(", *", "one,two,  three, four"))

# We can compile a regular expression to an object
my_regex = re.compile("(.*) (BITES) (.*)")
# now we can use it in a very similar way to the module
print(my_regex.sub(r"\3 \2 \1", "DOG BITES MAN"))


<re.Match object; span=(0, 6), match='cravat'>
None
<re.Match object; span=(9, 15), match='cravat'>
Mary had a little squirrel.
MAN BITES DOG
bbbbbbbbbb
baaaaaaaaa
('DOG', 'BITES', 'MAN')
DOG
DOG
{'subject': 'DOG', 'verb': 'BITES', 'object': 'MAN'}
DOG
['<bob@example.com>,', '<jane.doe@example.com>']
['one', 'two', 'three', 'four']
MAN BITES DOG


### Greedy heuristics

Regular expressions are *greedy* by default – this means that if a part of a regular expression can match a variable number of characters, it will always try to match as many characters as possible. That means that we sometimes need to take special care to make sure that a regular expression doesn’t match too much. For example:

In [6]:
# this is going to match everything between the first and last '"'
# but that's not what we want!
print(re.findall('".*"', '"one" "two" "three" "four"'))

# This is a common trick
print(re.findall('"[^"]*"', '"one" "two" "three" "four"'))

# We can also use ? after * or other expressions to make them *not greedy*
print(re.findall('".*?"', '"one" "two" "three" "four"'))

['"one" "two" "three" "four"']
['"one"', '"two"', '"three"', '"four"']
['"one"', '"two"', '"three"', '"four"']


### Functions as replacement

We can use `re.sub` to apply a function to a match instead of a string replacement. The function must take a match object as a parameter, and return a string. We can use this functionality to perform modifications which may be difficult or impossible to express as a replacement string:

In [7]:
def swap(m):
    subject = m.group("object").title()
    verb = m.group("verb")
    object = m.group("subject").lower()
    return "%s %s %s!" % (subject, verb, object)

print(re.sub("(?P<subject>.*) (?P<verb>.*) (?P<object>.*)!", swap, "Dog bites man!"))

Man bites dog!


### Flags

Regular expressions have historically tended to be applied to text line by line – newlines have usually required special handling. In Python, the text is treated as a single unit by default, but we can change this and a few other options using flags. These are the most commonly used:

+ `re.IGNORECASE` – make the regular expression case-insensitive. It is case-sensitive by default.
+ `re.MULTILINE` – make ^ and $ match the beginning and end of each line (excluding the newline at the end), as well as the beginning and end of the whole string (which is the default).
+ `re.DOTALL` – make . match any character (by default it does not match newlines).

Here are few examples:

In [8]:
print(re.match("cat", "Cat")) # this won't match
print(re.match("cat", "Cat", re.IGNORECASE)) # this will

text = """numbers = 'one,
two,
three'
numbers = 'four,
five,
six'
not_numbers = 'cat,
dog'"""

print(re.findall("^numbers = '.*?'", text)) # this won't find anything
# we need both DOTALL and MULTILINE
print(re.findall("^numbers = '.*?'", text, re.DOTALL | re.MULTILINE))

None
<re.Match object; span=(0, 3), match='Cat'>
[]
["numbers = 'one,\ntwo,\nthree'", "numbers = 'four,\nfive,\nsix'"]


## Parsing csv files: `csv`

CSV stands for comma-separated values – it’s a very simple file format for storing tabular data. Most spreadsheets can easily be converted to and from CSV format.

In a typical CSV file, each line represents a row of values in the table, with the columns separated by commas. Field values are often enclosed in double quotes, so that any literal commas or newlines inside them can be escaped:

    "one","two","three"
    "four, five","six","seven"
    
Python’s `csv` module takes care of all this in the background, and allows us to manipulate the data in a CSV file in a simple way, using the reader class:

In [9]:
import csv

with open("numbers.csv") as f:
    r = csv.reader(f)
    for row in r:
        print(row)
              
# Similarly, we can write to a CSV file using the writer class:

with open('pets.csv', 'w') as f:
    w = csv.writer(f)
    w.writerow(['Fluffy', 'cat'])
    w.writerow(['Max', 'dog'])

['one', ' "two"', ' "three"']
['four', ' "five"', ' "six"', ' "seven"']


## Writing scripts: `sys` and `argparse`

Technically speaking, any Python file can be considered a script, since it can be executed without compilation. When we call a Python program a script, however, we usually mean that it contains statements other than function and class definitions – scripts do something other than define structures to be reused.

### Scripts vs libraries

We can combine class and function definitions with statements that use them in the same file, but in a large project it is considered good practice to keep them separate: to define all our classes in library files, and import them into the main program. If we do put both classes and main program in one file, we can ensure that the program is only executed when the file is run as a script and not if it is imported from another file:

In [None]:
class MyClass:
    pass

class MyOtherClass:
    pass

if __name__ == '__main__':
    my_object = MyClass()
    # do more things

### Simple command-line parameters

When we run a program on the commandline, we often want to pass in parameters, or arguments, just as we would pass parameters to a function inside our code. Unlike parameters passed to a function in Python, arguments passed to an application on the commandline are separated by spaces and listed after the program name without any brackets.

The simplest way to access commandline arguments inside a script is through the `sys` module. All the arguments in order are stored in the module’s `argv` attribute. We must remember that the first argument is always the name of the script file, and that all the arguments will be provided in string format. Try saving this simple script and calling it with various arguments after the script name:

import sys

print sys.argv

### Complex command-line parameters

The `sys` module is good enough when we only have a few simple arguments – perhaps the name of a file to open, or a number which tells us how many times to execute a loop. When we want to provide a variety of complicated arguments, some of them optional, we need a better solution.

The `argparse` module allows us to define a wide range of compulsory and optional arguments. A commonly used type of argument is the flag, which we can think of as equivalent to a keyword argument in Python. A flag is optional, it has a name (sometimes both a long name and a short name) and it may have a value. In Linux and OSX programs, flag names often start with a dash (long names usually start with two), and this convention is sometimes followed by Windows programs too.

Here is a simple example of a program which uses `argparse` to define two positional arguments which must be integers, a flag which specifies an operation to be performed on the two numbers, and a flag to turn on verbose output:

In [None]:
import argparse
import logging

parser = argparse.ArgumentParser()
# two integers
parser.add_argument("num1", help="the first number", type=int)
parser.add_argument("num2", help="the second number", type=int)
# a string, limited to a list of options
parser.add_argument("op", help="the desired arithmetic operation", choices=['add', 'sub', 'mul', 'div'])
# an optional flag, true by default, with a short and a long name
parser.add_argument("-v", "--verbose", help="turn on verbose output", action="store_true")

opts = parser.parse_args()

if opts.verbose:
    logging.basicConfig(level=logging.DEBUG)

logging.debug("First number: %d" % opts.num1)
logging.debug("Second number: %d" % opts.num2)
logging.debug("Operation: %s" % opts.op)

if opts.op == "add":
    result = opts.num1 + opts.num2
elif opts.op == "sub":
    result = opts.num1 - opts.num2
elif opts.op == "mul":
    result = opts.num1 * opts.num2
elif opts.op == "div":
    result = opts.num1 / opts.num2

print(result)