<div align=right>
<img src="img/logosmall.png" width="100px" align=right>
</div>

# The Python standard library

<div class="alert alert-warning">
Parts of this section have been adapted from copyrighted material in *Model, ML:
Bioinformatics Programming using Python (2009)*.

**Please do not distribute it!**

The Python *standard library* is — as we've seen before — a collection of Python modules that are distributed along with the Python interpreter.  Hence they are, in every sense of the word, *standard* — you can rely on them being available wherever Python is installed.

Apart from being standard, they are also of uniformly good quality, undergoing the same stringent design criteria as Python itself.  You could say that they come with a "seal of approval".

The standard library covers a lot of fields of endeavour, with the proviso that they must be of *general* interest.  So we're unlikely to encounter anything in the standard library that will parse a bioinformatics-specific file format like BAM or VCF, but we *are* likely to find tools to work with file formats of general interest like HTML, XML, CSV, etc.

That said, the standard library contains an astonishingly broad and useful collection of modules, and it's time we take a look at some of them.

We've already encountered a couple of standard library modules throughout this course — modules like `random` and `math`.  Basically, almost every time we've used the `import` statement to date, we've called upon a standard library module.

The standard library modules covered in this Notebook are not in any sense special or unusual;  they've been picked semi-randomly as examples of things that might be useful in our field.  You're strongly encouraged to spend a good while looking at the standard library documentation go get an idea of what else might lurk in there that could be very useful to you!

## The standard library documentation

Like the rest of the Python distribution, the standard library is documented exhaustively.  Python provides a number of different ways of getting at this documentation.

If you remember nothing else, remember this URL:

<http://docs.python.org>

Once you arrive at the Python documentation home, make sure that you're viewing the documentation for the right version of Python.  As of this writing, you'll be automatically redirected to documentation for the latest version of Python 3, but you can change this with the drop-down box at the top left of the page.

At the top right of the page, you'll see the word [modules](https://docs.python.org/3/py-modindex.html).  Click it, and you'll be taken straight to the standard library *module index*.  Here's the exact URL for the Python 3 module index:

<https://docs.python.org/3/py-modindex.html>

# `datetime` — working with dates and times

The `datetime` module defines classes that represent a date, a time, a combination of date and time, and a couple of others. It also defines some basic ways of manipulating instances of these classes.

The official documentation for the `datetime` module can be found here:

<https://docs.python.org/3/library/datetime.html>

>Let's take a step back and consider why dates and times are an ideal candidate for including in the standard library:

>Date- or time-related information is something every programmer has to manipulate from time to time (pun intended).  While everyone *understands* how dates and times work, such manipulations still remain painful due to the non-decimal nature of our timekeeping system.

>Because date and time manipulations are *widely required* and yet *non-trivial* enough to be annoying, they're typically the sort of thing you'd expect Python to include in its standard library.

## Classes

The `datetime` module defines several classes. Their representations are simple, consisting only of several fields which have primitive types. This allows them to define `__repr__` methods that print exactly what you would enter to create one. You’ll see such output when you enter into the interpreter an expression whose value is an instance of one of these classes. They also define `__str__` methods for the more human-oriented output used by print.  Among the classes defined in `datetime` are:

`datetime.date`
* Represents a date with attributes year, month, and day

`datetime.time`
* Represents a time with attributes hour, minute, second, microsecond, and tzinfo (time zone information)

`datetime.datetime`
* Represents a combination of date and time with the attributes of each

`datetime.timedelta`
* Represents the difference between two dates, two times, or two datetimes, with attributes days, seconds, and microseconds

`datetime.tzinfo` and `datetime.timezone`
* Represents time zone information

>Time zones are messy and complicated, and we'll ignore them here.

Instances of the classes in this module are *immutable*. There are methods to create one instance based on the values of another’s attributes, but once an instance is created its attribute values cannot be changed.

## Instance creation

As we should know by now, we have to `import` a module before we can use it in our code:

In [2]:
import datetime

Thereafter, instances of classes in the `datetime` module are created using the standard calling syntax.  Let's look at some of them:


Create a variable to represent your birthday using `datetime.date`.

>First take a note of *my* birthday in the default example below, so that you can send me a card next year.

In [3]:
# The syntax is:
# datetime.date(year, month, day)

my_birthday = datetime.date(1970, 1, 22)
print("My birthday is on", my_birthday)

My birthday is on 1970-01-22


Note that the *default* text representation of a `datetime.date` object (as returned by its `__str__()` method) will differ based on how your computer is configured to represent dates.  My computer is set up to use the ISO `YYYY-MM-DD` representation, but yours may differ.  We'll see later in this section how to enforce a specific representation.

`datetime.time` can be used to create an object that represents a certain time of day without any date information.  Note that in the meta-syntax that's generally used to give the syntax of the instantiation call, square brackets denote an optional argument.  Thus, the following syntax, means we can create a `datetime.date` object with arbitrary precision, down to microseconds:

In [4]:
# datetime.time(hour[, minute[, second[, microsecond[, tzinfo]]]])

noon = datetime.time(12)       # hour only
lunch = datetime.time(14, 30)  # hour and minutes

print("Noon is", noon)
print("Lunch is at", lunch)

Noon is 12:00:00
Lunch is at 14:30:00


    datetime.datetime(year, month, day[, hour[, minute[, second[, microsecond[, tzinfo]]]]])
    
    datetime.timedelta([days[, seconds[, microseconds[, milliseconds[, minutes[, hours[, weeks]]]]]]])

The classes also provide methods to create instances based on the current date and/or time:

In [5]:
# datetime.date.today()

print("The date today is", datetime.date.today())

# datetime.datetime.now([tz])

print("The time right now:", datetime.datetime.now())

# datetime.datetime.combine(date, time)

The date today is 2017-03-09
The time right now: 2017-03-09 20:21:43.238478


    
That last one returns a `datetime` with the date portion taken from `date` and the time portion taken from `time`

Instances of the various `datatime` classes have methods which can create new instances — sometimes of the same class as the original instance, and sometimes of a different `datetime` class:

`my_date.replace(year, month, day)`
* Returns a new `datetime.date` with the same attribute values as `my_date`, except for those for which new values are specified (usually as keyword arguments)

`my_time.replace([hour[, minute[, second[, microsecond]]]])`
* Returns a new `datetime.time` with the same attribute values as `my_time`, except for those given new values by whichever keyword arguments are specified

`my_date.date()`
* Returns a new `datetime.date` with the day, month, and year of `my_date`

`my_time.time()`
* Returns a new `datetime.time` with the hour, minute, second, and microsecond of `my_time`

`my_datetime.replace([year[, month[, day[, hour[, minute[, second[, microsecond]]]]]]])`
* Returns a new `datetime.datetime` with the same attribute values as `my_datetime`, except for those given new values by whichever keyword arguments are specified

## Operations

A number of arithmetic and comparison operators are overloaded to work with instances of the various `datetime` classes.  For instance, here are some of the arithmetic operations supported by instances of `datetime.date`:


| Operation | Result |
|-----------|--------|
| `date1 + timedelta1` | `date` + `timedelta.days` |
| `date1 - timedelta1` | `date1 - timedelta1.days` |
| `date1 - date2` | a `timedelta` |
| `date1 < date2` | `True` if `date1` is earlier than `date2` |

The other comparison operators work as you'd expect.

The same operations are supported by instances of `datetime.datetime` with equivalent interpretations. However, only comparison operations supported are supported by `datetime.time`.  As always, refer to the module documentation for the details!

Here are some of the operations that can be performed with instances of `datetime.timedelta`.

| Operation | Result |
|-----------|--------|
| `t1 + t2` | sum |
| `t1 - t2` | difference |
| `t1 * i`, `i * t1` | product of `t1` and integer `i` |
| `t1 // i` | floor division, as with numbers |
| `-t1` | a new `timedelta` whose attribute values are the negation of `t1`’s
| `abs(t1)` | a copy of `t1`, except that a negative value of `days` is made positive
| `==`, `!=`, `<`, `<=`, `>`, `>=` | the usual comparisons |

And once again, refer to the documentation for further details.

## Methods

In addition to being able to access the attributes of instances of `datetime` classes (e.g. `day`, `month`, `year`, …) using the usual dot notation, thse classes also define some useful methods.  (There are many more; refer to the documentation!)

`my_date.isoweekday()`
* Returns the day of the week of `my_date` as an integer, with Monday = 1

`my_date.isocalendar()`
* Returns a tuple of `my_date`’s values with the following elements: (ISO year, ISO week number, ISO weekday)

`my_date.isoformat([sep])`
* Returns a string representing `my_date` in the ISO standard format: `YYYY-MM-DD`

`my_time.isoformat([sep])`
* Returns a string representing `my_time` in the ISO standard format: `HH:MM:SS.mmmmmm`, omitting the microseconds if 0

`my_datetime.isoweekday()`
* Returns the day of the week of `my_datetime` as an integer, with Monday = 1

`my_datetime.isocalendar()`
* Returns a tuple of `my_datetime`’s values with the following elements: (ISO year, ISO week number, ISO weekday)

`my_datetime.isoformat([sepchar])`
* Returns a string representing my_datetime in the standard format: `YYYY-MM-DDsepcharHH:MM:SS.mmmmmm`, omitting the microseconds if 0; `sepchar` is an optional one-character string that separates the date and time portions of the string (`'T'`, if omitted)

---

## Exercises

Do you remember where you were at midnight on 1 January 2000?  (If not, consider drinking less at parties.)

Work out exactly how many seconds ago that was *right now*.

> Hint: `datetime.timedelta` instances have a method `total_seconds()`

Create a `datetime.date` variable called `last_christmas` that stores the date of last Christmas, i.e. 25 December last year.  (Don't hard-code the year;  make sure it always contains the date of *last* Christmas.)

Use `last_christmas` to create a `datetime.date` variable called `the_very_next_day` that stores the date of the day *after* last Christmas.

In [None]:
# Your code goes here

print(last_christmas, ", I gave you my heart", sep = '')
print("But", the_very_next_day, "you gave it away")

---

# Files and the filesystem

In this section we’ll look at some standard library modules for working with the files, and with the filesystem.

## `os` — operating system access

The `os` module contains a few values that relate to the environment, but mostly, it provides operations for manipulating files and directories. It implements many low-level operations that you are unlikely to use, but it also contains some that you will use frequently.  After all, shuffling files and directories around is part of almost any data analysis workflow!

The official documentation for the `os` module can be found here:

<https://docs.python.org/3/library/os.html>

>I'm not testing these Notebooks on a Windows computer as I write them.  I *think* all the examples in this section will work on Windows, but I can't guarantee that they will.

## Environment access

The `os` module provides a few useful variables:

`os.sep`

* The string used to separate path components: forward slash(`'/'`) on UNIX-based systems and backslash (`r'\'`) on Windows.

In [None]:
import os

print("The path separator on my system is: '", os.sep, "'", sep='')

`os.environ`
* A dictionary containing the names and values of environment variables obtained from the operating system environment from which you started the Python interpreter.  (Note that a function `os.getenv()` also exists to extract the value of just a single environment variable.)

In [None]:
print(list(os.environ.keys()))

In [None]:
os.environ['PATH']

## Managing files and directories

`os` provides various functions that mimic the kinds of file and directory management commands available from the operating system command line — the shell prompt on Linux or OS X, or the command prompt on Windows.

>Why would you want to use Python to manipulate files and directories rather than use a purpose-built tool like the shell (e.g. by writing a shell script)?  For several reasons, but among them:

>* Writing all your code in a single language (in this case, Python) eliminates external dependencies and potential points of failure.
* File and directory manipulation code written in Python works across platforms.

### Directories

Common directory-related functions include:

| Function | UNIX equivalent | use case |
| ---------|-----------------|----------|
| `os.getcwd()` | `pwd` | returns the current working directory |
| `os.chdir(path)` | `cd` | makes `path` the working directory |
| `os.mkdir(path)` | `mkdir` | creates a directory at `path` |
| `os.mkdirs(path)` | `mkdir -p` | creates all the directories along `path` |
| `os.rmdir(path)` | `rmdir` | removes the directory at `path` |
| `os.removedirs(path)` | `rm -rf`| removes all the directories along `path` |

When executed from a Notebook, `os.getcwd()` will show the directory where the Jupyter process is running:

In [None]:
os.getcwd()

### Files

Useful file-manipulation functions include the following:

| Function | UNIX equivalent | use case |
|-|-|-|
| `os.remove(path)` | `rm path` | Removes the file specified by `path` |
| `os.rename(src, dst)` | `mv src dst` | Renames the file at path `src` to `dst` |

### Directory contents

Some very useful functions in `os` deal with the contents of directories:

`os.listdir(path)`

This function simply returns a list of the names of the files and directories in the directory specified by its argument.  (Note that the list is ordered arbitrarily.)  Let's see the files present in the directory where Jupyter is running:

In [None]:
print(os.listdir('.'))

`os.walk(path)`

This function almost deserves a section of its own. Its purpose is to produce the names of all the files and directories at path and below.  (Somewhat equivalent to the UNIX `find` command.) For each directory encountered starting at path, it produces a tuple of the form:

    (directory-path, subdirectory-names, filenames)
    
The directory-path in the tuple is a string indicating the path to the directory starting at the initial path argument. The two remaining elements of the tuple are lists; they contain directory names and filenames only, not full paths. The function has several optional parameters you can research in the documentation.

The value returned by `os.walk` is an iterable with some interesting properties, that supports using it in the following way:  Each time around an iteration over the result of `os.walk` you can remove elements from the `subdirectory-names list`, and the walk will skip those as it descends into the directory hierarchy. This is useful for ignoring directories such as those beginning with a period or those containing some kind of configuration information that you don’t want included in the walk.

The following example demonstrates a simple of use of `os.walk`. It prints the names of all the directories and filenames in an initial path and below, ignoring any directories with names starting in a period.

The function `show_in_path()` obtains the information to be printed and `show_directory_contents()` prints it. Indentation is added by `show_in_path()` to indicate the depth of each file and directory name.  (Depth is calculated by simply counting the number of separator characters in the path.)

In [None]:
def show_directory_contents(dirpath, filenames, level):
    print('    ' * level, dirpath, sep='')
    for name in filenames:
        print('    ' * (level + 1), name, sep='')

def show_in_path(startpath, ignoredots=True):
    print(startpath)
    for path, dirnames, filenames in os.walk(startpath):
        for dirname in dirnames:
            if dirname.startswith('.'):
                dirnames.remove(dirname)
        show_directory_contents(path[len(startpath) + 1:],       # strip dirpath
                                filenames,
                                path.count(os.sep))

## `os.path` — manipulating paths

`os.path` is a submodule of `os` that contains functions for manipulating *path strings* (as opposed to the actual files and directories at those paths).

This bears repeating:  `os.path` is for manipulating **strings** that represent paths, not the physical files and directories residing at those paths.

`os.path` is significant enough to warrant its own separate documentation page on `python.org`, despite being "only" a submodule:

<https://docs.python.org/3/library/os.path.html>

### Path components

One of the most frequent kinds of manipulations you will do to file paths is breaking them apart into their components in various ways. The `os` module provides these functions for that purpose:

`os.path.split(path)`
* Returns a 2-element tuple where the first element is the directory part of the path string (if any), and the second is the filename part of the path string (if any)

In [None]:
bam_path = "/data/output/analyzed/20160122/s12553_components.bam"

os.path.split(bam_path)

`os.path.dirname(path)`

* Returns only the directory part of the path string

In [None]:
os.path.dirname(bam_path)

`os.path.basename(path)`

* Returns the file part of the path string

In [None]:
os.path.basename(bam_path)

`os.path.splitext(path1)`
* Returns a 2-element tuple where the second element is the filename *extension* (if any), and the first part is the rest of the path string:

In [None]:
os.path.splitext(bam_path)

This makes more sense when applied to the result of `os.path.split()` or `os.path.basename()`:

In [None]:
dirpath, filename = os.path.split(bam_path)
basename, extension = os.path.splitext(filename)
print("Dirpath:\t{}\nBasename: \t{}\nExtension:\t{}".format(dirpath, basename, extension))

As we saw before, the function `os.listdir(dirpath)` returns a list of the names of all the files and directories that are in the directory at `dirpath`. The result often contains a lot of uninteresting files, such as editor backup files, `.pyc` files, and so on.  Hence, we often want to filter this list:

In [None]:
def filtered_directory_listing(dirpath='.', ignore_extensions=['.pyc', '.bak']):
    for filename in os.listdir(dirpath):
        if os.path.splitext(filename)[1] not in ignore_extensions:
            yield filename

print(*filtered_directory_listing(ignore_extensions=['.ipynb']), sep=', ')

Can you rewrite `filtered_directory_listing()` to use a single generator comprehension instead of a nested `for` and `if`?  Does it make it much shorter or more readable in this case?

In [None]:
# Rewrite in terms of a generator comprehension:

def filtered_directory_listing(dirpath='.', ignore_extensions=['.pyc', '.bak']):

    

### Path manipulations

Many programs deal with paths that have no or only a few directory names, but need full paths for certain purposes. The following functions expand or join paths in ways that would be difficult, or at least laborious, to program yourself:

`os.path.realpath(path)`
* Returns a canonical version of `path`, eliminating any symbolic links along the way

`os.path.expanduser(path)`
* Returns path with an initial `~` or `~username` replaced by the user’s home directory

In [None]:
os.path.expanduser('~')

`os.path.join(p1, p1, ...)`
* Returns a path formed by joining the arguments appropriately; roughly similar to the expression `os.sep.join(p1, p1, ...)` except it doesn’t add separators after arguments that already end in a separator and is therefore preferable

In [None]:
os.path.join("data", "output", "analyzed", "20160122", "s12553_components.bam")

### Path information

Often, you just need to test something about a path before doing something with it. Here are three predicates and a function to get the size of a file:

`os.path.exists(path)`
* Returns `True` if there is there a file or directory at path

`os.path.isfile(path)`
* Returns `True` if there is there a (normal) file at path

`os.path.isdir(path)`
* Returns `True` if there is there a directory at path

`os.path.getsize(path)`
* Returns the size in bytes of the file at path

In [None]:
os.path.exists('mydir')

In [None]:
os.mkdir('mydir')
os.path.exists('mydir')

In [None]:
os.path.isdir('mydir')

In [None]:
os.rmdir('mydir')
os.path.exists('mydir')

The following function calculates the collective size of all files in a directory at a given path, including hidden files starting in `.`, but ignoring any subdirectories.

In [None]:
def directory_size(path='.'):
    """Sum of the sizes of all files in the directory at path, including
    those beginning with a '.', and ignoring subdirectories
    
    """
    result = 0
    for item in os.listdir(path):
        if os.path.isfile(item):                   # the test
            result += os.path.getsize(os.path.join(path, item))
    return result

def bytes_to_mb(B):
    return round(B/1024/1024, 2)

print(bytes_to_mb(directory_size()), "MB", sep='')

This function could also have been written in terms of a generator comprehension:

In [None]:
def directory_size1(p='.'):
    """Sum of the sizes of all files in the directory at path, including
    those beginning with a '.', and ignoring subdirectories
    
    """
    return sum(os.path.getsize(os.path.join(p, i)) for i in os.listdir(p) if os.path.isfile(i))

print(bytes_to_mb(directory_size1()), "MB", sep='')

Here's an example that prints a directory structure in hierarchical form as a recursive function. Each call to the function provides just a path. The function gets the tree below that path from a call to `os.listdir()`.

In [None]:
def dirtree(path='.', ignoredots=True, level=0):
    print_path(path, level)                        # "do something" with tree root and level
    for name in os.listdir(path):                  # repeat with the rest of the tree
        if not(ignoredots and name.startswith('.')):
            subpath = os.path.join(path, name)
            if os.path.isdir(subpath):
                dirtree(subpath, ignoredots, level + 1)

def print_path(path, level):
    print(' ' * 3 * level, path, sep='')
    
dirtree()

## `glob` — filename expansion

The `glob` standard library module reproduces the wildcard expansion operations of command-line shells.  It uses the standard command-line matching syntax, reiterated in the table below.  (Don't confuse this with the much more powerful and expressive *regular expressions*.)

The documentation for this module is here:

<https://docs.python.org/3/library/glob.html>

A quick refresher on shell wildcards (or "globbing patterns"):

| Pattern | Meaning |
|---------|---------|
| `*`     | match 0 or more characters |
| `?`     | natch a single character |
| `[characters]` | match any of the characters inside the brackets |
| `[!characters]` |match any of the characters *except* those inside the brackets |

The glob module can return results either as a list or as a generator, depending on which function you call. The generator version is important for when there are a very large number of filenames matching the pattern, only some of which need to be used. The two relevant functions are:

`glob.glob(pattern)`
* Returns a list of paths (strings) that match pattern (which doesn’t necessarily contain wildcards)

`glob.iglob(pattern)`
* Same as `glob.glob()`, but returns an iterator instead of a list

>The `glob` module's functions do not expand tildes and environment variable names in paths; for that you need `os.path.expandvars()`.

For example, to do something to each Python file in the current directory:

```python
for filename in glob.iglob('*.py'):
    # do something
```

>Another standard library module — `fnmatch` — also does wildcard matching on filenames.  However, it works solely on filename *strings* and not the actual contents of directories.  It, too, can often be useful, e.g. when filtering a list of filenames returned by a function like `os.listdir()`.

## Comparing Files and Directories

You may be familiar with using a feature of an editor or even a separate application to compare the contents of two files or two directories, or you may have used `diff` on the UNIX command line to do so.

File comparison operatins can have many practical uses in programs, too. For example, you may need to extract the differences between two versions of a program’s output, or you might need to know if the results of some BLAST queries you’ve run are any different from the previous results.

Directory comparison is similarly useful: you can compare the contents of directories that contain files produced by programs you are using and extract or report the differences.

### `filecmp` — file and directory comparison

The `filecmp` module provides powerful and easy-to-use tools to do file comparison, and is documented here:

<https://docs.python.org/3/library/filecmp.html>

Here are some of the functions it provides:

`filecmp.cmp(filepath1, filepath2)`
* Compares the files at `filepath1` and `filename2` and returns `True` if their contents are equal

`filecmp.cmpfiles(directorypath1, directorypath2, filepaths)`
* Compares the files in the list `filepaths` in the directory at `directorypath1` with the corresponding files in the directory at `directorypath2`, and returns a tuple `(matches, mismatches, errors)`.

  `matches`
  * A list of files in `filepaths` whose contents were the same in both directories

  `mismatches`
  * A list of files in `filepaths` that were in both directories but had different contents

  `errors`
  * A list of files in `filepaths` that were not found in both directories or that caused some kind of error when an attempt was made to read them (e.g., because of inadequate user permissions)

`filecmp.dircmp(directorypath1, directorypath2, hidenames)`
* Creates an instance of the class `filecmp.dircmp` to compare the directories at directorypath1 and directorypath2, with hidenames a list of names to ignore.

Instances of `filecmp.dircmp` implement the following methods that print fairly elaborate reports to sys.stdout:

`report()`
* Prints a comparison between the two directories

`report_partial_closure()`
* Prints a comparison of the two directories as well as of the immediate subdirectories of the two directories

`report_full_closure()`
* Prints a comparison of the two directories, all of their subdirectories, all the subdirectories of those subdirectories, and so on (i.e. recursively)

In addition, many details of the comparisons that the reporting methods print out may be accessed directly as attributes. Their names use the syllable “left” for what was found in directorypath1 and “right” for what was found in directorypath2. The attributes are:

`left_list`
* The names of files and subdirectories found in directorypath1, not including elements of hidelist

`right_list`
* The names of files and subdirectories found in directorypath2, not including elements of hidelist

`common`
* The names of files and subdirectories that are in both directorypath1 and directorypath2

`left_only`
* The names of files and subdirectories that are in directorypath1 only

`right_only`
* The names of files and subdirectories that are in directorypath2 only

`common_dirs`
* The names of subdirectories that are in both directorypath1 and directorypath2

`common_files`
* The names of files that are in both directorypath1 and directorypath2

`common_funny`
* Names common to both directorypath1 and directorypath2 but that name a file in one and a directory in the other, along with names of files and directories that caused an error when an attempt to read them was made

`same_files`
* The paths to files whose contents are identical in both directorypath1 and directorypath2

`diff_files`
* The paths to files that are in both directorypath1 and directorypath2 but whose contents differ

`funny_files`
* The paths to files that are in both directorypath1 and directorypath2 but could not be compared for some reason

`subdirs`
* A dictionary that maps names in common_dirs to dircmp objects

## `csv` — parsing and writing comma- and tab-separated files

Comma-separated values (CSV) files are a widely used informal data interchange format, especially in conjunction with spreadsheet and data-collection applications. The term actually applies to a variety of text field conventions, including tab-separated.

The exact rules for representing a field’s value are not standardized and can vary in frustrating ways from one application to another. Python’s `csv` module hides this variability from you, allowing you to focus on the real purpose of your program.

We won't cover the `csv` module in detail here, but it's well worth reading its documentation and familiarising yourself with its use.  You can find that documentation here:

<https://docs.python.org/3/library/csv.html>

Briefly, the module contains the functions `csv.reader()` and `csv.writer()`. It also includes facilities for defining other formats and for more complicated reading and writing operations, but we won’t cover those details here. Here are the basics:

`csv.reader(source[, dialect='excel'])`

* Returns an iterator for the lines of source (which can be any iterable that produces strings; if a file, it must already be open).

  >When opening a file to use with `csv.reader()`, it is necessary to provide a keyword argument in the call to `open()` for a parameter that wasn’t included in the description of the function before: `newline=''`. This allows the reader to correctly interpret newline characters inside quoted field values in a platform-independent way.

  Each step of the iterator (either one cycle of a for statement or the result returned by a call to `next()`) produces a *list* of the values in the comma-separated fields of *one line*.

  If dialect is `'excel-tab'`, reads the file as tab-separated values; otherwise, reads it as comma-separated values.

`csv.writer(destination[, dialect='excel'])`

* Returns an instance of a `csvwriter` for a file — or anything else with a write method — that is open for writing.

  If dialect is 'excel-tab', writes tab-separated values; otherwise, writes comma-separated values.
  
The writer provides the following methods (the “row” in their names refers to the widespread use of CSV files as representations of simple spreadsheets, with each list of fields corresponding to a spreadsheet row):

`writerow(fieldlist)`
* Writes the fields of `fieldlist` to the writer’s destination according to the writer’s dialect, followed by `'\n'`

`writerows(rowlist)`
* Does `csvwriter.writerow()` for each row in `rowlist`

### Exercise

In the `files` subdirectory is a CSV file called `numbers.csv`, containing a couple of integers on each line (comma-separated).  Read in this file and simply print out the sum of the numbers on each line.

In [None]:
%cd files

In the `files` subdirectory there's also a CSV file called `numbers2.csv`.  This again contains a row of integers on each line, but this time in a slightly different CSV format.

Does your code also work on `numbers2.csv`?  If not, can you adapt it so it does (whilst still working on `numbers1.csv` as well)?

* [`numbers1.csv`](/edit/files/numbers1.csv)
* [`numbers2.csv`](/edit/files/numbers2.csv)

# Other interesting modules in the standard library

Here is a selectin (again, rather random) of further modules you might find interesting:

**`math`** provides mathematical constants and operations such as power and log functions, trigonometric functions, and much more.

* <https://docs.python.org/3/library/math.html>

Some well-known constants:

In [None]:
import math

math.pi, math.e

**`random`** provides the faclities for introducing randomness into your code, such as:

* generating random numbers
* picking random elements from lists
* randomly sampling populations
* shuffling lists in-place

…and much more.

* <https://docs.python.org/3/library/random.html>

**`difflib`** is a large module providing powerful tools for comparing files (or directories) and finding differences, allowing even more advanced operations than `filecmp`.

* <https://docs.python.org/3/library/difflib.html>

**`sys`** provides you about information about your Python *system*, i.e. your Python environment.

* <https://docs.python.org/3/library/sys.html>

The currently loaded modules:

In [None]:
import sys

sys.modules

The list of directories where the current Python environment looks for modules:

In [None]:
sys.path

The standard input, standard output and standard error streams if you're writing a standalong script:

```python
sys.stdin
sys.stdout
sys.stderr
```

The *argument vector*, i.e. the list of arguments provided to a command-line script:

```python
sys.argv
```

**`time`** provides access to the system clock.

* <https://docs.python.org/3/library/time.html>

The current time:

In [None]:
import time

time.localtime()

Sleep for 5 seconds:

In [None]:
time.sleep(5)

**`fileinput`** provides an alternative way to handle input from a file or multiple files.

>For those familiar with Perl, it provides a facility roughtly equivalent to Perl's "diamond" operator “`<>`”.

* <https://docs.python.org/3/library/fileinput.html>

**`argparse`** is an extensive module providing support for parsing command-line arguments.

If you're writing your own script and it requires a non-trivial command-line interface, you should start by reading the `argparse` documentation:

* <https://docs.python.org/3/library/argparse.html>

**`subprocess`** provides facilities for running external (usually binary) executables from Python, and capturing and using any output they generate.

This is again a large and complex module, and you should read its documentation before attempting to use it:

* <https://docs.python.org/3/library/subprocess.html>

**`tempfile`** helps with the creation of uniquely-named temporary files (and directories!) in the "right" place for every given system:

* <https://docs.python.org/3/library/tempfile.html>

---