<div align=right>
<img src="img/logosmall.png" width="100px" align=right>
</div>

In this Notebook we'll look briefly at three unrelated topics that didn't quite fit into any of the previous sections.  That doesn't mean they're unimportant, though!

# String formatting

To date we've seen how to build up a string manually by concatenating substrings with `+`, and by using `str` to convert numbers to a string representation:

In [None]:
dna = "ACTTACATGACCCAA"
print ('The length of the sequence "' + dna + '" is ' + str(len(dna)) + ".")

We've also been using features of the `print` funcion to format strings in a slightly more readable format:

In [None]:
print("The length of the sequence",
      dna.join('""'),
      "is",
      len(dna),
      end=".\n")

Even this can often lead to convoluted and hard-to-read code.  Also, we don't necessary always want to *print* a string we've just formatted.  We might want to store it, write it to a file, or handle it in a further computation.

The string type has a method `format()` which makes examples like that above simpler and more straight-forward:

In [None]:
'The length of the sequence "{}" is {}.'.format(dna, len(dna))

As you can see, our string now contains a number of placeholders called *replacement fields*, each consisting of a pair of curly braces (`{}`).  When we call the string's `format` method with arguments, each argument in turn is inserted in place of one of the replacement fields.  Note that `format()` automatically takes care of converting objects to a printable textual representation (just like `print()`).

If you put a single integer `n` between the replacement field's braces, it refers to the `n`th argument of the `format()` method (using a zero-based count):

In [None]:
'The length of the sequence "{1}" is {0}.'.format(len(dna), dna)

You can also use `format()` with keyword arguments, in which case your replacement fields can be *named*, which can help to make the whole statement even more readable (and means you don't have to worry about accidentally swapping arguments since order becomes irrelevant):

In [None]:
message = 'The length of the sequence "{seq}" is {seq_length}.'
message.format(seq=dna, seq_length=len(dna))

This is still just the tip of the iceberg.  There exists a whole formatting "mini-language" that can be used inside the braces of replacment fields to specify how the field should be formatted.  Here are just *some* examples of what's possible:

Specifiy precision when printing floating point numbers:

In [None]:
from math import pi

print("{:.2f}".format(pi))
print("{:.10f}".format(pi))

Note in the above example that we use the `from ... import` syntax to import just a single item — in this case, the constant `pi` — from the standar library `math` module *into the current namespace*.  (Hence, we can reference it merely as `pi` and not `math.pi`.)

Text alignment:

In [None]:
print("{:<30}".format('left aligned').join(['>>>', '<<<']))
print("{:>30}".format('right aligned').join(['>>>', '<<<']))
print("{:*^30}".format('centered').join(['>>>', '<<<'])) # '* as fill char

Base conversion of numbers:

In [None]:
print("int: {0:d};  hex: {0:x};  oct: {0:o};  bin: {0:b}".format(42))
# With prefix (0x, 0o or 0b):
print("int: {0:d};  hex: {0:#x};  oct: {0:#o};  bin: {0:#b}".format(42))

Adding a thousands separator:

In [None]:
"{:,}".format(123456789)

Even converting a floating point number into a percentage:

In [None]:
"{:.2%}".format(19 / 22)

And there's (a lot) more to it.  I would encourage you to go look at the “Format Specificatio Mini-Langauge” section in the official Python documentation:

* http://docs.python.org/3/library/string.html#formatspec

# Sets

In addition to lists, dictionaries and tuples, Python has one more basic and useful built-in data structure — the *set*.  A set is an **unordered** collection of **unique** elements.

Because Python's set it implemented behind the scenes as a hash table, set-wise operations are very efficient.  However, this means that the elements of a set — like the keys of a dictionary — have to be *hashable*.

A set is delimited by curly braces, just like a dictionary.  You're unlikely to confuse the two, since sets do not have paired elements:

In [None]:
s = {'ACT', 'CCA', 'GGT', 'TAG'}

In [None]:
print(s)

In [None]:
type(s)

A set can be created from a list (or other iterable) like so:

In [None]:
s = set(["ACT", "TAG", "GGT", "CCA"])
print(s)

Non-unique elements get "collapsed" implicitly when you create a set from a list (or other iterable) with non-unique values:

In [None]:
nucs = set("ACTTACGACTTACG")
print(nucs)

We can use this to check whether a list (or other container) has only unique elements:

In [None]:
def is_unique(ls):
    ls = list(ls)
    return len(set(ls)) == len(ls)

l1 = [1, 2, 3, 4, 5]
l2 = [1, 2, 3, 3, 5]

print(is_unique(l1))
print(is_unique(l2))

Since `{}` denotes an empty dictionary, the only way to create an empty set is by using the `set` function:

In [None]:
empty = set()
print(empty)

>Note:  You can also use the functions `list()` and `dict()` without arguments to create an empty list and dict, respectively, though it's easier just to write `[]` or `{}`.

Since sets are *unordered* (like dictionaries), numerical indices are nonsensical:

In [None]:
s[2]

Sets are mutable — we can add elements to a set with its `add` method:

In [None]:
s.add('CAT')
print(s)

Note what happens if we try to add a duplicate element:

In [None]:
s.add('CAT')
print(s)

Python doesn't complain, but the elements of the set remain unique.

Sets support `len` and `in`, and we can iterate over sets:

In [None]:
len(s)

In [None]:
"CAT" in s

In [None]:
for element in s:
    print(element)

Python defines a range of set operators which can be used with operands of type `set`:

| operator | meaning |
|---|---|
| `<=` | "is a subset of" |
| `<` | "is a proper subset of" |
| `>=` | "is a superset of" |
| `>` | "is a proper superset of" |
| <tt>&#124;</tt> | union |
| `&` | intersection |
| `-` | set difference |
| `^` | symmetric difference |

In [None]:
s1 = {'GGT', 'TAG', 'ACT'}
s2 = {'ACT', 'CCA', 'CAT'}

In [None]:
s1 & s2

In [None]:
s1 | s2

In [None]:
s1 > s2

In [None]:
s1 - s2

Additionally, the set type has some methods that duplicate some of the functionality of set operators.  However, these methods can take any iterable (not just a set) as argument.  Here is a list of set methods;  it's not too hard to deduce what they do:

| Method | Use case |
| -|-|
| `isdisjoint(other)` | `True` if no elements in common with *other* |
| `issubset(other)` | `True` if every element also in *other* |
| `issuperset(other)` | `True` if every element of *other* also in set |
| `union(other)` | Return new set with all elements of set and *other* |
| `intersection(other)` | Return new set with elements in common between set and *other* |
| `difference(other)` | Return new set with elements of set not in *other* |

If you need more help, have a look at the official documentation:

<https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset>



In [None]:
s1.union(s2)

In [None]:
s1.union(['ACT', 'TAG'])

# Exceptions

We've seen (many times!) by now that when a Python program encounters an error, it "bombs out" with a *stack trace*.  This isn't very hard to provoke:

In [None]:
5 / 0

We say that Python *raises an exception*.  (Python is a very polite language;  many other languages will *throw* an exception.)

One's initial instinct is to hate stack traces;  after all, if you see one it means that *something went wrong*.

Resist this instinct!  Stack traces are your friends!  The longer and the more explicit, the better, since an explicit stack trace helps you to find out quickly and easily what went wrong. When you program, things *will* go wrong, and there's nothing worse than a silent, uncommunicative error!

It's possible to write our own exceptions (though we won't cover that in this course), and in fact many Python modules define their own extensions just as they define their own classes and types.

We can manually raise an exception — both built-in and user-defined — using the `raise` keyword:

In [None]:
raise RuntimeError("Something bad happened")

We can also *handle* exceptions, using Python's `try ... except` syntax.  In some languages, exception handling is something you do only rarely, but in Python it's an everyday part of programming.

Remember what happens when we try to open a filename that doesn't exist for reading?

In [None]:
fh = open("nonexistent.txt", 'r')

The `FileNotFoundError` exception is raised.

We can avoid this error by first checking whether `nonexistent.txt` exists.  The `os.path` module in the standard library (which contains loads of utility functions for working with the filesystem) has a function `os.exists` that we can use for this purpose:

In [None]:
import os

if os.path.exists("nonexistent.txt"):
    fh = open("nonexistent.txt", 'r')
else:
    fh = None
    
print(fh)

This is cumbersome and error-prone (since we have to write the filename twice).

It is the more common Python idiom to just *let the `open` fail*, and handle the exception it raises:

In [None]:
try:
    fh = open("nonexistent.txt", 'r')
except FileNotFoundError:
    fh = None
    
print(fh)

In short, it works like this:

The block of code under the `try` statement is executed.  If an exception is raised during this execution, Python checks if there's an associated `except` statement that handles this exception.  (`except` without any named excepsions will handle any and all exceptions!)  If there is, the block under that `except` statement is executed.

Here's another example:  Here's a dictionary `counts` giving counts of trinucleotides in a sequence.  Remember what happens when we try to reference a key that doesn't exist in the dictionary?

In [None]:
counts = {'CGC': 1, 'ACG': 1, 'CGA': 1, 'CGT': 1, 'TAC': 1,
          'ATC': 2, 'TGA': 2, 'CTG': 1, 'GTA': 1, 'ATG': 1,
          'AAT': 1, 'GAT': 2, 'TCG': 2, 'GCT': 1}

taa_count = counts['TAA']

We could test for the existece of a key first, of course:

In [None]:
if 'TAA' in counts:
    taa_count = counts['TAA']
else:
    taa_count = 0
    
print(taa_count)

But it's more idiomatic to just *let the assignement fail*, and handle the exception:

In [None]:
try:
    taa_count = counts['TAA']
except KeyError:
    taa_count = 0
    
print(taa_count)

>In this instance, we could also have used the `get()` method of the dictionary object.  The choice is a matter of individual preference.

Python programmers refer to these two idioms as “LBYL” and “EAFP”:

* LBYL (Look Before You Leap) style implies doing extensive tests before you attempt any computation to ensure that the computation will succeed.


* EAFP (it's Easier to Ask Forgiveness than Permission) style means you just go ahead and perform the computation, *letting it fail* if something goes wrong, and handling whatever errors arise.

EAFP style can lead to more readable code since you just express your computation "normally" and let it fail if it wants to.  It can also lead to fewer unintentional errors, due to the fact that you have to type fewer fiddly variable names (etc.) in your tests.  EAFP is definitely the preferred Python idiom.

There's a lot more to be said about exceptions and exception handling in Python, but this is enough to get started on!

---