# BTM2018 - Advanced Python

This sessions aims to cover some more advanced areas of Python that will help you write more complicated scripts and programs. This worksheet contains information about the topics, example code and some exercises.  We will cover:
* List comprehensions
* Error Handling
* Classes
* REST web access (for instance to Ensembl)
* Scientific libraries

The largest part of the session will be devoted to classes and basic Object Oriented Programming in python, which is one of the most common methods for implementing modern software (in Python and beyond). We will start by briefly covering a few other important topics, which will be useful in the classes section, and finish with some other topics that might be generally useful.

In [None]:
import math
import sys
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## List Comprehensions

List comprehensions are a simple and efficient way to build up containers without an explicit loop. For example, if we wanted a list of square numbers a simple way to do it would be:

In [None]:
squares = []
for x in range(10):
    squares.append(x**2)

squares

The same thing can be done very simply in a single line using a list comprehension:

In [None]:
[x**2 for x in range(10)]

The general format is `[_expression_ for _var_ in _iterator_]`, where the expression depends on `var` and `iterator` is any object that can be looped over. This expression can be 

__Exercise__: Write a list copmprehension to generate a list of square roots:

They can also be extended to include a conditional expression to filter which variables that are used to build the list. For example if you wanted a list of cubes of odd numbers:

In [None]:
[x**3 for x in range(10) if x % 2 == 1]

This gives the same result as writing:

In [None]:
cubes = []
for x in range(10):
    if x % 2 == 1:
        cubes.append(x**3)

cubes

__Exercise__: Write a list copmprehension to determine all multiples of 7 under 100

Similar syntax can be used to create sets and dictionaries. A set is an object similar to a list, but it is unordered and contains only one copy of each unique element. They are created using `{x1, x2, ...}` or the `set()` function and support additional functions to perform various set opperations. We won't cover sets in detail, see [here](https://docs.python.org/3/library/stdtypes.html#set) for more information on all data structures.


In [None]:
# A set (Note uniqueness of elements)
set([1,2,3,1,2,4,6])

# Equivalent
{1,2,3,1,2,4,6}

# Be careful - {} creates and empty dictionary, use set() for an empty set
type({})
type(set())

Set comprehensions are similar to list comprehensions but use {...}:

In [None]:
{(-1)**x for x in range(10)}

Similarly dictionary comprehensions also use curly brackets but with key:value pairs:

In [None]:
{x: x**2 for x in range(10)}

Often you require a key that is different to the value, in which case the comprehension becomes more complicated. The simplest way to do this is if you iterate over a list containing other containers, which lets you asign multiple iterating valriables similarly to iterating over a dictionary: 

In [None]:
{key:value for key, value in [('foo', 1), ('bar', 2), ('baz', 3)]}

This sort of list can be constructed using the _zip_ function, which groups the 1st, 2nd, 3rd, ... items from each of the lists you provide:

In [None]:
keys = ['foo', 'bar', 'baz']
values = [1, 2, 3]

{k: v for k,v in zip(keys, values)}

A similar of construct can also be used for list comprehensions over multiple variables:

In [None]:
height = [4,7,8,4]
width = [7,6,2,8]
depth = [12,6,3,4]

volume = [h * w * d for h,w,d in zip(height, width, depth)]
volume

__Exerise__: Create a list of sequences error rates and then turn it into a dictionary with the names as keys. The expected values are:

{'read1': 0.16666666666666666,
 'read2': 0.12962962962962962,
 'read3': 0.0625,
 'read4': 0.5416666666666666}

In [None]:
seq_names = ['read1', 'read2', 'read3', 'read4']
seqs = ['acgtacgtacgatcgatcgattggctatgc',
        'cagctagctagctattcttcggataacacacatttacactatcggactacgatc',
        'acattcttctcttttttttttgcgagctatcgatcgatatcgagtcga',
        'gatcgatcgatcgtgacgtctagc']
seq_error_count = [5, 7, 3, 13]


Finally, you might intuitively expect that using round brackets would create a tuple comprehension, however when we try it we find it doesn't:

In [None]:
(math.factorial(x) for x in range(10000))

What has actually been created is a generator, which is a special type of iterator that only yields the next value when it is required, making them much more efficient. They are another very useful feature that we don't have time to cover, you can find out more at https://docs.python.org/3/howto/functional.html#generators (the rest of page is useful too). For a simple example of how powerful they can be think carefully about the computation you just did and how long it took - doing the same in a list comprehension is unlikely to be a good idea!

It is possible to create a tuple using a comprehension but requires calling the _tuple_ constructor function on another comprehension:

In [None]:
tuple([x**2 for x in range(10)])
tuple(x**3 for x in range(10))



## Error Handling

When you start to write longer scripts and programs it is important to be able to deal with unexpected events and recover from potential errors. This can be achieved through `try ... except` blocks, which allow you to write code that runs if an error occurs:

In [None]:
x = []
for i in (1,5,3,7,9,0,3,5):
    try:
        x.append(1/i)
    except:
        x.append('NA')
x

The except clause fired when attempting to calculate `1/0` raised a `ZeroDivisionError`, which would otherwise have stopped the script. You will have probably encountered various other exceptions during the course, for example a `NameError` if you used a variable that doesn't exist or a `TypeError` if you attempted to do something a type doesn't support (e.g. adding a number to a string). There are a number of builtin exceptions that handle common errors during code execution, as well as `SyntaxError`, which alerts you to incorrectly written code.

The `try .. except` system can be used to catch specific errors and deal with different errors separately using `except ErrorClass`. These are used in order, and a bare `except` matches any error so that should always be last.

In [None]:
for i in (0 , 'str', True):
    try:
        1/i
    except ZeroDivisionError:
        print("Don't divide by 0")
    except TypeError:
        print("You can't divide by {}".format(i))

In the final case we've encountered Pythons ability to convert types on the fly - it doesn't necessarily make sense to divide by `True` but its easy to convert boolean values to a number. This makes a lot of things simpler but occasionally leads to unintended results.

__Exercise__: Add `try .. except` blocks to this code to handle possible errors.

In [None]:
print('Divide the previous value by the current one')
while True:
    current_input = input('Enter a number (or q to quit): ')
    if current_input == 'q':
        break
    
    current_input = float(current_input)
    print(last_input/current_input)
    last_input = current_input

An `else` block can be attached to `try ... catch`, which is executed after the `try` block if no exception occurs:

In [None]:
number = input('Type an integer:')
try:
    number = int(number)
except Exception as err:
    print('Bad input: {}'.format(number))
else:
    print(number + 1)

Finally the error object can be passed to `except` blocks using the syntax `except ErrorClass as var`, which allows you to do things like output the error message or raise the error again after some clean up code:

In [None]:
for i in (1,2,'3',4):
    try:
        print(i ^ 2)
    except TypeError as e:
        print(e)

It is also possible to `raise` exceptions yourself, for instance if a function doesn't recieve the correct input:

In [None]:
def chi_squared(observed, probs):
    if not len(observed) == len(probs):
        raise ValueError('The number of classes of observed results and expected probabilities must be the same')
    elif not sum(probs) - 1 < 0.0001:
        raise ValueError('Probabilities must sum to approximately 1')
        
    n = len(observed)
    total = sum(observed)
    
    expected = [probs[i] * total for i in range(n)]
    chi = sum([(observed[i] - expected[i])/expected[i] for i in range(n)])

This allows you to catch errors that wouldn't cause an exception but would mean incorrect output (for instance the probabilities not summing to 1), or to catch errors early before they cause an exception and add additional information to the error message (the `probs` and `observed` lists not being the same length).

The `assert condition, 'optional message'` statement allows you to do a similar thing on a single line:

In [None]:
for i in (1, 2.0, complex(0,-1),'4'):
    assert isinstance(i, (int, float, complex)), 'input is {} not numeric'.format(type(i))
    print('{}^2 = {}'.format(i, i**2))

__Exercise__: Write a function to calculate the n<sup>th</sup> [triangle number](https://en.wikipedia.org/wiki/Triangular_number), checking for incorrect input (e.g. not an integer) using assert or raise, then add `try ... catch` blocks to the loop so that it prints either the result or the error message for each input.   

In [None]:
def triangle(n):
    pass

for i in (1,2,3,'a string', 4.0, 5.3, True, [1]):
    print(triangle(i))

## Classes

Object Oriented Programming (sometimes abbreviated OOP) is a style of programming based on the manipulation of objects, which represent real life things (e.g. a protein) or conceptual units (e.g. a type of dataset) and have both data and methods to process it associated with them. It aims to encourage programs that relate conceptually to real life and are easy to understand. Classes allow you define your own objects, which you can then create instances of in your program. The best way to learn is through examples, so we will build up some simple classes to represent biological sequences.

We will start off with an RNA sequence. There are fewer exercises in this section, but at each stage you can experiment with the topic using a new code block in this notebook, a fresh notebook or the python interpreter.

The basic syntax to define a class is very simple:

In [None]:
class RNA:
    """An RNA sequence"""
    pass

We have now created a class called `RNA`, with a docstring giving a brief description of it (this is what appears when you use the help function). However, the class doesn't really do anything yet, so lets add sequence data to it. We can do this by adding a method to the class, which has the same syntax as declaring a function. `__Methods__` with two leading and trailing underscores are special, with predefined roles (for more information on the use of underscores in names see [here](https://hackernoon.com/understanding-the-underscore-of-python-309d1a029edc)). The `__init__` method is called when the class is created and defines the variables used to initialise the class:

In [None]:
class RNA:
    """An RNA sequence"""
    def __init__(self, seq):
        # Make sequence lower case
        seq = str(seq).lower()

        # Set the sequence
        self.seq = seq

RNA('actgcthg')

We can now create an RNA instance using the syntax `ClassName(...)`, which uses the `__init__` method. In our case this assigns the given sequence to the instance variable `seq`, which is then associated with the `RNA` object we've created.
You will have noticed the `self` variable passed to the `__init__` method is not given when we called `RNA(seq)`. This is because the first variable in a class method, named `self` by convention, is implicitally assigned as the instance of the class, allowing you to modify instance variables and call methods on the instance. In our `__init__` method we used this to assign the sequence: `self.seq = seq`. Classes can also have [class methods and static methods](https://julien.danjou.info/guide-python-static-class-abstract-methods/) that have the class itself (rather than an instance) or no implicit first argument and have access to different bits of data. They are defined using [decorators](https://hackernoon.com/decorators-in-python-8fd0dce93c08). Both of these are more advanced topics than we will cover here.

Classes have two kinds of variables; class variables and instance variables. Both types can be accessed using `class.var` syntax. We have just encountered and instanace variable, `seq`, which is a bit of data associated with a specific instance of the class:

In [None]:
rna1 = RNA('aaa')
rna2 = RNA('ccc')

rna1.seq
rna2.seq

Class variables on the other hand are shared by all members of the class and are defined outside class methods, without using the `self.x` syntax. Lets add an `alphabet` variable to store the allowed bases for the class and use it to check the input:

In [None]:
class NucleotideError(Exception):
    """Error for a sequence containing a character outside the allowed alphabet"""
    pass

class RNA:
    """An RNA sequence"""
    alphabet = ('a', 'c', 'g', 'u')
    
    def __init__(self, seq):
        # Make sequence lower case
        seq = str(seq).lower()

        for base in seq:
            if not base in self.alphabet:
                raise NucleotideError('{} is not in the alphabet {}'.format(base, self.alphabet))
                
        # Set the sequence
        self.seq = seq

Now if we try to create an RNA molecule with a bad sequence an error is thrown:

In [None]:
RNA('acugacuHBHacugac')

As you can see we `raise` a new error type, `NucleotideError`, if the sequence is invalid. You can define your own errors using the class system, inheriting from the `Exception` base class or one of its derivatives (we will cover inheritance in more depth shortly). An important point to remember is that an exception will match any of its parent  exceptions as well in a `try ... except` block, so you must check for more specific exceptions first.
__Exercise__: Alter this code so the `SpecialTypeError` is caught correctly.

In [None]:
class SpecialTypeError(TypeError):
    pass

try:
    raise SpecialTypeError

except TypeError:
    print('Vague Error Message')
    
except SpecialTypeError:
    print('Useful Debuging Message')

except:
    print('Nondescript Error')

Looking back to general classes, there are other useful special methods including:
* `__str__` - called when you convert a function to a string (including when printing it)
* Similar functions `__int__`, `__float__` etc.
* `__repr__` - The 'cannonical' string representation, which should return the object if evaluated
* `__len__` - used by the builtin `len()` function to return an objects length

Alongside [many more](http://www.diveintopython3.net/special-method-names.html). We will implement the three listed ones:

In [None]:
class RNA:
    """An RNA sequence"""
    alphabet = ('a', 'c', 'g', 'u')

    def __init__(self, seq):
        # Make sequence lower case
        seq = str(seq).lower()


        # Check all bases are part of the sequence alphabet
        for base in seq:
            if not base in self.alphabet:
                raise NucleotideError('{} is not in the alphabet {}'.format(base, self.alphabet))

        # Set the sequence
        self.seq = seq

    def __str__(self):
        return "5'-{}-3'".format(self.seq)

    def __repr__(self):
        return '{}({})'.format(type(self).__name__, self.seq)

    def __len__(self):
        return len(self.seq)

In [None]:
rna = RNA('aucgaucguac')
rna
print(rna, len(rna), sep='\n')

We now have a simple container for RNA sequences, but it doesn't really do anything yet beyond slightly fancy printing. For that we can add a normal method. In this case we'll add a method to calculate the molecular weight of the molecular, also adding another class variable to store the base weights. This completes the RNA class:

#### RNA Class

In [None]:
class RNA:
    """An RNA sequence"""
    alphabet = ('a', 'c', 'g', 'u')

    # Molecular weights from thermofisher
    base_weights = {'a':329.2, 'c':305.2, 'g':345.2, 'u':306.2}

    def __init__(self, seq):
        # Make sequence lower case
        seq = str(seq).lower()


        # Check all bases are part of the sequence alphabet
        for base in seq:
            if not base in self.alphabet:
                raise NucleotideError('{} is not in the alphabet {}'.format(base, self.alphabet))

        # Set the sequence
        self.seq = seq

    def __str__(self):
        return "5'-{}-3'".format(self.seq)

    def __repr__(self):
        return '{}({})'.format(type(self).__name__, self.seq)

    def __len__(self):
        return len(self.seq)

    def molecular_weight(self):
        """Calculate the (approximate) molecular weight of the molecule, excluding potential end phosphate"""
        weight = 0
        for base in self.seq:
            weight += self.base_weights[base]

        return weight

__Potential Exercises__:
* Add another method (gc content? simple alignment?)
* Add an \_\_iter\_\_ method to allow looping through the sequence

Since these may take some time and research don't spend too long on them, you can always come back later if you find it interesting.

### Inheritance
Inheritance is a very useful feature of classes that allows you to easily define new variations on an existing class. We've already seen a basic example in the error that we defined, which inherited from the `Exception` class. The basic syntax is:

```
class child_class(parent_class):
    pass
```
A class inherits all the methods and class variables of it's parent class unless you expliciatally overwrite them.

Let's use our `RNA` class to define a class for ssDNA; the only real change we'll need to make is to the alphabet and molecular weights.

__Bonus Exercise__: Change the `RNA` class such that we don't even need to change the alphabet (hint: we list the possbile bases somewhere else)

#### ssDNA Class

In [None]:
class ssDNA(RNA):
    """A ssDNA sequence"""
    alphabet = ('a', 'c', 'g', 't')

    # Molecular weights from thermofisher
    base_weights = {'a':313.2, 'c':289.2, 'g':329.2, 't':304.2}

    def __init__(self, seq):
        super().__init__(seq)

This also demonstrates the `super()` function, which returns the parent class, allowing you to access it's methods. We use it to call the `RNA.__init__()` method, but note that it still uses the DNA alphabet not the RNA one becuase we used `self.alphabet` in the RNA method, which points to the DNA alphabet in our new class.

ssDNA is very similar to RNA, and was consequently fairly trivial to implement. Lets see what changes we need to make dsDNA. Initially this seems simple, the alphabet and weights are the same as ssDNA so we can trivially inherit from that:

In [None]:
class dsDNA(ssDNA):
    """A dsDNA sequence"""
    def __init__(self, seq):
        super().__init__(seq)

print(dsDNA('actcagcgat'))

That doesn't really seem a good representation of dsDNA though, it might be nice to see the reverse strand too? This can be done by overwriting the `__str__` function. Similarly the molecular weight will only account for one strand, so that will need to be changed too.
__Exercise__: Modify the template below to do this

In [None]:
class dsDNA(ssDNA):
    """A dsDNA sequence"""
    def __init__(self, seq):
        super().__init__(seq)

Here's one possible solution, using another class variable to store watson-crick pairing (which is constant between molecules) and calculating the complement as we initialise instances, because it is useful in both functions:

#### dsDNA Class

In [None]:
class dsDNA(ssDNA):
    """A dsDNA sequence"""
    base_complements = {'a':'t', 'c':'g', 'g':'c', 't':'a'}
    def __init__(self, seq):
        super().__init__(seq)
        self.complement = ''.join([self.base_complements[base] for base in self.seq])

    def __str__(self):
        foreward = super().__str__()
        reverse = "3'-{}-5'".format(self.complement)
        return '{}\n   {}   \n{}'.format(foreward, '|'*len(self), reverse)

    def molecular_weight(self):
        foreward_strand = super().molecular_weight()
        reverse_strand = 0
        for base in self.complement:
            reverse_strand += self.base_weights[base]

        return foreward_strand + reverse_strand
    
print(dsDNA('actcagcgat'))
print(dsDNA('actcagcgat').molecular_weight())

This is possibly an overly elaborate way of printing dsDNA, depending on what your program is for, but it shows the utility of overwriting parent methods. Both `__str__` and `molecular_weight` again make use of `super()` to call the equivalent `ssDNA` method as a starting point to build on; often a useful way to avoid writing the same function twice.

We have now built up three classes to represent nucleotide sequences, which could easily be extended to perform various other functions. __Exercise__: How might this system be extended to include a protein class? Could it inherit from any of the exisiting 3 or would a new `sequence` base class be needed? Try implementing this.

## Database Access

It is possible to use python to programmatically interface with many types of database, including many biological ones. There are various packages that are designed to help with this, for instance the [sqlite3](https://docs.python.org/3/library/sqlite3.html) package for SQL databases, or you can write your own API. When working with online databases, such as Ensembl, the `requests` package is invaluable, allowing you to send and process HTML requests. The code below is a simple example I have used to interface with the [Ensembl RESTful web API](http://rest.ensembl.org/), which demonstrates both the `requests` library and a real use of classes.

In [None]:
#!/usr/bin/env python3
# Commented Module Docstring
#"""
#Module providing a REST client class and associated functions, aimed primarily\
#at accessing the Ensembl REST API. It is based on the example client found at\
#https://github.com/Ensembl/ensembl-rest/wiki/Example-Python-Client.
#"""
import time
import requests


class RestClient:
    """
    REST API client to perform rest requests to the indicated server.\
    Rate limiting is performed by waiting 'pause' seconds between requests.
    """

    def __init__(self, server, pause):
        self.server = server
        self.pause = pause
        self.last_time = time.time()

    def rate_limit(self):
        """
        Check the time since last request and limit request rate if needed
        """
        delta_t = time.time() - self.last_time
        if delta_t < self.pause:
            time.sleep(self.pause - delta_t)

    def rest(self, request, header=None, data=None):
        """Make a REST get request to the server"""
        if header is None:
            header = {'Content-Type': 'text/plain'}

        # Perform rate limiting
        self.rate_limit()

        # Make request
        if data is None:
            response = requests.get(self.server + request,
                                    headers=header)

        else:
            response = requests.post(self.server + request,
                                     headers=header,
                                     data=data)

        self.last_time = time.time()

        # Cheack response
        if not response.ok:
            if 'Retry-After' in response.headers:
                time.sleep(float(response.headers['Retry-After'] + 2))
                self.rest(request, header, data)
            else:
                response.raise_for_status()

        return response

I have only used it for simple task and so constructed the request strings manually, but it would be possible to add additional methods to perform common Ensembl lookup tasks. Here's an example of my usage, collecting information about genes using their IDs:

In [None]:
ids = ['ENSG00000139618', 'ENSG00000141510', 'ENSG00000234861']

rest = RestClient('http://rest.ensembl.org', 1)

data = '{ "ids" : ["' + '", "'.join(set(ids)) + '"] }'
lookup = rest.rest("/lookup/id",
                   header={"Content-Type":"application/json",
                               "Accept" : "application/json"},
                   data=data)

lookup = lookup.json()
result = pd.DataFrame.from_dict(lookup)
result

Similar setups, along with some error checking, can be used to perform a range of other lookups using the [Ensembl RESTful guide](http://rest.ensembl.org/) to determine how to construst the different queries. __Exercise__: Implement a method for the ID lookup I demonstrated.

Lets look in more detail at the different methods and how they work together:

```
def __init__(self, server, pause):
        self.server = server
        self.pause = pause
        self.last_time = time.time()
```
Only one special method is implemented, `__init__`, since I have not needed any other functionality.  This method is simple, setting the given `server` and time to wait between requests (`pause`), which should be set high enough to prevent making too many requests in a short time and being timed out by the server (0.1 has worked fine for me). Finally the `last_time` variable, which will store the time of the last request, is set to the current time.

__Exercise__: Add a `__str__` and/or `__repr__` special method

```
def rate_limit(self):
    """
    Check the time since last request and limit request rate if needed
    """
    delta_t = time.time() - self.last_time
    if delta_t < self.pause:
        time.sleep(self.pause - delta_t)
```
The next method is also fairly strightforward, checking the difference between the current time and the time of the last request and pausing the program if not enough time has elapsed. It would be possible to do more complex rate limiting, using the rate information returned by the server (see the [Ensembl example client](https://github.com/Ensembl/ensembl-rest/wiki/Example-Python-Client)). For a small number of queries and as a simple example pausing between queries is sufficient.

The next method, `rest`, defines the main request action. We'll look at it in smaller sections. First we check that a header has been provided and substitute a reasonable default if not. We would want to perform some error checking if we were making a more robust client. We then perform the rate limiting action.

```
def rest(self, request, header=None, data=None):
    """Make a REST get request to the server"""
    if header is None:
        header = {'Content-Type': 'text/plain'}

    # Perform rate limiting
    self.rate_limit()
```
Next we make the actual request to the server, using the `requests` package. We decide between a `get` or `post` request based on the presence of additional data to be sent. Again this could be extended using `try ... except` blocks to deal with server errors, potentially moving the rate limiting action to be in response to these.

```
    # Make request
    if data is None:
        response = requests.get(self.server + request,
                                headers=header)

    else:
        response = requests.post(self.server + request,
                                 headers=header,
                                 data=data)
```
We then record the time of the request to perform the next `rate_limit`:

```
    self.last_time = time.time()
```
And finally we check response to see if the request was successful. If we recieve a rate limit request we respond by waiting the required time (plus two seconds to prevent occasional errors I encountered) and then retry the request, Other errors are raised and passed to the user to deal with. Again more sophisticated error checking could be implemented here.

```
    # Cheack response
    if not response.ok:
        if 'Retry-After' in response.headers:
            time.sleep(float(response.headers['Retry-After'] + 2))
            self.rest(request, header, data)
        else:
            response.raise_for_status()

    return response
```

This gives a simple example of a class and database client that can be used for real research work, albeit one that you could be made much more robust and feature rich if you wanted to use it heavily. __Bonus Exercise for the Enthusiastic__: try adding some error checking features

# Scientific Libraries

We have seen a few ways to write scientific programs and scripts in these sessions. However, in many cases someone else will have already written and extensively tested code to do the thing you're trying to do. In these cases it is generally best to use the established library, which will have been widely tested and is less likely to contain bugs (as well as usually being easier and quicker for you). We have already encountered a couple of examples; for instance the `pandas` library for data frames and the `requests` library for HTML. There are packages to read and process most common biological files (e.g. [pyvcf](https://pyvcf.readthedocs.io/en/latest/) for VCF files). It is always worth checking to see if someone else has already done your work for you!

In addition to this there is the [Biopython](https://biopython.org/) collection of packages that covers a range of biological functions, including reading and manipulating sequences, phylogenetic trees and population genetics. Indeed it includes a ready made class that is a more advanced version of the sequences classes we made earlier, with various additional useful methods:

In [None]:
from Bio.Seq import Seq
from Bio.Alphabet import generic_dna, generic_protein
my_dna = Seq("AGTACATCTATCATGACTGGT", generic_dna)
my_protein = Seq("AGTDCALTWGT", generic_protein)
my_dna
my_protein

my_dna.count('ATG')
my_dna.complement()
my_dna.translate()

This is just a small taster of the wide array of packages available that might be useful to you; it is definietly useful to research the most common packages used in your area if you find yourself using python frequently. Some generally useful starting points are:
* The builtin [Logging](https://docs.python.org/3/library/logging.html) module, which helps debug programs in a more organised way
* The [SciPy](https://www.scipy.org/) stack (Numpy, Scipy, ...) - covers advanced numerical and statistical operations
* The [SciKit](https://www.scipy.org/scikits.html) collections for various other scientific computing stuff, including machine learning
* [Matplotlib](https://matplotlib.org/) for basic plotting
  * [Seaborn](https://seaborn.pydata.org/) for pretty plotting
  * [Plotly](https://plot.ly/python/) for interactive plots
* [Biopython](https://biopython.org/)
* [Jupyter](http://jupyter.org/) to make notebooks (like this one) to present your work