# Intermediate Python

Learning Objectives:
- Learn some advanced features of python
- Explain when to use each advanced funtion

---

Topics
- Packages and installation
- Importing modules
- Environments
- Documentation
- Testing
- Logging
- Command line arguments
- Object Oriented Programming
- Generators
- Decorators
- Best Practices

---

## Packages

[PyPI](https://pypi.python.org/pypi) has ~123,358+ package in it!

### Package installation with Anaconda Navigator

You can use the Anaconda Navigator GUI to install packages.

__Guided demo__

### Package installation with Conda

Many packages in conda, but not all

    conda update conda
    conda install biopython

(Note there is also bioconda for install bioinfomatics programs. We will see more in a future seminar.)

### Package installation with pip

To install the latest version of “SomeProject”:

    pip install 'SomeProject'

To install a specific version:

    pip install 'SomeProject==1.4'

To install greater than or equal to one version and less than another:

    pip install 'SomeProject>=1,<2'

    pip install --upgrade SomeProject

NOTE: You can also install directly from a juyter notebook but need to use the `-y` option to respond to the "yes" to install.

#### Installing from VCS

Install a project from VCS in “editable” mode. For a full breakdown of the syntax, see pip’s section on VCS Support.

    pip install -e git+https://git.repo/some_pkg.git#egg=SomeProject          # from git
    pip install -e hg+https://hg.repo/some_pkg.git#egg=SomeProject            # from mercurial
    pip install -e svn+svn://svn.repo/some_pkg/trunk/#egg=SomeProject         # from svn
    pip install -e git+https://git.repo/some_pkg.git@feature#egg=SomeProject  # from a branch


requirements.py

    pip install -r requirements.txt

More information: https://packaging.python.org/installing/

---

## Importing Modules

[Source](https://www.blog.pythonlibrary.org/2016/03/01/python-101-all-about-imports/)


Utilizing modules or packages in python is a two-step process.

1. They must be installed on the computer you want to use them with; this only has to be done once but occasional updates are necessary
1. You must import a module or package into your script once for each script you write.

We will see some advice on importing in the PEP8 guidelines. PyCharm also assists users with organizing imports. 


A regular import, and quite possibly the most popular goes like this:

```python
import sys
```
    
All you need to do is use the word “import” and then specify what module or package you want to actually import. The nice thing about import though is that it can also import multiple package at once:

```python
import os, sys, time
```

While this is a space-saver, it’s goes against the [Python Style Guide’s recommendations](https://www.python.org/dev/peps/pep-0008/#imports) of putting each import on its own line.

Sometimes when you import a module, you want to rename it. Python supports this quite easily:

```python
import sys as system
print(system.platform)
```

This piece of code simply renames our import to “system”. We can call all of the modules methods the same way before, but with the new name. There are also certain submodules that have to be imported using dot notation:

```python
import urllib.error
```

You won’t see these very often, but they’re good to know about.

## Environment

We will use the Anaconda Navigator to create and run a new environment.

Manage envronments in anaconda

How to start a new Jupyter notebook in an environment

__DEMO__

__On Your Own (OYO)__

python2 vs python3 and the six package

http://book.pythontips.com/en/latest/targeting_python_2_3.html

## Documentation

### Docstrings

[Source](http://www.pythonforbeginners.com/basics/python-docstrings)

Python documentation strings (or docstrings) provide a convenient way of associating documentation with Python modules, functions, classes, and methods. 

An object's docsting is defined by including a string constant as the first statement in the object's definition. 

It's specified in source code that is used, like a comment, to document a specific segment of code.

Unlike conventional source code comments the __docstring should describe what the function does, not how__.

All functions should have a docstring. This allows the program to inspect these comments at run time, for instance as an interactive help system, or as metadata. Docstrings can be accessed by the __doc__ attribute on objects.

What should a Docstring look like?
- The doc string line should begin with a capital letter and end with a period. 
- The first line should be a short description.
- Don't write the name of the object. 
- If there are more lines in the documentation string, the second line should be blank, visually separating the summary from the rest of the description. 
- The following lines should be one or more paragraphs describing the object’s
calling conventions, its side effects, etc.

In [None]:
%%writefile my_module.py
def my_function():
    """
    Do nothing, but document it.

    No, really, it doesn't do anything.
    """
    pass

Let's see how this would look like when we print it

In [None]:
from my_module import my_function
my_function.__doc__

In [None]:
my_function?

In [None]:
import my_module

help(my_module)

__OYO__:

You can use [__sphinx__](http://www.sphinx-doc.org/en/stable/) to generate documentation based upon docstrings.

## Testing

In [None]:
%%writefile unnecessary_math.py

"""
Module showing how doctests can be included with source code
Each '>>>' line is run as if in a python shell, and counts as a test.
The next line, if not '>>>' is the expected output of the previous line.
If anything doesn't match exactly (including trailing spaces), the test fails.
"""
 
def multiply(a, b):
    """
    >>> multiply(4, 3)
    12
    >>> multiply('a', 3)
    'aaa'
    """
    return a * b

In [None]:
%%writefile test_um_pytest.py
from unnecessary_math import multiply
 
def test_numbers_3_4():
    assert multiply(3,4) == 12

def test_strings_a_3():
    assert multiply('a',3) == 'aaa' 
 

In [None]:
%%writefile test_um_pytest.py
import unittest
from unnecessary_math import multiply

class TestUM(unittest.TestCase):
    def test_numbers_3_4(self):
        self.assertEqual( multiply(3,4), 12)

Test unnecessary_math with pytest

    To run tests : py.test             test_um_pytest.py
              or : python -m pytest    test_um_pytest.py

    Verobse (-v) : py.test -v          test_um_pytest.py
              or : python -m pytest -v test_um_pytest.py

In [None]:
!py.test test_um_pytest.py

In [None]:
!py.test -v test_um_pytest.py

### Test driven development (TDD)

- Start with writing the most basic test for the beginning of your project. For example: loading a file.
- Run the test and confirm that it fails.
- Write just enough code to pass your test
- Confirm that the test passes
- Write another test for the next small piece of your project or code
- Write just enough to pass the test
- Repeat until your code is done

## Logging

Basic logging example:

In [None]:
import logging

logging.basicConfig(filename='example.log',level=logging.DEBUG)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')

Logging across multiple source code files or modules:

In [None]:
%%writefile mylib.py
import logging

def do_something():
    logging.info('Doing something')
    print("hello")
    

In [None]:
# myapp.py
import logging
import mylib

def main():
    logging.basicConfig(filename='myapp.log', level=logging.INFO)
    logging.info('Started')
    mylib.do_something()
    logging.info('Finished')

if __name__ == '__main__':
    main()

For more: https://awesome-python.com/#logging

## Command line Arguments

    import argparse

    parser = argparse.ArgumentParser()
    parser.parse_args()

In [None]:
#!~/anaconda/bin python

# import modules used here -- sys is a very standard one
import sys, argparse, logging

# Gather our code in a main() function
def main(args, loglevel):
    logging.basicConfig(format="%(levelname)s: %(message)s", level=loglevel)
  
    # TODO Replace this with your actual code.
    print("Hello there.")
    logging.info("You passed an argument.")
    logging.debug("Your Argument: %s" % args.argument)
 
# Standard boilerplate to call the main() function to begin the program.
if __name__ == '__main__':
    parser = argparse.ArgumentParser(description = "Does a thing to some stuff.",
                                     epilog = "As an alternative to the commandline, \
                                     params can be placed in a file, one per line, and 
                                     specified on the commandline like '%(prog)s @params.conf'.",
                                     fromfile_prefix_chars = '@' )
    # TODO Specify your real parameters here.
    parser.add_argument("argument",
                        help = "pass ARG to the program",
                        metavar = "ARG")
    parser.add_argument("-v",
                        "--verbose",
                        help="increase output verbosity",
                        action="store_true")
    args = parser.parse_args()
  
    # Setup logging
    if args.verbose:
        loglevel = logging.DEBUG
    else:
        loglevel = logging.INFO
  
main(args, loglevel)

Python script template with argparse in it: https://gist.github.com/burkesquires/2bab01406597312a2ef0cc74128df89f

You can also check out [python fire](https://github.com/google/python-fire) from Google that generates CLI

---

## Object Oriented Programming

[Source](https://www2.unil.ch/phylo/teaching/python/lecture3.pdf)

Object-oriented programming: a programming paradigm based on the concept of “objects”, which are data structures containing data (or attributes) and methods (or procedures).

### Objects in python

We have already used objects.

n = 12 # n is an object of type integer
s = ’ACAGATC’ # s is an object of type string
l = [12, ’A’, 21121, ’ACCAT’] # ls is an object of type list

These objects

- contain data (the number 12, the string ’ACAGCTC’, . . .)
- can be modified/manipulated

        s.count(’A’) 
        s.lower() 
        l.append(’121’) 
   
    
### Extending data types

Python standard data types

For most simple programs, we can usually survive well with standard Python data types. This includes
- numbers, strings
- tuples, lists, sets, dictionaries

__Defining your own data types__

It might be useful though to create your own data types: objects built to your own specifications and organised in the way convenient to you.
This is done through an object definitions known as classes.
                        
#### Class vs object

Implementation vs instantiation

The class is the definition of a particular kind of object in terms of its component features and how it is constructed or implemented in the code.
The object is a specific instance of the thing which has been made according to the class definition. Everything that exists in Python is an object.


#### Classes vs functions

Functions _do_ specific things, classes _are_ specific things that can also _do_ speficic things.

Classes can have methods, which are functions that are associated with a particular class, and do things associated with the thing that the class is - but if all you want is to do something, a function is all you need.


#### OOP in Python

Common principle of OOP is that a class definition
- makes available certain functionality
-   hides internal information about how a specific class is implemented

This is called encapsulation and information hiding.

Python is however quite permissive and you can access any element of an object if you know how to do that.
                        

### An example

__A sequence object__

We need to store data:

- species name
- sequence in DNA and amino-acid   protein name
- length of the sequences
- percentage of GC
- ...

We need to be able to manipulate the data using methods:
- add/remove nucleotide or amino-acid (and update the other data dependent on it)
- translate DNA to amino-acid and inversely
- print the sequence in various ways
- calculate some characteristics
- ...
                      
          

### Class definition

    class Sequence:
        # some statements
        
        
- common practice to save each class into their specific files. Use then the from Sequence import Sequence

### Inheritance

    class MultipleSeqAl(Sequence):
        # some statements

- inheriting methods from a superclass
- classes can have more than one superclass
                        

### Class functions

Providing object capabilities

- functions are defined within the construction of a class
- defined in the same way as ordinary functions (indented within the class code block)
- accessed from the variable representing the object via ’dot’ syntax 

        name = mySequence.getName()

- getName() knows which Sequence object to use when fetching the name

first argument is special: it is the object called from (self)

In [None]:
class Sequence:
    
    def getName(self):
        return(self.name)
    
    def getCapitalisedName(self): 
        name = self.getName()
        if name:
            return(name.capitalize())
        else:
            return(name)

Remarks on functions

__Order of functions__

- order of functions does not matter
- if function definition appears more than once, the last instead replaces the previous one

        class MultipleSeqAl(Sequence): 
            def getMSA(self):
                # function implementation

            def getSequenceIdentity(self): 
                # function implementation

Using subclasses
- call specific functions as ususal msa.getMultipleSeqAl()
- can also call msa.getName() from Sequence directly because of inheritance
- however, .getMSA() cannot be accessed from an object Sequence

__Object attributes__
- Variables tied to the object
- attributes hold information useful for the object and its functions
- e.g. associate a variable storing sequence name in Sequence objects

__Class attributes__
- specific to a particular object
- defined inside class functions
- use the self keyword to access it

__Object attributes__
- available to all instances of a class
- defined outside all function blocks
- usually used for variable that do not change   accessed directly using the variable name
- bare function names are also class attributes

__Examples of class attributes__

#### Object life cycle

__Birth, life and death__
- creation of object handled in a special function called constructor
- removal is handled by a function called destructor
- Python has automatic garbage collection, usually no need to define a destructor

__Class constructor__
- called whenever the corresponding object is created
- use a special name: __init__
- first argument is the object itself (i.e. self)
- any other arguments you need to create the object
- good idea to introduce a key to uniquely identifies objects of a given class
                        

When to create attributes
- attributes can be created in any class function (or directly on the object)
- convention to create most of them in the constructor either directly or through the call to a function
- set it to None if it cannot be set at object creation 
- constructor are inherited by subclasses
                      

In [None]:
class Sequence:
    
     # A class attribute. It is shared by all instances of this class
    sequence_type = "sequence"

    # Basic initializer, this is called when this class is instantiated.
    # Note that the double leading and trailing underscores denote objects
    # or attributes that are used by python but that live in user-controlled
    # namespaces. Methods(or objects or attributes) like: __init__, __str__,
    # __repr__ etc. are called magic methods (or sometimes called dunder methods)
    # You should not invent such names on your own.
    
    def __init__(self, string):
        
        # Assign the argument (string) to the instance's seq attribute
        self.seq = string

        # Initialize property
        self.length = len(self.seq)

        # Initialize property
        self.source = ""


    # A class method is shared among all instances
    # They are called with the calling class as the first argument
    @classmethod
    def get_seq(cls):
        return cls.seq

    # A property is just like a getter.
    # It turns the method len() into an read-only attribute of the same name.
    @property
    def lenth(self):
        return self._length

    # A property is just like a getter.
    # It turns the method len() into an read-only attribute of the same name.
    @property
    def source(self):
        return self._source

    # This allows the property to be set
    @source.setter
    def source(self, source):
        self._source = source

In [None]:
# Instantiate a class
seq1 = Sequence("atcg")
seq1.source = "NIH"

seq2 = Sequence("ttaggg")
seq2.source = "UTSW" 

# seq1 and seq2 are instances of type Sequence, or in other words: they are Sequence objects

# Call our class method
print(seq1.source)
print(seq1.seq)
print(seq2.source)
print(seq2.seq)

## Multiple Inheritance

In [None]:
# Another class definition
class DNA_Sequence(Sequence):

    sequence_type = "DNA"

    def __init__(self, seq, gc_percentage=0.0):
        self.gc = gc_percentage
        self.adapter = False
        super().__init__(seq)

    # And its own method as well
    def get_gc(self):
        return self.gc

In [None]:
dna = DNA_Sequence('atgc', 0.5)
dna.source = "NIH"

print(dna.get_gc())
print(dna.source)

To take advantage of modularization by file you could place the classes above in their own files,

say, sequence.py and dna_sequence.py

to import functions from other files use the following format

    from "filename-without-extension" import "function-or-class"

    # dna_sequence.py
    from sequence import Sequence

#### Advantages and Disadvantages of Object-Oriented Programming (OOP)
[Source](https://www.saylor.org/site/wp-content/uploads/2013/02/CS101-2.1.2-AdvantagesDisadvantagesOfOOP-FINAL.pdf)

Some of the advantages of object-oriented programming include:
1. Improved software-development productivity: modularity, extensibility, and reusability
2. Improved software maintainability
3. Faster development: Reuse enables faster development
4. Lower cost of development: Reuse of software also lowers the cost of development
5. Higher-quality software: More time for verification 


Disadvantages of object-oriented programming include:
1. Steep learning curve
2. Larger program size: OOP typically involve more lines of code than procedural programs
3. Slower programs
4. Not suitable for all types of problems: There are problems that lend themselves well to functional-programming style, logic-programming style, or procedure-based programming style, and applying object-oriented programming in those situations will not result in efficient programs.  

## Decorators

[Source](http://book.pythontips.com/en/latest/decorators.html)

Decorators are functions which modify the functionality of another function.

Decorators let you execute code before and after a function.

In [None]:
from functools import wraps

def logit(func):
    @wraps(func)
    def with_logging(*args, **kwargs):
        print(func.__name__ + " was called")
        return func(*args, **kwargs)
    return with_logging

@logit
def addition_func(x):
   """Do some math."""
   return x + x

In [None]:
result = addition_func(4)
# Output: addition_func was called

For more info: http://book.pythontips.com/en/latest/decorators.html

## Python Standard Library

    datetime
    - Enables python to work natively with dates and times

    glob
    - Enables python to return a list of files with the given extension.

    itertools
    - Advanced functions operating on lists and iterable objects.

    multiprocessing
    - Enable python to run multiple processes

    os
    - The functions that the OS module provides allows you to interface with the underlying operating system that Python is running on. (Windows, Mac or Linux) (http://www.pythonforbeginners.com/os/python-system-administration) 

    shutil
    - High-level file (shell) operations

    subprocess
    - Enables python to run command line statements; like using "!" in Jupyter notebook

---

# Advanced DNA class

Source: http://www.mularoni.com/python_course/advanced.html

In [None]:
%%writefile DNA_class.py
class DNA:
    
    """Class representing DNA as a string sequence.""" 

    basecomplement = {'a': 't', 'c': 'g', 't': 'a', 'g': 'c'} 

    standard = { 'ttt': 'F', 'tct': 'S', 'tat': 'Y', 'tgt': 'C',
                 'ttc': 'F', 'tcc': 'S', 'tac': 'Y', 'tgc': 'C',
                 'tta': 'L', 'tca': 'S', 'taa': '*', 'tca': '*',
                 'ttg': 'L', 'tcg': 'S', 'tag': '*', 'tcg': 'W',

                 'ctt': 'L', 'cct': 'P', 'cat': 'H', 'cgt': 'R',
                 'ctc': 'L', 'ccc': 'P', 'cac': 'H', 'cgc': 'R',
                 'cta': 'L', 'cca': 'P', 'caa': 'Q', 'cga': 'R',
                 'ctg': 'L', 'ccg': 'P', 'cag': 'Q', 'cgg': 'R',

                 'att': 'I', 'act': 'T', 'aat': 'N', 'agt': 'S',
                 'atc': 'I', 'acc': 'T', 'aac': 'N', 'agc': 'S',
                 'ata': 'I', 'aca': 'T', 'aaa': 'K', 'aga': 'R',
                 'atg': 'M', 'acg': 'T', 'aag': 'K', 'agg': 'R',

                 'gtt': 'V', 'gct': 'A', 'gat': 'D', 'ggt': 'G',
                 'gtc': 'V', 'gcc': 'A', 'gac': 'D', 'ggc': 'G',
                 'gta': 'V', 'gca': 'A', 'gaa': 'E', 'gga': 'G',
                 'gtg': 'V', 'gcg': 'A', 'gag': 'E', 'ggg': 'G'
                 }
                                   
    def __init__(self, s="", name=""): 
        """Create DNA instance initialized to string s.""" 
        self.seq = s.lower()
        self.seq = self.cleandna(self.seq)
        self.len = len(self.seq)
        self.name = name

    def cleandna(self, s):
        """Return dna only composed by letters ['a', 'c', 'g', 't', 'n']."""
        new_sequence = ""
        nucleotides = ['a', 'c', 'g', 't', 'n']
        for c in s:
            if c not in nucleotides:
                continue
            new_sequence += c
        return new_sequence

    def getname(self):
        """Return the name of the sequence."""
        if self.name:
            return self.name
        else:
            return 'unknown'
    
    def getsequence(self):
        """Return the dna sequence."""
        return self.seq

    def setname(self, name):
        """Set the name of the sequence."""
        self.name = name

    def setsequence(self, s):
        """Set the sequence content."""
        self.seq = s.lower()
        self.len = len(self.seq)
     
    def transcribe(self): 
        """Return as rna string.""" 
        return self.seq.replace('t', 'u') 
     
    def reverse(self): 
        """Return dna string in reverse order.""" 
        letters = list(self.seq) 
        letters.reverse() 
        return ''.join(letters) 
     
    def complement(self): 
        """Return the complementary dna string."""
        comp = ''
        letters = list(self.seq) 
        for base in letters:
            comp += self.basecomplement[base]
        return comp 
     
    def reversecomplement(self): 
        """Return the reverse complement of the dna string.""" 
        revcomp = ''
        letters = list(self.seq) 
        letters.reverse() 
        for base in letters:
            revcomp += self.basecomplement[base]
        return revcomp 
     
    def gc_percentage(self): 
        """Return the percentage of dna composed of G+C.""" 
        s = self.seq 
        gc = s.count('g') + s.count('c') 
        return gc * 100.0 / len(s) 
 
    def translate(self, frame=1):
        """ translate a DNA like cDNA sequence to a protein 
            possible frames 1,2,3,-1,-2,-3 
        """
        possibleframe = (1,2,3,-1,-2,-3)
        if frame not in possibleframe:
            frame = 1  # First frame
        if frame < 0 :
            cdna = self.reversecomplement()
            frame = abs(frame) - 1
        else:
            cdna = self.seq
            frame = frame - 1
        code = self.standard
        prot = ""
        i = frame  # Starting frame
        while i <= len(cdna) - 3:  # While there are at least 3 letters
            prot += code.get(cdna[i:i+3], "?")
            i += 3
        return prot

In [None]:
# This is how we load the class 
from DNA_class import DNA 

dir(DNA)
['__doc__', '__init__', '__module__', 'basecomplement', 'complement', 'gc_percentage', 'getname', 'getsequence', 
'reverse', 'reversecomplement', 'setname', 'setsequence', 'standard', 'transcribe', 'translate']

# Now let's use it!
dna1 = DNA('CGACAAGGATTAGTAGTTTAC','mydna1')
dna2 = DNA('gcctgaaattgcgcgc')
dna1.getname()
'mydna1'

dna2.getname()
'unknown'

dna1.getsequence()
'cgacaaggattagtagtttac'

dna2.getsequence()
'gcctgaaattgcgcgc'

dna3 = DNA()
dna3.getsequence()
''

dna3.getname()
'unknown'

dna3.setsequence('gcCVnn % tgacKLtcg')
dna3.setname('dna3')
dna3.getname()
'dna3'

dna3.getsequence()
'gccnntgactcg'


dna = DNA('CGACAAGGATTAGTAGTTTAC', 'mydna')
dna.getsequence()
'cgacaaggattagtagtttac'

dna.transcribe()
'cgacaaggauuaguaguuuac'

dna1.reverse()
'catttgatgattaggaacagc'

dna.complement()
'gctgttcctaatcatcaaatg'

dna.reversecomplement()
'gtaaactactaatccttgtcg'

dna.gc_percentage()
38.095238095238095

dna.translate()    # default frame = 1
'RQGLVVY'
dna.translate(1)   # frame 1
'RQGLVVY'
dna.translate(2)   # frame 2
'DKD**F'
dna.translate(3)   # frame 3
'TRISSL'
dna.translate(-1)  # frame complement 1
'VNY*SLW'
dna.translate(-2)  # frame complement 2
'*TTNPC'
dna.translate(-3)  # frame complement 3
'KLLILV'

---

---

My thanks for Ryan Dale, Brenden Jeffrey, and Philip Macmenamin on advice on what to include in this seminar.

---

# Resources:    

- [Python 3 Reference card](https://dzone.com/refcardz/core-python)
- [Awesome python](http://awesome-python.com)
- https://pythontips.com/2013/09/01/best-python-resources/

- http://book.pythontips.com/en/latest/ (intermediate python)
- http://www.codeconquest.com/blog/the-50-best-websites-to-learn-python/
- http://book.pythontips.com/en/latest/#

Intermediate and advanced Python Resources:
- https://jeffknupp.com/blog/2014/06/18/improve-your-python-python-classes-and-object-oriented-programming/
- https://gist.github.com/kalefranz/94370b293f8c1a693b278f64381faf07
- http://www.codeconquest.com/blog/the-50-best-websites-to-learn-python/
- http://intermediatepythonista.com
- http://www.shahmoradi.org/ECL2017S/lecture/11-python-advanced-decorator-class
- http://www.davekuhlman.org/python_201.html
- https://uoftcoders.github.io/studyGroup/lessons/python/intermediate/lesson/
- https://www.quora.com/What-are-the-advanced-topics-in-python
- https://stackabuse.com/object-oriented-programming-in-python/
- https://jeffknupp.com/blog/2018/10/11/write-better-python-functions/
- https://www.tutorialspoint.com/python/python_classes_objects.htm