## Installing modules

Up to now, every Python library we've needed to use, such as Pandas, Numpy, and Matplotlib, came included with our Anaconda Python distribution.  However, there are many useful Python libraries that are not included in Anaconda Python by default, or are not directly accessible via the Anaconda package manager. 

In class we'll review three ways to install modules:

### Installing modules using the Anaconda GUI

1. Run the Anaconda Navigator program
2. Select "Environments" tab on the left
3. Select the environment you want to install a packge into -- "base" by default
4. Select "All" in package pull down menu in right pane 
5. Search for the package of interest -- e.g. "biopython"
6. Click checkbox next to packages you wish to install and then select the "Apply" button

### Installing modules using the `conda` command line tool

1. Search for the package of interest -- `conda search biopython`
2. Install the package of interest -- `conda install biopython`

### Installing modules using pip

1. Search for packages of interest on [PyPI](https://pypi.org/) or via the command line, e.g. `pip search gff`
2. Install via `pip install` command -- e.g. `pip install gffpandas`



## Introducing Biopython

Biopython is a library that contains a wide variety of functions and classes for working with bioinformatics data of various kinds.  Nucleotide and protein sequence information is particularly well supported, but Biopython has tools for a wide variety of tasks, such as running automated data base searches over the internet, working with 3D structural data,  running population genetic simulations, etc.  Today we'll focus primarily on working with sequence data and associated metadata.

In [None]:
import Bio  # base library, this is a check to see if we installed it correctly

### How do I start to learn a new library?

1. Find the documentation and look for a tutorial
2. Read, test, and extend code examples illustrating how the library works
3. Learning how to effectively use API documentation
4. Learn how to query Python objects in an interactive session
5. Read the source code

We'll illustrate all of these steps today as we start to get acquainted with Biopython

## CLASS TODO
1. Find the Biopython home page
2. Find the link to the Biopython documentation
3. Go the the API (application programmers interface) documentation

### Creating Seq objects

In [None]:
from Bio.Seq import Seq

In [None]:
s1 = Seq("ATGCGCGATGA")

In [None]:
s1

In [None]:
s1[0]  # indexing similar to strings

In [None]:
s1[0:6]  # slicing similar to strings

### Python tools for introspection -- type, dir

In [None]:
type(s1) # Seq objects are string like, but are not strings

In [None]:
dir(s1)  # when applied to an object dir gives all the attributes associated with an object

In [None]:
[i for i in dir(s1) if not i.startswith("_")]  # get the attributes, hiding the "dunders"

In [None]:
# we can even wrap this up in a function
def object_attributes(o):
    return [i for i in dir(o) if not i.startswith("_")]

In [None]:
object_attributes(s1)

In [None]:
# We can even write a couple of functions to automatically query methods and attributes

import types

def methods_of(o):
    methods = [i for i in dir(o) if (not i.startswith("_")) and (type(getattr(o,i)) == types.MethodType)]
    return methods

def attributes_of(o):
    attribs = [i for i in dir(o) if (not i.startswith("_")) and (type(getattr(o,i)) != types.MethodType)]
    return attribs

In [None]:
methods_of(s1)

In [None]:
attributes_of(s1)

## CLASS TODO

1. Find the Bio.Seq page in API docs
2. Skim the documentation for the non-dunder methods to get a sense of what sort of built-in functionality Seq objects have

### Examples of methods on Bio.Seq objects

In [None]:
s1.complement()

In [None]:
s1.reverse_complement()

In [None]:
s1.transcribe()

In [None]:
s1.translate()

### Parsing sequence records from a FASTA file

## CLASS TODO

1. Read the Bio.SeqIO.arse docsand short examples

In [None]:
# use a for loop to iterate over fasta records in a file
for rec in SeqIO.parse("../data/covid-S-and-E.fsa", format="fasta"):
    print(rec.name)

In [None]:
# use a list comprehension to get all the fasta records out of a file and store them in a list
recs = [rec for rec in SeqIO.parse("../data/covid-S-and-E.fsa","fasta")]

In [None]:
recs

In [None]:
len(recs)

In [None]:
recs[0]

In [None]:
type(recs[0])

## CLASS TODO

1. Find the SeqRecord page in API docs
2. What are the non-method attributes associated with SeqRecords?
3. What are the methods associated with SeqRecords

In [None]:
recs[0].seq

In [None]:
recs[0].name

In [None]:
recs[0].description

## Parsing records from a Genbank file

In [None]:
filename = "../data/NC_045512.gb"
covidrecs = [rec for rec in SeqIO.parse(filename, format="genbank")]

In [None]:
covidrecs

In [None]:
len(covidrecs)

In [None]:
covidref = covidrecs[0]

In [None]:
covidref.name, covidref.description

In [None]:
len(covidref.features)

In [None]:
covidref.features[0]

In [None]:
type(covidref.features[0])

## CLASS TODO

1. Read the SeqFeature docs
2. What are the non-method attributes associated with SeqRecords?
3. What are the methods associated with SeqRecords

In [None]:
covidrec.features[0].location.start, covidrec.features[0].location.end

In [None]:
covidrec.features[0].qualifiers

In [None]:
covidrec.features[1].qualifiers

In [None]:
covidrec.features[2].qualifiers

In [None]:
covidrec.features[2].location

In [None]:
covidrec.features[40].qualifiers

In [None]:
genefeatures = [ftr for ftr in covidrec.features if ftr.type == "gene"]

In [None]:
len(genefeatures)

In [None]:
ftrsbyname = dict((ftr.qualifiers["gene"][0], ftr)  for ftr in genefeatures)

In [None]:
type(ftrsbyname)

In [None]:
ftrsbyname.keys()

In [None]:
ftrsbyname["S"].location

In [None]:
ftrsbyname["S"].extract(covidre)