# An introduction to solving biological problems with Python - Additional Reading


## Tuples

- Can contain any number of items
- Can contain different types of items
- __Cannot__ be altered once created (they are immutable)
- Items have a defined order

A tuple is created by using round brackets around the items it contains, with commas seperating the individual elements.

In [3]:
a = (123, 54, 92) # tuple of 4 integers
b = () # empty tuple
c = ("Ala",) # tuple of a single string (note the trailing ",")
d = (2, 3, False, "Arg", None) # a tuple of mixed types

print(a)
print(b)
print(c)
print(d)

(123, 54, 92)
()
('Ala',)
(2, 3, False, 'Arg', None)


You can of course use variables in tuples and other data structures

In [5]:
x = 1.2
y = -0.3
z = 0.9
t = (x, y, z)

print(t)

(1.2, -0.3, 0.9)


Tuples can be _packed_ and _unpacked_ with a convenient syntax. The number of variables used to unpack the tuple must match the number of elements in the tuple.

In [None]:
t = 2, 3, 4 # tuple packing
print('t is', t)
x, y, z = t # tuple unpacking
print('x is', x)
print('y is', y)
print('z is', z)

Unlike lists, you cannot alter tuples in place.

In [None]:
Tuples cannot be altered once they have been created, if you try to do so, you'll get an error.

In [None]:
t = (123, 54, 92, 87, 33)
print(t)
t[1] = 4

In [None]:
You can convert between tuples and lists with the <tt>tuple</tt> and <tt>list</tt> functions. Note that these create a new collection with the same items, and leave the original unaffected.

In [None]:
a = (1, 4, 9, 16)     # A tuple of numbers
b = ['G','C','A','T'] # A list of characters

print(a)
print(b)

l = list(a)   # Make a list based on a tuple 
print(l)

t = tuple(b)  # Make a tuple based on a list
print(t)

## Sets

- Sets contain unique elements, i.e. no repeats are allowed
- The elements in a set do not have an order
- Sets cannot contain elements which can be internally modified (e.g. lists and dictionaries)

In [None]:
l = [1, 2, 3, 2, 3] # list of 5 values
s = set(l) # set of 3 unique values
print(s)
e = set() # empty set
print(e)

In [None]:
Sets are very similar to lists and tuples and you can use many of the same operators and functions, except they are **inherently unordered**, so they don't have an index, and can only contain _unique_ values, so adding a value already in the set will have no effect

In [None]:
s = set([1, 2, 3, 2, 3])
print(s)
print("number in set:", len(s))
s.add(4)
print(s)
s.add(3)
print(s)

In [None]:
You can remove specific elements from the set.

In [None]:
s = set([1, 2, 3, 2, 3])
print(s)
s.remove(3)
print(s)

In [None]:
You can do all the expected logical operations on sets, such as taking the union or intersection of 2 sets with the <tt>|</tt> _or_ and <tt>&</tt> _and_ operators 

In [None]:
s1 = set([2, 4, 6, 8, 10])
s2 = set([4, 5, 6, 7])

print("Union:", s1 | s2)
print("Intersection:", s1 & s2)

## Exercise 1.2.3

1. Given the protein sequence "MPISEPTFFEIF", split the sequence into its component amino acid codes and use a set to establish the unique amino acids in the protein and print out the result.

## Functions

- [Function definition syntax](#Function-definition-syntax)
- [Exercises 2.1.1](#Exercises-2.1.1)
- [Return value](#Return-value)
- [Exercises 2.1.2](#Exercises-2.1.2)
- [Function arguments](#Function-arguments)
- [Exercises 2.1.3](#Exercises-2.1.3)
- [Variable scope](#Variable-scope)

## Function basics

We have already seen a number of functions built into python that let us do useful things to strings, collections and numbers etc. For example `print()` or `len()` which is passed some kind of sequence object and returns the length of the sequence.

This is the general form of a function; it takes some input _arguments_ and returns some output based on the supplied arguments.

The arguments to a function, if any, are supplied in parentheses and the result of the function _call_ is the result of evaluating the function.


In [None]:
x = abs(-3.0)
print(x)

l = len("ACGGTGTCAA")
print(l)

As well as using python's built in functions, you can write your own. Functions are a nice way to **encapsulate some code that you want to reuse** elsewhere in your program, rather than repeating the same bit of code multiple times. They also provide a way to name some coherent block of code and allow you to structure a complex program.

## Function definition syntax

Functions are defined in Python using the `def` keyword followed by the name of the function. If your function takes some arguments (input data) then you can name these in parentheses after the function name. If your function does not take any arguments you still need some empty parentheses. Here we define a simple function named `sayHello` that prints a line of text to the screen:

In [None]:
def sayHello():
    print('Hello world!')

Note that the code block for the function (just a single print line in this case) is indented relative to the `def`. The above definition just decalares the function in an abstract way and nothing will be printed when the definition is made. To actually use a function you need to invoke it (call it) by using its name and a pair of round parentheses:

In [None]:
sayHello() # Call the function to print 'Hello world'

If required, a function may be written so it accepts input. Here we specify a variable called `name` in the brackets of the function definition and this variable is then used by the function. Although the input variable is referred to inside the function the variable does not represent any particular value. It only takes a value if the function is actually used in context.

In [None]:
def sayHello(name):
    print('Hello', name)

When we call (invoke) this function we specify a specific value for the input. Here we pass in the value `User`, so the name variable takes that value and uses it to print a message, as defined in the function. 

In [None]:
sayHello('User')  # Prints 'Hello User'

When we call the function again with a different input value we naturally get a different message. Here we also illustrate that the input value can also be passed-in as a variable (text in this case).

In [None]:
text = 'Mary'
sayHello(text)     # Prints 'Hello Mary'

A function may also generate output that is passed back or returned to the program at the point at which the function was called. For example here we define a function to do a simple calculation of the square of input (`x`) to create an output (`y`):

In [None]:
def square(x):
  y = x*x
  return y

Once the `return` statement is reached the operation of the function will end, and anything on the return line will be passed back as output. Here we call the function on an input number and catch the output value as result. Notice how the names of the variables used inside the function definition are separate from any variable names we may choose to use when calling the function.
  

In [None]:
number = 7
result = square(number) # Call the square() function which returns a result
print(result)           # Prints: 49

The function `square` can be used from now on anywhere in your program as many times as required on any (numeric) input values we like.

In [None]:
print(square(1.2e-3))   # Prints: 1.4399999999999998e-06

A function can accept multiple input values, otherwise known as arguments. These are separated by commas inside the brackets of the function definition. Here we define a function that takes two arguments and performs a calculation on both, before sending back the result.


In [None]:
def calcFunc(x, y):
  z = x*x + y*y
  return z


result = calcFunc(1.414, 2.0)
print(result)  #  5.999396

Note that this function does not check that x and y are valid forms of input. For the function to work properly we assume they are numbers. Depending on how this function is going to be used, appropriate checks could be added.

Functions can be arbitrarily long and can peform very complex operations. However, to make a function reusable, it is often better to assign it a single responsibility and a descriptive name.
Let's define now a function to calculate the [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) between two vectors:

In [None]:
def calcDistance(vec1, vec2):    
    dist = 0
    for i in range(len(vec1)):
        delta = vec1[i] - vec2[i]
        dist += delta*delta
    dist = dist**(1/2) # square-root
    return dist

For the record, the [prefered way to calcule a square-root](https://docs.python.org/3/library/math.html#math.sqrt) is by using the built-in function `sqrt()` from the `math` library:
```python
import math
math.sqrt(x)
```

Let's experiment a little with our function.

In [None]:
w1 = ( 23.1, 17.8, -5.6 )
w2 = ( 8.4, 15.9, 7.7 )
calcDistance( w1, w2 )

Note that the function is general and handles any two vectors (irrespective of their representation) as long as their dimensions are compatible:

In [None]:
calcDistance( ( 1, 2 ), ( 3, 4 ) ) # dimension: 2

In [None]:
calcDistance( [ 1, 2 ], [ 3, 4 ] ) # vectors represented as lists

In [None]:
calcDistance( ( 1, 2 ), [ 3, 4 ] ) # mixed representation

## Exercises 2.1.1

- a. Calculate the mean
    - Write a function that takes 2 numerical arguments and returns their mean. Test your function on some examples.
    - Write another function that takes a list of numbers and returns the mean of all the numbers in the list.
- b. Write a function that takes a single DNA sequence as an argument and estimates the molecular weight of this sequence. Test your function using some example sequences. The following table gives the weight of each (single-stranded) nucleotide in g/mol:

<table>
    <tr><th>DNA Residue</th><th>Weight</th></tr>
    <tr><td>A</td><td>331</td></tr>
    <tr><td>C</td><td>307</td></tr>
    <tr><td>G</td><td>347</td></tr>
    <tr><td>T</td><td>306</td></tr>
</table>


- c. If the sequence passed contains base `N`, use the mean weight of the other bases as the weight of base `N`.

## Return value

There can be more than one `return` statement in a function, although typically there is only one, at the bottom. Consider the following function to get some text to say whether a number is positive or negative. It has three return statements: the first two return statements pass back text strings but the last, which would be reached if the input value were zero, has no explicit return value and thus passes back the Python `None` object. Any function code after this final return is ignored. 
The `return` keyword immediately exits the function, and no more of the code in that function will be run once the function has returned (as program flow will be returned to the call site)

In [None]:
def getSign(value):
    
    if value > 0:
        return "Positive"
    
    elif value < 0:
        return "Negative"
    
    return # implicit 'None'

    print("Hello world") # execution does not reach this line
    
print("getSign( 33.6 ):", getSign( 33.6 ))
print("getSign( -7 ):", getSign( -7 ))
print("getSign( 0 ):", getSign( 0 ))

All of the examples of functions so far have returned only single values, however it is possible to pass back more than one value via the `return` statement. In the following example we define a function that takes two arguments and passes back three values. The return values are really passed back inside a single tuple, which can be caught as a single collection of values. 

In [None]:
def myFunction(value1, value2):
    
    total = value1 + value2
    difference = value1 - value2
    product = value1 * value2
    
    return total, difference, product

values = myFunction( 3, 7 )  # Grab output as a whole tuple
print("Results as a tuple:", values)

x, y, z = myFunction( 3, 7 ) # Unpack tuple to grab individual values
print("x:", x)
print("y:", y)
print("z:", z)

## Exercises 2.1.2

a. Write a function that counts the number of each base found in a DNA sequence. Return the result as a tuple of 4 numbers representing the counts of each base `A`, `C`, `G` and `T`.

b. Write a function to return the reverse-complement of a nucleotide sequence.

## Function arguments

### Mandatory arguments

The arguments we have passed to functions so far have all been _mandatory_, if we do not supply them or if supply the wrong number of arguments python will throw an error also called an exception:

In [None]:
def square(number):
    # one mandatory argument
    y = number*number
    return y

In [None]:
square(2)

**Mandatory arguments are assumed to come in the same order as the arguments in the function definition**, but you can also opt to specify the arguments using the argument names as _keywords_, supplying the values corresponding to each keyword with a `=` sign.

In [None]:
square(number=3)

In [None]:
def repeat(seq, n):
    # two mandatory arguments
    result = ''
    for i in range(0,n):
        result += seq
    return result

print(repeat("CTA", 3))
print(repeat(n=4, seq="GTT"))

<div class="alert-warning">**NOTE** Unnamed (positional) arguments must come before named arguments, even if they look to be in the right order.</div>

In [None]:
print(repeat(seq="CTA", n=3))

### Arguments with default values
Sometimes it is useful to give some arguments a default value that the caller can override, but which will be used if the caller does not supply a value for this argument. We can do this by assigning some value to the named argument with the `=` operator in the function definition.

In [None]:
def runSimulation(nsteps=1000):
    print("Running simulation for", nsteps, "steps")

runSimulation(500)
runSimulation()

<div class="alert-warning">**CAVEAT**: default arguments are defined once and keep their state between calls. This can be a problem for *mutable* objects:</div>

In [None]:
def myFunction(parameters=[]):
    parameters.append( 100 )
    print(parameters)
    
myFunction()
myFunction()
myFunction()
myFunction([])
myFunction([])
myFunction([])

... or avoid modifying *mutable* default arguments.

In [None]:
def myFunction(parameters):
    # one mandatory argument without default value
    parameters.append( 100 )
    print(parameters)
    
my_list = []
myFunction(my_list)
myFunction(my_list)
myFunction(my_list)
my_new_list = []
myFunction(my_new_list)

### Position of mandatory arguments
Arrange function arguments so that *mandatory* arguments come first:

In [None]:
def runSimulation(initialTemperature, nsteps=1000):
    # one mandatory argument followed by one with default value
    print("Running simulation starting at", initialTemperature, "K and doing", nsteps, "steps")
    
runSimulation(300, 500)
runSimulation(300)

As before, no positional argument can appear after a keyword argument, and all required arguments must still be provided.

In [None]:
runSimulation( nsteps=100, initialTemperature=300 )

In [None]:
runSimulation( initialTemperature=300 )

In [None]:
runSimulation( nsteps=100 ) # Error: missing required argument 'initialTemperature'

In [None]:
runSimulation( nsteps=100, 300 ) # Error: positional argument follows keyword argument

Keyword names must naturally match to those declared:

In [None]:
runSimulation( initialTemperature=300, numSteps=100 ) # Error: unexpected keyword argument 'numSteps'

Function cannot be defined with mandatory arguments after default ones.

In [None]:
def badFunction(nsteps=1000, initialTemperature):
    pass

## Exercises 2.1.3

Extend your solution to the previous exercise estimating the weight of a DNA sequence so that it can also calculate the weight of an RNA sequence, use an optional argument to specify the molecule type, but default to DNA. The weights of RNA residues are:

<table>
    <tr><th>RNA Residue</th><th>Weight</th></tr>
    <tr><td>A</td><td>347</td></tr>
    <tr><td>C</td><td>323</td></tr>
    <tr><td>G</td><td>363</td></tr>
    <tr><td>U</td><td>324</td></tr>
</table>


## Variable scope

Every variable in python has a _scope_ in which it is defined. Variables defined at the outermost level are known as _globals_ (although typically only for the current module). In contrast, variables defined within a function are local, and cannot be accessed from the outside.

In [None]:
def mathFunction(x, y):
    math_func_result = ( x + y ) * ( x - y )
    return math_func_result

In [None]:
answer = mathFunction( 4, 7 )
print(answer)

In [None]:
answer = mathFunction( 4, 7 )
print(math_func_result)

Generally, variables defined in an outer scope are also visible in functions, but you should be careful manipulating them as this can lead to confusing code and python will actually raise an error if you try to change the value of a global variable inside a function. Instead it is a good idea to avoid using global variables and, for example, to pass any necessary variables as parameters to your functions.

In [None]:
counter = 1
def increment(): 
    print(counter)
    counter += 1

increment()
print(counter)

If you really want to do this, there is a way round this using the `global` statement. Any variable which is changed or created inside of a function is local, if it hasn't been declared as a global variable. To tell Python that we want to use the global variable, we have to explicitly state this by using the keyword `global`.

In [None]:
counter = 1
def increment(): 
    global counter
    print(counter)
    counter += 1

increment()
print(counter)

<div class="alert-warning">**NOTE** It is normally better to avoid global variables and passing through arguments to functions instead.</div>

In [None]:
def increment(counter): 
    return counter + 1

counter = 0
counter = increment( counter ) 
print(counter)

A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the first time the module name is encountered in an import statement. 
They are also run if the file is executed as a script.

Do comment out these executable statements if you do not wish to have them executed when importing your module.

For more information about modules, https://docs.python.org/3/tutorial/modules.html.

## Exercises and Modules
​
- [Exercises 2.2.1](#Exercises-2.1.1)
- [Exercises 2.2.2](#Exercises-2.2.2)
- [Exercises 2.2.3](#Exercises-2.2.3)
- [Modules](#Modules)
- [Exercises 2.2.4](#Exercises-2.2.4)

## Exercises 2.2.1
​
### Translate DNA sequence into protein sequence
​
Write a function that translates a DNA sequence into a protein, a sequence of amino acids. The function should take 2 arguments, a DNA sequence and a dictionary that defines the standard genetic code.
​
For mapping RNA codons to amino acids you can use the dictionary `standardGeneticCode` defined below. Notice that it only maps strings in upper case, so make sure that `codon` is in upper case before your look up. You can translate codon into an upper case with the `upper()` method on String. Notice also that it maps RNA codons and not DNA ones.
​
First, loop over the sequence to extract every three basees until the end or until a stop codon either by using a `for` loop or a `while` one. 
​
Then convert the DNA into an RNA sequence, by replacing all T bases by U. Make sure that the codon corresponds to an amino accid. Convert the RNA codon into an amino acid using the dictionary provided and return the protein sequence as a list of amino acids.

In [None]:
standardGeneticCode = { 
          'UUU':'Phe', 'UUC':'Phe', 'UCU':'Ser', 'UCC':'Ser',
          'UAU':'Tyr', 'UAC':'Tyr', 'UGU':'Cys', 'UGC':'Cys',
          'UUA':'Leu', 'UCA':'Ser', 'UAA': None, 'UGA': None,
          'UUG':'Leu', 'UCG':'Ser', 'UAG': None, 'UGG':'Trp',
          'CUU':'Leu', 'CUC':'Leu', 'CCU':'Pro', 'CCC':'Pro',
          'CAU':'His', 'CAC':'His', 'CGU':'Arg', 'CGC':'Arg',
          'CUA':'Leu', 'CUG':'Leu', 'CCA':'Pro', 'CCG':'Pro',
          'CAA':'Gln', 'CAG':'Gln', 'CGA':'Arg', 'CGG':'Arg',
          'AUU':'Ile', 'AUC':'Ile', 'ACU':'Thr', 'ACC':'Thr',
          'AAU':'Asn', 'AAC':'Asn', 'AGU':'Ser', 'AGC':'Ser',
          'AUA':'Ile', 'ACA':'Thr', 'AAA':'Lys', 'AGA':'Arg',
          'AUG':'Met', 'ACG':'Thr', 'AAG':'Lys', 'AGG':'Arg',
          'GUU':'Val', 'GUC':'Val', 'GCU':'Ala', 'GCC':'Ala',
          'GAU':'Asp', 'GAC':'Asp', 'GGU':'Gly', 'GGC':'Gly',
          'GUA':'Val', 'GUG':'Val', 'GCA':'Ala', 'GCG':'Ala', 
          'GAA':'Glu', 'GAG':'Glu', 'GGA':'Gly', 'GGG':'Gly'}

In [None]:
Exercises 2.2.2
Calculate the GC content of a DNA sequence
Write a function that calculates the GC content of a DNA sequence by re-using the code written for the Exercises 1.4.2 yesterday.


Exercises 2.2.3
Extract the list of all overlaping sub-sequences
Write a function that extracts a list of overlapping sub-sequences for a given window size from a given sequence. Do not forget to test it on a given DNA sequence.

## Modules

So far we have been writing Python code in files as executable scripts without knowning that they are also modules from which we are able to call the different functions defined in them.

A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended. Create a file called `my_first_module.py` in the current directory with the following contents:

In [None]:
def say_hello(user):
    print('hello', user, '!')

Now enter the Python interpreter from the directory you've created `my_first_module.py` file and import the `say_hello` function from this module with the following command:

```bash
python3
Python 3.5.2 (default, Jun 30 2016, 18:10:25) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from my_first_module import say_hello
>>> say_hello('Anne')
hello Anne !
>>>
```

There is one module already stored in the course directory called `my_first_module.py`, if you wish to import it into this notebook, below is what you need to do. If you wish to edit this file and change the code or add another function, you will have to restart the notebook to have these changes taken into account using the restart the kernel button in the menu bar.

In [None]:
from my_first_module import say_hello
say_hello('Anne')

A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the first time the module name is encountered in an import statement. 
They are also run if the file is executed as a script.

Do comment out these executable statements if you do not wish to have them executed when importing your module.

For more information about modules, https://docs.python.org/3/tutorial/modules.html.

## Exercises 2.2.4
### Calculate GC content along the DNA sequence
Combine the two methods written above to calculates the GC content of each overlapping sliding window along a DNA sequence from start to end. 

From the two files you wrote, import the methods written at exercices 2.2.2 and 2.2.3.
The new function should take two arguments, the DNA sequence and the size of the sliding window, and re-use the previous methods written to calculate the GC content of a DNA sequence and to extract the list of all overlapping sub-sequences. It returns a list of GC% along the DNA sequence.

## BioPython

- [Working with sequences](#Working-with-sequences)
- [Connecting with biological databases](#Connecting-with-biological-databases)
- [Exercises 2.4.1](#Exercises-2.4.1)

## Using third party library, BioPython

Biopython tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html

The goal of Biopython is to make it as easy as possible to use Python for bioinformatics by creating high-quality, reusable modules and classes. Biopython features include parsers for various Bioinformatics file formats (BLAST, Clustalw, FASTA, Genbank,...), access to online services (NCBI, Expasy,...), interfaces to common and not-so-common programs (Clustalw, DSSP, MSMS...), a standard sequence class, various clustering modules, a KD tree data structure etc. and even documentation.

## Working with sequences

We can create a sequence by defining a `Seq` object with strings. `Bio.Seq()` takes as input a string and converts in into a Seq object. We can print the sequences, individual residues, lengths and use other functions to get summary statistics.  

In [None]:
# Creating sequence
from Bio.Seq import Seq
my_seq = Seq("AGTACACTGGT")
print(my_seq)
print(my_seq[10])
print(my_seq[1:5])
print(len(my_seq))
print(my_seq.count( "A" ))

We can use functions from `Bio.SeqUtils` to get idea about a sequence 

In [None]:
# Calculate the molecular weight
from Bio.SeqUtils import GC, molecular_weight
print(GC( my_seq ))
print(molecular_weight( my_seq ))

One letter code protein sequences can be converted into three letter codes using `seq3` utility 

In [None]:
from Bio.SeqUtils import seq3
print(seq3( my_seq ))

Alphabets defines how the strings are going to be treated as sequence object. `Bio.Alphabet` module defines the available alphabets for Biopython. `Bio.Alphabet.IUPAC` provides basic definition for DNA, RNA and proteins. 

In [None]:
from Bio.Alphabet import IUPAC
my_dna = Seq("AGTACATGACTGGTTTAG", IUPAC.unambiguous_dna)
print(my_dna)
print(my_dna.alphabet)

In [None]:
my_dna.complement()

In [None]:
my_dna.reverse_complement()

In [None]:
my_dna.translate()

### Parsing sequence file format: FASTA files

Sequence files can be parsed and read the same way we read other files. 

In [None]:
with open( "data/glpa.fa" ) as fileObj:
    print(fileObj.read())

Biopython provides specific functions to allow parsing/reading sequence files. 

In [None]:
# Reading FASTA files
from Bio import SeqIO

fileObj = open("data/glpa.fa")

for protein in SeqIO.parse(fileObj, 'fasta'):
    print(protein.id)
    print(protein.seq)

Sequence objects can be written into files using file handles with the function `SeqIO.write()`. We need to provide the name of the output sequence file and the sequence file format. 

In [None]:
# Writing FASTA files
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC

sequence = 'MYGKIIFVLLLSEIVSISASSTTGVAMHTSTSSSVTKSYISSQTNDTHKRDTYAATPRAHEVSEISVRTVYPPEEETGERVQLAHHFSEPEITLIIFG'

fileObj = open( "mySeqFile.fa", "w")
  
seqObj = Seq(sequence, IUPAC.protein)
proteinObjs = [SeqRecord(seqObj, id="MYID", description='my description'),]

SeqIO.write(proteinObjs, fileObj,  'fasta')

fileObj.close()

with open( "biopython.fa" ) as fileObj:
    print(fileObj.read())

## Connecting with biological databases

Sequences can be searched and downloaded from public databases. 

In [None]:
# Read FASTA file from NCBI GenBank
from Bio import Entrez

Entrez.email = 'A.N.Other@example.com'
socketObj = Entrez.efetch(db="protein", rettype="fasta", id="71066805")
dnaObj = SeqIO.read(socketObj, "fasta")
socketObj.close()

print(dnaObj.description)
print(dnaObj.seq)

In [None]:
# Read SWISSPROT record
from Bio import ExPASy

socketObj = ExPASy.get_sprot_raw('HBB_HUMAN')
proteinObj = SeqIO.read(socketObj, "swiss")
socketObj.close()

print(proteinObj.description)
print(proteinObj.seq)

## Exercises 2.4.1

- Retrieve a FASTA file named `data/sample.fa` and answer the following questions:
  - How many sequences are in the file?
  - What are the IDs and the lengths of the longest and the shortest sequences?
  - Create a new object that contains only sequences with length longer than 500bp. What is the average length of these sequences?
  - Calculate and print the percentage of GC contents in each of the sequences.
  - Write the newly created sequence object into a FASTA file named `sample.long.fa` 