# Python 101 – TailorDev Commons

This notebook is part of the [Introduction to programming with Python](https://commons.tailordev.fr/software-carpentry/lessons/introduction-to-programming-with-python.html) lesson developed by [TailorDev](https://tailordev.fr).

The first part on introducing the languages and the notion of variables is usually done in the Python shell.

## Operations on variables

### Exercise 1.1

Compute the area of a circle of 10 cm diameter using two variables `Pi` and `D` (the diameter)

In [102]:
D = 10
pi = 3.14
area = pi * (D/2) ** 2
print(area)

78.5


### Exercise 1.2

Generate a 100 nucleotides long poly-A sequence without typing the `A` key multiple times

In [103]:
polyA = 'A' * 100
print(polyA)

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA


## Working with lists

In [105]:
# create a list named `peptide` with three items
peptide = ['PRO', 'GLY', 'ALA']

In [106]:
# list indexes are 0-based (first = 0, last = nb items - 1)
first_aa = peptide[0]

In [107]:
peptide[2]

'ALA'

In [108]:
# create a second list
second_peptide = ['ARG', 'ASN', 'ASP']

In [109]:
# the sum of two (or more) lists is a new list with all items of all lists
peptide + second_peptide

['PRO', 'GLY', 'ALA', 'ARG', 'ASN', 'ASP']

In [110]:
# list contains a function (also known as method) `append()` to add
# a new item into an existing list
peptide.append('ARG')

In [111]:
# in Jupyter, we can print a variable directly. In a Python shell,
# or in a Python script, you have to use `print(peptide)`.
peptide

['PRO', 'GLY', 'ALA', 'ARG']

In [122]:
peptide = ['PRO', 'GLY', 'ALA']
peptide

['PRO', 'GLY', 'ALA']

In [123]:
peptide.append('ARG')

In [124]:
peptide

['PRO', 'GLY', 'ALA', 'ARG']

In [125]:
# `pop()` is another function that can be called on a list. Its
# purpose is to remove the last (inserted) element of the list.
# This function modifies the list.
peptide.pop()

'ARG'

In [126]:
peptide

['PRO', 'GLY', 'ALA']

In [127]:
# If you want to remove an element at a specific index,
# it is possible, but you cannot pass multiple indexes:
peptide.pop([0, 1])

TypeError: 'list' object cannot be interpreted as an integer

In [128]:
# You can only pass one index to the `pop()` function:
peptide.pop(2)

'ALA'

In [129]:
peptide

['PRO', 'GLY']

In [130]:
# You can get information about a function by calling `help()`
help(peptide.pop)

Help on built-in function pop:

pop(...) method of builtins.list instance
    L.pop([index]) -> item -- remove and return item at index (default last).
    Raises IndexError if list is empty or index is out of range.



In [131]:
# This works on variables too
help(peptide)

Help on list object:

class list(object)
 |  list() -> new empty list
 |  list(iterable) -> new list initialized from iterable's items
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /

In [132]:
peptide

['PRO', 'GLY']

In [133]:
peptide[1]

'GLY'

In [134]:
peptide

['PRO', 'GLY']

In [136]:
# It is not possible to "pop" an element that does not exist
peptide.pop(2)

IndexError: pop index out of range

In [137]:
# Here we create a list of lists, which is a way to represent
# a matrix in Python
crds = [[0.4, 1.2, 9.3], [0.5, 1.8, 9.1]]

In [138]:
len(crds)

2

In [140]:
# We can access a sub-list and print its length
len(crds[1])

3

In [141]:
# This is the first sub list
crds[0]

[0.4, 1.2, 9.3]

In [142]:
crds[0]

[0.4, 1.2, 9.3]

In [143]:
# We can create a variable with the content of this first sub list
xyz = crds[0]

In [144]:
xyz

[0.4, 1.2, 9.3]

In [145]:
xyz[0]

0.4

In [146]:
crds[0][0]

0.4

In [147]:
# Jupyter tip to display the help page of a variable or function
# directly into Jupyter. It will open a popup at the bottom of the
# page
?peptide

In [148]:
?peptide.pop()

## Loops

In [149]:
peptide = ['PRO', 'GLY', 'ALA', 'ARG', 'ASN']

In [150]:
# Loop over the content of a list, `aa` is a variable that will
# get the value of a list item on each iteration
for aa in peptide:
    print(aa)

PRO
GLY
ALA
ARG
ASN


In [154]:
# The loop is useful because you do not have to repeat the
# following code multiple times:
aa = peptide[0]
aa

'PRO'

In [155]:
aa = peptide[1]
aa

'GLY'

In [156]:
aa = peptide[2]
aa

'ALA'

In [159]:
# The three cells above are repeating the same operation
# again and again. We should not do that, instead prefer a
# for loop:
for aa in peptide:
    print(aa)

PRO
GLY
ALA
ARG
ASN


In [160]:
# Pay attention to the indentation. In Python, this is really
# important. In some other languages, we use `{` and `}` to
# define the content of a for loop. In Python, we use indentation
# levels (indentation is the number of spaces from the beginning
# of a line)
for aa in peptide:
    # inside the loop, will be called for each iteration of the loop
    aa = 'AA'

# outside the loop, will be called once the loop is over
print(peptide)

['PRO', 'GLY', 'ALA', 'ARG', 'ASN']


In [162]:
a = ["1 PRO", "2 GLY"]
for i in a:
    # `split()` is a function on a string variable, it allows to
    # transform a string into a list of strings
    print(i.split())

['1', 'PRO']
['2', 'GLY']


In [164]:
peptide = []
lines = ["1 PRO", "2 GLY", "3 ASP", "4 MET"]
for line in lines:
    sp = line.split(' ') # uses spaces as sep by default
    aa = sp[1]
    peptide.append(aa)
    print(peptide)

['PRO']
['PRO', 'GLY']
['PRO', 'GLY', 'ASP']
['PRO', 'GLY', 'ASP', 'MET']


In [165]:
? line.split()

## Comparison operators

In [166]:
# affectation (also known as a variable assignement)
# put the value `5` into the variable named `a`
a = 5

In [167]:
# `==` is for equality
# Answers the question: is `a` equals to `5`?
a == 5

True

In [54]:
a == "foo"

False

In [168]:
# Question: is `a` NOT equal to `2`? 
a != 2

True

In [169]:
# Question: is `a` greater than `2`?
a > 2

True

In [170]:
# Question: is `5` greater than `2`?
5 > 2

True

In [171]:
# Question: is `a` greater than or equal to `2`?
a >= 6

False

In [172]:
a = 5
a >= 5

True

In [175]:
# Question: is `a` less than `2`?
a < 5

False

In [176]:
# Question: is `a` less than or equal to `2`?
a <= 5

True

In [177]:
# Recap of the comparison operators: ==, !=, >, >=, <, <=

## Control flow statements (1)

In [178]:
sequence = 'atgc'
size = len(sequence)
if size == 4:
    print('Sequence length match')

Sequence length match


In [181]:
sequence = 'atgc'
size = len(sequence)
# if the condition is met (that is, `size` is equal to `5`),
# then execute the code into the `if` statement (the code
# indented below the if`)
if size == 5:
    print('Sequence length match')

In [183]:
sequence = 'atgc'
size = len(sequence)
# if `size` is equal to `4`, then print "Sequence length match"
# otherwise (`else`), print "Sequence does not match"
if size == 4:
    print('Sequence length match')
else:
    print('Sequence does not match')

Sequence length match


In [184]:
sequence = 'atgc'
size = len(sequence)
if size == 5:
    print('Sequence length match')
else:
    print('Sequence does not match')

Sequence does not match


In [185]:
print(size == 4)

True


In [186]:
if True:
    print('True !')
else:
    print('False')

True !


In [187]:
if 0:
    print('😎')
else:
    print('🤔')

🤔


In [188]:
if 3:
    print('😎')
else:
    print('🤔')

😎


## Control flow statements (2)

In [189]:
apple = "blue"
if apple == "green":
    print('My apple is green !')
elif apple == "blue":
    print('My apple is blue !')
else:
    print('My apple is not green, but maybe red.')

My apple is blue !


In [190]:
sequence = 'atgc'
for nt in sequence:
    if nt == 'a':
        print('found A')
    elif nt == 't':
        print('found T')
    else:
        print('No A or T found')
        

found A
found T
No A or T found
No A or T found


### Exercise 2.1

Given the following sequence, calculate the ratio of each nucleotide type it contains:

In [191]:
seq = 'ATGCTCGCGGCGCTAGCTACTAGCTAGCA'
seq_len = len(seq)
a = t = g = c = 0
for nt in seq:
    if nt == 'A':
        a += 1  # a += 1 is the same as a = a + 1
    elif nt == 'T':
        t += 1
    elif nt == 'G':
        g += 1
    elif nt == 'C':
        c += 1

for counter in [a, t, c, g]:
    print(counter / seq_len)

0.20689655172413793
0.20689655172413793
0.3103448275862069
0.27586206896551724


## Working with files

In [194]:
# might not work because `myseq.fasta` must exist in your
# Jupyter workspace
f = open('./myseq.fasta')

In [195]:
# By prepending `!` before a command, it tells Jupyter to
# execute a Bash (shell) command instead of Python code.
# The command below creates a fasta file:
!echo '>foo\nATGC' > myseq.fasta

In [199]:
!cat myseq.fasta

>foo
ATGC


In [200]:
!ls

1crn2.fasta  1crn.fasta  myseq.fasta  python-101.ipynb


In [205]:
# When you open a file, you must close it!
# This is common to all programming languages, even if it
# might be hidden to you (see next cell for instance)
f = open('myseq.fasta')
lines = f.readlines()
f.close()

# `\n` is a special character to denote the end of a line
print(lines)

['>foo\n', 'ATGC\n']


In [206]:
# `with` takes care of closing the file at the end ;-)
with open('./myseq.fasta') as f:
    lines = f.readlines()
    for line in lines:
        # we use a new function `strip()` on a string variable
        # to remove whitespaces (including `\n`)
        print(line.strip())

>foo
ATGC


### Exercise 3.1

Download the crambine sequence in **fasta format** from the
[PDB](http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=FASTA&compression=NO&structureId=1CRN).

Save this file in your working directory and store the amino-acids sequence in a `seq` variable.

In [207]:
!wget "http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=FASTA&compression=NO&structureId=1CRN" -O 1crn.fasta


--2017-03-15 09:54:05--  http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=FASTA&compression=NO&structureId=1CRN
Resolving www.rcsb.org (www.rcsb.org)... 128.6.70.10
Connecting to www.rcsb.org (www.rcsb.org)|128.6.70.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: ‘1crn.fasta’

1crn.fasta              [ <=>                  ]      76  --.-KB/s   in 0s     

2017-03-15 09:54:06 (4.15 MB/s) - ‘1crn.fasta’ saved [76]



In [208]:
!ls

1crn2.fasta  1crn.fasta  myseq.fasta  python-101.ipynb


In [209]:
!cat 1crn.fasta

>1CRN:A|PDBID|CHAIN|SEQUENCE
TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN


In [210]:
with open('./1crn.fasta') as f:
    lines = f.readlines()
    sequence = lines[1].strip()
print(sequence)

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN
