# Introduction to Python

In these classes we will learn the basics of the programming language Python.  
Python can be easily installed, along with many usefull scientific libraries, with [Anaconda](https://www.anaconda.com/). Binaries are available for Windows, macOS, and Linux.  
In these classes we will make use of the [Jupyter Notebook](https://jupyter.org/) or [Jupyter Lab](https://github.com/jupyterlab/jupyterlab).

Once installed both Python and Jupyter, retrieve the notebooks by cloning this [repository](https://github.com/batterio/intro_ipython_notebook)

To start Jupyter, type in the terminal:  
    `jupyter notebook`  
or  
    `jupyter lab`

This will open a webpage in your browser. Open the notebooks folder and run (by double clicking on it) the introduction.ipynb file.

The material is also explorable in [nbviewer.ipython.org](https://nbviewer.jupyter.org/github/batterio/intro_ipython_notebook/blob/master/notebooks/index.ipynb)

## Index

* [Run Python on the shell](#Run-Python-on-the-shell)
* [Syntax](#Syntax)
* [Simple operations](#Simple-operations)
* [Variables](#Variables)
* [Data structure: string](#Data-structure%3A-string)
* [Exercise 1](#Exercise-1)
* [Data structure: list](#Data-structure%3A-list)
* [Data structure: tuple](#Data-structure%3A-tuple)
* [Data structure: dictionary](#Data-structure%3A-dictionary)
* [Exercise 2](#Exercise-2)

## Run Python  on the shell
[back to top](#Index)

We'll first run python on the shell. As a convention, text that begins with a # is a comment and should not be typed.

    $> python
    
    Python 3.7.1 (default, Oct 22 2018, 11:21:55) 
    [GCC 8.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
    >>> # This is a comment and it will not be executed
    >>>

To exit python, you just have to type exit() or press the keys CTRL and D together.

## Syntax 
[back to top](#Index)

Unlike languages like C/C++ or Perl, which use braces to define blocks, Python uses line indentation to define a block. The number of spaces in the indentation is variable, but all statements within the block must be indented the same amount.

In [2]:
# The second block in this example will generate an error
if 1 < 10:
    print("1 is smaller than 10")
    print("10 is bigger than 1")
else:
    print("1 is bigger than 10")
   print("10 is smaller than 1")

IndentationError: unindent does not match any outer indentation level (<tokenize>, line 7)

## Simple operations
[back to top](#Index)

The following convections are used in the code examples: blue color is used in comments, green color is used to represent the result of a command and red color is used when an error is raised.

In [3]:
# We can use python as a calculator
3 + 4

7

In [4]:
# Division
8 / 2

4.0

In [5]:
# Product
4 * 2

8

In [6]:
# Power
4 ** 2

16

In [7]:
8 + 4 * 3

20

In [8]:
(8 + 4) * 3

36

Numbers can be:

* integer: **int()**
* floating point: **float()**
* complex: **complex()**

In [9]:
float(2)

2.0

In [10]:
int(3.2)

3

In [11]:
int('3')

3

## Variables
[back to top](#Index)

Define variables in python is very easy. Python variables don't have types, but their value do. So you can bound a variable to an integer at one point in your program and then rebound to a string at another point

In [12]:
# We can bound a variable to an integer...
a = 8
print(a)

8


In [13]:
a + 4

12

In [14]:
c = a + 4
print(c)

12


In [15]:
# ...or we can bound the same variable to a string...
a = "Hello"
print(a)

Hello


In [16]:
a + " World!"

'Hello World!'

In [17]:
# ...but we can't mix different types
4 + " is a number" 

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [18]:
"4" + " is a number" 

'4 is a number'

In [19]:
str(4) + " is a number" 

'4 is a number'

The command `type` shows the type of the object passed as argument.

In [20]:
n = 5
type(n)

int

In [21]:
s = "Hello"
type(s)

str

In [22]:
import sys
type(sys)

module

Another very useful command is `dir`, expecially when you use python in interactive mode. This command return a list of the attributes of the object passed as argument.

In [23]:
x = "this is a string"
dir(x)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


Using Jupyter you can explore the attribute of the object by pressing `TAB`

In [None]:
x.

All the element of this list are functions or constant of the type string. Now, how can we know how to use a particular function? In this case it could be very useful to use the `.__doc__` or the `help` commands.

Let's say we are interested in the functions `count` and `replace`, what we can do is:

In [25]:
help(x.count)

Help on built-in function count:

count(...) method of builtins.str instance
    S.count(sub[, start[, end]]) -> int
    
    Return the number of non-overlapping occurrences of substring sub in
    string S[start:end].  Optional arguments start and end are
    interpreted as in slice notation.



In [26]:
x.count("i")

3

In [27]:
help(x.replace)

Help on built-in function replace:

replace(old, new, count=-1, /) method of builtins.str instance
    Return a copy with all occurrences of substring old replaced by new.
    
      count
        Maximum number of occurrences to replace.
        -1 (the default value) means replace all occurrences.
    
    If the optional argument count is given, only the first count occurrences are
    replaced.



In [28]:
x.replace("i", "X")

'thXs Xs a strXng'

`count` could be useful for calculate the GC content of a sequence of DNA while `replace` could be useful to change DNA to RNA (T -> U)

## Data structure: string
[back to top](#Index)

A string, by definition, is a sequence of characters, like "012345ABCDE". Python recognize as strings everything that is delimited by quotation marks `" "` or `' '`.

In [29]:
dna1 = "gattaca"
dna2 = "acattag"
dna1 == dna2

False

In [30]:
dna1 != dna2

True

Strings have indices, so we can refer to every position of a string with its correspondig index. You have to keep in mind that python, as many other languages, **starts to count from 0**!  
You can also select a slice of a string defining an interval where the first number is included but not the last one: In `[n, m]`, `n` is included but not `m`

```
   0   1   2   3   4   5   6
 +---+---+---+---+---+---+---+
 | g | a | t | t | a | c | a |
 +---+---+---+---+---+---+---+
  -7  -6  -5  -4  -3  -2  -1
```

In [31]:
dna1[1]

'a'

In [32]:
dna1[1:3]

'at'

In [33]:
dna1[-1]

'a'

In [34]:
len(dna1)

7

In [35]:
"c" in dna1

True

In [36]:
# 'find' and 'index' are two useful methods to extract an index from a string
help(dna1.index)

Help on built-in function index:

index(...) method of builtins.str instance
    S.index(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found, 
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Raises ValueError when the substring is not found.



In [37]:
help(dna1.find)

Help on built-in function find:

find(...) method of builtins.str instance
    S.find(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.



In [38]:
dna1.index('f')

ValueError: substring not found

In [39]:
dna1.find('f')

-1

In [40]:
dna1.find("ta")

3

In [41]:
dna1.upper()

'GATTACA'

In [42]:
dna1.count("a")

3

In [43]:
dna1.replace("a", "U")
'gUttUcU'

'gUttUcU'

## Exercise 1
[back to top](#Index)

Try to write a code than transform the RNA sequence `"UUgGAagaGcuuACUUag"` to DNA and then calculate its GC content

Tips:

* Assign the sequence to a variable
* Make all the nucleotides uppercase (or lowercase)
* Replace all the 'U' with 'T'
* Count the number of 'C' and 'G' and divide it by the length of the sequence

[Solution](solutions.ipynb#Exercise-1)

## Data structure: list
[back to top](#Index)

List is an array that contains objects non necessarily of the same type. The elements of a list are included between two square brackets `[` and `]`

In [44]:
ecoRI = "gaattc"
bamHI = "ggatcc"
hindIII = "aagctt"
enzymes = [ecoRI, bamHI, hindIII]

print(enzymes) 

['gaattc', 'ggatcc', 'aagctt']


In [45]:
# A list can also contain another list
my_list = [100, 'bio', enzymes]

print(my_list)

[100, 'bio', ['gaattc', 'ggatcc', 'aagctt']]


In [46]:
# You can access to one element of the list with its index
my_list[0]

100

In [47]:
# Remember that python starts counting from 0!
my_list[3]

IndexError: list index out of range

In [48]:
# You can get the last element of the list with the index -1
my_list[-1]

['gaattc', 'ggatcc', 'aagctt']

In [49]:
my_list[-1] == my_list[2]

True

In [50]:
# You can know how many elements are in the list with 'len'
len(my_list)

3

In [51]:
# With ':' you can select part of the list
my_list[1:3]

['bio', ['gaattc', 'ggatcc', 'aagctt']]

In [52]:
my_list[1:]

['bio', ['gaattc', 'ggatcc', 'aagctt']]

In [53]:
my_list[:]

[100, 'bio', ['gaattc', 'ggatcc', 'aagctt']]

In [54]:
type(my_list[0])

int

In [55]:
type(my_list[2])

list

In [56]:
# The command 'range' create a list of numbers
range(5)

range(0, 5)

In [57]:
# The command 'split' returns a list of subsequences
seq = "atg-gct-tta"
seq.split("-")

['atg', 'gct', 'tta']

In [58]:
# The command 'list' returns a list of all the characters of a string
my_list = list("atga")
my_list

['a', 't', 'g', 'a']

In [59]:
# With command 'append' you can add something to a list (last position).
my_list.append('c')
my_list

['a', 't', 'g', 'a', 'c']

In [60]:
# The command 'insert' adds something in a specific position of the list
my_list.insert(1, 'a')
my_list

['a', 'a', 't', 'g', 'a', 'c']

In [61]:
# The command 'pop' removes amd returns an element of the list at a given index
my_list.pop(1)

'a'

In [62]:
my_list

['a', 't', 'g', 'a', 'c']

In [63]:
# The command 'sort' orders a list
my_list.sort()
my_list

['a', 'a', 'c', 'g', 't']

In [64]:
# The command 'reverse' changes the order of the list
my_list.reverse()
my_list

['t', 'g', 'c', 'a', 'a']

In [65]:
# Is possible to transform a list in a string with the command 'join'.
my_string = "".join(my_list)
my_string

'tgcaa'

## Data structure: tuple 
[back to top](#Index)

Tuple is an immutable list. The elements of a tuple are included between two brackets `(` and `)`

In [66]:
my_tuple = ("gttc", 5, [4, "a"])
my_tuple

('gttc', 5, [4, 'a'])

In [67]:
type(my_tuple)

tuple

In [68]:
my_tuple[1] = 3

TypeError: 'tuple' object does not support item assignment

In [69]:
# The only commands that we can apply to the tuples are 'count' and 'index'
dir(my_tuple)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'count',
 'index']

In [70]:
my_tuple.count("gttc")

1

In [71]:
my_tuple.index("gttc")

0

## Data structure: dictionary
[back to top](#Index)

Dictionary is one of the most useful tools in python. It is an associative array: **{key: value}**.  
The elements of a dictionary are included between two curly brackets `{` and `}`. The key and the value of a dictionay are linked by `:`.  
The key is a tuple and consecuentely is immutable

In [72]:
my_dict = {'hindIII':'aagctt', 'ecoRI':'gaattc', 'bamHI':'ggatcc'}
print(my_dict)

{'hindIII': 'aagctt', 'ecoRI': 'gaattc', 'bamHI': 'ggatcc'}


In [73]:
# 'keys' returns all the keys of the dictionary in a list
my_dict.keys()

dict_keys(['hindIII', 'ecoRI', 'bamHI'])

In [74]:
# 'values' returns all the values of the dictionary in a list
my_dict.values()

dict_values(['aagctt', 'gaattc', 'ggatcc'])

In [75]:
# 'items' returns both the keys and the values of the dictionary in a list of tuples
my_dict.items()

dict_items([('hindIII', 'aagctt'), ('ecoRI', 'gaattc'), ('bamHI', 'ggatcc')])

In [76]:
# Given a key, you can get very quickly the relative value
my_dict['ecoRI']

'gaattc'

In [77]:
# This is how you add an element to a dictionary...
my_dict['BhlII'] = 'agatct'
print(my_dict)

{'hindIII': 'aagctt', 'ecoRI': 'gaattc', 'bamHI': 'ggatcc', 'BhlII': 'agatct'}


In [78]:
# ...and this is how you delete one
del my_dict['bamHI']
my_dict.keys()

dict_keys(['hindIII', 'ecoRI', 'BhlII'])

In [79]:
# What happen if you ask for a key that doesn't exist?
my_dict['Xho1']

KeyError: 'Xho1'

In [80]:
# You can check if the key exists with 'in' or 'get'
'ecoRI' in my_dict.keys()

True

In [81]:
my_dict.get('Xho1', "The key doesn't exist!")

"The key doesn't exist!"

## Exercise 2
[back to top](#Index)

Try to write a code than make the reverse of the sequence `'ACTCGAACGTGTGTCGTTCGGGATTACG'`

Tips:

* Assign the sequence to a variable
* Create a list with the characters of the sequence
* Reverse the list
* Concatenate the elements of the list together to form a string

[Solution](solutions.ipynb#Exercise-2)