# Introduction to Python Programming Notebook

 * Alisson Hayasi da Costa
 * Leonardo Utida
 * Franciele Grego Esteves
 * Daiane Belgini
 * Renato Augusto Corrêa dos Santos
 * Nathalia Graf Grachet
 * Henrique Frajacomo
 * Bruna Zamith Santos

# 1. Brief Introduction on Concepts

To understand the Python language well, we need to introduce some initial concepts that, even though they are mentioned in a simple way, are indeed complex and requires an in-depth study.

### Class
- Represents an  _entity_ (physical or logical) in the real world.
- In general, it contains features (attributes) and methods to manipulate them.
- Defines a general behaviour for a category of objects.
    - **Example:** We can create a class named **```Flower```** that has the attributes: petal width and petal length.

### Object
- Instance of a specific class.
- Contains all attributes and methods defined in it's class.
    - **Example:** In the **```Flower```** class example above, we can instantiate the object **```orchid```** with petal width of 4cm and petal length of 3cm.
    
### Method
- Procedure or function that performs an operation over the data of the object.
     - **Example:** Still using the **```Flower```** class, we can have a method that gets the petal length and width and calculates it's area.
     

### IDENTATION!
- Identation in code defines the overall structure of the algorithm.
- In Python, it's given by the **```tab```** command.
- Identation is a **very important** concept in Python! Therefore, make correct use of identation in your code. Especially after the declaration of conditionals and repetitions (that will still be seem here).

## Python and Data
- In Python, all types of data (primitives or not) and data structures are objects of a class.
- Roughly, we can say that the **class** of an object is it's **type**.
- The **type** defines the nature of the data.

# 2. Variables

**Variables** are memory spaces capable of storing **data** that have specific **types**, for example: integer numbers, real numbers or text (**```string```**).

The declaration (and it's initialization) of a variable occurs when the name is declared followed by a value attribution, supported by the operator `=`.

Just like this: `<variable_name> = <value>`

In [None]:
i = 2018
a = 3

In [None]:
i

2018

The data types in Python are diverse, and can be divided into **primitive types** (basic types) and **more complex data structures**.

Starting with the **primitive types** we have:

| Type    | Value                                       | Description                                            |
|---------|---------------------------------------------|------------------------------------------------------|
| int     | integer number                              | Positive or negative discrete numbers   |
| float   | real number / floating point number     | Decimal numbers           |
| string  | text                                       | Set of characters that expresses textual information |
| complex | complex number                             | Complex numbers of the form $x+j$                     |
| boolean | True/False                                  | Boolean values               |
| null    | None                                        | null value                                           |


Every **data type** has a domain of value that it can store and operations associated to it's type. 

In [None]:
i = 2018
f = 3.14
s = "Biology!"
c = 3+4j
b = True
n = None

In [None]:
n

In [None]:
s

'Biology!'

When attributing a value to a variable (with a certain **type**), this variable receives the **type** of the attribution.

In [None]:
a=True
print(type(a))
b = 2018
print(type(b))
c = 3.14
print(type(c))

<class 'bool'>
<class 'int'>
<class 'float'>


It's important to remember that the attribution of a value to a variable in a given part of the code overwrites the attribution that was made previously.

In [None]:
b = 2018
print(b)
print(type(b))
b = "Overwrite as string"
print(b)
print(type(b))
b = 3.14
print(b)
print(type(b))

2018
<class 'int'>
Sobrescreve como string
<class 'str'>
3.14
<class 'float'>


It's also important to notice the difference of definition between integers and real numbers. Even though there's a `zero (0)` in the decimals, it's type is still considered float:

In [None]:
c = 3.0

In [None]:
print(type(c))

<class 'float'>


In [None]:
c = 3

In [None]:
print(type(c))

<class 'int'>


Also notice that decimals are separated with **periods and not commas**. Also, you don't need to separate the thousands:

In [None]:
30000000.15

30000000.15

## 2.1 Naming Variables

The names of variables and other language concepts should be very suggestive.

For that, there are some rules to use. Variables:

 - Should start with a letter and not a number
     - **Example**: ```pi = 3.141592```
 - Can contain _underscore_ (`_`) and numbers
     - **Example**: ```_pi1_ = 3.141592```
     (even though the names can't start with numbers, they can start with _underscore_ )
 - Can't have spaces (use _underscore_ whenever you need to use a space)
     - **Example**: ```first_message = "Hello World!!!"```
 - It's impossible to use reserved words (Reserved words of the language meant for internal usage)
     - **Example**: ```in```
 - Variables in Python 3 can have accents (but this practice is not recommended!)
 - Python 3 is ```case sensitive```, therefore uppercase and lowercase characters mean different things.

In [None]:
k   = 2018 # Pouco sugestível
ano = 2018 # Ideal

You can also add commentary lines to your code:
- Using the character  ```#```: The information after the sharp sign will not be considered by Python.
- Through the combination ```''' comment '''```: The information between the six quotation marks (3 to open a comment and 3 to close it) are not considered by Python.

## 2.2. Converting data types

In some cases, there is the possibility of you having a integer or real number as a _string_. Or even have a number that you want to convert to a _string_. In Python, these convertions can be made in a very simple manner.

- The function ```str(x)``` converts the parameter ```x``` given to a string.
- The function ```int(x)``` converts the parameter ```x``` given to an int.
- The function ```float(x)``` converts the parameter ```x``` given to a float.


In [None]:
um = '1'
print(um, type(um))
print(int(um), type(int(um)))

1 <class 'str'>
1 <class 'int'>


In [None]:
um = 1
print(um ,type(um))
print(str(um), type(str(um)))

1 <class 'int'>
1 <class 'str'>


In [None]:
pi = 3.1415
print(pi, type(pi))
print(int(pi), type(int(pi)))

3.1415 <class 'float'>
3 <class 'int'>


## 2.3. Variables and time

A Python code is usually executed line by line by the computer (or cell by cell, if you are running a Jupyter Notebook).

The content of a variable can change overtime, since every attribution overwrites the last known value.

In [None]:
repetition_DNA = 7 * "ATC"

In [None]:
repetition_DNA

'ATCATCATCATCATCATCATC'

In [None]:
repetition_DNA = 5 * "ATC"

In [None]:
repetition_DNA

'ATCATCATCATCATC'

# 3. Arithmetic Operations

You can use various arithmetic operations with the different number types shown. The table below exposes the possible operations and their operators:

| Operation        | Operator        | 
|-----------------|-----------------|
| Sum            | ```+```         |
| Subtraction       | ```-```         |
| Multiplication   | ```*```         |
| Division         | ```/```         |
| Integer Division | ```//```        |
| Mod (division) | ```%```         |
| Power   | ```**```        |

In [None]:
sum_ = 2 + 3.2

In [None]:
sum_

5.2

In [None]:
subtraction = 3 - 4

In [None]:
subtraction

-1

In [None]:
multiplication = 5 * 11

In [None]:
multiplication

55

In [None]:
division = 13 / 5.2

In [None]:
division

2.5

In [None]:
integerDivision = 13 // 5

In [None]:
integerDivision

2

In [None]:
mod = 103.5 % 10

In [None]:
mod

3.5

In [None]:
power = 2 ** 10

In [None]:
power

1024

The execution order of these functions is the same as in mathematics. Also, you can use parenthesis to define the order of execution of the defined operations.

In [None]:
result = -(1 * 2 * 3) + (4 ** (1/2))

In [None]:
result

-4.0

## Exercises

Considering the arithmetic operations presented in this section:

1. Write code that calculates the number of peptide bonds of a protein with 20 aminoacids.

2. Write code that calculates the expected number of aminoacid residues in the translated protein, for a coding sequence of size 130.

# Extra

1. Create the objects $num1=1450$ e $num2=198$. Perform a sum of those using their object names.

2. Divide each of these values by 2 and store the results in other two vectors  ($res1$ e $res2$). What are the values in the new objects?

3. Calculate $\frac{1,78}{2} + \frac{5,43}{3}$.

4. Calculate the expression $\frac{(x+y)\times h}{k+a+g} + u$, assuming that $x=9$, $y=27$, $h=6$, $k=1$, $a=2$, $g=3$ e $u=5$.

## 4. Operações lógicas, booleanas e de comparação

## 4.1. Variáveis do tipo lógico

Quando quisermos armazenar um conteúdo simples: `verdadeiro` ou `falso` em uma variável.

In [None]:
resultado = True

In [None]:
aprovado = False

In [None]:
resultado

True

In [None]:
aprovado

False

## 4.2. Operações lógicas, booleanas e de comparação


Operações lógicas, `booleanas` e de comparação são úteis para comparar números, resultados de operações aritméticas e até mesmo _strings_. São aquelas que retornam verdadeiro (**```True```**) ou falso (**```False```**).


A tabela abaixo apresenta as operações de comparação, operações booleanas, bem como seus operadores. 


| Tipo de operação | Operação        | Operador        | 
| ---------- |-----------------|---------------------|
| Comparação / relacional | Diferença       | ```!=```        |
| Comparação / relacional | Maior que       | ```>```         |
| Comparação / relacional | Maior ou igual a| ```>=```        |
| Comparação / relacional | Menor que       | ```<```         |
| Comparação / relacional | Menor ou igual a| ```<=```        |
| Comparação / relacional | Equivalente a   | ```==```        |
| Booleana / Lógica | Verdadeiro      | ```True```      |
| Booleana / Lógica | Falso           | ```False```     |
| Booleana / Lógica | Negação         | ```not```       |
| Booleana / Lógica | E               | ```and```       |
| Booleana / Lógica | Ou              | ```or```        |

Esses operadores são muito úteis quando estamos trabalhando com estruturas condicionais (`if-then-else` - serão abordadas mais adiante), usadas quando um código precisa decidir qual caminho seguir.

### Tabela-Verdade

Os resultados das operações booleanas (lógicas) podem ser descritos em uma tabela denominada Tabela-Verdade.

|    A    |    B    |    NOT A    |    A AND B    |    A OR B    |
|---------|---------|-------------|---------------|--------------|
| False   | False   |    True     |     False     |     False    |
| False   | True    |    True     |     False     |     True     |
| True    | False   |    False    |     False     |     True     |
| True    | True    |    False    |     True      |     True     |   

In [None]:
1 < 4 < 6

True

In [None]:
5 >= 4

True

In [None]:
2 == 1

False

Usando a operação lógica `or`:

In [None]:
2 < 2 or 5 > 1

True

Usando a operação lógica `not`:

In [None]:
not(1 != 1)

True

Usando a operação lógica `and`:

In [None]:
2 > 1 and 1 <= 2

True

In [None]:
2 < 1 and 1 <= 2

False

Cadeias de caracteres (`strings` - detalhadas mais adiante), também podem ser comparadas:

In [None]:
'actcacactaac' == 'actcacactaac'

True

In [None]:
'actc' == 'actcacactaac'

False

## 4.3. Expressões lógicas

Operadores lógicos e relacionais podem ser utilizados em expressões mais complexas.

In [None]:
True or False and not True

True

In [None]:
True or False and False

True

In [None]:
True or False

True

In [None]:
True

True

In [None]:
# Exemplo misturando os dois tipos de operadores

### Exercícios

Considerando as operações apresentadas nesta seção:

3. Dadas duas variáveis que guardam os comprimentos de uma sequência de nucleotídeos codificante (CDS) e de uma proteína (valores numéricos inteiros), verificar se o tamanho da CDS condiz com o comprimento da proteína traduzida (fornecer comandos adicionais).

```python
compr_proteina = 30
compr_cds = 90
compr_proteina2 = 50
compr_cds2 = 160
```

# 5. Conditional Structures

Many times our code should run a specific array of operations given a condition. In Python, we use the conditional structure ```if-then-else``` to control the program's execution line and, determine which actions should be taken given a condition.

The syntax for the  ```if-then-else``` is pretty simple.
```python
if(<condition>): # if the condition is true, then...
    <code>
else: # else...
    <code>
```

In [None]:
seq_dna1 = 'atcgactgactgaaacac'
seq_dna2 = 'agtcaggagag'

In [None]:
if(seq_dna1 < seq_dna2):
    print(seq_dna1)
else:
    print(seq_dna2)

agtcaggagag


In case several conditions should be verified, we can bind many ```if-then-else``` in the format ```if-elif-else```. The keyword ```elif``` expresses the idea of "else, if". As in code:

```python
if(<condition>):
    <code>
elif(<condition>):
    <code>
elif(<condition>):
    <code>
else:
    <code>
```

In [None]:
if(seq_dna1 > seq_dna2):
    print(seq_dna1)
elif(seq_dna1 < seq_dna2):
    print(seq_dna2)
else:
    print(seq_dna1, seq_dna2)

atcgactgactgaaacac


In [None]:
if(seq_dna1 > seq_dna2):
    print(seq_dna1)

atcgactgactgaaacac


If there's only one condition that needs to be verified, without needing an ```else```, you can declare the structure as ```if <condition>```

# 6. Sequences

Sequences are collections of objects sorted in a positional order. The position of every object (element) is determined by it's **index**.

We'll cover this up ahead.

Python comes with a variety of sequence types ready for use, such as:

1. Strings
2. Lists
3. Tuples

These structures, being sequences, have the same operations in common, just like the numeric variables. From that, you can solve a huge number of problems with those simples structures.

## 6.1. Strings

The  _strings_ are a resource of Python to store textual information in general, just like arbitrary collections of bytes (in other words, these collections can represent the contents of image files, for example).

In Bioinformatics, _strings_ are important, since they are used to represent DNA, RNA and protein sequences.

It's possible to declare a string using simple quotation marks (‘ ’) and double quotation marks (“ ”). This allows you to use double quotation marks in your text if you have declared it in simple ones, and vice-versa.

In [None]:
cat = 'Felis catus'

In [None]:
dog  = "Canis lupus familiaris"

In [None]:
cat

'Felis catus'

In [None]:
dog

'Canis lupus familiaris'

Notice that whitespaces are considered a "letter" of a *string*. Let's use the `len()` function to count the number of characters in a *string* that contains whitespaces.

In [None]:
len("I walk")

7

In [None]:
cat

'Felis catus'

The content of strings can be accessed character by character using a number, the **index**, which represents the position.

**IMPORTANT**: The indexes start at 0, and not at 1!!!

In [None]:
cat[0]

'F'

In [None]:
cat[5]

' '

### 6.1.1 _Slice_ of sequences

_Strings_, for being sequences, have the capacity of having, not only specific characters, but also a range of those. 

For that, we use ```[]``` and we insert the index of the character that we want to access. In case you want to specify a range of values, we first insert the index of the first element and then the index of the last element separated by ``:`` , or as it's called in Python, the  _slice_ operator.

As the first element's index in a sequence is 0, we can access values from `zero` to `sequence length - 1`.

The _slice_ slice operator can be used in three different patterns.

| Pattern                 | Description
|------------------------|----------------------------------------------------------|
| ```sequence[i]```     | Returns the value of the element in index ```i``` | 
| ```sequence[i:j]```   | Returns a subsequence starting at index ```i``` until ```j```|
| ```sequence[i:j:k]``` | Returns a subsequence starting at index ```i``` until ```j``` in intervals of ```k```         | 

In [None]:
human = 'Homo sapiens'

In [None]:
human[0]

'H'

In [None]:
human[-1]

's'

In [None]:
human[:6]

'Homo s'

In [None]:
human[7:]

'piens'

In [None]:
human[2:7]

'mo sa'

In [None]:
human[::2]

'Hm ain'

An important information is that _strings_ in Python are immutable.

In [None]:
dog[4]

's'

In [None]:
dog[4]='s'

We would have to create a new `string` containing the correct value or attribute a new value to the object `cao`.

### 6.1.2 Sequence Operators
Besides the _slice_ operator presented previously, many other operators can be used to manipulate sequences. we can, for example, check if there is a character or number in a sequence, concatenate sequences, check if two sequences or part of them are equal.



| Operation        | Operator        | Syntax            | Description        |
|-----------------|-----------------|--------------------|------------------
| "In"            | ```in```        | x ```in``` seq     | Verifies if `x` is in the sequence `seq`. Returns `True` in case it is and `False` otherwise|
| "Not in"        | ```not in```    | x ```not in ``` seq | Inverse case of the "In" Operation.
| Concatenation    | ```+```         | seq1 ```+``` seq2 | Concatenates two sequences _seq1_ and _seq2_. Return the result of the concatenation |
| Multiplication   | ```*```         | seq * x          | Multiplies a sequence `seq`  `x` times (being `x` an integer number) | 

In [None]:
seq1 = 'atgcgatctagca'

In [None]:
seq2 = 'tcgatcgagtgcata'

_string_ concatenation

In [None]:
seq_results = seq1 + seq2

In [None]:
seq_results

'atgcgatctagcatcgatcgagtgcata'

A common task in biology, is the search or minor sequences inside a DNA sequence.

For example, we can search the pattern `atc` in a bigger sequence:

In [None]:
seq_dna1 = 'atcatcatc'
seq_dna2 = 'atc'

In [None]:
if(seq_dna2 in seq_dna1):
    print("Found!")

Achou!


In [None]:
seq_dna3 = 'ggg'


In [None]:
if(seq_dna3 in seq_dna1):
    print("Found!")
else:
    print("Not Found!")

Não Achou!


In [None]:
if(seq_dna3 in seq_dna1):
    print("Found!")
else:
    print("Not Found!")

Não achou!


_string_ multiplication

In [None]:
seq_1 = "atcgatcga"

In [None]:
seq_1 * 7

'atcgatcgaatcgatcgaatcgatcgaatcgatcgaatcgatcgaatcgatcgaatcgatcga'

#### Composition

Can be used in cases where concatenating multiple _strings_ is not practical.

For example, if we want to point out a value inside a bigger _string_.

In [2]:
user = "Joao"

In [3]:
"%s, your DNA was sequenced today." % user

'Joao, your DNA was sequenced today.'

The symbol `%s` is called 'position marker' and points to a substitution spot for the  _string_ (`Joao`).

| Marker  | Type              |
|-----------|-------------------|
| %d        | Integer  |
| %s        | Strings           |
| %f        | Floats  |

### 6.1.3. Built-in methods for sequences

Besides the basic operators, there are other methods that can be used to increase the range of applications of sequences:


| Method          | Description                               |
|-----------------|-----------------------------------------|
| ```len(s)```    | Returns the size of sequence ```s```        |
| ```min(s)```    | Returns the smallest element in the sequence ```s``` |
| ```max(s)```    | Returns the biggest element in the sequence ```s``` |
| ```s.index(x)```| Returns the index of the first occurence of ```x``` in ```s```|
| ```s.count(x)```| Returns the number of occurences of ```x``` in ```s```|

In [None]:
seq_results

'atgcgatctagcatcgatcgagtgcata'

In [None]:
len(seq_results)

28

To count the number of adenines or thymines in your sequence, we can use the `count()` method:

In [None]:
seq_results.count('a')

8

In [None]:
min(seq_results)

'a'

In [None]:
max(seq_results)

't'

### 6.1.4 Methods for _strings_

_strings_, have the sequence methods, but they also have the specific _string_ methods. 

The most used examples are:

| Method                             | Description
|------------------------------------|----------------------------------------------------------------------------------|
| ```s.title()```                    | Returns a copy of _string_ with all it's words starting with capital letters | 
| ```s.lower()```                    | Returns a copy of _string_ with all it's characters lowercase                 | 
| ```s.upper()```                    | Returns a copy of _string_ with all it's characters uppercase                |
| ```s.replace(old_s, new_s)```      | Returns a copy of _string_ with all the occurences of ```old_s``` substituted by ```new_s``` |
| ```s.split(sep)```                 | Returns a list of _strings_ having ```sep``` as the delimiter     |

In [None]:
butterfly = 'Danaus plexippus'

In [None]:
butterfly.title()

'Danaus Plexippus'

In [None]:
butterfly.lower()

'danaus plexippus'

In [None]:
butterfly.upper()

'DANAUS PLEXIPPUS'

In [None]:
butterfly.replace('plexippus', 'chrysippus')

'Danaus chrysippus'

In [None]:
butterfly.split()

['Danaus', 'plexippus']

## 6.2. Lists

A **list** is an ordered collection of data of any **type**, with no fixed length and mutable (can be modified). You can store a **list** of data of the integer type, float type, strings or even other lists and sequences!

Thanks to this feature, the lists are one of the most versatile structures in Python and, just like _strings_, they are also **sequences** and, because of that, they have all the sequence methods already presented.

In [None]:
lista = ['Danaus affinis', 'Danaus chrysippus', 'Danaus eresimus', 23, 6.93, ['Python', 2017, 2018]]
lista

['Danaus affinis',
 'Danaus chrysippus',
 'Danaus eresimus',
 23,
 6.93,
 ['Python', 2017, 2018]]

In [None]:
len(lista)

6

In [None]:
lista[1]

'Danaus chrysippus'

In [None]:
lista[-1]

['Python', 2017, 2018]

In [None]:
lista[1:3]

['Danaus chrysippus', 'Danaus eresimus']

In [None]:
lista[3:-1]

[23, 6.93]

In [None]:
lista[-1] = "Danaus plexippus"
lista

['Danaus affinis',
 'Danaus chrysippus',
 'Danaus eresimus',
 23,
 6.93,
 'Danaus plexippus']

### 6.2.1 Frequently used methods in Lists

Lists are a very common structure (if not the most common) in Python for various applications. Logically, many methods also are used to complement their power. 

The table below presents the most common methods used when working with lists.

| Method                             | Description
|------------------------------------|--------------------------------------------------------------------------|
| ```l.append(x)```                  | Concatenates the element ```x``` in the list ```l```                            | 
| ```l.insert(i, x)```               | Inserts the element ```x``` in the index ```i``` of the list ```l```             | 
| ```l.pop(i)```                     | Removes the element of the list ```l``` that is found in the index ```i``` and returns it. If pop() doesn't find the index, the last element of ```l``` is removed |
| ```l.remove(x)```                  | Removes the element ```x``` of the list ```l``` |
| ```l.extend(l2)```                 | Extends the list ```l``` with the list ```l2```     |
| ```l.sort()```                     | Sorts the list ```l``` in ascending order |
| ```l.count(x)```                   | Counts the amount of occurences of ```x``` in the list ```l``` |  

In [None]:
butterflies = ['Danaus affinis', 'Danaus chrysippus']
butterflies

['Danaus affinis', 'Danaus chrysippus']

In [None]:
butterflies.append('Danaus eresimus')
butterflies

['Danaus affinis', 'Danaus chrysippus', 'Danaus eresimus']

In [None]:
butterflies.insert(0,'Morpho Menelaus')
butterflies

['Morpho Menelaus', 'Danaus affinis', 'Danaus chrysippus', 'Danaus eresimus']

In [None]:
x = butterflies.pop()
x

'Danaus eresimus'

In [None]:
butterflies.remove('Danaus affinis')
butterflies

['Morpho Menelaus', 'Danaus chrysippus']

In [None]:
butterflies.extend(['Danaus affinis', 'Danaus eresimus'])
butterflies

['Morpho Menelaus', 'Danaus chrysippus', 'Danaus affinis', 'Danaus eresimus']

In [None]:
butterflies.sort()
butterflies

['Danaus affinis', 'Danaus chrysippus', 'Danaus eresimus', 'Morpho Menelaus']

In [None]:
butterflies.count('Danaus')

0

In [None]:
butterflies.count('Danaus affinis')

1

### 6.2.2. Simples Statistics in Lists

It's also possible to find some basic statistic functions inside lists using some built-in methods of Lists in Python.

In [None]:
even = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [None]:
max(pares)

10

In [None]:
min(pares)

0

In [None]:
sum(pares)

55

## 7. Dictionaries

Dictionaries are collections of the “_mapping_” type, that is, ```<key:value>``` relationships. Those are collections organized by **keys** where, for every **key** there's an associated value. Elements in a dictionary are accessed via a **key** and not by index, as it happens in sequences.

Dictionaries are also mutable, just like lists.

In [None]:
bases = {'A':'Adenine', 'T':'Thymine', 'C':'Cytosine', 'G':'Guanine'}
bases

{'A': 'Adenina', 'C': 'Citosina', 'G': 'Guanina', 'T': 'Timina'}

In [None]:
bases['G']

'Guanina'

In [None]:
bases.keys()

dict_keys(['A', 'T', 'C', 'G'])

In [None]:
bases.values()

dict_values(['Adenina', 'Timina', 'Citosina', 'Guanina'])