# **Introduction to Python Programming Notebook**
---

###### ${By \ Matheus\ Scaketti\ \& \ Ubiratan\ Batista*}$
###### ${Reviewed \ by \ Raissa\ Melo}$

\*Adapted from the material developed for the Brazilian Python Workshop for Biological Data, 2021 edition. [[GitHub Repository]](https://github.com/SantosRAC/BrazilianWorkshopPythonForBioData_Zuvanov_etal_2021)

#**Introduction to Python**

Python is an easy-to-learn and powerful language. It can be used in various fields and is compatible with most platforms.

**What will you learn today?** By the end of this session, you'll be able to harness the language's potential to create diverse programs/routines. *Your imagination is the limit.*

Key points that will be covered:

*   Language peculiarities
*   Variables/Objects
*   Arithmetic operators
*   Logical operators
*   Flow control tools
*   Sequences


---

# **1. Important Concepts**

Every language has its characteristics and advantages and disadvantages. Before discussing any topic, it's important to learn some concepts to be able to create correct code.

## **1.1. Classes**

Classes are a way to organize similar data and functionalities together.

*   Represents an entity (real or logical) from the real world.
*   Can contain attributes and functions that manipulate the class's data.

After their creation, it's possible to generate instances (objects) of the same class depending on the intended purpose. For example:

*   We can create a class called **```Flower```** and define attributes such as leaf ***length*** and ***width***. Additionally, we can define functions that can be called to perform a specific action, such as **```area_calculation()```**.

  <img src='https://raw.githubusercontent.com/brazilpythonws/Updates_BrazilianWorkshopPythonForBiologicalData/main/5thBrazilianWorkshopPythonForBioData_2022/Class_Notebooks/Day1_2022_colab/GIFs/ClassObjects.gif'>
  
  
## **1.2. Objects**

*   Represents an instance of a specific class.
*   Contains all the attributes and functions defined in the class.
*   Example:
    *   Just as we created the **```Flower```** class, we can create an object called **```orchid```** with attributes: length and width equal to 4 cm and 3 cm, respectively.

## **1.3. Functions**

*   Functions are code routines that perform operations on the object's data.

## **1.4. Indentation**

Code indentation is a way to improve code readability and organization, defining its structure better.

Unlike some programming languages, the Python language requires that code be indented correctly to define its structure. Indentation is a key requirement for error-free execution.

In Python, indentation is defined by using the **```tab```** command.

## **1.5. Comments**

An important practice for better code documentation is adding comments and explanations of what is being done in the code. It's important to use comments so that other people or even yourself know what is being executed.

In Python, there are two ways to create comments:
* Using the ```#``` character: Any information after the character will not be considered during code execution.
* Through the combination ```''' comment '''```: Any information enclosed in the quotes will not be considered. This type of comment is useful for multi-line comments.

---

#   **Variables/Objects**

**Variables/Objects** are memory spaces capable of storing **data** that have a specific **type**, such as **```integers```**, **```real numbers```**, or **```strings```**.

The declaration (and initialization) of a variable occurs when the name is declared, followed by an assignment value, represented as `=`.

For example: `<variable_name> = <value>`


In [None]:
obj_a = obj_b = obj_c = obj_d = 5

In [None]:
print(obj_a)
print(obj_b)
print(obj_c)
print(obj_d)

5
5
5
5


The data types in Python are diverse and can be divided into **primitive types** (basic types) and **more complex data structures**.

First, in **primitive types**, we have:

| Type    | Value | Description|
|---------|-------|----------|
| int     | Integer number | Discrete positive or negative numbers |
| float   | Real number / Floating-point number | Decimal numbers |
| string  | Text | Set of characters expressing textual information|
| complex | Complex numbers | Complex numbers in the form $x+j$       |
| boolean | True or False | Logical  / Booleans values |
| null    | None | Null values |

# **Object of type ```int``` (Integer)**:
*   Examples of values of type **int**:
    * 4
    * 5
    * -12
    * 4594

# **Object of type ```float```**:
Floating-point objects in Python use **```.```** to separate the real part from the fractional part.
*   Examples of values of type **float**:
    * 0.345
    * 10.005
    * -0.312
    * 4.0

*   Okay, I think we have an error here. How can the value 4 be both **int** and **float** at the same time?

    *   **Response**: It depends on how we are treating the value 4. Although the values are the same in both cases, in the case of the **float** number, there is a fractional part (e.g.,**```.0```**).

# **Object of type ```string``` (Text)**:
In Python, to represent a value of type **string** , it is necessary to enclose our "value" in either single quotes (**```''```**) or double quote (**```""```**).
*   Examples of values that are **string**:
    * 'Biology'
    * '3.14'
    * "Hello"

Note that **'3.14'** is different from just **3.14**, as we are defining the value as a set of characters that form text.

# **Naming Objects**:

The names of objects and other linguistic concepts should be suggestive, follow a pattern, and represent what the variable is storing.

Additionally, there are some rules for their creation:

 * The name should clearly represent the stored data:
     * **Correct**: ```year = 2021```
     * **Incorrect**: ```k = 2021```
 * It should start with a letter, not a number:
     * **Correct**: ```pi = 3.141592```
     * **Incorrect**: ```3pi = 3.141592```
 * It can contain an _underscore_ (`_`) or numbers. Unlike numbers, the _underscore_ can be used at the beginning:

     * **Example**: ```_pi1_ = 3.141592```

* It cannot contain spaces (use the _underscore_ to represent spaces):
     * **Example**: ```first_message = "Hello world!!!"```
 * You cannot use words reserved by the language, which are used for internal use or as function names:
     * **Example**: ```in```
 * Objects in Python3 can have accents (but this practice is **not recommended!**)
 * Python3 is ```case sensitive```, meaning uppercase and lowercase characters have distinct meanings.


In [None]:
i = 2018
f = 3.14
s = "Biology!"
c = 3+4j
b = True
n = None

In [None]:
n

In [None]:
s

'Biology!'

When assigning a value (with a certain **type**) to a variable, that variable becomes of the stored **type** .

In [None]:
a=True
print(type(a))
b = 2018
print(type(b))
c = 3.14
print(type(c))

<class 'bool'>
<class 'int'>
<class 'float'>


It is important to emphasize that the assignment of a value to a variable at a certain point in the code overwrites any previous assignment to the same variable.

In [None]:
b = 2018
print(b)
print(type(b))
b = "Sobrescreve como string"
print(b)
print(type(b))
b = 3.14
print(b)
print(type(b))

2018
<class 'int'>
Sobrescreve como string
<class 'str'>
3.14
<class 'float'>


Highlight: Integers and floating-point are diferent numbers in Python language. Even if there is a `zero (0)` after the decimal point, the type is still considered **float**:

In [None]:
c = 3.0

In [None]:
print(type(c))

<class 'float'>


In [None]:
c = 3

In [None]:
print(type(c))

<class 'int'>


Also, note that in Python, decimal points are separated with **a period, not a comma** (as in Portuguese). Furthermore, thousands **are not separated by commas**:

In [None]:
30000000.15

30000000.15

# **Data Type Conversion**

In some cases, there is the possibility of having a value of type ```int```, ```float```, or ```string```, but we need this value to be represented with a different data type. In Python, these conversions can be done very simply using specific functions.

* The function ```str(x)``` converts the parameter ```x``` into a **string** variable.
* The function ```int(x)``` converts the parameter ```x``` into an **int** variable.
* The function ```float(x)``` converts the parameter ```x``` into a **float** variable.


In [None]:
um = '1'
print(um, type(um))
print(int(um), type(int(um)))

1 <class 'str'>
1 <class 'int'>


In [None]:
um = 1
print(um, type(um))
print(str(um), type(str(um)))

1 <class 'int'>
1 <class 'str'>


In [None]:
pi = 3.1415
print(pi, type(pi))
print(int(pi), type(int(pi)))

3.1415 <class 'float'>
3 <class 'int'>


# **Composition**

Compositions can be used when it is not simple or practical to join different types of data into a _string_. For example, if we want to indicate the value of a specific variable within a _larger string_.

In [None]:
user1 = "John"
user2 = "Maria"
day = 12

In [None]:
"%s, your DNA was sequenced today." % user1

print("%s e %s, your DNA was sequenced on day %d." % (user1, user2, day))

John e Maria, your DNA was sequenced on day 12.


The symbol `%s` is called a placeholder and indicates that we will replace it with a specific _string_ (e.g., `John`).


| Placeholder  | Type              |
|-----------|-------------------|
| %d        | Integers  |
| %s        | Strings           |
| %f        | Decimal numbers  |

---

#   **Arithmetic Operators**

You can use various arithmetic operations to perform calculations and create formulas. The table below shows the possible operations and their characters in the Python language:

| Operation        | Character        |
|-----------------|------------------|
| Addition            | ```+```          |
| Subtraction       | ```-```          |
| Multiplication   | ```*```          |
| Division         | ```/```          |
| Integer Division | ```//```         |
| Modulus  (Division)   | ```%```          |
| Exponentiation     | ```**```         |



## **3.1. Prioridade de execução**
Uma informação muito necessária ao organizar fórmulas em Python, é a ordem e forma que digitamos as operações. Assim como na matemática, existe a precedência de execução de cada operação, em que determinada operação terá uma prioridade em relação a outras. Na tabela abaixo é possível verificar essas prioridades:

| Prioridade | Caractere        |
|------------|------------------|
| 1°         | ```()```         |
| 2°         | ```**```         |
| 3°         | ```* /  // %```  |
| 4°         | ```+ -```        |


In [None]:
addition = 2 + 3.2
addition

5.2

In [None]:
subtraction = 3 - 4
subtraction

-1

In [None]:
multiplication = 5 * 11
multiplication

55

In [None]:
division = 13 / 5.2
division

2.5

In [None]:
integer_division = 13 // 5.2
integer_division

2.0

In [None]:
mod = 103.5 % 10
mod

3.5

In [None]:
power = 2 ** 10
power

1024

In the example above, it is possible to verify the order of all priorities. Firstly, all calculations within the parentheses are performed. After that, the first operator with the highest priority is exponentiation, followed by division, and only lastly will addition be performed.

# ***Math***

Understanding how to create functions and complex scripts is important. However, we don't always need to reinvent the wheel. In Python, you can access predefined definitions and scripts and import them for use in your code. This is allowed through the use of modules. An important module to consider is the **```math```** module, which brings various functions that work with non-complex real numbers and makes programming tasks easier.

To import a module is simple; you just need to add a line of code in the format `import [module_name]`. After that, you can call the desired function from the module: `[module_name].[function_name]`. Additionally, to make writing and reading the code easier, you can define an "alias" for the module by adding the alias at the time of module import: `import [module_name] as [alias]`.

Some functions that can be useful are found in the table below:

| Function/Variable | Description        |
|------------     |------------------|
| sqrt(x)  | Returns the square root of ```x``` |
| ceil(x)  | Returns the ceiling of `x`, which is the smallest integer greater than or equal to `x`      |
| floor(x) | Returns the floor of `x`, which is the largest integer less than or equal to `x`      |
| pi       | Returns the constant value of `pi`  |

In [None]:
import math as mt

In [None]:
mt.sqrt(100)

10.0

In [None]:
mt.floor(23.5)

23

In [None]:
mt.ceil(23.5)

24

In [None]:
mt.pi

3.141592653589793

---

# Exercícios

1. Crie os objetos $num1=1450$ e $num2=198$. Faça a soma desses números, utilizando os nomes dos objetos.

2. Divida cada um desses valores por 2 e armazene os resultados em outros dois vetores ($res1$ e $res2$). Quais são os valores dos novos objetos?

3. Faça o cálculo a seguir $\frac{1,78}{2} + \frac{5,43}{3}$.

4. Calcule a expressão $\frac{(x+y)\times h}{k+a+g} + u$, assumindo que $x=9$, $y=27$, $h=6$, $k=1$, $a=2$, $g=3$ e $u=5$.

In [None]:
num1 = 1450
num2 = 198

res1 = num1/2
res2 = num2/2

print(res1)
print(res2)

725.0
99.0


In [None]:
#As próximas duas células são sugestões de inclusão/alterações no primeiro exercício. Como forma de incluir dados biológicos nos exercicios da aula.

# Exercises

1. Create the objects $num1=1450$ and $num2=198$. Perform the sum of these numbers using the object names.

2. Perform the following calculation $\frac{1.78}{2} + \frac{5.43}{3}$.

3. Calculate the expression $\frac{(x+y)\times h}{k+a+g} + u$, assuming $x=9$, $y=27$, $h=6$, $k=1$, $a=2$, $g=3$, and $u=5$.

4. In a population of 1000 individuals, for the Y locus, we have a frequency of 60% for the A allele and 40% for the a allele for a certain characteristic. Count the occurrences of each allele (A and a) in the population and store them in the objects alelo_dominante (dominant allele) and alelo_recessivo (recessive allele).

5. In a double-stranded DNA sequence, there are 120 pairs of nucleotides. The percentage of C-G in this sequence is 58.33%. Calculate the numbers of nucleotides A, T, C, and G present in this sequence and store them in the integer variables qtdA, qtdT, qtdC, and qtdG.

In [None]:
#1
num1 = 1450
num2 = 198
soma = num1 + num2
print(soma)

In [None]:
#2
calculo = (1.78/2) + (5.43/3)
print(calculo)

In [None]:
#3
calculo = (((9+27) * 6) / (1 + 2 + 3)) + 5
print(calculo)

41.0


In [None]:
#4
allele_dominant = (1000 * 0.6) * 2
allele_recessive = (1000 * 0.4) * 2
print(allele_dominant)
print(allele_recessive)

1200.0
800.0


In [None]:
#5
qtdC = round((120 * 0.5833))
qtdG = qtdC
qtdA = round((240 - (qtdC + qtdG))/2)
qtdT = qtdA
print(qtdC)
print(qtdG)
print(qtdA)
print(qtdT)

70
70
50
50


## **Logical, Boolean, and Comparison Operations**

## 4.1. Logical Variables

When we want to store a simple content: `true` or `false` in a variable.

In [None]:
result = True

In [None]:
approved = False

In [None]:
result

True

In [None]:
approved

False

## **Logical, Boolean, and Comparison Operations**



Logical, `Boolean`, and comparison operations are useful for comparing numbers, results of arithmetic operations, and even _strings_. They are operations that return true (**```True```**) or false (**```False```**).

The table below presents comparison operations, Boolean operations, as well as their operators.


| Type of Operation | Operation        | Operator        |
| ---------- |-----------------|---------------------|
| Comparison / relational | Not Equal       | ```!=```        |
| Comparison / relational | Greater Than       | ```>```         |
| Comparison / relational | Greater Than or Equal To| ```>=```        |
| Comparison / relational | Less Than       | ```<```         |
| Comparison / relational | Less Than or Equal To| ```<=```        |
| Comparison / relational | Equivalent To   | ```==```        |
| Boolean / Logical | True     | ```True```      |
| Boolean / Logical | False           | ```False```     |
| Boolean / Logical | Negation         | ```not```       |
| Boolean / Logical | And               | ```and```       |
| Boolean / Logical | Or              | ```or```        |

These operators are very useful when working with conditional structures (`if-then-else` - which will be discussed later), used when code needs to decide which path to take.

### **Truth Table**

The results of Boolean (logical) operations can be described in a table called the Truth Table.

|    A    |    B    |    NOT A    |    A AND B    |    A OR B    |
|---------|---------|-------------|---------------|--------------|
| False   | False   |    True     |     False     |     False    |
| False   | True    |    True     |     False     |     True     |
| True    | False   |    False    |     False     |     True     |
| True    | True    |    False    |     True      |     True     |   


These operators are very useful when working with conditional structures (`if-then-else` - which will be addressed later), used when code needs to decide which path to take.

### Truth Table

The results of boolean (logical) operations can be described in a table called a Truth Table.

|    A    |    B    |    NOT A    |    A AND B    |    A OR B    |
|---------|---------|-------------|---------------|--------------|
| False   | False   |    True     |     False     |     False    |
| False   | True    |    True     |     False     |     True     |
| True    | False   |    False    |     False     |     True     |
| True    | True    |    False    |     True      |     True     |

In [None]:
1 < 4 < 6

True

In [None]:
5 >= 4

True

In [None]:
2 == 1

False

Using the logical operation `or`:

In [None]:
2 < 2 or 5 > 1

True

Using the logical operation `not`:

In [None]:
not(1 != 1)

True

Using the logical operation `and`:

In [None]:
2 > 1 and 1 <= 2

True

In [None]:
2 < 1 and 1 <= 2

False

`Strings` can also be compared (detailed further below):

In [None]:
'actcacactaac' == 'actcacactaac'

True

Cadeias de caracteres (`strings` - detalhadas mais adiante), também podem ser comparadas:

In [None]:
'actcacactaac' == 'actcacactaac'

True

In [None]:
'actc' == 'actcacactaac'

False

## **Logical Expressions**

Logical and relational operators can be used in more complex expressions.

In [None]:
True or False and not True

True

In [None]:
True or False and False

True

In [None]:
True or False

True

In [None]:
True

True

In [None]:
#Example mixing the two types of operators

### Exercises


Considering the operations presented in this section:

3. Given two objects that store the lengths of a coding nucleotide sequence (CDS) and a protein (integer numeric values), check if the length of the CDS matches the length of the translated protein (provide additional commands).

```python
protein_length = 30
cds_length = 90
protein_length2 = 50
cds_length2 = 160
```

In [None]:
# Exercise
protein_length = 30
cds_length = 90
protein_length_2 = 50
cds_length_2 = 160

print((cds_length/3) == protein_length)
print((cds_length_2/3) == protein_length_2)

True
False




---



# **Flow Control Tools**

A computer program executes a sequence of pre-defined instructions, one after another, to perform a specific task. However, just like in everyday life, some tasks require a slightly more elaborate scenario for the final goal to be achieved, and commonly, we are talking about decision-making. In programming, some structures are key to changing the direction of a program's instructions based on conditions that need to be considered and satisfied.


## **Conditional Structure**



Let's consider a situation where we need to assign names to a group of objects based on their physical characteristics. These classifications are often used in characterizing biological groups in various areas such as landscaping, botany, taxonomy, etc.

In [19]:
import IPython
IPython.display.Image(url='https://raw.githubusercontent.com/brazilpythonws/Updates_BrazilianWorkshopPythonForBiologicalData/main/5thBrazilianWorkshopPythonForBioData_2022/Class_Notebooks/Day1_2022_colab/GIFs/Gif1_EN_HighQuality.gif')

A very common way to make decisions is to assess a situation and determine what to do based on the current state. Just as in life, in programming, there is a structure capable of determining which path to take if a situation is true or false.

<img src="https://drive.google.com/uc?id=19wWEcfkhG_8QgOUjji4jtr_YYNNqyX1I" width="300">

In Python, we use the ```if-then-else``` conditional structure to control the execution flow and determine what actions should be taken based on a condition.

The syntax for ```if-then-else``` is simple.


```
if(<condition>): # If the condition is true, then...
    <code>
else: # if not...
    <code>
```

In [None]:
height = 1.70

In [None]:
if(height > 1.90):
  print('You are a tall person')
else:
  print('You are a short person')

You are a short person



In many cases, using just a *true* or *false* condition is not enough. In such cases, we need to check more conditions. To do this, we can combine multiple ```if-then-else``` structures using the ```if-elif-else``` structure. The word ```elif``` is an expression in the language used to represent the idea of "else, if." Thus, we can use this structure to improve the verification of the previous example.

In [None]:
if(height > 1.90):
  print('You are a tall person')
elif(height > 1.50):
  print('You are an average height person')
else:
  print('You are a short person')

You are an average height person


In [None]:
if(height > 1.90):
  print('You are a tall person')
else:
  if(height > 1.50):
    print('You are an average height person')
  else:
    print('You are a short person')

You are an average height person


If it is not necessary to execute code if the verified condition is negative, you can remove the ```else``` condition from the structure.

## **Loops (Repetitions)**
Often, we need to perform the same action repeatedly to achieve a final goal. For example, collecting the same information a defined number of times.

In [16]:
IPython.display.Image(url='https://raw.githubusercontent.com/brazilpythonws/Updates_BrazilianWorkshopPythonForBiologicalData/main/5thBrazilianWorkshopPythonForBioData_2022/Class_Notebooks/Day1_2022_colab/GIFs/Gif2_EN_HighQuality.gif')

Just like in real life, this also happens in programming. However, if it's necessary to perform an operation several times, it's not necessary to duplicate the code in question. For example: Let's suppose that for some reason, your machine cannot perform multiplication operations, how could we perform a simple calculation like **"```10*5```"**? Simple, just add the value ```5``` ```10``` times or the value ```10``` ```5``` times.

However, it's not pleasant to type ```10+10+10...``` or ```5+5+5...```. Fortunately, in programming, there are structures that allow you to execute a command *n* times.


## **Statement ```For```**

The ```for``` statement is used to iterate over a sequence of elements, such as a string, a list, or a tuple. Unlike other languages, the for statement
does not require an iteration step or a stopping condition.

To solve our problem, we can use the ```range(x)``` function, which returns a sequence depending on the input value:


In [None]:
multiplic_through_sum = 0

# 5 * 10
for i in range(5):
  multiplic_through_sum += 10

multiplic_through_sum

50

## **Statement ```While```**

Unlike the previous statement, the ```while``` statement is not used for iteration over a sequence. The statement is used to execute repeated commands while a certain condition is true.

In [18]:
IPython.display.Image(url='https://raw.githubusercontent.com/brazilpythonws/Updates_BrazilianWorkshopPythonForBiologicalData/main/5thBrazilianWorkshopPythonForBioData_2022/Class_Notebooks/Day1_2022_colab/GIFs/Gif3_EN_HighQuality.gif')

Following the same idea as our problem:

In [None]:
multiplic_through_sum = 0
i = 1

# 5 * 10
while(i <= 5):
  multiplic_through_sum += 10
  i += 1

multiplic_through_sum

50

---

#   **Sequences**

Sequences are collections of objects organized in a position order. The position of each element is determined by its **index**.

Python has various types of sequences that can be used, such as:
* Strings
* Tuples
* Lists

These structures, being sequences, have common operations. Using these, you can solve many problems with these simple structures.

# **Operations on Sequences**
To manipulate sequences, there are functions that allow operations on these sequences.

| Operation        | Operator/Function        | Syntax            | Description        |
|-----------------|-----------------|--------------------|------------------
| "In"            | ```in```        | x ```in``` seq     | Checks if `x` is contained in the sequence `seq`. Returns `True` if true and `False` otherwise.|
| "Not in"        | ```not in```    | x ```not in ``` seq | Inverse operation to the "In" operation.|
| Concatenation    | ```+```         | seq1 ```+``` seq2 | Concatenates two sequences _seq1_ and _seq2_. Returns the result of the concatenation. |
| Multiplication   | ```*```         | seq ```*``` x          | Multiplies the sequence  `seq`  `x` times (`x` being an integer number). |
| Find Index| `index()` | ```seq.index(x)```| Returns the index of the first occurrence of `x` in the sequence ```seq```|
| Count Total Elements| len() |```len(s)```    | Returns the size of the sequence `s` |
| Count Specific Elements| `count()` | ```seq.count(x)```| Returns the number of occurrences of `x` in the sequence `seq`|




In [None]:
seq1 = 'atgcgatctagca'
seq2 = 'tcgatcgagtgcata'

In [None]:
concatenated_seq = seq1 + seq2
concatenated_seq

'atgcgatctagcatcgatcgagtgcata'

In [None]:
seq1 = 'atcatcatc'
seq2 = 'atc'

In [None]:
if(seq2 in seq1):
    print("Found!")

Found!


In [None]:
seq3 = 'aac'

In [None]:
if(seq3 in seq1):
    print("Found!")
else:
    print("Not Found!")

Not Found!


In [None]:
people_seq = ['John', 'Maria', 'Lucas']

In [None]:
people_seq.index('Maria')

1

Notice that the index returned for this example is different from what we're used to. In Python, to represent indices, the sequence starts with the value 0. Therefore, even though `'Maria'` is in the second position of the sequence, its index is the number `1`.

In [None]:
len(people_seq)

3

In [None]:
people_seq.count('John')

1

## Exercises

Considering the arithmetic operations presented in this section:

1. Write code to calculate the number of peptide bonds for a protein with 20 amino acids.

2. Write code to calculate the expected number of amino acid residues in the translated protein for a coding sequence of size 330.

In [None]:
peptBonds = len('MKNKFKTQEELVNHLKTVGF') - 1
peptBonds

19

In [None]:
seq = 'MKNKFKTQEELVNHLKTVGFVFANSEIYNGLANAWDYGPLGVLLKNNLKNLWWKEFVTKQKDVVGLDSAIILNPLVWKASGHLDNFSDPLIDCKNCKARYRADKLIESFDENIHIAENSSNEEFAKVLNDYEISCPTCKQFNWTEIRHFNLMFKTYQGVIEDAKNVVYLRPETAQGIFVNFKNVQRSMRLHLPFGIAQIGKSFRNEITPGNFIFRTREFEQMEIEFFLKEESAYDIFDKYLNQIENWLVSACGLSLNNLRKHEHPKEELSHYSKKTIDFEYNFLHGFSELYGIAYRTNYDLSVHMNLSKKDLTYFDEQTKEKYVPHVIEP'
aaResidues = len(seq)//3
aaResidues

110

# **Sequence Slices**

A very interesting feature for working with sequences in general is the ability to allow the programmer to select a specific sub-sequence from a set of elements. To achieve this, square brackets `[]` are used, adding the index position in the sequence. For sub-sequences,`:` is used in conjunction with start and end positions.

The index of the first element in the sequence starts at 0. `Yes, 0, not 1`, this is **important information**. Therefore, we can access other indices from `zero` up to `sequence size - 1`.

| Pattern                 | Description
|------------------------|----------------------------------------------------------|
| `seq[i]`     | Returns the value of the element at index `i` |
| `seq[i:j]`   | Returns the sub-sequence that starts at index `i` until `j`|
| `seq[i:j:k]` | Returns the sub-sequence that starts at index `i` up to `j` with a step of `k`|

In [None]:
people_seq = ['John', 'Maria', 'Lucas', 'Roberto']

In [None]:
people_seq[0]

'John'

In [None]:
people_seq[:2]

['John', 'Maria']

In [None]:
people_seq[2:]

['Lucas', 'Roberto']

In [None]:
people_seq[1:3]

['Maria', 'Lucas']

In [None]:
people_seq[::2]

['John', 'Lucas']

# **Strings**

Strings are Python resources for storing textual information in general, allowing you to store anything from simple texts to collections of data representing images.

In bioinformatics, strings are important and can be used to represent DNA, RNA, and protein sequences.


**Remember:** To declare a string, you can use two methods, one with single quotes (' ') and the other with double quotes (" "). This allows you to use one together with the other, enabling you to create texts with quotes to print on the screen, for example.

In addition to the basic sequence methods, strings also have specific functions that can be used. The table below shows the most common ones.

| Method                             | Description
|------------------------------------|----------------------------------------------------------------------------------|
| `s.title()`                    | Returns a copy of the _string_ with all initial characters in uppercase.|
| `s.lower()`                    | Returns a copy of the _string_ with all characters in lowercase.      |
| `s.upper()`                    | Returns a copy of the _string_ with all characters in uppercase.                |
| `s.replace(old_s, new_s)`      | Returns a copy of the string with all occurrences of `old_s` replaced by `new_s` |
| `s.split(sep)`                 | Returns a list of strings with `sep` as the delimiter    |



In [None]:
butterfly = 'Danaus plexippus'


In [None]:
butterfly.title()

'Danaus Plexippus'

In [None]:
butterfly.lower()

'danaus plexippus'

In [None]:
butterfly.upper()

'DANAUS PLEXIPPUS'

In [None]:
butterfly.replace('plexippus', 'chrysippus')

'Danaus chrysippus'

In [None]:
butterfly.split()

['Danaus', 'plexippus']



---



# **Lists**

A **list** is an ordered collection of data that can be of any **data type**, with an unfixed size, and allows manipulation of its data. You can store a **list** of integer data, floating-point data, strings, or even a set with other lists and sequences!

Because of this functionality, Lists is one of Python's most versatile data structure, and like _strings_, they are also **sequences**, and as a result, they have all the sequence methods already presented.

# **Common Methods in Lists**

Since lists are a very common structure for various applications, there are logically specific methods that allow you to extract 100% of their potential. The table below lists the most commonly used methods:

| Method                             | Description
|------------------------------------|--------------------------------------------------------------------------|
| `l.append(x)`   | Appends element `x` to list `l`           |
| `l.insert(i, x)`| Inserts element `x` at index `i` in list `l`|
| `l.pop(i)`      | Removes the element in list `l` found at index `i` and returns it. If the function does not find the index, the last element in `l`  is removed. |
| `l.remove(x)`   | Removes element `x` from list `l`              |
| `l.extend(l2)`  | Extends list `l` with list `l2`            |
| `l.sort()`      | Sorts list `l` din ascending order.          |

In [None]:
# To create a new list, we put several strings or numbers within square brackets, separated by commas:
# Each individual item in a list is called an element
# The first element of a list is always at index zero;

# Examples of lists:

protein = []
protein = ['ALA', 'LYS', 'GLY', 'GLU', 'ALA']
apes = ["Homo sapiens", "Pan troglodytes", "Gorilla gorilla"]
conserved_sites = [24, 56, 132]

In [None]:
list1 = ['Danaus affinis', 'Danaus chrysippus', 'Danaus eresimus', 23, 6.93, ['Python', 2017, 2018]]
list1

['Danaus affinis',
 'Danaus chrysippus',
 'Danaus eresimus',
 23,
 6.93,
 ['Python', 2017, 2018]]

In [None]:
# To add another element to the end of an existing list, we can use the append() method.
# Changes the original (variable)

list1.append('Danaus eresimus')
list1

['Danaus affinis',
 'Danaus chrysippus',
 'Danaus eresimus',
 23,
 6.93,
 ['Python', 2017, 2018],
 'Danaus eresimus']

In [None]:
list1.insert(0,'Morpho Menelaus')
list1

['Morpho Menelaus',
 'Danaus affinis',
 'Danaus chrysippus',
 'Danaus eresimus',
 23,
 6.93,
 ['Python', 2017, 2018],
 'Danaus eresimus']

In [None]:
list1.extend(['Danaus affinis', 'Danaus eresimus'])
list1

['Morpho Menelaus',
 'Danaus affinis',
 'Danaus chrysippus',
 'Danaus eresimus',
 23,
 6.93,
 ['Python', 2017, 2018],
 'Danaus eresimus',
 'Danaus affinis',
 'Danaus eresimus']

In [None]:
list1.remove(23)
list1.remove(6.93)
list1.remove(['Python', 2017, 2018])
list1

['Morpho Menelaus',
 'Danaus affinis',
 'Danaus chrysippus',
 'Danaus eresimus',
 'Danaus eresimus',
 'Danaus affinis',
 'Danaus eresimus']

In [None]:
x = list1.pop()
print(x)
list1

Danaus eresimus


['Morpho Menelaus',
 'Danaus affinis',
 'Danaus chrysippus',
 'Danaus eresimus',
 'Danaus eresimus',
 'Danaus affinis']

In [None]:
list1.sort()
list1

['Danaus affinis',
 'Danaus affinis',
 'Danaus chrysippus',
 'Danaus eresimus',
 'Danaus eresimus',
 'Morpho Menelaus']

In [None]:
list1.count('Danaus affinis')

2

In [None]:
list1.count('Danaus')

0

# **Basic Statistics on Lists**

You can also find some basic statistics functions to be used in conjunction with lists.

In [None]:
numbers_seq = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [None]:
max(numbers_seq)

10

In [None]:
min(numbers_seq)

0

In [None]:
len(numbers_seq)

11

In [None]:
sum(numbers_seq)

55

# **Dictionaries**

Dictionaries are "mapping" type collections, written as `<key: value>` relationships. These collections are organized by **key** and for each **key** there is an associated value. Elements in dictionaries are accessed by their **key**, not by their index, as is done in the case of other sequences.

Dictionaries are also mutable (allow changes), just like lists.

In [None]:
bases = {'A':'Adenine', 'T':'Thymine', 'C':'Cytosine', 'G':'Guanine'}
bases

{'A': 'Adenine', 'T': 'Thymine', 'C': 'Cytosine', 'G': 'Guanine'}

In [None]:
bases['G']

'Guanine'

In [None]:
bases.keys()

dict_keys(['A', 'T', 'C', 'G'])

In [None]:
bases.values()

dict_values(['Adenine', 'Thymine', 'Cytosine', 'Guanine'])