# Python Basics

## Why Python?

There are many programming languages that offer different functionalities. Programmers usually end up using multiple languages depending on what they want to achieve. In bioinformatics, we are often interested in manipulating "text", such as DNA and protein sequences... 

Python is therfore a good choice for bioinformatics analysis because
* It has consistent syntax
* It has bulilt-in libraries that one can use
* Allows for easy manipulatation of text


## Data Types

Data types are items that help *classify* or *categorize* data. They represent what kind of *value* each item has. These items are called *variables*.

<img src="https://drive.google.com/uc?id=1fiIM9_ydwnK8gYn1mi40QDEiljkZkKy3">

Variables can have any name and any value. For example:

A numerical variable is defined as:

```
numerical_var = 5
```
A "string", or character variable, is defined as:
```
string_var = "Hello World"
```
A combination of different data types assigned to one variable can be achieved using a list, for example like this:
```
my_list = [1, 1.34, "Hello World!"]
```

##### Detour 1: Print Function

In Python you can use print() to print, or display, a specified message or value on the screen.

We can print a message like this:

```
print("Hi there")
```

We can even print a numerical value:
```
print(193.39)
```

Also, we can print variables created before. We just have to enter the name of the variable, like this:
```
print(my_dna)
```




Try printing the values of all variables assigned so far, as well as your own message. We have started this problem for you. 

Note 1: each print statement should be started on a new line or in a new cell.

Note 2: To execute the code, press Shift + Enter, or press the play button in the cell.

In [1]:
# ANSWER
# assign the variables from above since we only showed them above in a text cell and didn't assign them in a coding cell 
numerical_var = 5 
string_var = "Hello World"
my_list = [1, 1.34, "Hello World!"]
# print the variables
print(numerical_var)
print(string_var)
print(my_list)
print('This is my own message - Good job today!')



5
Hello World
[1, 1.34, 'Hello World!']
This is my own message - Good job today!


##### Detour 2: Comments
Something very useful in programming are "comments". 

In Python, the character "#" prevents the program from executing the line in front of which "#" is placed. This is often used for commenting, or annotating, code, or for preventing certain parts of the code from running (for example, parts that are no longer needed).

Here an example of how comments work:

In [1]:
print('This is a line 1')
#print('This is line 2')
print('This is line 3')

This is a line 1
This is line 3


As you can see from the cell above, when commenting a line of code, the line changes color, allowing us to quickly spot comments. 

We will be using comments to give you instructions and hits.

Try it out yourself! Comment, and remove the comments below.

In [None]:
print('Comment me')
print('Hello World')
#print('Uncomment me')

Comment me
Hello World


### Exercises on Data Types

Create a variable called **my_dna** as assign as its value the string: **ATGCGTA**


In [2]:
# Write your code here
# ANSWER
my_dna = 'ATGCGTA'

Now, let's create a numerical variable called **my_dna_length** and assign as its value the number of nucleotides in my_dna

In [3]:
# Write your code here
# ANSWER
my_dna_length = 7

##### Detour 3: Function len( )

While it is easy to count the length of my_dna, as the number of characters increases the manual counting gets more difficult ...
There is an easier way to do this in Python! We use a function called:
```
len()
```
We will describe functions like this one later in more detail, but in practice len() will count every character in a string or every item in a list. For example:
```
len("ABC")
```
will count how many characters constitute "ABC".

Try to get the number of nucleotides for my_dna using len( ).

In [4]:
# Place your code here
# ANSWER
len(my_dna)

7

## Operators

Scientists constantly have to manipulate, modify and interpret their data. To do this with programming, we can use arithmetic symbols and operators.

We can use them for truth value testing, comparisons, data type conversions, and many other things.

<img src="https://drive.google.com/uc?id=1-73goDs7Igl3jAfwmsOyy4vXHdbX78il">

Arithmetic and operator symbols can work for different data types, but they might do different things if the variable is string or integer, for example.

### Exercises on Operators

In [5]:
# Here we assigned some variables that you will need
x = 3
y = 7
true_bool = True
false_bool = False
str_1 = "Hello"
str_2 = "World"

Use operators to show if true_bool is equal to false_bool, and do some calculations with x and y.


Hint: Refer to the diagram above to find the operation you need to use.


In [6]:
# Place your code here
# ANSWER
print(true_bool == false_bool)
print(x+y)
print(x-y)
print(x*y)

False
10
-4
21


Now, let's use operators for strings. The operators + and * can be also used with strings. For * you will need to to use a number. Try it out, and ask for help if you are stuck using + and * with strings.

In [7]:
# Place your code here
# ANSWER
print(str_1+str_2)
print(str_1*4)

HelloWorld
HelloHelloHelloHello


To conclude the Exercises on Operators:


Print the variable my_dna three times.

In [8]:
# Your code here
# ANSWER
my_dna*3

'ATGCGTAATGCGTAATGCGTA'

Create a variable "my_dna_2" with the value "CATCGGGTA" and print the concatenation of my_dna with my_dna_2

*Hint: concatenation is the string version for adding*

In [9]:
# Your code here
# ANSWER
my_dna_2 = 'CATCGGGTA'
print(my_dna + my_dna_2)

ATGCGTACATCGGGTA


On the screen show the following message: 
```
My dna sequence is ATGCGTA and it has length 7
```
There is mulitple ways to do this! 

*Hint 1: you can use previously assigned variables*

*Hint 2: when using + to concatenate, you can only join the same data type together*

In [10]:
# Your code here
# ANSWER
print('My dna sequence is ' + my_dna + ' and it has length ' + str(my_dna_length)) # str will change the number 7 to a string so that you can add it to string message


My dna sequence is ATGCGTA and it has length 7


## Functions

Functions are blocks of code that performs an specific task. They typically do one single task, and ideally they do that one task well! We already introduced you to some functions, like print() and len(). 

There are two type of functions. Built-in functions are the ones that Python created and we can use but not modify (for example print() and len()). The most common built-in functions are [here](https://docs.python.org/3/library/functions.html).

A cool thing about programming is that we can create our own functions to serve our needs. To create a function in Python, we use the following format:

In [11]:
# This is called defining a function
def say(): # say is the name of the function that we will later use to "call" the function
    greeting = 'Hello'
    print(greeting)

To "execute" or run a function, we write the name of the function followed by round brackets. When executing, or running the code, the function will perform, its task. For example:

In [5]:
say()

Hello


Functions can take in *parameters* and do something with them. To do this, we specify the name of the parameter inside the parenthesis, like this:

In [18]:
def say_my_name(name):
    greeting = 'Hello my name is ' + name
    print(greeting)

To execute with specific parameters, we input the values of the parameters. We can use the same function with different values for its parameters:

In [19]:
# You can do this:
say_my_name('Karla')
say_my_name('Arjana')

Hello my name is Karla
Hello my name is Arjana


In [20]:
# Or you can do this:
say_my_name(name = 'Karla')
say_my_name(name = 'Arjana')

Hello my name is Karla
Hello my name is Arjana


Now try printing your name!

In [21]:
# Your code here
# ANSWER
say_my_name('Mariel')
say_my_name(name='Sarah')

Hello my name is Mariel
Hello my name is Sarah


Sometimes it is useful to have a function that returns a value. For this, we use "return" at the end of the function. A returned value can then be used outside the function, in contrast to a printed value which can only be viewed.

In [22]:
def summation(var1,var2):
    s = var1 + var2
    return(s)


summation(1,4) # If function is called at the end of the cell, the returned value will be automatically "printed". 
# If the function is called within other code you need to do print(summation(1,4)) to see the returned value.

5

You can save the returned value as a new variable, like this:

In [30]:
s = summation(13,48)
print(s)

61


If the operator is valid for different data types, one function can be used in multiple ways. For example, "summation(var1,var2)" can be used for numbers, strings, and even lists...

In [33]:
print(summation(4,3.5))
print(summation('hello','world'))
list_1 = [1,2,3]
print(summation(list_1,[6,7,8,'hello']))

7.5
hello world
[1, 2, 3, 6, 7, 8, 'hello']


### Exercises on Functions

Create a function called up_seq_down that does the following:

1. Receives as parameter a DNA sequence
2. Creates variable called upstream with value AAA
3. Creates variable called downstream with value GGG
4. Concatenates upstream, the DNA sequence (input) and downstream
5. Returns the concatenated sequence

Next, run your function with some different DNA sequences as input.

*Hint: Your output should look something like this*
```
AAATCGAGTGCACTCGGG
```


In [40]:
# Create your function here
# ANSWER
def up_seq_down(dna_seq):
    upstream = 'AAA'
    downstream = "GGG"
    
    seq_concat = upstream + dna_seq + downstream
    
    return(seq_concat)

up_seq_down('GCATCGA')


'AAAGCATCGAGGG'

Below another for of a valid answer to the question

In [12]:
# Create your function here
# ANSWER
def up_seq_down(dna_seq):
    upstream = 'AAA'
    downstream = "GGG"
    
    print (upstream + dna_seq + downstream)
    
up_seq_down('GCATCGA')

AAAGCATCGAGGG


## Files

Python is specially good for working with text files. It is easy to open, write, read, save, or edit the content of a text file. 

You can learn more about files [here](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files) (optional).


Either to access the content of an existing file, or to create a file from scratch, we use the function "open( )", and specify what we intend to do with the file. We can, for example, write ("w") a file like this:
```
f = open('my_file.txt','w')
f.write('My first file\n')
f.write('This is line two\n')
f.write('And line three')
f.close()
```
Above we create a text file called "my_file", assigned it to the variable f, and wrote in it using the function "write( )" (The "\n" starts a new line). At last, we used f.close( ) to tell the computer we are done using the file and it can be created on the computer.

### Exercises on Files

Let's try to recreate the example above, creating our first file.

In [13]:
# Place your code here
# ANSWER

f = open('my_first_file.txt','w')
f.write('Line 1 \n')
f.close()

Now that we have created a file, we can try to read and print its content! To do so, we will use the function "read( )" as follows:
```
f = open('my_file.txt','r')
txt = f.read()
f.close()
print(txt)
```

Use the template above to read and print the content of the file you created in the previous step.

In [16]:
# Place your code here
# ANSWER

f = open('my_first_file.txt','r')
txt = f.read()
f.close()
print(txt)

Line 1 



We can also use function "readlines( )" to create a list containing each line of the file.
```
f = open('my_file.txt','r')
txt = f.readlines()
f.close()
print(txt)
```

There is a file called "sequences.fasta" available to you in the same folder as this notebook.
Open and read the file, then print its content using read() or readlines().

In [17]:
# Place your code here
# ANSWER

f = open('my_first_file.txt','r')
txt = f.readlines()
f.close()
print(txt)

['Line 1 \n']


## Loops

Loops are a way to iteratively go over a list or string (or any other data type that is iterable) and do the same thing to each item. (For more info see [here](https://www.learnpython.org/en/Loops)).

In [18]:
# Run this cell to see how loops work
prime_nums = [2, 3, 5, 7]
for prime in prime_nums:
    print(prime)

2
3
5
7


In [19]:
# Another example of a for loop, for even numbers
even_nums = [2, 4, 6, 8]
for even in even_nums:
    print(even)

2
4
6
8


Next, go through a list of the numbers from 1 to 10 and display their square.

In [20]:
# Place your code here.
# ANSWER
numb = [1,2,3,4,5,6,7,8,9,10]

for square in numb:
    print(square**2)

1
4
9
16
25
36
49
64
81
100


Another valid answer for this example is using the function "range()". It works well for larger lists, for example squaring numbers from 1 - 100.

See below

In [21]:
# Place your code here.
# ANSWER

for n in range(1,101):
    print(n**2)

1
4
9
16
25
36
49
64
81
100
121
144
169
196
225
256
289
324
361
400
441
484
529
576
625
676
729
784
841
900
961
1024
1089
1156
1225
1296
1369
1444
1521
1600
1681
1764
1849
1936
2025
2116
2209
2304
2401
2500
2601
2704
2809
2916
3025
3136
3249
3364
3481
3600
3721
3844
3969
4096
4225
4356
4489
4624
4761
4900
5041
5184
5329
5476
5625
5776
5929
6084
6241
6400
6561
6724
6889
7056
7225
7396
7569
7744
7921
8100
8281
8464
8649
8836
9025
9216
9409
9604
9801
10000


# References/Resources
[Python Documentation](https://www.python.org/doc/)

[Operators](https://docs.python.org/3/)

Topics and explanations in this notebook were inspired by [yourgenome.org](https://www.yourgenome.org).

Some exercises in this notebook were inspired by [Python for biologists](https://pythonforbiologists.com/upcoming-workshops/introduction-to-python-for-biologists-online-course-13th-24th-july-2020)





# About this Notebook
This notebook was created by Karla Godinez-Macias and Arjana Begzati for STARTneuro at UC San Diego.