# Python Basics

## Why Python?

There are many programming languages that offer different functionalities. Programmers usually end up using multiple languages depending on what they want to achieve. In bioinformatics, we are often interested in manipulating "text", such as DNA and protein sequences... 

Python is therfore a good choice for bioinformatics analysis because
* It has consistent syntax
* It has bulilt-in libraries that one can use
* Allows for easy manipulatation of text


## Data Types

Data types are items that help *classify* or *categorize* data. They represent what kind of *value* each item has. These items are called *variables*.

<img src="https://drive.google.com/uc?id=1fiIM9_ydwnK8gYn1mi40QDEiljkZkKy3">

Variables can have any name and any value. For example:

A numerical variable is defined as:

```
numerical_var = 5
```
A "string", or character variable, is defined as:
```
string_var = "Hello World"
```
A combination of different data types assigned to one variable can be achieved using a list, for example like this:
```
my_list = [1, 1.34, "Hello World!"]
```

##### Detour 1: Print Function

In Python you can use print() to print, or display, a specified message or value on the screen.

We can print a message like this:

```
print("Hi there")
```

We can even print a numerical value:
```
print(193.39)
```

Also, we can print variables created before. We just have to enter the name of the variable, like this:
```
print(my_dna)
```




Try printing the values of all variables assigned so far, as well as your own message. We have started this problem for you. 

Note 1: each print statement should be started on a new line or in a new cell.

Note 2: To execute the code, press Shift + Enter, or press the play button in the cell.

In [1]:
# ANSWER
# assign the variables from above since we only showed them above in a text cell and didn't assign them in a coding cell 
numerical_var = 5 
string_var = "Hello World"
my_list = [1, 1.34, "Hello World!"]
# print the variables
print(numerical_var)
print(string_var)
print(my_list)
print('This is my own message - Good job today!')

5
Hello World
[1, 1.34, 'Hello World!']
This is my own message - Good job today!


##### Detour 2: Comments
Something very useful in programming are "comments". 

In Python, the character "#" prevents the program from executing the line in front of which "#" is placed. This is often used for commenting, or annotating, code, or for preventing certain parts of the code from running (for example, parts that are no longer needed).

Here an example of how comments work:

In [None]:
print('This is a line 1')
#print('This is line 2')
print('This is line 3')

This is a line 1
This is line 3


As you can see from the cell above, when commenting a line of code, the line changes color, allowing us to quickly spot comments. 

We will be using comments to give you instructions and hits.

Try it out yourself! Comment, and remove the comments below.

In [None]:
print('Comment me')
print('Hello World')
#print('Uncomment me')

Comment me
Hello World


### Exercises on Data Types

Create a variable called **my_dna** as assign as its value the string: **ATGCGTA**


In [7]:
# Write your code here
# ANSWER
my_dna = 'ATGCGTA'

Now, let's create a numerical variable called **my_dna_length** and assign as its value the number of nucleotides in my_dna

In [None]:
# Write your code here
# ANSWER
my_dna_length = 7

##### Detour 3: Function len( )

While it is easy to count the length of my_dna, as the number of characters increases the manual counting gets more difficult ...
There is an easier way to do this in Python! We use a function called:
```
len()
```
We will describe functions like this one later in more detail, but in practice len() will count every character in a string or every item in a list. For example:
```
len("ABC")
```
will count how many characters constitute "ABC".

Try to get the number of nucleotides for my_dna using len( ).

In [None]:
# Place your code here
# ANSWER
len(my_dna)

7

## Operators

Scientists constantly have to manipulate, modify and interpret their data. To do this with programming, we can use arithmetic symbols and operators.

We can use them for truth value testing, comparisons, data type conversions, and many other things.

<img src="https://drive.google.com/uc?id=1-73goDs7Igl3jAfwmsOyy4vXHdbX78il">

Arithmetic and operator symbols can work for different data types, but they might do different things if the variable is string or integer, for example.

### Exercises on Operators

In [3]:
# Here we assigned some variables that you will need
x = 3
y = 7
true_bool = True
false_bool = False
str_1 = "Hello"
str_2 = "World"

Use operators to show if true_bool is equal to false_bool, and do some calculations with x and y.


Hint: Refer to the diagram above to find the operation you need to use.


In [4]:
# Place your code here
# ANSWER
print(true_bool == false_bool)
print(x+y)
print(x-y)
print(x*y)

False
10
-4
21


Now, let's use operators for strings. The operators + and * can be also used with strings. For * you will need to to use a number. Try it out, and ask for help if you are stuck using + and * with strings.

In [5]:
# Place your code here
# ANSWER
print(str_1+str_2)
print(str_1*4)

HelloWorld
HelloHelloHelloHello


To conclude the Exercises on Operators:


Print the variable my_dna three times.

In [8]:
# Your code here
# ANSWER
my_dna*3

'ATGCGTAATGCGTAATGCGTA'

Create a variable "my_dna_2" with the value "CATCGGGTA" and print the concatenation of my_dna with my_dna_2

*Hint: concatenation is the string version for adding*

In [9]:
# Your code here
# ANSWER
my_dna_2 = 'CATCGGGTA'
print(my_dna + my_dna_2)

ATGCGTACATCGGGTA


On the screen show the following message: 
```
My dna sequence is ATGCGTA and it has length 7
```
There is mulitple ways to do this! 

*Hint 1: you can use previously assigned variables*

*Hint 2: when using + to concatenate, you can only join the same data type together*

In [None]:
# Your code here
# ANSWER
print('My dna sequence is ' + my_dna + ' and it has length ' + str(my_dna_length)) # str will change the number 7 to a string so that you can add it to string message

My dna sequence is ATGCGTA and it has length 7


## References/Resources
[Python Documentation](https://www.python.org/doc/)

[Operators](https://docs.python.org/3/)

Topics and explanations in this notebook were inspired by [yourgenome.org](https://www.yourgenome.org).

Some exercises in this notebook were inspired by [Python for biologists](https://pythonforbiologists.com/upcoming-workshops/introduction-to-python-for-biologists-online-course-13th-24th-july-2020)





## About this Notebook
This notebook was created by Karla Godinez-Macias and Arjana Begzati for STARTneuro at UC San Diego.