# Resources for learning Python:
1. Introducing [PYTHON TUTOR](https://pythontutor.com/)
    - particularly helpful for investigating how variables are stored in memory
<br></br>2. We will work through [Rosalind problems](https://rosalind.info/problems/tree-view/) and others in class
    - The benefit of using Rosalind problems: if you are stuck there are many resources, and they give a preamble for each question.
    - Plus the structure of the questions is that they build on each other (as shown in the tree diagram above) and are relevant to bioinformatics:

        1. Introductory Problem - not genomics, but useful
        2. String Problem - not genomics
        3. Rosalind DNA problem
        4. Rosalind RNA problem

<br>

I often give extra material in these notebooks that will not be covered during class time, but will be similar to other problems that are covered. I give out the extra material because I know that everyone's comfort level with certain concepts is different. Depending on how much extra material I have created, it will be placed in separate notebooks or cells (the end of this notebook) that are labeled "extra material" and you do not have to work through them unless you want to do them!

# Summary of this notebook: 
* Python has objects, including variables. EVERYTHING IS AN OBJECT IN PYTHON.
* Currently, we are treating Python as a procedural language - it is evaluated top to bottom.
* Python has built-in data types. These include:

   * __Numeric__: integers, float, complex
   * __Text:__ str
   * __Sequence:__ list, tuple, range
   * __Mapping:__ dict
   * __Nonetype__
   * __Boolean__ <br></br>

* Each of these data types has associated **methods** that can be used to manipute that particular type of data (also: **attributes**, but we'll worry about those later)   
* Strings as a text data type
  * __why you should be outraged that strings have methods__, but you will also accept it.<br></br>

* Poke into lists a bit just to demonstrate a comparison

# Quick facts about Python
* Python is an interpreted language that executes one line at a time (procedural); it can also be used as OOP or act as a functional language.
* Uses __WHITESPACE__ as part of syntax. You must use 4 spaces for indentation. 
* Variables* Everything in python is an object with dedicated methods (and attributes)
* Example Data types: Integers, floats, STRINGS, lists
* Commenting your code
* Slicing
* There are conventions for how to 'properly' write out Python Code. At the very beginning of your Python journey, I don't want you to get overwhelmed by small details BUT it is always good practice to start your programming by following conventions. Therefore, here are the [Python conventions](https://www.python.org/dev/peps/pep-0008/).
* [Two states of every programmer](https://www.reddit.com/r/ProgrammerHumor/comments/9bbect/what_are_your_current_state_and_dont_lie_loool/)


# Why Python? 
---
### "Hello, JAX!" in three common programming languages: 
#### in C: 
```
#include <stdio.h>
int main(int argc, char ** argv)
{
    printf("Hello, JAX!|n"):
}
```
#### in Java:
```
public class Hello
{
    public static void main(String argv[])
    {
        System.out.printlin("Hello, JAX!");
    }
}
```
#### in Python:
```
print("Hello, JAX!")
```

# Major Programming Components
---
There are **five** major components that we will need to do when we are programming:
  
1. Data manipulation - variable assignment, data types, functions, methods, math etc.
2. User interaction (Input/Output) 
3. Repeats (repeats commands while condition is true) 
4. Decisions (Loops)
5. libraries/modules/encapsulation 

How are we going to use Python to accomplish the five goals above? Let's look at Python **Syntax** for some clues. 

# Aside (if time): What is Pseudocode? 
-----------------------
## A great way to break down problems

Pseudocode can be highly sophisticated (and is used as a design tool when building programs that have multiple parts and multiple contributors, sometimes across multiple languages), or it can be used as a way to focus on the structure of the answer to the problem. 

Check out this [Psuedocode 101 resource](https://www.povertyactionlab.org/sites/default/files/research-resources/rr_datacleaning_Pseudocode.pdf) for more detailed information. 

We haven't learned how to code yet, but you should reflect the purpose of pseudocode (the five steps above) in your usage which will likely include: 

* Breakdown the problem into smaller chunks
* Explain how you would solve the individual smaller chunks
* If possible, include as many coding details as you know 
    *  If relevant: explain *why* you think your code is broken
 
An example (taken from the above linked document) is the following that will be similar to a question we will see later on: 

__Computing a Quiz Average: Pseudo-code a routine to calculate your quiz average.__

Get number of quizzes as a parameter
1. Initialize "running_sum_total" and "count" variables to 0
2. while count < number of quizzes
    * 2.1 get quiz grade
    * 2.2 add quiz grade to "running_sum_total"
    * 2.3 increment count
3. compute average of sum over number of quizzes
4. return average

# NOW WE ACTUALLY CODE!
 ____________________________________
# Variables and assignment

* What is a variable? 
  * **name that indicates where in memory we 'parked' a value**
  * any data type: integers, floats, strings, lists, dictionaries <br></br>
      
* Variable names have restrictions in Python
  * They can contain letters, underscores, numbers
  * They are case sensitive
  * They cannot begin with %, ^ ,__,_
      * Beginning a variable name with an underscore has a special meaning in Python and shouldn't be done (we'll see this much later)
  * They cannot begin with a number <br></br>

* How do we assign in Python? 
  * assignment operator is = <br></br>

* ***With assignment, you are creating a reference to an object***
  - Everything in Python is an **object** and you don't need to define or ''declare'' variables in advance - this is different than some other languages and can be have crucial implications for data manipulation.
  - **Variables serve as pointers to those objects**. Whenever you assign a value to a variable, you are pointing to something in memory. <br></br>
        
Let's test this with the following code. As we work through code examples, I strongly encourage you to predict what you think will happen and then see if your prediction was correct. Crucially, if it wasn't correct, why wasn't it?. USE [PYTHON TUTOR](https://pythontutor.com/) TO HELP YOU WITH UNDERSTANDING WHAT IS HAPPENING. 

__We'll use the built-in print() function__ to [help us debug](https://www.reddit.com/r/ProgrammerHumor/comments/9b6e3v/give_this_movie_a_name/).

In [None]:
# EXTRA MATERIAL/PRACTICE

# -------------------------------------------
# Hashed out lines are not interpreted by Python - they are ignored. So you can use them to write instructions or relevant information 
# to your future self or anyone else who uses your script. 
# --------------------------------------------

# Examples of variables and how/where they are stored. Copy and paste this cell into "python tutor" and we can see line
# by line what is happening with memory
p=100
print("Here is the initial value of p: ", p)
n=p
print("Here is the value of n: ", n)
# print(type(p))

# We can see where the value is stored using id()
print('---location of the two variables-----')
print(id(p))
print(id(n))
print('-------------------------------------')

#TAKE HOME: change the value associated with p and we're changing where p 'points' in memory
p=50
print("Here is the new value of p: ", p, "and it is located here:", id(p))
print("We have changed the value of p, so now what is the value of n: ", n, "and it is located here: ", id(n))
# we can see where the value is stored the two variables no longer point to the same memory slot
# ----------

# Common programming trick: If we want to preserve a value while still allowing two variables to point at the same memory slot, 
# introduce a third, intermediate variable to temporarily hold the value.
print('-------------------------------------')
q=n
n=p
print("Now the value of n is:", n)
# confirm memory slots
print('---location of the p, n, and q variables -----')
print(id(p))
print(id(n))
print(id(q))

## Variables:
* Variables -- where we store a value. A good analogy is a label for your value.
* Variables have types: numeric, floating point, string
     - Python is considered a 'strongly typed' language so if you type the following into a cell, it won't convert the string (some languages like C would implicitly convert it); it will give you an error instead:

In [None]:
m='5'
n=5
print(m+n)

# Can we fix the above error? How? 

### [Rosalind Problem](https://rosalind.info/problems/ini2/)
**Problem 2B1**

__Given:__ Two positive integers a and b, each less than 1000.

__Return:__ The integer corresponding to the square of the hypotenuse of the right triangle whose legs have lengths a and b.

In [None]:
# EXTRA MATERIAL for practice similar to the Rosalind problem above
# -------------------

# You can also do simple math in python: 
WaterOxygens=1
WaterHydrogens = 2
OxygenMass = 15.9994 # standard atomic weight
HydrogenMass = 1.00794

#calculating the mass of a water molecule
#H20= 2*H*MassOfHydrogen +1*O*MassOfOxygen
WaterMass = WaterOxygens*OxygenMass+WaterHydrogens*HydrogenMass
print("The Molecular weight of a molecule of Water is: ",WaterMass)

# ----------

# there is also division and moduli with and without floating numbers
a= 4.0
b = 3.0
print("a/b = ", round(a/b,1))

# what does % mean in the context of division? 
print("a%b = ", a%b)
# it gives the remainder of the division. This is useful for when we want even or odd numbers etc. Keep an eye out for this in loops!

print(" ~~~~~~~")

# ----------

# We can convert floating points to integers and back. 
print(" 4/3 = ", int(a)/int(b))
print(" 4%3 = ", int(a)%int(b))
# have we changed the value of 'a' and 'b'? 

# how about now? 
a = 4/2
print("a/(b+b) = ", a/(b+b))
# there are rules that prioritize floating points over integer in mixed equations. 

#??round

# Strings!
## Facts about strings (data type)

* We care a lot about strings in Biology because we care a lot about _**sequence data**_ like this:

    **5’-ATGCAGTACCTTA-3**

* Strings are **immutable**. What do you think that means?
    * Despite the fact that strings are immutable, they have methods! Grrrr. This should not be true (why not?), but it is.
    * Other **immutable** methods such as integers and floats also have methods. 

### Methods
* variable_name.method
    * variable_name.tab(don't type tab, just press the tab key and a list of methods associated with the data type of that variable will appear) 

### Indexing
* Like most computer programming languages (with a few exceptions, like R and Matlab), we begin counting at 0 (not 1).
    * This is true for strings as well. So if we wanted to know what letter was at position X, we could find that out (see example in next cell).
    * Slicing relies on counting elements. Lower bound included, upper bound excluded. 

In [1]:
# DON'T RUN THIS CODE YET - FIRST PREDICT WHAT YOU THINK IT WILL DO

# Assignment operator works like this (oh, and look! I am commenting my code): 
my_DNA='ATGCTGA'

#print(id(my_DNA))
# the string ATGCGTA is now stored in memory as the variable my_DNA and can be accessed by typing my_DNA like so:


print(my_DNA)
print("*********")
# note that what we did above was use a built-in function called **print()** and provided it with an **argument** called my_DNA

# we can ask about individual elements in the string. We'll discuss slicing next. 

# what do we expect to be printed out by the following? 

print(my_DNA[5])
print("___________")
print(my_DNA[3])

# What does immutable mean? It is a bit tricky. 
# The following is allowed:
# We can change the value of the variable my_DNA that we assigned above like so:
my_DNA ='ATG'
#print(id(my_DNA)) # is this located at the same place? 

# The following will cause an exception to be raised.
# Why? 
# Because you can't modify the individual elements of the string itself once they have been set but you CAN modify the variable
# pointing to the string. In simple terms: immutable objects can't be changed once they are set. However **the variable pointing to them
# can be changed so that it points to a different value(s)**

#my_DNA[2]="T"

print(my_DNA)

ATGCTGA
*********
G
___________
C
ATG


### [Rosalind Problem](https://rosalind.info/problems/dna/)
**Problem 2B2**

A string is simply an ordered collection of symbols selected from some alphabet and formed into a word; the length of a string is the number of symbols that it contains. An example of a length 21 DNA string (whose alphabet contains the symbols 'A', 'C', 'G', and 'T') is **ATGCTTCAGAAAGGTCTTACG**.

__Given:__ A DNA string `s` of length at most 1000 nt.

__Return:__ Four integers (separated by spaces) counting the respective number of times that the symbols 'A', 'C', 'G', and 'T' occur in `s`.

Think about how you would convert one variable into another? This is a common problem and there is a common solution...

### [Rosalind Problem](https://rosalind.info/problems/rna/)
**Problem 2B3**

An RNA string is a string formed from the alphabet containing 'A', 'C', 'G', and 'U'.
Given a DNA string `t` corresponding to a coding strand, its transcribed RNA string `u` is formed by replacing all occurrences of 'T' in t with 'U' in `u`.

__Given:__ A DNA string `t` having length at most 1000 nt.

__Return:__ The transcribed RNA string of `t`.

## Slicing Strings
* To extract a part of a string, we use []
    * _remember that we start counting at 0 (not 1)_
* The arguments in the slice give you control over how the variable is sliced.
    * [lower:upper:increment], [l:u:i] 
    * [lower bound, inclusive:upper bound, exclusive]
* Important methods when slicing strings include:
    * .count()
    * .find() <-- takes single string as argument and then returns the location, the index, where it first occurs. 

## Concatenation 
What happens for the following: 
1. 5+5
2. 5.0+5
3. int(5.0)+5
4. "5"+"5"

This is overriding, a fancy term for the fact that Python treats different data types differently when they are passed as arguments to the same operator (in this case +)

In [None]:
print("5.0+5 =",5.0+5)
print("five+five =","5"+str(5))
print('5+5=',int("5")+5)
# the chant: Everything in Python is an object!
print(my_DNA*5)
print(my_DNA+my_DNA)

## String Methods
There are **methods** and **attributes** that are associated with strings. 

You can only use methods (functions) with the data type that they are associated with. String methods manipulate the string data type. You can check Python's documentation for the [full list of string methods](https://docs.python.org/3.6/library/stdtypes.html#string-methods).

Or, even more straightforward, you can use the tab after the '.' to discover the methods associated with a type of data.

Particularly useful methods include: **find, lower, upper, replace, rstrip, strip, find, count**

Tangentially, we can **turn other types of data into strings** by using the str() function. This is a feature that we will use really really often and usually in the circumstance of wanting to print a concatenated version of output.

The fact that **STRINGS ARE IMMUTABLE BUT YET THEY HAVE METHODS** will make you feel uncomfortable. Sorry. 

# EXTRA MATERIAL 
This execise is similar to Rosalind Questions that we have seen today.

Here is a sequence of DNA: ATGCAGTCCAGCG
1. How can we determine length of the sequence? 
2. How can we determine how many As or CGs or GCs are in the sequence?
3. What is the location of the first T in the sequence?
4. What if we had two strings (second string: GTCCAGCTTTCGTT) and we wanted to concatenate them except omit the first character of each?
    - You can assume the strings will be at least of length 1 (so no strings of length 0).