# Python primer II, GAIN Summer 2021, day 3. 

## Topics to cover
- lists
- conditional statements: `if`, `else`, and `elif`
- `for` loops
- working with files (input/output)
- introduction to libraries (`import`, `sys`)

## Helpful materials:
- Haddock and Dunn chapter 9 and Chapter 10 pdfs (Haddock_Dunn_Chap10.pdf
Haddock_Dunn_Chap9.pdf)
- [Python for biologists](https://pythonforbiologists.com/introduction) tutorial sections:

  - [printing and manipulating text](https://pythonforbiologists.com/printing-and-manipulating-text)
  - [lists and loops](https://pythonforbiologists.com/lists-and-loops)
  - [working with files](https://pythonforbiologists.com/working-with-files)
<p>&nbsp;</p>

# 1. Lists
Lists store vectors of information, and you will commonly use them. The are convenient because they can be used with loops to execute the same blocks of code on each element.

Hardcoding lists within your scripts is good for learning, although once you start working with real data you will learn to build lists quickly on the fly. To build lists, we enclose the set in brackets. The contents of a list can be a mixture of data types, although you will usually work with lists of only one type (strings, integers, floats)


In [2]:
NameList = ['Jim', 'Bob', 'Amy', 'Beth']    # list of strings
NumList = [9, 28, 18, 83, 85]   # list of integers

List elements are accessed by their indices, 0 coming before the first list element. Rather than thinking of each element as matching its index, think of the indeces as representing the boundaries between elements. (see page 159 of Haddock and Dunn for explanation)

In [3]:
List=('a', 'b', 'c', 'd', 'e')
print(List[-1])  # will print e
print(List[-3:-1])  # will print ('c', 'd') 

e
('c', 'd')


Specifying ranges of elements is done using `:`, with indices corresponding to boundaries between elements (see page 159 of Haddock and Dunn for explanation).

In [None]:
print(List[0:3])  # will print ('a', 'b', 'c')
print(List[1:4]) # will print ('b', 'c', 'd')
print(List[-3:]) # will print ('c', 'd', 'e')

Note, that information from strings can be similarly extracted. One difference is that you can not modify a portion of a string, but you can modify portions of a list.



In [None]:
Names = "TomJoshTrevor"
print(Names[0:3]) # prints 'Tom'
print(Names[7:]) # prints 'Trevor'

## Useful functions for working with lists:

`list()` translates a string into a list. This is useful because lists are easy to iterate through, e.g., by using `for`

In [None]:
NumString="1234533324555434343"
NumList=list(NumString)    

`.split` splits a string by specified delimiters. This is common when you have a line of data (tab or comma delimited) and you want to make that line into a list that can be worked wiht efficiently.

In [4]:
Temp="65,76,77,77,65,67,65,45,45,90,91,91"
List_Temp= Temp.split(",")

`.join()` "joins" elements of a list into a string. Delimiter, if used, is supplied before `.join`


In [5]:
Bases=['A', 'G', 'G', 'C', 'TTT', 'ATC']
''.join(Bases) # no delimiter, creates 'AGGCTTTATC'
','.join(Bases) # comma delimiter, creates 'A,G,G,C,TTT,ATC'
String=','.join(Bases[0:4]) # comma delimiter, sends to variable String

Note that strings and lists store information in a similar manner (See Haddock and Dunn pg 164)

`range()` generates lists of integers based on starting, ending, and interval values. The simplist use is to make a list of integers with specified start and stop points. Notice that the list will end ONE integer before the stop point.


In [6]:
RanList = list(range(0,9))   
print(RanList) # [0, 1, 2, 3, 4, 5, 6, 7, 8]

[0, 1, 2, 3, 4, 5, 6, 7, 8]


.append()` used to add elements to the end of an list


In [7]:
Breeds=['labrador', 'golden', 'flatcoat', 'chesapeake']
Breeds.append('curlycoat') #adds 'curlycoat')

Joining two lists together is very simple, just use `+`

In [11]:
List1 = ["a", "b" , "c"]
List2 = [1, 2, 3]
List3 = List1 + List2
print(List3)

['a', 'b', 'c', 1, 2, 3]


`del()` removes any specified elements from a list

In [12]:
del(Breeds[:2]) #removes the first two elements

`.reverse()` reverses a list. **Note, this function doesn't return a value, it just reverses the existing list.**


In [13]:
Bases=['A', 'G', 'G', 'T', 'T', 'T']
Bases.reverse()
print(Bases) # ['T', 'T', 'T', 'G', 'G', 'A']

['T', 'T', 'T', 'G', 'G', 'A']


Reversing strings is a bit less straight forward. The text below looks funny, but it does the job of reversing the string Seq. You might learn more about the meaning of the syntax later if you learn about 'slicing'.
<p>&nbsp;</p>

| Operators | Meaning |
|---------- | ---------- |
|==  | Equal To |
|>   | Greater Than |
|>=  | Greater Than or Equal To |
|<   | Less Than |
|<=  | Less Than or Equal To |
|!=  | Not Equal |
<p>&nbsp;</p>

Logical operators, as listed below are also useful in conditional statements.

| Operator | True if |
|---------- | ---------- |
|and  | Equal To |
|or  | Greater Than |
|not  | Greater Than or Equal To |
|(not A) or B | Less Than |
|not (A or B)| Less Than or Equal To |
<p>&nbsp;</p>

### `if` is used prior to a condition being stated, and code under `if` must be indented:


In [14]:
X = 4
if (X > 3):
    print("%d is greater than 3" % (X))

4 is greater than 3


### `elif` is used when multiple conditions follow the initial `if`

In [15]:
Y = 3
if (Y > 3):
    print("%d is greater than 3" % (Y))
elif (Y == 3):
    print("%d equals 3" % (Y))
else:
    print("%d is less than 3" % (Y))

3 equals 3


# 3. `for`

### `for` can be used with lists, dictionaries, and even strings at some points. Unlike the conditional statements above, `for` is used to loop (or iterate) through a data structure, executing the same block of code on each item. Python uses indentation in an inflexible manner (other languages often use curly brackets with indentation optional) to set apart code inside `for` loops. **Once a loop is initiated, the code within the loop must be indented.** A common error in your python code will come from incorrect indentation. 
<p>&nbsp;</p>

### Below is an example of using `for` to loop through a string. The code below should print each base in the DNA string on its own line of output.

In [16]:
DNA = "ATCGGGAAACC"
for Seq in DNA: 
    print(Seq)

A
T
C
G
G
G
A
A
A
C
C


### You will often use `for` to loop through lists. The syntax is similar to above. Lets make a list of numbers and loop through it.

In [17]:
RanList = list(range(0,100))   
for Num in RanList:
    if Num%10==0:
        print ("multiple of 10: ", Num)

multiple of 10:  0
multiple of 10:  10
multiple of 10:  20
multiple of 10:  30
multiple of 10:  40
multiple of 10:  50
multiple of 10:  60
multiple of 10:  70
multiple of 10:  80
multiple of 10:  90


Notice above we are doing three things. 
- First, we are making a list of integers from 0 through 100. 
- We are then looping through that list with `for`. Note that the statement "for Num in RanList" uses an already named list. Because you are looping through that list, you need to name each element so that it can be referred to within your loop. You can name it whatever you want, here I used "Num". 
- We then put a conditional statement to print something only if that condition is met.
- `%` is the modulo operator. It returns the *remainder* of division. So 10%10 = 0, 20%10 = 0, etc., but 5%10=5.

### While looping through data structures, we will often want to use an incrementer or counter, to keep track of how far we have gone, how far we should go, or how many times we have encountered a particular object. The `+=` tool from last week can be very helpful for this.


In [18]:
num_a = 0
num_c = 0
num_g = 0
num_t = 0
DNA_seq = "ATCGGGAAACCTTAAGCTAAA"
for base in DNA_seq:
    if (base=="A"):
        num_a += 1
    elif (base=="C"):
        num_c += 1
    elif (base=="G"):
        num_g += 1
    else:
        num_t += 1

print("There are %d A bases" % (num_a))        
print("There are %d C bases" % (num_c))        
print("There are %d G bases" % (num_g))        
print("There are %d T bases" % (num_t))        

There are 9 A bases
There are 4 C bases
There are 4 G bases
There are 4 T bases


### Here is an example of how to simply increment a counter to keep track of how far you have gone through a list. If you start with zero, you will essentially be tracking correct list indices (because the first list element is 0)


In [19]:
List=['1','2','3','4','5','6','7','8','9','10']
CTR = 0
for Num in List:
    print("List index is :", CTR)
    CTR += 1    

List index is : 0
List index is : 1
List index is : 2
List index is : 3
List index is : 4
List index is : 5
List index is : 6
List index is : 7
List index is : 8
List index is : 9


# 4. Working with Files

For almost every task you attempt with Python, you will need to 1) open and read data from existing text files; and 2) write the products of your code to new text files. Sometimes you will work with one file at a time, while other tasks will involve reading and writing to very large numbers of files. As hopefully you will see here, Python makes all of the above fairly straightforward. 
<p>&nbsp;</p>

## I. Input

Input involves several steps

- assigning the name of the file to a variable (based on its location), and opening a connection to the file (creating a file object with `open()`)
- reading the contents of the file (`.read`)
<p>&nbsp;</p>

### `open()`, along with the `r` (read) argument, can be used to open a connection (also could be called a file handle) to files stored on your computer. 

<p>&nbsp;</p>

### This can be done by 'hardcoding' the name of a file or files into your code:

If the file is in your working directory:

In [None]:
IN_file=open('data.txt', 'r')

Of course, you can also use an absolute path:


In [None]:
IN_file=open('/working/parchman/data.txt', 'r')

## ** Perhaps more usefully, file names can be processed from command line arguments. This is often advantageous in that the same script can be used to process different files or different sets of files without. Let's walk through how to do this below, while also giving you a preview of using python libraries/modules. 
<p>&nbsp;</p>

Command line arguments can be accessed from you code using `sys.argv`. Once you have imported the `sys` library, `sys.argv` will essentially be a list of command line arguments. `IMPORTANT NOTE`: the [1:] below skips the first argument, which is the script itself

In [None]:
import sys
for Arg in sys.argv[1:]:       
    print(Arg)

If you had placed just the above in a script, executed that script as below:

    $ python argtest.py Lebron AD Westbrook

You should see Lebron, AD and Westbrook printed consecutively to the screen

From here, you can see that using the `open()` function to make file objects from filenames listed in sys.argv is an efficient way to access files in your code. For most cases, this strategy will be more efficient and useful than hardcoding file names into your scripts.



    import sys
    IN = open(sys.argv[1], 'r') 
    
If you provide a file name as an argument, `sys.argv[1]`, as above, the second element of that list will be the file name (remember that list indexing in Python begins with zero). So if you ran the code below, the file `data.txt` would be opened and assigned to `IN`.

	python3 read_file.py data.txt
	
	import sys
	IN = open(sys.argv[1], 'r')

<p>&nbsp;</p>

### There are a number of ways you can read data from a file object.

**What you will most often want to do is loop over the file object to read each line one at a time from the file. In other words, we will run all of our commands on the first row of the file, then we will run all of our commands on the second row, and so on. This is memory efficient, fast, and leads to simple code:**


In [None]:
for Line in IN:
    print (Line)

**Once you start processing files one line at a time,  you will realize that line ending characters (`\n`) often get in the way, and can be most effectively dealt with by removing them right off the bat. We can use the `.strip` function to do this as below.**


In [None]:
IN_data = IN.strip("\n")

### Another way of doing this, following Haddock and Dunn:


In [None]:
IN_Name = "data.txt"
IN = open(IN_Name, 'r')
LineNumber = 0  ## setting to 0 to count lines while looping through the file. 

for Line in IN:
    Line = Line.strip('\n') #### critical, removing line ending
    print(LineNumber, ":", Line)

    LineNumber += 1 ## incrementing LineNumber to count runs through loop

IN.close() #closing IN, see below.

## II. Output

Opening a file for output, and writing to that file (as opposed to printing the output to the terminal using `print`), works similarly to the examples above and also uses `open()`. With this function, we use either the "r", "w", and "a" arguments for the `open()` function to specify read, write, or append actions. For writing output from your code, we will use "w" or "a".


In [None]:
OUT = open("outfile.txt", "w")
OUT.write("Here is the data my python code processed \n")

The `.write` function above works similarly to `print`. Hence, you can write strings of text, variables, and even  variables processed by functions. A few examples below. Two important points about `.write` in how it differs from `print`. `.write()` is picky about what it will write out, preferring strings, and requiring some specific notation (examples below). Also, while `print` automatically appends line endings to statements, `.write` does not. Thus, you will need to add line endings.


In [None]:
OUT.write("Here is the data my python code processed \n") # string of text, note line ending added

OUT.write("Data: %d %f" % (VarA, VarB)) # string and variables
OUT2.write("Data %s \n" % (Line))
OUT3.write("%d\n" % (VarName))

Strings can be written using just a variable name, but Python doesnt like lists

In [None]:
OUT4.write(Name + "\n")

Finally, note that you can control output with `print` as well by using unix redirect (`>`), if you only need to send output to one file.

    $ python myscript.py > output.txt

## III. Closing file connections 

It may take some experience before you realize that closing file connections when you are done with them is good form. While learning and writing straightforward scripts, you may not encounter problems when you fail to close file handles, but that will change as you ramp up what you are doing. Nonetheless, get in the habit of doing this now, and try not to forget. It is simple using the `.close()` function. While you are learning python, you will commonly want to these commands towards the end of your scripts.
