## Learning objectives


1. Understanding *print* syntax

2. Using basic Python data types and assignment

3. The basics of lists

4. Understanding mutability

5. If / else statemens and for loops

6. Interacting with files in Python

---




## Printing with Python


Let's start with something simple. The print function allows us to send information to the terminal as standard output. That means that we can use it for giving information to the user or to downstream programs in our pipeline. Let's see what it does.




In [0]:
print('Hello world')

What happens if we want to print something other than a string? Let's see what happends when we print an integer. What about a float?




In [0]:
print(1)
print(1.0 / 3.0)

Python can render all of its basic data types and structures in string format. In addition, any python class that has the special member function '__str__'. So, let's see what a printed list looks like.




In [0]:
a = ['Hello', 'world']
print(a)

Now, what happens when we have more than one thing we want to print? The print function can handle that as well. Any number of objects can be passed to print.




In [0]:
print('Hello', 'world')

What if we want to combine the multiple elements into a single string before printing it? We have a few options. Strings can be combined using addition.




In [0]:
print('Hello' + ' world')

We can also use something called string formatting. String formatting comes in two varieties, old and new styles. The old variety requires us to know exactly what kind of variable is being substituted into the string.




In [0]:
print('Hello %s' % ('world'))

The second string formatting approach uses placeholders that specify either implicitly (in order) or explicitly which value from the list of arguments to substitute in.




In [0]:
print('{} {}'.format('Hello', 'world'))
print('{0} {1}'.format('Hello', 'world'))

We can also reuse arguments with this second approach.




In [0]:
print('He{0}{0}{1} w{1}r{0}d'.format('l', 'o'))

Finally, we can direct the output of the print function to different destinations. For example, we can tell it to go to the standard error buffer, a separate way to print to the terminal that isn't read as standard input, usually used for printing error and status messages.




In [0]:
import sys

print('Hello world', file=sys.stderr)

We can also print to a filestream, which we'll talk about a little later.




In [0]:
fs = open('hello_world.txt', 'w') # Don't worry about what this means yet
print('Hello world', file=fs)
fs.close()

You should now have a file called 'hello_world.txt' containing that text.


### Excercise:


Now, given the following variables, try printing the 'do wa ditty ditty dum ditty do' with the format method. Can you do it reusing arguments?




In [0]:
do = 'do'
wa = 'wa'
ditty = 'ditty'
dum = 'dum'

Your answer:




In [0]:
print("{0} {1} {2} {2} {3} {2} {0}".format(do, wa, ditty, dum))

## Looking at basic Python data types


Let's take a look at the basic Python data types.




In [0]:
i = 7

i_as_s = str( i )

f = 5.689

s_as_r = float( "5.689" )

# String
s = "This is a string"

# Boolean
truthy = True
falsy = False

print(i, type(i))
print(i_as_s, type(i_as_s))
print(f, type(f))
print(s_as_r, type(s_as_r))
print(s, type(s))
print(truthy, type(truthy))
print(falsy, type(falsy))

One particularly handy string method is `split`. It breaks the string up based on whatever argument is passed and returns a list with all of the resulting substrings. With no argument, it defaults to splitting on all tabs and spaces.




In [0]:
s = 'our test string'
print(s.split())
print(s.split('t'))

The reverse of this is `join`. This puts the string inbetween all elements in a list of strings.




In [0]:
s = ' '
L = ['A', 'B', 'C']
print(s.join(L))
s = '_'
print(s.join(L))

### Excercise:


What happens we you combine different variable types? What type do you get? Use the function ```type``` to see.




In [0]:
# e.g. print(type(1+1))
print(type(1+1.0))
print(type(1+True))
print(type(1.0+True))

## Altering variables in place


When performing operations on variables, we can put the results in a new variable, but it is often much more useful to alter an existing variable. Let's start with a simple example of a counter.




In [0]:
# Let's initialize our counter with a value of zero
counter = 0

# Let's perform a loop 5 times and count the loops
for i in range(5):
    # We can increment our counter this way:
    # counter = counter + 1
    # but an easier way is to implicitly include the 'counter +' portion like so
    counter += 1
    # This is called an 'assignment operator'

    print(counter)

There are a variety of operators that can use assignment operations:


- +=  Addition
- -=  Subtraction
- *=  Multiplication
- /=  Division
- **=  Raising the variable to the power of a value
- %=  Computes the modulus of the variable and a value
- //=  Computes floor (integer) division


### Excercise:


Can you perfom the following equation with one line per operation using only assignment operators?




In [0]:
# ((5 + 10) / 3) ** 2
a = 0
a += 10
a += 5
a /= 3
a **= 2

## Lists


Lists can contain any kind of data object and each entry in a list can be different. However, the convention is to use a single object type per list. Once created, lists can by altered, changing, adding, or deleting entries.


Create a list from data




In [0]:
L = [1, 2, 3]

Add an entry




In [0]:
L.append(4)
print(L)

Add two lists




In [0]:
L2 = L + [5, 6]
print(L2)

In order to access data in a list, we can tell it what *index*, or list position, we want to see. As with all objects in Python, the first entry in a list has an index of zero. So, the last entry is at index length(list) - 1, of ```len(L) - 1```.


We can also use special syntax in indexing lists. To get a range of values, we can use a colon to indicate the first position through the first excluded position. Thus, ```L[1:3]``` gives us list entries 1 and 2. We can also use implicit values in specifying ranges. If we leave out ths start index, it is assumed to be zero. If we leave out the stop index, it is assumed to be the length of the list. So ```L[:3]``` would return the first three values from the list. When a range is used to index a list, a list of the requested entries is returned. This indexing is the same as is used for strings.


Change an entry




In [0]:
L[1] = 7
print(L)

Remove an entry




In [0]:
del L[0]
print(L)

Pull an entry out of the list




In [0]:
v = L.pop(0)
print(L)
print(v)

We can also delete a specific value from the list. This finds the first time the value appears and removes it.




In [0]:
L.remove(3)
print(L)

If the value doesn't exists, we get an error




In [0]:
L.remove(0)

### Excercise:


Given the following grocery list, can you make items into the appropriate fruits, vegetables, and meats lists? Can you do this using ```pop```? Indexing and list addition?




In [0]:
groceries = ['apple', 'pear', 'brocolli', 'lettuce', 'watermelon', 'turkey', 'beef', 'carrots']
fruits = []
vegetables = []
meats = []

## Your code
fruits.append(groceries.pop(0))
fruits.append(groceries.pop(0))
vegetables.append(groceries.pop(0))
vegetables.append(groceries.pop(0))
fruits.append(groceries.pop(0))
meats.append(groceries.pop(0))
meats.append(groceries.pop(0))
vegetables.append(groceries.pop(0))
print('Fruits:', fruits)
print('Vegetables:', vegetables)
print('Meats:', meats)

# or

groceries = ['apple', 'pear', 'brocolli', 'lettuce', 'watermelon', 'turkey', 'beef', 'carrots']
fruits = []
vegetables = []
meats = []

fruits += groceries[:2]
fruits += groceries[4:5]
vegetables += groceries[2:4]
vegetables += groceries[7:8]
meats += groceries[5:7]
print('Fruits:', fruits)
print('Vegetables:', vegetables)
print('Meats:', meats)

## Mutability


Let's take a quick look at mutability.


Mutability just means that we can change the value of a data object *in place*. Of all of the data objects we've talked about so far, only lists are mutable. When we assign a new value to a variable that already exists, we aren't changing its value, we're actually creating a new variable and giving it the old name. This is especially clear when we replace an old value with one that takes up a different amount of memory.


![](./fig/mutability.png)




In [0]:
a = 1
a = 2
print(a)

In [0]:
a = 'a'
a = 'b'
print(a)

Let's see what happens when we try to assign a different letter in a string




In [0]:
a = 'abc'
print(a[1])

a[1] = 'd'
print(a)

This is because we're trying to overwrite only part of the string, rather than replacing it. If we want to do this replacement, we'll need to create a new string.




In [0]:
a = 'abc'
a = a[:1] + 'd' + a[2:]
print(a)

## Conditional statements


Conditional statements give you the ability to adapt your code to a variety of situtations depending on what is encountered. Conditional statements take the form of *if else* statements. An *if* statement can handle anything that can be evaluated to true or false.




In [0]:
# The '==' means 'is equal to'. '=' means 'assign this value to'. Don't get them mixed up!
if 1 == 1:
    print('true')
else:
    print('false')

For example, all integers evaluate to true except zero. Same for all floats. All strings are true except for empty strings. All lists are true except for empty lists. To see whether something will be true or false, you can simply convert it into a boolean and look at its value.




In [0]:
print(1, bool(1))

print(0, bool(0))

print(1.0, bool(1.0))

print(0.0, bool(0.0))

print('abc', bool('abc'))

print('', bool(''))

print(['a', 'b', 'c'], bool(['a', 'b', 'c']))

print([], bool([]))

If we have multiple things to test, we can create nested if statments.




In [0]:
i = 5
if i > 0:
    if i < 10:
        print('true')

However, we have an easier way to do this. There are logical operators, *and*, *or*, and *xor* (*&*, *|*, and *^*, respectively)




In [0]:
i = 5
if i > 0 and i < 10:
    print('true')

If we want to test multiple conditions sequentially, we can use ```else if``` or better yet, ```elif```.




In [0]:
i = 15
if i < 0:
    print('small')
elif i > 10:
    print('large')
else:
    print('medium')

### Exercise:


Can you create a conditional statement that prints 'true' for even numbers between 2 and 20, inclusive?




In [0]:
# Hint: you can use integer division for an easy odd/even test
i = 16
if i >= 2 and i <= 20 and i % 2 == 0:
    print('true')

# or

print(i >= 2 and i <= 20 and i % 2 == 0)

## For loops


There are lots of times when we need to cycle through information, such as going through items in a list or numbers in a range. A *for* loop acts on something called an *iterator*. An *iterator* will return information each time it is called, such as in a *for* loop. All you need to know about iterators now is that each time information is requested from them, they will pass the next value until they reach the end of their queue of information. While many things can act as an iterator, we are going to look at two kinds of iterators, a list, and *range*.




In [0]:
L = ['a', 'b', 'c']

# As we go through the list, each value will be placed in 'i'
for i in L:
    print(i)

*range* creates an iterator that can step by some integer value from a start to finish value. If we give it a single value, start defaults to zero, step defaults to one, and the value is the end. Two values are taken as the start and stop, and three are start, stop, and step.




In [0]:
for i in range(5):
    print(i)

In [0]:
for i in range(1, 5):
    print(i)

In [0]:
for i in range(1, 6, 2):
    print(i)

### Exercise:


Can you write a for loop that prints numbers from 9 to 0?




In [0]:
for i in range(10)[::-1]:
    print(i)

# or

for i in range(9, -1, -1):
    print(i)

## While loops


While we're at it, let's look at another kind of loop, the `while` loop. This is a loop that executes until a conditional statement becomes False. Be careful, this is an easy way to get your code stuck in an infinite loop. Let's look at a simple way to use this loop.




In [0]:
i = 0
while i < 10:
    print(i)
    i += 1

Before each execution of the code in the loop, python checks if the conditional statement is still True. It's possible that the loop would never execute if the statement was false to begin with.


It's also possible to break out of loops, both `for` and `while`, before they're finished. The `break` command exits a loop immediately. So, it you have some make or break condition, so to speak, you can jump out of your loop. Like if you only want to read the first several lines of a file, for example.




In [0]:
i = 0
for line in range(100): #let's pretend this range function is lines of our file
    print(line)
    if i >= 10:
        break
    i += 1

We can also skip the remaining code of the loop and go back to the beginning as if we had finished the current pass through the loop with `continue`. Again, this works for both kinds of loops. This is useful to avoid lots of nested if statements or heavily indented code from being stuck in if statements. For example, skipping a file header.




In [0]:
for i in range(20):
    if i < 10:
        continue
    print(i)

### Exercise


Add conditional statements into this code using `continue` and `break` to only print out numbers 10-20.




In [0]:
i = 0
while i < 100:
    print(i)
    i += 1

## Reading files


Let's put these types together and write a simple program to read the first line of a text file. To do this, we will need a way to interact with files outside of Python. The most basic way of doing this is the 'file handle'. This is a data object that is created to allow us to look at a specific chunk of memory on the hard-drive.


The way we create a file handle is with the ```open``` command. This takes one required argument, the name of the file we want to interact with, and has several options arguments of which we are only going to learn about one right now, the 'mode' argument. Mode tells the interpreter how we want to interact with the file and will default to 'read', meaning we can view the file but cannot change anything about it.




In [0]:
f = open('data/t_data.ctab')
print(f)

The file handle we've created has several methods (functions built in to a data class). One is the ```readline``` method, which returns all of the text in the file up to the first newline character.




In [0]:
l = f.readline()
print(l)

This advances the file handle to looking at the start of the next line of text in the file, so each time we call ```readline``` we will get the next line of text. When we're finished with our file handle, we can call the ```close``` method (although this will occur automatically when your script ends).


We can also use the ```write``` method to add data to the file.




In [0]:
f.write('test')

Now why didn't that work? If you recall, we did not specify a value for 'mode' so it defaulted to 'read'. If we use ```mode='write'``` or ```mode='w'```, this will create a new empty file to write to. But be careful. If the file already exists, it will be overwritten. We could also use ```mode='append'```, which will create a new file if doesn't exist, but append new data on the end of the file if it already exists. When writing, we need to be sure to close the file handle to properly write the fle.




In [0]:
# First let's close our file handle
f.close()

# Now let's open a new file
f = open('test.txt', mode='w')
f.write('test')

# Alternatively, you could use a print command
# print('test', file=f)

f.close()

We should now have a file containing our text in it.


### Exercise:


Can you write code to open the file you just created and print its contents?




In [0]:
for line in open('test.txt'):
    print(line, end='') # the 'end' argument prevents appending a newline character on the print output

## A script to analyze a file


Let's put these new skills into action and write a simple script to analyze some trait of a text file.


Ideally we would write this script to take in data from standard input and write to standard output so it could work in a pipeline. We would do this by using the standard input file handle just like any other file.




In [0]:
import sys

f = sys.stdin

Next, we need some way to look at each line in turn. We've already seen the ```readline``` method.  File handles can be used as iterators and act as if each request is a call of ```readline```. In a for loop when the end of the file is reached, the for loop finishes.




In [0]:
# Because we can't stream data in a Jupyter notebook, we'll open a file instead
f = open('data/t_data.ctab')

# This will iterate over each line of the file
for line in f:
    # Next, we want to skip the header line, which we know starts with 't_id'
    if line.startswith('t_id'):
        continue

    # Next, we want to create a list out of the line, splitting it into individual fields
    # We want to take off the newline character before splitting the line
    fields = line.rstrip('\n\r').split()
    
    # The start and end coordinates of each transcript are in columns 3 and 4.
    # Let's find the size of each transcript
    print(int(fields[4]) - int(fields[3]))

f.close()

Note that we needed to convert the coordinates from fields into integers before performing a mathematical operation on them. Using the print statement, we sent the results to standard output, making it suitable for use in a pipeline.


### Exercise:


Can you find the maximum number of exons in data/t_data.ctab?




In [0]:
max_exons = 0
for line in open('.data/t_data.ctab'):
    if line.startswith('t_id'):
        continue
    exons = int(line.split('\t')[6])
    if exons > max_exons:
        max_exons = exons
print(max_exons)