# Variables in and out of functions

**Rule: Variables/data in functions should go in through arguments, come out via `return`.**

In [1]:
# Correct way to write this function
def addOne(number):
    return number + 1

In [2]:
# Works as expected, takes straight value as argument
addOne(2)

3

In [3]:
# Names of variables passed as arguments don't need to match variable name in the function definition
# Here, the value of n is assigned to the local variable 'number' inside the function

n = 4
addOne(n)

5

In [4]:
# Gives an error
# 'number' is a local variable, not accessible outside the function
print(number)

NameError: name 'number' is not defined

In [5]:
# Now we have a global variable 'number' and a local variable 'number'
# Again the local variable remains inside the function
# Existence of a global 'number' doesn't matter - there is no name conflict

number = 7
new_number = addOne(number)
print(number, new_number)

7 8


In [6]:
# WRONG way to write a function - inside the function refers to global variable 'count'
# The function also fails to use the argument passed to it

count = 0

def addOne_wrong(number):
    return count + 1
addOne(count)

1

In [7]:
# Function produces an unexpected answer
c = 10
addOne_wrong(c)

1

## Passing lists as function arguments

It gets confusing fast! Lists exist as specific objects in your computers memory. Confusingly, a single list object can have multiple names, including a global name and a local name.

In [8]:
# Lists work differently inside functions
# You can't use this function to create a new list with appended value
# In this case, the same list gets two names: a global name (my_list) and a local name (a_list)
# But it's the same list.


def run_append(a_list, an_item):
    return a_list.append(an_item)

my_list = [1, 2, 3, 4] # Create a list, assign to a global variable

new_list = run_append(my_list, 5)
print(new_list, my_list) # 'new_list' is just an empty list, while my_list has been changed


# print(a_list) # This would raise an error - 'a_list' is a local variable name

None [1, 2, 3, 4, 5]


In [9]:
# If you don't want to change the original list, make a copy inside the function

def run_append_correct(a_list, an_item):
    another_list = list(a_list) # first make a copy of the list
    another_list.append(an_item) # now run append on the copy
    return another_list # return the copy

my_list = [1, 2, 3, 4]
new_list = run_append(my_list, 5)
print(new_list, my_list) # works as expected

None [1, 2, 3, 4, 5]


In [10]:
# If lists can be both global and local, why doesn't this work?
# The function creates a list, but that list only has a local name, is thus inaccessible outside the function

def create_list(item1, item2):
    local_list = [item1, item2]

create_list(1, 2)
print(local_list) # raises an error

NameError: name 'local_list' is not defined

# Coding strategies:

## 1) Decide what your input and output are

Write code to translate DNA into protein. Input is a DNA sequence (as a string), output is protein sequences (as a string).

Example from ps4/2 Histograms: You task is to produce a histogram of gene lengths of genes in the human genome.

**Input**: Gene lengths in bp - more specifically, a list of integers representing gene lengths in bp

**Output**: A historgram - a set of bins with reasonable boundaries showing number of occurrences of data points within boundaries. More specifically, a list of integers representing bin counts

## 2) Outline steps to get there/break into tasks:

1. Assign DNA sequence to variable.

2. Move through the string three bases at a time

3. For each codon, look up amino acid symbol and add to protein sequence.

4. End when we reach a stop codon.

Tasks: 1) Determine bin boundaries/widths. 2) Count number of genes that fall in each bin.

## 3) Work inside out: small steps/functions to larger ones


Extract codons from a string
    - Use a for loop, slices to get three bases at a time
    - stop the for loop when you encounter a stop codon

Look up codons in a genetic code dictionary and add the amino acid symbol.

    - write a genetic code dictionary
    - look up entries using extracted codon
    - append/concatenate amino acid symbol to string.

## Pseudocode

Pseudocode is a little stricter - write out code in plain language, but arrange it roughly how you would lay out your code, including indentation. 

**Pseudocode for opening files**

```
Create empty lists for gene names, lengths, exon counts, and gene types 

For each line in file:
    Strip the trailing newline
    Split the line into list elements at the tabs & assign to variables
    Calculate transcript length
    Append name, length, and gene type to lists while converting the values to integers as needed
```

**function to translate dna**

```
def translate_dna( dna sequence):

    define a genetic code dictionary
    initialize empty string for protein sequence
    for every three bases in the dna sequence:
        look up the corresponding amino acid using the genetic code
        append that amino acid to protein sequence
        
    return protein sequence
        
```

## Create placeholder functions with `pass`
I forgot to mention this in class, but **DELETE** `pass` when your function is complete, otherwise your function may not do anything. 

In [11]:
# Sometimes its easier to list out the functions you'll need, then fill them in later.
# 'pass' will let you create incomplete function definitions that don't raise an error when you run the cell.
# DELETE the pass when the function is complete.


def mean(values):
    pass

def std(values):
    pass

def draw_random_sample(mean, sd):
    pass

def compare_sample_means(sample1, sample2):
    pass

## Use small examples to test your code! And comment your code!!

In the genetic code/translate dna function, we first wrote and tested code to grab codons from dna sequence. Once that was working, we used it in a function.

# Rolling your own data structures

1. So far, we've been using simple lists to hold our data. For example, we read in a file of data with rows and columns, save each column as its own list. This makes sense when each column is a fundamentally different thing.

2. But sometimes we want more complex data structures. Sometimes with time series data, it makes more sense to keep the rows together. Or you may have a gene network, useful to represent as a dictionary of lists - each entry a gene, with a list of its interacting neighbors.

3. Often we don't need complex data structures, simple ones will do. In this class, many of our first examples could be implemented just as easily as basic data types. But the point here is to practice *building* your own complex data structures from simple elements - a list of dictionaries of lists, etc. Think of something like an address book - a dictionary of lists is a good way to hold multiple types of information about each dictionary entry.

In [12]:
# Making a list of lists is easy - just make a list of your elements
# For example, three data points with xyz coordinates:

xyz = [[10, 0, 2], [3, 8, 7], [6, 5, 3]]

In [13]:
xyz # look at the output, bracket and comma structure show it's a list of lists

[[10, 0, 2], [3, 8, 7], [6, 5, 3]]

In [14]:
xyz[0] # first element in the list xyz, which is itself a list

[10, 0, 2]

In [15]:
xyz[0][1] # read left to right - result of first indexing is a list, to which second index is applied

0

In [16]:
# How do you get 'columns'? NOT like this:
xyz[:][2]

# Why is this wrong? The slice [:] just returns a list of lists, the second index grabs the third list

[6, 5, 3]

In [17]:
# List comprehensions to access columns - builds a list of y values
# To access the middle (y) coordinates:

y = [row[1] for row in xyz]

In [18]:
y

[0, 8, 5]

In [19]:
# The one-liner list comprehension above is identical to this for loop:

y = []
for row in xyz:
    y.append(row[1])
