# Practical Python Programming for Biologists
Author: Dr. Daniel Pass | www.CompassBioinformatics.com

---

# Functions

Functions are essential for keeping our code in reusable blocks, improving readability, and generally reducing the amount of typing and repitition. If you decided to change a peice of code, it's much more work if you have it in several different places, rather than just one reusable chunk, and also makes it less likely to have unexpected consequences when different code should be the same.

Functions allow us to keep a set of instructions under a single name and call it whenever needed.

Basically, anything we've already learned how to do can be put in a function. It's like whole mini-programs within a program.

A function is created with ```def``` and looks like built-in functions that we've already seen such as ```len()```. We also define our parameters within the parantheses ```()``` which we can then use in our function as a variable.

## A basic function

In [None]:
def function_to_check_if_pass(measurement): #starting with def -- defining a function
    if measurement > 75:
      print("Over Threshold!")

Run that code and see what happens. Not very exciting is it? That's because you've only ***defined*** the function, but you haven't called it.

Now lets acutally use the function:

In [None]:
function_to_check_if_pass(80)
function_to_check_if_pass(95)
function_to_check_if_pass(60)

Over Threshold!
Over Threshold!


As you can see, the value we put in the ```()``` becomes the variable within the functon. But we also only get two responses back. If there is no output from the function then nothing happens. Lets change that

In [None]:
def better_function_to_check_if_pass(measurement):
    if measurement > 75:
      print("Over Threshold!")
    else:
      print("Below Threshold")

In [None]:
better_function_to_check_if_pass(80)
better_function_to_check_if_pass(95)
better_function_to_check_if_pass(60)

Over Threshold!
Over Threshold!
Below Threshold


## Exercise - Function Bugfix

I think we're all bored of writing code to calculate GC%! Here's a badly made function to do our favourite task of calculating the GC content of a DNA sequence. Try to make it work to output the result for both of the given sequences, and we'll never need to write it again, as we can always reuse this function!

First write the code to call the function, and then see where the errors point you

In [None]:
def gc_content(sequence):
    gc_count = (sequence.count("G"), sequence.count("C"))
    gc_percentage = (gc_count / len(sequence)) * 100
    print(round(gc_percentage, 2))

# Test the function on these sequences
gene1 = "ATCGGTA"
gene2 = "GATTACA"


42.86
28.57


---

## Better functions

So we know how to make a function, and how to call it. But so far we have used it as a data dead-end. Information goes in to the function, but nothing comes out, which means it's hard to make useful pipelines. Here is where we can use the ```return``` function.

Using ```return``` you tell the function what data to give back to the main script. Here lets do some calculations with fish breeding tanks.

In [None]:
# Function to calculate volume from a radius
def calc_tank_volume(radius):
  volume = (3.14 * radius ** 2) * 2
  return volume

In [None]:
# Radius of tanks in metres
radius = 0.5
small_tank_volume = calc_tank_volume(radius)
print("Small Tank with radius", radius, "is", small_tank_volume, "metres cubed")

# large tank
radius = 1.5
large_tank_volume = calc_tank_volume(radius)
print("Large Tank with radius", radius, "is", large_tank_volume, "metres cubed")

Small Tank with radius 0.5 is 1.57 metres cubed
Large Tank with radius 1.5 is 14.13 metres cubed


Here we've called the function twice but returned a set value. Note here that we don't NEED to put the returned output into a variable, we could also just use the function within a print command:

In [None]:
# Same as above!
print(calc_tank_volume(7))
print(calc_tank_volume(3.5))

307.72
76.93


**Exercise:** Create a new function that returns the supplier based on the tank volume
1. Make the function that accepts a number and returns the supplier
2. Test the function with each of the list of volumes
3. **Extension:** What would happen if the volume was too large for any option? Make sure you have an output for that!
2. **Extension:** Combine the your function and the `calc_tank_volume()` function above so that you can get the output by providing the radius, not the volume

**Suppliers:**

| Size | Supplier |
|------|----------|
| 0-15m | Amazon   |
| 15-45m | Pets4U   |
| 45-900m| Seaworld |

In [None]:
# Define your function


In [None]:
# Run your tests
# volumes of tanks to buy [1.2, 17, 3.5, 26]
# radii of tanks to buy [0.2, 2.4, 19, 6]


---
Hopefully you can see how our defined functions work exactly like built-in functions such as ```sum()```, ```max()```, ```len()``` etc.

Lets make it more complex as we can pass multiple parameters to functions. How many fish can we fit in our tanks? Note how we can pass multiple variables to the function, but can choose what to return at the end.

In [None]:
def calc_fish_number(radius, fish):
  volume = (3.14 * radius ** 2) * 2

  if fish == "Guppy":
    total_fish = 17 * volume
  elif fish == "Tuna":
    total_fish = 3 * volume

  return total_fish

# Tests - Value = radius in metres & fish per m3
print("Number of Guppies managable:", calc_fish_number(0.5, "Guppy"))
print("Number of Tuna managable:", calc_fish_number(1.5, "Tuna"))

Number of Guppies managable: 26.69
Number of Tuna managable: 42.39


Ah yes, we all know that small guppies can live at 17 per m3 but Tuna only 2 per m3 (of course!)

**Default function values**

We can also give and return multiple variables, and even set default values. Note the three return variables and the three variables being assigned by running the function. Default variables are assigned in the function call with ```=```, and will be used if no variable is passed to it (see the second function call).

Try changing the variables and function calls to test the function parameters. I have split it into three code blocks to demonstrate the different parts of the code, but in reality it all follows together in one script

In [None]:
# Create function
                                         ## Note the default assigned values
def calculate_fish_population(fish_count, tank_length=1, tank_width=1, tank_height=1):
    # Do some calculations
    tank_volume = tank_length * tank_width * tank_height
    water_volume = tank_volume * 0.8  # Consider 80% usable water volume
    fish_density = fish_count / water_volume
    max_fish_capacity = water_volume * 200

    if fish_density <= 0.5:
        population_level = 'Low'
    elif fish_density <= 1.0:
        population_level = 'Medium'
    else:
        population_level = 'High'

    # Returning three outputs
    return fish_density, population_level, max_fish_capacity

In [None]:
## Our test data
fish_count = 150
tank_length = 2.2
tank_width = 0.8
tank_height = 0.5

# Print the return values
print(calculate_fish_population(fish_count, tank_length, tank_width, tank_height))

# Print the Standard tank size (using default values)
print(calculate_fish_population(fish_count))


(213.06818181818176, 'High', 140.80000000000004)
(187.5, 'High', 160.0)


Just printing the output of the function to the screen isn't very useful!

We could assign those values to a new list, or assign them separately to new variable names using the tuple labelling format:

In [None]:
# Custom tank size
density, population, capacity = calculate_fish_population(fish_count, tank_length, tank_width, tank_height)

# Output data
print("Fish Density:", round(density, 2), "fish per unit volume")
print("Population Level:", population)
print("Max Fish Capacity:", capacity, "units")

if fish_count > capacity:
  print("Too many fish for this tank")
else:
  print("Tank is appropriate size")

Fish Density: 213.07 fish per unit volume
Population Level: High
Max Fish Capacity: 140.80000000000004 units
Too many fish for this tank


---

### Better String Formatting for cleaner code

We are at a point now where we are creating and outputting multiple variables at once and it can become tiresome to create print commands that contain lots of outputs. Lets use our last function as an example. This is a confusing line to read:



In [None]:
print("Tank of size", str(tank_length) + " x " + str(tank_width) + " x " + str(tank_height), "produced capacity", int(capacity), "at fish density",  round(density, 2))

Tank of size 2.2 x 2.2 x 0.8 produced capacity 140 at fish density 213.07


Using the ```.format()``` method we can make that cleaner by putting the variables at the end:

In [None]:
print("Tank of size {} x {} x {} produced capacity {} at fish density {}".format(tank_length, tank_width, tank_height, int(capacity), round(density,2)))

Tank of size 2.2 x 0.8 x 0.5 produced capacity 140 at fish density 213.07


This depends on all the variables being in the correct order, but you can also tag them with variable names and then call them in any order:

In [None]:
print("Capacity is {cap} at fish density {den} | Note: Tank size {l} x {w} x {h} \nDONT FORGET: Height is {h}\n".format(l=tank_length, w=tank_width, h=tank_height, cap=int(capacity), den=round(density,2)))

Capacity is 140 at fish density 213.07 | Note: Tank size 2.2 x 0.8 x 0.5 
DONT FORGET: Height is 0.5



A cool feature since python 3.6 is that when you put the letter f at the beginning of a string (e.g., f"..."), it indicates an f-string. F-strings provide an easy way to embed expressions and variables that you've already defined earlier in your code. But be aware that it's only since python 3.6 so may not be backwards compatable.

See how with the ```f``` it means we don't have to close and open quotes and commas when printing ```(" ",)``` and looks cleaner.

None of this is essential python, but makes your code easier to read and less chance of errors!

In [None]:
print("Tank of size {tank_length} x {tank_width} x {tank_height} produced capacity {capacity} at fish density {density}")
print(f"Tank of size {tank_length} x {tank_width} x {tank_height} produced capacity {round(capacity,0)} at fish density {round(density,2)}")

Tank of size {tank_length} x {tank_width} x {tank_height} produced capacity {capacity} at fish density {density}
Tank of size 2.2 x 0.8 x 0.5 produced capacity 141.0 at fish density 213.07


## Exercise - Function writing

### 1. Sample count function
We have a number of river water sites that we want to investigate, but the microscopy service will only take the job if we have at least 55 samples total!

We have three locations where we have colected samples from, and the number of replicates at each:
```
River Wye | 20 sites | 5 replicates
River Taf | 5 sites | 6 replicates
River Aber | 11 sites | 5 replicates
River Iago | 2 sites | 18 replicates
```


1. Write a function named  ```big_enough``` that prints True or False for each of the three sites calling the function using the values
2. Put the sets of data into a list and with a loop output the three results
3. Write a better output print to make more sense!
3. Extension: Put the input data into a dictionary so you can test with just the river name i.e. ```print(big_enough('Wye'))```


In [None]:
# Write your function here:

# Testing your function - should return True or False
print(big_enough(20, 5))

True
False
True


## 2. DNA complement function
 Create a function named  that takes a DNA sequence as input and returns its complement strand (Bases swapped for their complement, but in the same order)

    ```A <--> T```
    ```C <--> G```

 Extension: There is no built-in named method for reversing a string in python (something that libraries and modules can solve) but there is a simple code that can achive it. Test this line, and then see if you can use it to create a full reverse-complement output ```reverse_string = "My string"[::-1]```

In [None]:
# Your function
def complement():

# Testing your function
for_seq = "ATGCGCATGCTAGCTAG"


Forward sequence:	 ATGCGCATGCTAGCTAG
Complement sequence:	 TACGCGTACGATCGATC
