# Workshop 3

## User Defined Functions

As well as the functions built in to Python, sometimes we want to write our own functions.

There are two main reasons for this:
* If you want to repeat the same task lots of times, it is easier to use a function that copy and paste the same piece of code.
* When you are writing longer scripts, it is easier to read code if it is divided into function.

A function has an input and an output.  We define a function using the `def` command.

In [1]:
def myfunction(mystring):
    newstring = "%s is my string" % mystring
    return newstring

When you run this code, Python loads the function into its memory, but doesn't actually run the function - it is just stored reading to use.
The input for the function above is `mystring` and the output is `newstring`.

The `return` value of a function another name for its output. You can assign the output of a function to a variable.

In [2]:
x = myfunction('hello')
print (x)

hello is my string


In [3]:
y = 'goodbye'
x = myfunction(y)
print (x)

goodbye is my string


To define functions, we need to rememeber the concept of `arguments` as the input to functions.

The names that you give to the arguments when you pass them to the function do not need to be the same as the arguments inside the function.  This can be slightly confusing.

In [4]:
def another_function(x, y):
    tot = x + y
    return (tot)

In [5]:
another_function(1, 2)

3

In [6]:
v = 1
w = 2
another_function(v, w)

3

In [7]:
v = 1
w = 2
another_function(x=v, y=w)

3

Just like for in-built functions, the function decides which argument is which based on either their order or the names inside the brackets when you call the function.

**Exercise** 

Write a function which returns the mean of two numbers and call the function in Python.

*Bonus: Write and call a function which returns the mean of any number of integers, using a for loop and a list as an argument.*

Inside a function we can combine several steps.

For example, this function checks how many lines are inside a file.

In [8]:
def countLines(infile):
    lines = open(infile).readlines()
    x = 0
    for line in lines:
        x += 1
    return (x)

In [9]:
countLines('lines1.txt')

4

In [10]:
countLines('lines2.txt')

5

We can add to this function to instead return True if the number of lines is even or False is the number of lines is odd.

In [11]:
def countLinesOddEven(infile):
    lines = open(infile).readlines()
    x = 0
    for line in lines:
        x += 1
    if x % 2 == 0:
        iseven = True
    else:
        iseven = False
    return (iseven)

In [12]:
countLinesOddEven('lines1.txt')

True

In [13]:
countLinesOddEven('lines2.txt')

False

**Exercise**
* Modify the function to return the total number of characters in the file (you can use the `len` function to find the number of characters in a string).

*Bonus: Instead, return a list of the lengths of all the lines in the file*

It's also possible to generate functions which have no arguments, or functions which return nothing.  These are sometimes useful.

For example, if you always need the same list of column names for your file, you can use a function to get them each time.

In [14]:
def getColumnNames():
    return (["name", "start_position", "end_position", "strand"])

In [15]:
x = getColumnNames()
print (x)

['name', 'start_position', 'end_position', 'strand']


If you do something to the output file inside the function you may not need to return anything.

In [16]:
def removeSpaces(infile, outfile):
    inf = open(infile).readlines()
    out = open(outfile, "w")
    for line in inf:
        out.write(line.replace(" ", ""))
    out.close()

In [17]:
removeSpaces('removespaces.txt', "nospaces.txt")

If you look at the file `nospaces.txt` you will see the result of running this function.

## Pandas

We will now briefly look at a specific Python module - `pandas` - for dealing with dataframes (tables of data), as it is often useful.

The convention is to rename `pandas` as `pd` when we import it.

In [18]:
import pandas as pd

Pandas can be used to read, write and parse tab or comma delimited tables.

These are text files where the columns are seperated by either tabs or commas.

We can read a table using the `pd.read_csv` function.

In [19]:
comma_delim = pd.read_csv("commadelim.txt")

In Jupyter, if you just type the name of a `pandas` dataframe it will display it nicely.

In [20]:
comma_delim

Unnamed: 0,Name,Age,Location
0,Katy,33,Cambridge
1,Mary,75,London
2,Bob,44,Oxford


For a tab delimited table we have to add an extra argument, to tell Python that the table is tab delimited.

`\t` represents a tab character in Python.

In [21]:
tab_delim = pd.read_csv("tabdelim.txt", sep="\t")

In [22]:
tab_delim

Unnamed: 0,Name,Age,Location
0,Katy,33,Cambridge
1,Mary,75,London
2,Bob,44,Oxford


We can easily sort tables in pandas with the `sort_values` method.

In [23]:
tab_delim = tab_delim.sort_values('Age')

In [24]:
tab_delim

Unnamed: 0,Name,Age,Location
0,Katy,33,Cambridge
2,Bob,44,Oxford
1,Mary,75,London


It's also easy to delete or add a row or column

We can delete rows or columns using the `drop` method.  It refers to rows as axis 0 and columns as axis 1.

We access rows using the row number from the left side of the table.

In [25]:
tab_delim = tab_delim.drop(0, axis=0)

In [26]:
tab_delim

Unnamed: 0,Name,Age,Location
2,Bob,44,Oxford
1,Mary,75,London


We access columns using the column names.

In [27]:
tab_delim = tab_delim.drop('Name', axis=1)

In [28]:
tab_delim

Unnamed: 0,Age,Location
2,44,Oxford
1,75,London


To add a column or row, we can use a list.

For columns, we give the column name we want to add in square brackets after the table variable name.

In [29]:
tab_delim['Name'] = ['Bob', 'Mary']

In [30]:
tab_delim

Unnamed: 0,Age,Location,Name
2,44,Oxford,Bob
1,75,London,Mary


For rows, we do the same but we add `.loc` before the first square bracket.

In [None]:
tab_delim.loc[0] = [33, 'Cambridge', 'Katy']

In [None]:
tab_delim

To write a table to file, we used the table name plus the `to_csv` variable.

In [None]:
tab_delim.to_csv("mytable.tsv", sep="\t")

Regardless of the input file format, the output file will be comma delimited unless we specify `sep="\t"`

**Exercise**
* Make a tab delimited table in a text editor and save it in the folder with your notebook.  Make sure there are column headings and at least one column is numerical (integers or floats).
* Read the table into Python
* Sort the table by the numerical column
* Add a new column in Python
* Add a new row in Python
* Delete a column
* Add a column
* Output the new table to a text file.

*Bonus: Try to make an additional column with a transformation applied to the numerical column, e.g. add 1 to all the numbers and put this in a new column.*