### Week 1 - Python Basics

Covers:

    - Importing modules
    - Reading / writing files
    - Building functions
    - Docstrings & commenting
    - Styling & good practices

#### Importing modules

Always at top of script, always ordered alphabetically

Can order as you like, sensible examples are seperating system, 3rd party and local package imports:




In [None]:
# system
import math
import os
import sys
from time import sleep

# 3rd party installed
import numpy as np
import pandas as pd

# local packages
import week1_example_script as example

In [None]:
# functions imported from other scripts may be called
example.say_hello()

#### Reading & writing files

Use the python `open()` function to open files, can be used both to open files for reading and writing.


`open(filename, mode)`


In [None]:
# open file and read all lines
f = open('./example_file.tsv', 'r')
f.read()

In [None]:
# read specific line
f = open('./example_file.tsv', 'r')
f_lines = f.readlines() # read in all lines and store in f_lines variable
f.close()

# read just the 6th line
f_lines[5]

In [None]:
# loop over file and read lines, allows to do operations on file as reading
f = open('./example_file.tsv', 'r')
for line in f:
    if 'A2M' in line:
        print(line)
f.close()

In [None]:
# use of with statement automatically closes file
with open('./example_file.tsv') as f:
    # read file into a Pandas dataframe
    df = pd.read_csv(f, sep='\t')

print(df)


In [None]:
# open a new file, use 'w' mode to write to file
f = open('./output/new_example_file.txt', 'w')
f.write('line1\n')
f.close()

In [None]:
# open file in 'a' mode to append to file, else it will be overwritten
f = open('./output/new_example_file.txt', 'a')
f.write('line2\n')
f.close()

In [None]:
# many packages have their own functions for writing e.g. Pandas

df.to_csv('./output/saved_df.tsv', sep='\t')
df.to_excel('./output/saved_df.xlsx')

#### Building Functions

Functions in python are defined first with `def()`, block of code when run that can be passed arguments and return results

In [None]:
def read_file(file_to_open):
    with open(file_to_open) as f:
        df = pd.read_csv(f, sep='\t')

    return df

In [None]:
df = read_file('./example_file.tsv')
print(df)

Key things for a good function:

- does one 'thing'
- sensibily named, often useful to begin with an adjective (i.e. read, get, split etc.)
- verbose names are better than abbreviated
- use when doing the same thing repeatedly (i.e. a calculation or filtering data etc.)
- docstrings

In [59]:
def get_sample_id(name):
    if '_' in name:
        sample_id = name.split('_')[0]
        print(f'Sample id is: {sample_id}')
        return sample_id
    else:
        print('Given name does not contain "_"')
        return


In [60]:
name = "sample1_abc_123"
sample_id = get_sample_id(name)


Sample id is: sample1


In [62]:
name = 'somethingElse'
sample_id = get_sample_id(name)
print(sample_id)

Given name does not contain "_"
None


#### Docstrings & comments

Use to make code understandable and more readable, can use either pairs of triple quotes for docstring and \# for comments

Code should be written to be readable on it's own, but comments can help for more complex functions and / or rationale of why something is being done
i.e. 

In [None]:
def read_file(file_to_open):
    """
    Reads in given file to Pandas dataframe

    Args:
        - file_to_open (str): path to file to read in
    
    Returns:    
        - df (df): DataFrame of data in given file
    """
    with open(file_to_open) as f:
        df = pd.read_csv(f, sep='\t')

    return df

#### Styling & Good Practices

PEP8 style guide: http://www.python.org/dev/peps/pep-0008/

Easiest to use a linter extension to IDE, e.g. pylinter for VSCode, will automatically highlight most issues

Things to follow:

- variables, functions, methods: `lower_case_with_underscores`
- classes: `classesAsCamelCase`
- **No single character variables** except in very short blocks, i.e.:

&nbsp;&nbsp;&nbsp;&nbsp;`for i in range(0, 10):`

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`print(i)`

- better to be more verbose in naming to make it easier to read

- limit line length, comments/doc strings 72 characters, code 79 (doesn't have to be strict if makes it less readable)
- can split strings in different ways, preffered is to use parentheses:

&nbsp;&nbsp;&nbsp;&nbsp;`long_string = (`
    
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`"this is a really really really really really really really "`

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`"really really really really really long string"`

&nbsp;&nbsp;&nbsp;&nbsp;`)`

- formatting strings using f-strings to add in variables: https://www.python.org/dev/peps/pep-0498/

```
name='John'
formatted_string = f"His name is {name}"
```

- adding rulers in IDE makes this easier to follow
- add single line doc strings for obvious functions (i.e `find_sqrt()` above), multi line with args/returns/outputs for more complex, can also include short usage if appropriate
- prefer code being readable over lots of comments
- using `__name__` global variable:
    - `if __name__ == __main__:` used to define what to run when the script is called
    - typically all code should be in functions, then the functions called in this section
    - https://docs.python.org/3/library/__main__.html

- generally should **not** use globals => variables defined outside of functions (in global scope)
- these are bad pratice as can cause conflicts and make it difficult to identify errors, unexpected side effects etc
- exception are constansts which should be `ALWAYS_CAPITALISED`
- constants are a type of variable that can't be changed, normally defined in a file then imported (i.e. for tokens):

&nbsp;&nbsp;&nbsp;&nbsp;token.py:

&nbsp;&nbsp;&nbsp;&nbsp;```AUTH_TOKEN = 'XXXXXX'```



&nbsp;&nbsp;&nbsp;&nbsp;my_script.py:

&nbsp;&nbsp;&nbsp;&nbsp;```from token import AUTH_TOKEN```

- consistent use of spaces between lines and functions etc.


Example (relatively) short scripts:
- https://github.com/eastgenomics/eggd_generate_bed/blob/master/resources/home/dnanexus/generate_bed.py
- https://github.com/eastgenomics/dx_job_monitor/blob/main/dx_job_monitor.py
- https://github.com/eastgenomics/hermes/blob/main/hermes.py
- https://github.com/eastgenomics/athena/blob/master/bin/coverage_stats_single.py
- https://github.com/Addy81/BroadWork/blob/master/bin/2.hap.py-analysis.ipynb


#### Useful resources
- common useful functions with worked examples: https://www.w3schools.com/python/python_ref_functions.asp
- StackOverflow, W3schools, learnpython.org, docs.python.org

Books:
- Learn Python the Hard Way
    - Lots of exercises, teaches Python through doing lots of coding practice
    - https://www.valeacademy.org.uk/documents/download/5e7e0291-cfa4-4f7e-84af-64cf0a01017d.pdf
- Think Python
    - Separate chapters covering core concepts, some examples & mini projects to practice
    - https://greenteapress.com/wp/think-python/
- Automate the Boring Stuff:
    - Lots of projects, good for learning how to put things together
    - https://automatetheboringstuff.com/
    