# Practical Python Programming for Biologists
Author: Dr. Daniel Pass | www.CompassBioinformatics.com

---

# Your code, on the command line

To make your Python code runnable from the command line there are a few main steps. Any code you have written or used this week should be equally comfortable on a local computer or a Jupyter notebook, and it just requires informing the code where it is.

### Stages for creating an independent python program

1. **Open a new file for your script**: Write your Python code in a file with a .py extension - it is not technically essential but good practice.

2. **Define the shebang line**: The first line of your script must include a shebang line to specify that it is Python and which interpreter to be used. This ensures that the correct Python version is invoked when running the script.

For example:
` #!/usr/bin/python`
   or
`#!/usr/bin/env python3`

This will vary depending on your computer. Typing ```which python``` on the command line will show you what to use.

3. **Write your code!**: Now's the time to put all your code in there. Once you're finished, save and exit.

3. **Add executable permissions**: If you are working on a Unix/Linux system (macs are a unix system), you need to make your Python script executable by assigning the appropriate permissions. Use the chmod command to set the executable permission:

        chmod +x script.py

4. **Run the script**: On the terminal move to the directory where your Python script is located, and run the script using ```./script.py```. If you have set up the shebang line correctly it shouldn't need you to specify python, but you can optionally do that if required: ```python script.py```

        ./script.py

### Exercise - Run your code

Take the code below and turn it into a command line program. Follow the steps above to create and run your code **on your own computer**:
1. Create a new textfile with the shebang line at the top
2. Add the code that you want to run, save & exit
3. Make the code excecutable
4. Run the script!

Lets reuse the fish populations function from yesterday. Use this as your code:

In [None]:
#!/Your/Shebang/Line/Not/This/One/Obviously

def calc_fish_number(radius, fish):
  volume = (3.14 * radius ** 2) * 2

  if fish == "Guppy":
    total_fish = 17 * volume
  elif fish == "Tuna":
    total_fish = 3 * volume

  return total_fish

# Tests - Value = radius in metres & fish per m3
print("Number of Guppies managable:", calc_fish_number(0.5, "Guppy"))
print("Number of Tuna managable:", calc_fish_number(1.5, "Tuna"))

#### Working with graphical outputs
When making graphical outputs you may not have the matplotlib library installed, or the environment setup to put the graph on screen. To instal matplotlib (if required) you simply use the commandline code ```pip install matplotlib``` like we have in the notebook.

To change code designed for notebooks to the command line replace the ```plt.show()``` with ```plt.savefig("my_plot.png")``` to save a figure as a file instead of displaying on the screen if your environment isn't set up for that (for example most servers don't have graphical output)


---

# Argparse

The argparse module in Python provides a powerful way to handle command-line arguments and options. It allows you to define the command-line interface for your script, specify the expected arguments and parameters, and automatically generate help messages.

It may not be something you want to do (you could edit all the variables inside your code) but it is a method to make your code run like a stand-alone program and be easily variable.

Here is a default template for the three main stages for creating parameter-controlled code (remember it cannot be ran in the Jupyter environment).

In [None]:
import argparse, sys

# Initialise the parser class
parser = argparse.ArgumentParser(description='Description of your script')

# Define some options/arguments/parameters
parser.add_argument('-i', '--input', help='Path to input file')
parser.add_argument('-o', '--output', help='Path to output file', default='my_output.txt')

# This line checks if the user gave no arguments, and if so then print the help
parser.parse_args(args=None if sys.argv[1:] else ['--help'])

# Collect the inputted arguments into a dictionary
args = parser.parse_args()

print(args)

#### Exercise: Make a command line program

Lets combine the two options. We will put the argparse code above the fish code in one script and try to run it using different parameters. The only change we are making here is replacing the values in the code, with the argparse values from the command line.

1. Copy this code into a new script and run it with the -h parameter. It hasn't been defined, but it is a built-in option with argparse.
2. Run the code a few times with different numbers and fish to test it.

Extension: What happens if you type a fish not in the code? Add a more usefull output than the error e.g. "That fish is not in the database"


In [None]:
import argparse, sys

###############################
#### Reading parameters in ####
###############################

# Initialise the parser class
parser = argparse.ArgumentParser(description='Test how many fish you can hold in your tank')

# Define some options/arguments/parameters
parser.add_argument('-f', '--fish', help='What fish to test?')
parser.add_argument('-r', '--radius', type=float, help='Radius of the tank', default='1')

# Collect the inputted arguments into a dictionary or check if empty and give help
parser.parse_args(args=None if sys.argv[1:] else ['--help'])
args = parser.parse_args()

##############################
##### The actual script ######
##############################

def calc_fish_number(radius, fish):
  volume = (3.14 * radius ** 2) * 2

  if fish == "Guppy":
    total_fish = 17 * volume
  elif fish == "Tuna":
    total_fish = 3 * volume

  return total_fish

# Tests - Value = radius in metres & fish per m3
print("Number of", args.fish, "managable:", calc_fish_number(args.radius, args.fish))

## Extension Exercise - Building the parameters

Lets make our own length and GC% calculator (You can write the code yourself, or use the one below)

1. In a new script combine the code below with the first argparse code (including input/output) above and test it with the ```co1_sequences.fasta``` file
2. Add an additional argument for "minimum length" and set the default to 900.
3. Modify the script to only print for sequences above the default 900.
4. Test your your new parameter to output only over 1000

Note that this uses the biopython module. You may need to install this with pip install biopython first.

In [None]:
from Bio import SeqIO, Seq

for seq_record in SeqIO.parse(args.input, 'fasta'):
    seq_len = len(seq_record)
    GC = (seq_record.seq.count("G") + seq_record.seq.count("C")) / seq_len * 100
    print("Sequence", seq_record.id, "has length", seq_len, "and GC of", str(round(GC, 2)) + "%")


---

## Subprocess - Including other programs in your python code

The ability to interact with external command-line tools and programs is crucial for data analysis and processing with a range of bioinformatic packages out there, and incorporating them into your workflows. The library ```subprocess``` that allows you to spawn new processes, connect to their input/output/error pipes, and obtain their returns.

The objective here is that sometimes there are programs that only exist in other languages that you want to use, or alternatively they are faster in other languages. Python makes it easy (ish) to combine them.

! ***NOTE: These examples will not work unless you have the relevant files available or programs installed. They are just for demonstration*** !

Firstly lets just look at running a basic command to make a directory, and then check in your local files

In [None]:
import subprocess

command = "mkdir test_directory"
subprocess.run(command, shell=True)

It can also be important to capture the output of the other command you're running. We also include the ```text=True``` parameter so that the output is in the same format, otherwise it will be interpretted differently (Try removing that parameter to check)

In [None]:
import subprocess

command = "ls -l *"
process = subprocess.run(command, shell=True, capture_output=True, text=True)
print(process.stdout)

We can combine multiple arguments togeher in a list. Here they will just be added together in order

In [None]:
command = ["grep", "^KW", "/content/am181037.embl"]
process = subprocess.run(command, capture_output=True, text=True)
print(process.stdout)

The commands above all return just an output to the screen, however most programs in this context may be producing specific output files. Here is a more complex real-world command, where sequences is a variable that we created in our code

- Note that this will not run unless you have blastp installed

In [None]:
command = ['blastp','-query',sequences,'-out','blout.txt','-db','nr.00']
process = subprocess.run(command, shell=True, capture_output=True, text=True)
print(process.stdout)

Running an external program within your script can be messy and cause errors. It is good practice to have your code test wether something is working, rather than just assuming it does.

The ```try``` and ```except``` code is like an if/else block but specifically for seeing if errors occur, and is exactly what we want here.

In [None]:
# Example 4: Handling errors and exceptions
command = "invalid_command"
try:
    process = subprocess.run(command, capture_output=True, text=True)
    process.check_returncode()  # Check if the process exited with a non-zero status
except subprocess.CalledProcessError as e:
    print("An error occurred:", e.stderr)

Note: You may see ```subprocess.Popen()``` in some code. Overall, ```subprocess.Popen()``` provides more flexibility and control over the spawned processes, while ```subprocess.run()``` offers a simpler and more convenient way to execute commands and retrieve their output and where we will finish this session!