# Error Mitigation

------
### Learning Objectives:
+ Common errors in BASH coding
+ Errors in conda environments

## Common Errors in BASH Coding
----

One of the largest barriers to learning to interact with bioinformatics software and data is error mitigation. Becoming fluent in a coding language is fairly challenging, but can be overcome by piecing together code from various sources, and testing each chunk before combining them to create complex code. However, if an error is flagged it can be very difficult to interpret and know what to do next, often there is no indication that something went wrong. If there is a message thrown it doesn't identify exactly where the error occurred and what steps can be taken to mitigate the error. 


Below is a list of some common errors in BASH coding so you can get a feel for what the errors look like and how to mitigate the issue.

- Wrong path
- Typo
- Space in a file name
- Missing required flag or argument

### Incorrect Paths

Let's start with the wrong path. All of our sequence files are in the directory `sequences/` what happens when you try to look for files ending in the pattern `*.fasta` in our home directory?

In [None]:
%%bash

# Wrong path
ls *.fasta 

In Jupyter notebook you get a red shaded box and a warning *no such file or directory*, in the terminal you get a similar warning but no red shaded box. You can see this warning is a bit ambiguous, there is no way to tell that the path was incorrect, in fact there are many similar errors that will throw the same message. 


### Incorrect Spelling

Here we will use the correct path, but there will be a typo in our regular expression or the file won't exist.

In [None]:
%%bash

# Typo (incorrect case)
ls gcp_research_workflow/*.Fasta


In [None]:
%%bash

# Typo (incorrect spelling)

ls gcp_research_workflow/*.fasto

In [None]:
%%bash

# Typo (incorrect spelling) and directory does not exist

ls sequence/*.fasta

In [None]:
%%bash

# File does not exist

ls *.fasta

In these cases though there is an error thrown, it isn't clear exactly what the problem is and you will have to do some investigating before you can figure out exactly what the issue is. In the examples we used above the commands were simple and so diagnosing the problem is relatively straightforward. 

<div class="alert alert-block alert-warning">
    <i class="fa fa-question-circle-o" aria-hidden="true"></i>
    <b>TEST YOUR SKILLS</b> 
      <p>Practice your skills in the code block below</p>
    <div style="background-color: white ; color:black; padding: 3px;">In the next two code blocks the code is a little more complex, can you diagnose the errors?<br><br> Run the #FLASHCARD code block to see the answer.</div>
    
</div>

In [None]:
%%bash

cat assembly_test/SRR18241034_1.fastq | sed -n '2~4p' | head -10000 | grep -o . | sort | grep 'n' | wc -l

## Here you are getting a count of 0 but there are actually 114 Ns in the first 10,000 reads, why are you getting 0 returned?


In [None]:
# FLASHCARD
from IPython.display import IFrame
IFrame("quiz_files/quiz7-1.html", width=600, height=350)

In [None]:
%%bash


ls gcp_research_workflow/*.fastq.gz | while read i; do 
   echo $x
   zcat $x | sed -n '2~4p' | head -10000 | grep -o "ATG" | wc -l
done

## Here you are getting a red shaded box and an answer returned, This is the correct answer where is the typo here?

# gcp_research_workflow/SRR1039508_1.chr20.fastq.gz
# 10093
# gcp_research_workflow/SRR1039508_2.chr20.fastq.gz
# 10034
# gcp_research_workflow/SRR1039509_1.chr20.fastq.gz
# 10322
# gcp_research_workflow/SRR1039509_2.chr20.fastq.gz
# 10110
# gcp_research_workflow/SRR1039512_1.chr20.fastq.gz
# 10173
# gcp_research_workflow/SRR1039512_2.chr20.fastq.gz
# 10277
# gcp_research_workflow/SRR1039513_1.chr20.fastq.gz
# 10448
# gcp_research_workflow/SRR1039513_2.chr20.fastq.gz
# 10353

In [None]:
# FLASHCARD
from IPython.display import IFrame
IFrame("quiz_files/quiz7-2.html", width=600, height=350)

We started with typos because these are the most common mistakes we diagnose in our workshops, especially with regard to case. 

One way to attempt to prevent typos, at least when typing filenames is to use the `tab` key to autocomplete a filename. In your terminal window (or a code chunk window) type `ls g` and then use the `tab` key to autocomplete the name of the directory. 

Now you should see `ls gcp_research_workflow/` add the text `all_` to the end of the command and press the `tab` key again. The file `all_counts.txt` will be filled in automatically. 

Using the `tab` autocomplete can save time, but more importantly it can prevent typos and save even more time trying to diagnose a vague error caused by a typo.

### Incorrect or Missing Flags

Another issue that can occur is a missing flag or argument in commands where flags or arguments are required. For our test example you will need to ensure that you have the conda environment `test_env` loaded. 

In [None]:
%%bash

# Missing argument
fastqc

If you run this command in the terminal, an error is thrown but there is no information about how to mitigate the error. In the Jupyter notebook there is a bit more information than in the terminal, but you need to read through a bit of text before it indicate an argument is missing. 

When typing commands quickly it's often the case that you will misspell or incorrectly indicate the flag that you're trying to use. Luckily in this case the software will throw an error identifying the flag that is unrecognized so you can easily fix the problem. 

In [None]:
%%bash

# incorrect flag asking for 4 threads
fastqc -threads 4 gcp_research_workflow/*.fastq

## Errors in Conda Environments
---

In submodule 6 we needed to create two separate conda environments for assembly and annotation. I explained that this was due to a conflict in the dependencies of software used in each environment. 

Let's try to install prokka in your assembly environment and see what error is thrown. Remember to activate your assembly environment in the terminal using `conda activate assembly`. This command will take about 10-15 minutes to run.

`conda install -c bioconda prokka=1.14.6`

You can see that the error message is pretty long and indicates that there are conflicts in the dependencies needed for python v. 3.9 and prokka v. 1.14.6. Can you see what the names of the dependencies (packages) causing an error are?

One way to mitigate this would be to try all versions of python and prokka to determine a combination of versions that have compatible dependencies, but this is time consuming and it's easier to silo the tasks into their own environments where you can use the desired versions of the software specified. 

## Final Thoughts on Error Mitigation
----
The best way to diagnose an error thrown when using software is to use the help page for that software and remind yourself of the syntax of the command. Again this is a reason why saving code to BASH scripts is helpful as you can copy and edit the command as you need to use it in new projects. 


## Continuing Bioinformatic Education 
----
This lesson introduced you to how you can interact with your data using the terminal interface using the BASH coding language. We also covered software installation, and completed a simple assembly workflow. These foundational skills are the first steps toward performing bioinformatic analyses on genomic data. 

To continue to build on these skills it might be helpful to continue practicing BASH coding with these other training modules.

[Methylation Sequencing Analysis](https://github.com/NIGMS/MethylSeqUH)



[RNA-seq Transcriptome Assembly ](https://github.com/NIGMS/rnaAssemblyMDI)
