## What to consider

There is a tradeoff between *Quantity* and *Quality* of the reads that will remain after quality control. This program
will attempt to balance these two and optimize the largest number of retained reads with the highest quality.

In this case both **trimming** and **maxEE** will be parameters that we consider.

### How to do this:

1. Perform "obvious" trimming
    + iterate along the average quality score and eliminate ends below a threshold (default will be 20)

2. Create a graph of amplicon size (x-axis) and the average sum of the expected errors (y-axis)
    + This will be done by indexing the amplicons to certain lengths and calculating the sum of the expected errors
    + Find a threshold
    + It will also be done separately for the forward and reverse reads

3. The number of reads retained above a certain threshold will be determined
    + This allows us to estimate the total number of reads retained

4. The next step will be to determine whether lowering the max expected error threshold is needed.
    + Can a significant number of reads can be retained with a slight increase of the **maxEE** parameters?
    + Plot maxEE (x-axis) vs number of reads retained (y-axis)
    + Find a threshold
    + Do for both forward and reverse reads

## Performing "Obvious" Trimming

Current industry standards involve looking at a barplot and picking trim values where a noticeable decrease in average
quality score at a position on either end of the read occurs. Here we start with the same concept in order to reduce
search space.

The trim values (or index at which trimming is done) is determined by the first instance an average quality score is
below a threshold starting from the center index. This is done because the highest quality scores will be in the middle
of the reads.

It is also important to note that scores, as output in FASTQ files, are on a scale from 0 to 42 as per Phred quality
score standards.

Here is a step-by-step process of how this form of trimming is performed.

In [21]:
# Step 1: set threshold value
threshold = 20

# Step 2: middle index is determined
list =  [15, 18, 18, 30, 30, 40, 30, 30, 19, 17, 15]

mid_index = len(list) // 2

# Step 3: travel from center to left and find instance of average score below threshold
current_index = mid_index
while current_index >= 0:
    # if value at current index is below threshold
    if list[current_index] < threshold:
        # get the prior index
        trim_left_index = current_index + 1
        break
    else:
        current_index -= 1

# Step 4: similar to step 3 the right index value is found

# Step 5: tuple containing left and right trim sites is returned. For the list in this example, the returned value
# would be (2, 8)
