# ufits test

notes below reflect observations in working through GitHub [README.md] [link1] and [tutorial] [link2] pages  

[link1]:https://github.com/nextgenusfs/ufits/blob/master/README.md
[link2]:https://github.com/nextgenusfs/ufits/blob/master/docs/tutorial.md



# pipeline

- programs installed using Amazon EC2 instance
- initial setup follows similar commands shared on the [linux installation instructions page] [link1].
- instructions below assume a working instance is already in progress and no programs have yet to be installed...
[link1]:https://github.com/nextgenusfs/ufits/blob/master/docs/ubuntu_install.md

## install programs

This should look very familiar to the installation instructions provided on the [UFITS Github page] [link1]. Note that there are a few comments about a few of these installs (and the problems I've encountered) which are described after the code is executed. I put an asterisk '\***' next to the steps in which I've encountered a problem.
[link1]:https://github.com/nextgenusfs/ufits/blob/master/docs/ubuntu_install.md

### comment about ufits installation:
1. There is no resulting 'test_data' folder, but this is manually downloaded below.
2. The __ufits__ program is installed in a hidden LinuxBrew subdirectory:  
    /home/ubuntu/.linuxbrew/bin/ufits
3. The USEARCH install seems to require both version 8 and 9. Both were downloaded and named separately. See explanation in pipeline below.

## Download necessary datasets (.fastq files and databases)
Because the LinuxBrew approach fails to install the test_data directory, you have to install these scripts manually. We'll just install the single Ion Torrent file to test out the program. We're also going to install all but the COI databases.

## Run UFITS

### Part 1: processing Ion Torrent data

Unfortunately, no barcodes are provided in the LinuxBrew install, so if we're trying to follow along with the Pre-processing data example in the [tutorial] [link1], we have to create a text file of the 'barcodes.fa' term implied in the example. That's done in the code below, then we follow with the actual command to execute the 'ufits ion' step.
[link1]:https://github.com/nextgenusfs/ufits/blob/master/docs/tutorial.md

resulting ouput suggests that 5 of 6 barcodes successfully amplified; all but BC_33 (which was the one used to exemplify the barcode matching in the tutorial !?)

### Part 2: clustering data
This was the first instance in which an error popped up despite having all the necessary data. It appears that the default USearch8 argument from the ufits-OTU_cluster.py script leads to an error. For example:

Leads to the following error message:

Instead, I tried specifying the specific verson of USEARCH I've been using as follows:

Interestingly this change by specifying the '-u' path allows USEARCH to function just fine, however a new error pops up with respect to the UPARSE clustering function:

I wondered if there was simply some discrepency between running USearch 9.0 vs. 8.0... This was why I ended up downloading both version 8.0 in addition to the 9.0 version I installed initially. Setting the paths and symlinks using the usearch8 and usearch9 then allowed me to specify which version of usearch to use depending on which version was needed for each ufits python script.   

Interestingly, if I run usearch8 with the same command as above:

Boom. Works like a charm! Well almost... notice right at the end of the table we fail to get any of our OTUs to pass the VSEARCH chimera filtering step.

I'm still unclear why that is; I think what might be the case is that the test data I downloaded in this tutorial from your GitHub page wasn't the same as what you used in your example. The fact that BC_33 wasn't even a match for any of the ion.test.fastq reads makes me think this is the case... In any event, I think it's very important to note that running USEARCH9.0 doesn't go, but a change to v_8.0 solves the problem.  

Nevertheless, you can proceed with this test data if you avoid the chimera filtering step as follows (all I'm doing is removing the '--uchime_ref ITS2' portion of the code most recently executed:

Sure enought, we get the same number of reads passing filters, same number of OTUs called, but we do get a proportion of reads to actually make their way into the OTU table so we can continue with the tutorial. Nice.

## Part 3: filtering OTU table

Because the LinuxBrew install didn't include specific data I got to a point in the day where I didn't both testing the filter command using the '-b mock' flag. This should be simple enought. I proceeded just by using the command where you specify the percent index-bleed across all samples:

No issues here:

## Part 4: Assigning Taxonomy to OTUs

There's a tiny issue with the instructions in the tutorial page, specifically with the script used as the example in the "Assigning Taxonomy to your OTUs" section... Here's the code you've provided as the example:

And here's the example code that is suggested from the output of the previous script:

The difference? No fasta file flag is specified (there's the '-i' flag first for the OTU table but you follow that flag up with a fasta file); and, no OTU table is put in.  It took me a while to catch that error, and I was excited thinking that all I had to do was just follow the instructions the program suggested instead of the online tutorial. I also realized a while later that you illustrate the various approaches possible in the README.md page __differently__ and I think correctly. So I entered this:

And I got this:

Which made me cry a bit... But that's when I realized you need to have multiple versions of USEARCH to get this to all pull together, or at least a version at 8.1 or greater. And it totally was staring me in the face in the [README.md] [link1] file, but unfortunately it's not explicity in the [tutorial] [link2]. It's also perhaps unnecessarily tricky that you can use a lower version for some of the early aspects of the program but not later.  

Perhaps at very least in the tutorial example you can be explicit about which versions will or will not work for that step, in addition to the information you provide in the README.md page.  

Once I realized I needed an updated version of USEARCH, I then installed usearch9, and entered the following code:
[link1]:https://github.com/nextgenusfs/ufits/blob/master/README.md
[link2]:https://github.com/nextgenusfs/ufits/blob/master/docs/tutorial.md

And I was sorely disappointed to realize this was giving the me the same error message... How is v9.0 not above 8.1??

Perhaps the problem has something to do with the default taxonomy method. I tried specifying just the usearch method:

Boo. Same error message. Wasn't expecting that.

Last ditch effort was to install exactly the version that was described by the error message: v8.1.1756

Then try running the taxonomy script once more specifying this specific v8.1:

And it totally worked. Well sort of. The script ran, but because (again) my test data set probably wasn't the same one you used in your tutorial example, this was somewhat expected.  

And that's totally silly that version 9 won't work, but version 8.1 will:

And the results of that taxonomy table: