# Analysis through Qiime1

In [1]:
!validate_mapping_file.py -o exp12_mapping_file_format_results/ -m mapping_file_cohousing2018.tsv

### Preparing raw illumina sequences prior to demultiplexing and quality filtering
Prior to using the split_libraries_fastq python script, you need to already have the mapping file, sequence reads (fastq.gz), and barcode reads (fastq.gz)

### Extracing barcodes
The following is entered through the terminal. 
Backward slashes are used so you do not have to enter everything on line in the terminal.
#### Before running the code, make a dedicated folder for holding everything (ie. pellet-removal-qiime-analysis). When you download the sequences, they will be in two gz files and within multiple folders. Simply move the gz files into the dedicated folder, this will just make it easier to work with. Also have the metadata file in that dedicated folder as well, to make things easier as well.

In [5]:
!extract_barcodes.py \
-f PT-pool-2_S1_L001_R1_001.fastq.gz \
-c barcode_single_end \
--bc1_len 12 \
-o processed_seqs

### Processing Illumina Data - Demultiplexing and quality filtering through the split_libraries_fastq.py script
This will demultiplex and quality filter our data.
For our case, since they are reverse barcoded they are not considered golay barcodes so you have to specify that the barcode_type is 12. This does not take a lot of time (30 minutes)

In [6]:
!split_libraries_fastq.py \
-o split_libraries_output/ \
-i processed_seqs/reads.fastq \
-b processed_seqs/barcodes.fastq \
-m pcos_mapping.tsv \
--barcode_type 12

### Working with demultiplexed sequences - Need to cluster the sequences using a denovo OTU approach. 
This script takes a sequence file and performs all processing steps through building the OTU table.
This script will produce an OTU mapping file (pick_otus.py), a representative set of sequences (FASTA file from pick_rep_set.py), a sequence alignment file (FASTA file from align_seqs.py), taxonomy assignment file (from assign_taxonomy.py), a filtered sequence alignment (from filter_alignment.py), a phylogenetic tree (Newick file from make_phylogeny.py) and a biom-formatted OTU table (from make_otu_table.py).

In [7]:
#Creating OTUs thru the de novo method.
#-a makes it run in parallel
#-O means we will use 4 cores
#-i is the filtered demultiplexed sequences that we want to make OTUs out of
#--output_dir is the folder that you want to output the OTUs to
!pick_de_novo_otus.py \
-i split_libraries_output/seqs.fna \
--output_dir uclust_otu/ \
-a \
-O 4

In [8]:
# Summarizing the results from the OTUs
!biom summarize-table -i uclust_otu/otu_table.biom -o uclust_otu/otu_table_summary.txt

In [10]:
#Filtering out any OTUs that rarely pop up across all the samples
#I have 95 total samples (this includes the 2 controls)
#since we want to filter out any OTUs that do not show up at least 25% of the samples, we will set 
#s as the number of samples that equals 25% of total samples (it was 24)
#-s is The minimum number of samples an OTU must be observed in for that otu to be retained [default: 0]
#-i is 
!filter_otus_from_otu_table.py \
-s 24 \
-i uclust_otu/otu_table.biom \
-o uclust_otu/uclust_otu_filter_25_percent/otu_table_filtered_s24.biom

In [11]:
#Summarizing our results (biom table)
#Link: http://biom-format.org/documentation/summarizing_biom_tables.html
#This explains how to actually use it.
#biom summarize-table -h
#-i is the input file
#-o is An output file-path (It is a text file)
#Compare the differences between before and after the 25 OTU minimum filter
!biom summarize-table \
-i uclust_otu/uclust_otu_filter_25_percent/otu_table_filtered_s24.biom \
-o uclust_otu/uclust_otu_filter_25_percent/otu_table_summary.txt

In [12]:
#Filtering out singletons
#Singletons are OTUs that appear only once
#Discard all OTUs that are observed fewer than 2 times (i.e., singletons)
!filter_otus_from_otu_table.py \
    -i uclust_otu/uclust_otu_filter_25_percent/otu_table_filtered_s48.biom \
    -o uclust_otu/uclust_otu_filter_25_percent/otu_singleton_filter/otu_table_no_singletons.biom \
    -n 2

In [13]:
#Summarizing our results (biom table)
#Link: http://biom-format.org/documentation/summarizing_biom_tables.html
#This explains how to actually use it.
!biom summarize-table \
    -i uclust_otu/uclust_otu_filter_25_percent/otu_singleton_filter/otu_table_no_singletons.biom \
    -o uclust_otu/uclust_otu_filter_25_percent/otu_singleton_filter/otu_table_summary.txt