In [464]:
%reload_ext watermark
%matplotlib inline
import os

from metapool.metapool import *
from metapool import (validate_plate_metadata, assign_emp_index, make_sample_sheet,
                      KLSampleSheet, parse_prep, validate_and_scrub_sample_sheet, generate_qiita_prep_file)
%watermark -i -v -iv -m -h -p metapool,sample_sheet,openpyxl -u

Last updated: 2021-09-30T01:26:05.461006-07:00

Python implementation: CPython
Python version       : 3.9.7
IPython version      : 7.27.0

metapool    : 0+untagged.93.gcec558b
sample_sheet: 0.12.0
openpyxl    : 3.0.8

Compiler    : Clang 11.1.0 
OS          : Darwin
Release     : 18.7.0
Machine     : x86_64
Processor   : i386
CPU cores   : 8
Architecture: 64bit

Hostname: CFAs-MacBook-Pro.local

re        : 2.2.1
matplotlib: 3.4.3
seaborn   : 0.11.2
numpy     : 1.21.2
pandas    : 1.3.3



# Knight Lab Amplicon Sample Sheet and Mapping (preparation) File Generator 

### What is it?

This Jupyter Notebook allows you to automatically generate sample sheets for amplicon sequencing. 


### Here's how it should work.

You'll start out with a **basic plate map** (platemap.tsv) , which just links each sample to it's approprite row and column.

You can use this google sheet template to generate your plate map:

https://docs.google.com/spreadsheets/d/1xPjB6iR3brGeG4bm2un4ISSsTDxFw5yME09bKqz0XNk/edit?usp=sharing

Next you'll automatically assign EMP barcodes in order to produce a **sample sheet** (samplesheet.csv) that can be used in combination with the rest of the sequence processing pipeline. 

**Please designate what kind of amplicon sequencing you want to perform:**

In [465]:
seq_type = '16S'
#options are ['16S', '18S', 'ITS']

## Step 1: read in plate map

**Enter the correct path to the plate map file**. This will serve as the plate map for relating all subsequent information.

In [466]:
plate_map_fp = './test_data/amplicon/compressed-map.tsv'

if not os.path.isfile(plate_map_fp):
    print("Problem! %s is not a path to a valid file" % plate_map_fp)

**Read in the plate map**. It should look something like this:

```
Sample	Row	Col	Blank
GLY_01_012	A	1	False
GLY_14_034	B	1	False
GLY_11_007	C	1	False
GLY_28_018	D	1	False
GLY_25_003	E	1	False
GLY_06_106	F	1	False
GLY_07_011	G	1	False
GLY_18_043	H	1	False
GLY_28_004	I	1	False
```

In [467]:
plate_df = read_plate_map_csv(open(plate_map_fp,'r'))

plate_df.head()

Unnamed: 0,Sample,Row,Col,Blank,Project Plate,Project Name,Compressed Plate Name,Well
0,X00180471,A,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,A1
1,X00180199,C,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,C1
2,X00179789,E,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,E1
3,X00180201,G,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,G1
4,X00180464,I,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,I1


## Step 2: check for duplicate sample IDs

This messes things up downstream. Make sure each sample has a different name.

In [468]:
try:
    assert(len(set(plate_df['Sample'])) == len(plate_df['Sample']))
except AssertionError as e:
    prev = ''
    for sample in sorted(plate_df['Sample']):
        if sample == prev:
            print('\nDuplicates:')
            print(plate_df.loc[plate_df['Sample'] == prev,])
            print(plate_df.loc[plate_df['Sample'] == prev,])
        
        prev = sample
    print('\n\nWarning! Some samples names are duplicate! Please update plate map to fix duplciates')
    raise e

# Assign barcodes according to primer plate

This portion of the notebook will assign a barcode to each sample according to the primer plate number.

As inputs, it requires:
1. A plate map dataframe (from previous step)
2. Preparation metadata for the plates, importantly we need the Primer Plate # so we know what **EMP barcodes** to assign to each plate.

The workflow then:
1. Joins the preparation metadata with the plate metadata.
2. Assigns indices per sample

## Enter and validate the plating metadata

- In general you will want to update all the fields, but the most important ones are the `Primer Plate #` and the `Plate Position`. `Primer Plate #` determines which EMP barcodes will be used for this plate. `Plate Position` determines the physical location of the plate.
- If you are plating less than four plates, then remove the metadata for that plate by deleting the text between teh curly braces.
- For missing fields, write NA between the single quotes for example `'NA'`.
- To enter a plate copy and paste the contents from the plates below.

In [469]:
_metadata = [
    {
        # top left plate
        'Plate Position': '1',
        'Primer Plate #': '1',
        
        'Sample Plate': 'THDMI_UK_Plate_2',
        'Project_Name': 'THDMI UK',

        'Plating': 'SF',
        'Extraction Kit Lot': '166032128',
        'Extraction Robot': 'Carmen_HOWE_KF3',
        'TM1000 8 Tool': '109379Z',
        'Primer Date': '2021-08-17', # yyyy-mm-dd
        'MasterMix Lot': '978215',
        'Water Lot': 'RNBJ0628',
        'Processing Robot': 'Echo550',
        'Original Name': ''
    },
    {
        # top right plate
        'Plate Position': '2',
        'Primer Plate #': '2',
        
        'Sample Plate': 'THDMI_UK_Plate_3',
        'Project_Name': 'THDMI UK',

        'Plating':'AS',
        'Extraction Kit Lot': '166032128',
        'Extraction Robot': 'Carmen_HOWE_KF4',
        'TM1000 8 Tool': '109379Z',
        'Primer Date': '2021-08-17', # yyyy-mm-dd
        'MasterMix Lot': '978215',
        'Water Lot': 'RNBJ0628',
        'Processing Robot': 'Echo550',
        'Original Name': ''
    },
    {
        # bottom left plate
        'Plate Position': '3',
        'Primer Plate #': '3',
        
        'Sample Plate': 'THDMI_UK_Plate_4',
        'Project_Name': 'THDMI UK',

        'Plating':'MB_SF',
        'Extraction Kit Lot': '166032128',
        'Extraction Robot': 'Carmen_HOWE_KF3',
        'TM1000 8 Tool': '109379Z',
        'Primer Date': '2021-08-17', # yyyy-mm-dd
        'MasterMix Lot': '978215',
        'Water Lot': 'RNBJ0628',
        'Processing Robot': 'Echo550',
        'Original Name': ''
    },
    {
        # bottom right plate
        'Plate Position': '4',
        'Primer Plate #': '4',
        
        'Sample Plate': 'THDMI_US_Plate_6',
        'Project_Name': 'THDMI US',

        'Plating':'AS',
        'Extraction Kit Lot': '166032128',
        'Extraction Robot': 'Carmen_HOWE_KF4',
        'TM1000 8 Tool': '109379Z',
        'Primer Date': '2021-08-17', # yyyy-mm-dd
        'MasterMix Lot': '978215',
        'Water Lot': 'RNBJ0628',
        'Processing Robot': 'Echo550',
        'Original Name': ''
    },
]

plate_metadata = validate_plate_metadata(_metadata)
plate_metadata

Unnamed: 0,Plate Position,Primer Plate #,Sample Plate,Project_Name,Plating,Extraction Kit Lot,Extraction Robot,TM1000 8 Tool,Primer Date,MasterMix Lot,Water Lot,Processing Robot,Original Name
0,1,1,THDMI_UK_Plate_2,THDMI UK,SF,166032128,Carmen_HOWE_KF3,109379Z,2021-08-17,978215,RNBJ0628,Echo550,
1,2,2,THDMI_UK_Plate_3,THDMI UK,AS,166032128,Carmen_HOWE_KF4,109379Z,2021-08-17,978215,RNBJ0628,Echo550,
2,3,3,THDMI_UK_Plate_4,THDMI UK,MB_SF,166032128,Carmen_HOWE_KF3,109379Z,2021-08-17,978215,RNBJ0628,Echo550,
3,4,4,THDMI_US_Plate_6,THDMI US,AS,166032128,Carmen_HOWE_KF4,109379Z,2021-08-17,978215,RNBJ0628,Echo550,


The `Plate Position` and `Primer Plate #` allow us to figure out which wells are associated with each of the EMP barcodes.

In [470]:
if plate_metadata is not None:
    plate_df = assign_emp_index(plate_df, plate_metadata, seq_type).reset_index()

    plate_df.head()
else:
    print('Error: Please fix the errors in the previous cell')

As you can see in the table above, the resulting table is now associated with the corresponding EMP barcodes (`Golay Barcode`, `Forward Primer Linker`, etc), and the plating metadata (`Primer Plate #`, `Primer Date`, `Water Lot`, etc).

In [471]:
plate_df.head()

Unnamed: 0,index,Sample,Row,Col,Blank,Project Plate,Project Name,Compressed Plate Name,Well,Plate Position,...,Original Name,Plate,EMP Primer Plate Well,Name,Illumina 5prime Adapter,Golay Barcode,Forward Primer Pad,Forward Primer Linker,515FB Forward Primer (Parada),Primer For PCR
0,0,X00180471,A,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,A1,1,...,,1,A1,515rcbc0,AATGATACGGCGACCACCGAGATCTACACGCT,AGCCTTCGTCGC,TATGGTAATT,GT,GTGYCAGCMGCCGCGGTAA,AATGATACGGCGACCACCGAGATCTACACGCTAGCCTTCGTCGCTA...
1,1,X00180199,C,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,C1,1,...,,1,B1,515rcbc12,AATGATACGGCGACCACCGAGATCTACACGCT,CGTATAAATGCG,TATGGTAATT,GT,GTGYCAGCMGCCGCGGTAA,AATGATACGGCGACCACCGAGATCTACACGCTCGTATAAATGCGTA...
2,2,X00179789,E,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,E1,1,...,,1,C1,515rcbc24,AATGATACGGCGACCACCGAGATCTACACGCT,TGACTAATGGCC,TATGGTAATT,GT,GTGYCAGCMGCCGCGGTAA,AATGATACGGCGACCACCGAGATCTACACGCTTGACTAATGGCCTA...
3,3,X00180201,G,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,G1,1,...,,1,D1,515rcbc36,AATGATACGGCGACCACCGAGATCTACACGCT,GTGGAGTCTCAT,TATGGTAATT,GT,GTGYCAGCMGCCGCGGTAA,AATGATACGGCGACCACCGAGATCTACACGCTGTGGAGTCTCATTA...
4,4,X00180464,I,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,I1,1,...,,1,E1,515rcbc48,AATGATACGGCGACCACCGAGATCTACACGCT,TGATGTGCTAAG,TATGGTAATT,GT,GTGYCAGCMGCCGCGGTAA,AATGATACGGCGACCACCGAGATCTACACGCTTGATGTGCTAAGTA...


# Combine plates (optional)

If you would like to combine existing plates with these samples, enter the path to their corresponding sample sheets and mapping (preparation) files below. Otherwise you can skip to the next section.

- sample sheet and mapping (preparation)

In [472]:
files = [
    # uncomment the line below and point to the correct filepaths to combine with previous plates
    # ['test_output/amplicon/2021_08_17_THDMI-4-6_samplesheet.csv', 'test_output/amplicon/2021-08-01-515f806r_prep.tsv'],
]
sheets, preps = [], []

for sheet, prep in files:
    sheets.append(KLSampleSheet(sheet))
    preps.append(parse_prep(prep))
    
if len(files):
    print('%d pair of files loaded' % len(files))

# Make Sample Sheet

This workflow takes the pooled sample information and writes an Illumina sample sheet that can be given directly to the sequencing center or processing pipeline. Note that as of writing `bcl2fastq` does not support error-correction in Golay barcodes so the sample sheet is used to generate a mapping (preparation) file but not to demultiplex sequences. Demultiplexing takes place in [Qiita](https://qiita.ucsd.edu).

As inputs, this notebook requires:
1. A plate map DataFrame (from previous step)

The workflow:
1. formats sample names as bcl2fastq-compatible
2. formats sample data
3. sets values for sample sheet fields and formats sample sheet.
4. writes the sample sheet to a file

## Step 1: Format sample names to be bcl2fastq-compatible

bcl2fastq requires *only* alphanumeric, hyphens, and underscore characters. We'll replace all non-those characters
with underscores and add the bcl2fastq-compatible names to the DataFrame.

In [473]:
plate_df['sample sheet Sample_ID'] = plate_df['Sample'].map(bcl_scrub_name)

plate_df.head()

Unnamed: 0,index,Sample,Row,Col,Blank,Project Plate,Project Name,Compressed Plate Name,Well,Plate Position,...,Plate,EMP Primer Plate Well,Name,Illumina 5prime Adapter,Golay Barcode,Forward Primer Pad,Forward Primer Linker,515FB Forward Primer (Parada),Primer For PCR,sample sheet Sample_ID
0,0,X00180471,A,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,A1,1,...,1,A1,515rcbc0,AATGATACGGCGACCACCGAGATCTACACGCT,AGCCTTCGTCGC,TATGGTAATT,GT,GTGYCAGCMGCCGCGGTAA,AATGATACGGCGACCACCGAGATCTACACGCTAGCCTTCGTCGCTA...,X00180471
1,1,X00180199,C,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,C1,1,...,1,B1,515rcbc12,AATGATACGGCGACCACCGAGATCTACACGCT,CGTATAAATGCG,TATGGTAATT,GT,GTGYCAGCMGCCGCGGTAA,AATGATACGGCGACCACCGAGATCTACACGCTCGTATAAATGCGTA...,X00180199
2,2,X00179789,E,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,E1,1,...,1,C1,515rcbc24,AATGATACGGCGACCACCGAGATCTACACGCT,TGACTAATGGCC,TATGGTAATT,GT,GTGYCAGCMGCCGCGGTAA,AATGATACGGCGACCACCGAGATCTACACGCTTGACTAATGGCCTA...,X00179789
3,3,X00180201,G,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,G1,1,...,1,D1,515rcbc36,AATGATACGGCGACCACCGAGATCTACACGCT,GTGGAGTCTCAT,TATGGTAATT,GT,GTGYCAGCMGCCGCGGTAA,AATGATACGGCGACCACCGAGATCTACACGCTGTGGAGTCTCATTA...,X00180201
4,4,X00180464,I,1,False,THDMI_10317_PUK2,THDMI_10317,THDMI_10317_UK2-US6,I1,1,...,1,E1,515rcbc48,AATGATACGGCGACCACCGAGATCTACACGCT,TGATGTGCTAAG,TATGGTAATT,GT,GTGYCAGCMGCCGCGGTAA,AATGATACGGCGACCACCGAGATCTACACGCTTGATGTGCTAAGTA...,X00180464


## Format the sample sheet data

This step formats the data columns appropriately for the sample sheet, using the values we've calculated previously.

The newly-created `bcl2fastq`-compatible names will be in the `Sample ID` and `Sample Name` columns. The original sample names will be in the Description column.

Modify lanes to indicate which lanes this pool will be sequenced on.

The `Project Name` and `Project Plate` columns will be placed in the `Sample_Project` and `Sample_Name` columns, respectively.

sequencer is important for making sure the i5 index is in the correct orientation for demultiplexing. `HiSeq4000`, `HiSeq3000`, `NextSeq`, and `MiniSeq` all require reverse-complemented i5 index sequences. If you enter one of these exact strings in for sequencer, it will revcomp the i5 sequence for you.

`HiSeq2500`, `MiSeq`, and `NovaSeq` will not revcomp the i5 sequence.

In [474]:
sequencer = 'HiSeq4000'
lanes = [1]

metadata = {
    'Bioinformatics': [
        {
         'Sample_Project': 'THDMI_10317',
         'QiitaID': '10317',
         'BarcodesAreRC': 'False',
         'ForwardAdapter': '',
         'ReverseAdapter': '',
         'HumanFiltering': 'True'
        },
    ],
    'Contact': [
        {
         'Sample_Project': 'THDMI_10317',
         # non-admin contacts who want to know when the sequences
         # are available in Qiita
         'Email': 'yoshiki@compy.com,ilike@turtles.com'
        },
    ],
    'Chemistry': 'Amplicon',
    'Assay': 'TruSeq HT',
}

sheet = make_sample_sheet(metadata, plate_df, sequencer, lanes)

sheet.Settings['Adapter'] = 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCA'
sheet.Settings['AdapterRead2'] = 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT'



Check for any possible errors in the sample sheet

In [475]:
sheet = validate_and_scrub_sample_sheet(sheet)

Add the other sample sheets

In [476]:
if len(sheets):
    sheet.merge(sheets)

## Step 3: Write the sample sheet to file

In [477]:
# write sample sheet as .csv
sample_sheet_fp = './test_output/amplicon/2021_08_17_THDMI-4-6_samplesheet16S.csv'

if os.path.isfile(sample_sheet_fp):
    print("Warning! This file exists already.")



In [478]:
with open(sample_sheet_fp,'w') as f:
    sheet.write(f)
    
!head -n 30 {sample_sheet_fp}
!echo ...
!tail -n 15 {sample_sheet_fp}

[Header],,,,,,,,,,
IEMFileVersion,4,,,,,,,,,
Date,2021-09-30,,,,,,,,,
Workflow,GenerateFASTQ,,,,,,,,,
Application,FASTQ Only,,,,,,,,,
Assay,TruSeq HT,,,,,,,,,
Description,,,,,,,,,,
Chemistry,Amplicon,,,,,,,,,
,,,,,,,,,,
[Reads],,,,,,,,,,
151,,,,,,,,,,
151,,,,,,,,,,
,,,,,,,,,,
[Settings],,,,,,,,,,
ReverseComplement,0,,,,,,,,,
Adapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA,,,,,,,,,
AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT,,,,,,,,,
,,,,,,,,,,
[Data],,,,,,,,,,
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,Sample_Project,I5_Index_ID,index2,Well_description,Lane
X00180471,X00180471,THDMI_10317_PUK2,A1,515rcbc0,AGCCTTCGTCGC,THDMI_10317,,,,1
X00180199,X00180199,THDMI_10317_PUK2,C1,515rcbc12,CGTATAAATGCG,THDMI_10317,,,,1
X00179789,X00179789,THDMI_10317_PUK2,E1,515rcbc24,TGACTAATGGCC,THDMI_10317,,,,1
X00180201,X00180201,THDMI_10317_PUK2,G1,515rcbc36,GTGGAGTCTCAT,THDMI_10317,,,,1
X00180464,X00180464,THDMI_10317_PUK2,I1,515rcbc48,TGATGTGCTAAG,THDMI_10317,,,,1
X00179796,X0017979

# Create a mapping (preparation) file for Qiita

In [479]:
output_filename = 'test_output/amplicon/2021-08-01-515f806r_prep.tsv'

In [480]:
qiita_df = generate_qiita_prep_file(plate_df, seq_type)
qiita_df.head()

Unnamed: 0,sample_name,barcode,primer,project_name,well_id,primer_plate,plating,extractionkit_lot,extraction_robot,tm1000_8_tool,...,orig_name,well_description,pcr_primers,center_name,run_center,platform,target_subfragment,target_gene,sequencing_meth,library_construction_protocol
0,X00180471,AGCCTTCGTCGC,GTGYCAGCMGCCGCGGTAA,THDMI_10317,A1,1,SF,166032128,Carmen_HOWE_KF3,109379Z,...,X00180471,THDMI_UK_Plate_2.,FWD:GTGYCAGCMGCCGCGGTAA; REV:GGACTACNVGGGTWTCTAAT,UCSDMI,UCSDMI,Illumina,V4,16S rRNA,Sequencing by synthesis,"Illumina EMP protocol 515fbc, 806r amplificati..."
1,X00180199,CGTATAAATGCG,GTGYCAGCMGCCGCGGTAA,THDMI_10317,C1,1,SF,166032128,Carmen_HOWE_KF3,109379Z,...,X00180199,THDMI_UK_Plate_2.,FWD:GTGYCAGCMGCCGCGGTAA; REV:GGACTACNVGGGTWTCTAAT,UCSDMI,UCSDMI,Illumina,V4,16S rRNA,Sequencing by synthesis,"Illumina EMP protocol 515fbc, 806r amplificati..."
2,X00179789,TGACTAATGGCC,GTGYCAGCMGCCGCGGTAA,THDMI_10317,E1,1,SF,166032128,Carmen_HOWE_KF3,109379Z,...,X00179789,THDMI_UK_Plate_2.,FWD:GTGYCAGCMGCCGCGGTAA; REV:GGACTACNVGGGTWTCTAAT,UCSDMI,UCSDMI,Illumina,V4,16S rRNA,Sequencing by synthesis,"Illumina EMP protocol 515fbc, 806r amplificati..."
3,X00180201,GTGGAGTCTCAT,GTGYCAGCMGCCGCGGTAA,THDMI_10317,G1,1,SF,166032128,Carmen_HOWE_KF3,109379Z,...,X00180201,THDMI_UK_Plate_2.,FWD:GTGYCAGCMGCCGCGGTAA; REV:GGACTACNVGGGTWTCTAAT,UCSDMI,UCSDMI,Illumina,V4,16S rRNA,Sequencing by synthesis,"Illumina EMP protocol 515fbc, 806r amplificati..."
4,X00180464,TGATGTGCTAAG,GTGYCAGCMGCCGCGGTAA,THDMI_10317,I1,1,SF,166032128,Carmen_HOWE_KF3,109379Z,...,X00180464,THDMI_UK_Plate_2.,FWD:GTGYCAGCMGCCGCGGTAA; REV:GGACTACNVGGGTWTCTAAT,UCSDMI,UCSDMI,Illumina,V4,16S rRNA,Sequencing by synthesis,"Illumina EMP protocol 515fbc, 806r amplificati..."


In [481]:
qiita_df.set_index('sample_name', verify_integrity=True).to_csv(output_filename, sep='\t')

Add the previous sample sheets

In [482]:
if len(preps):
    prep = prep.append(preps, ignore_index=True)

In [483]:
!head -n 5 {output_filename}

sample_name	barcode	primer	project_name	well_id	primer_plate	plating	extractionkit_lot	extraction_robot	tm1000_8_tool	primer_date	mastermix_lot	water_lot	processing_robot	sample_plate	linker	orig_name	well_description	pcr_primers	center_name	run_center	platform	target_subfragment	target_gene	sequencing_meth	library_construction_protocol
X00180471	AGCCTTCGTCGC	GTGYCAGCMGCCGCGGTAA	THDMI_10317	A1	1	SF	166032128	Carmen_HOWE_KF3	109379Z	2021-08-17	978215	RNBJ0628	Echo550	THDMI_UK_Plate_2	GT	X00180471	THDMI_UK_Plate_2.	FWD:GTGYCAGCMGCCGCGGTAA; REV:GGACTACNVGGGTWTCTAAT	UCSDMI	UCSDMI	Illumina	V4	16S rRNA	Sequencing by synthesis	Illumina EMP protocol 515fbc, 806r amplification of 16S rRNA V4
X00180199	CGTATAAATGCG	GTGYCAGCMGCCGCGGTAA	THDMI_10317	C1	1	SF	166032128	Carmen_HOWE_KF3	109379Z	2021-08-17	978215	RNBJ0628	Echo550	THDMI_UK_Plate_2	GT	X00180199	THDMI_UK_Plate_2.	FWD:GTGYCAGCMGCCGCGGTAA; REV:GGACTACNVGGGTWTCTAAT	UCSDMI	UCSDMI	Illumina	V4	16S rRNA	Sequencing by synthesis	Illumina EMP prot