New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensuring to find the correct file after Demultiplexing my 16S amplicon raw dataset with combinatorial dual indexes cutadapt command #776
Comments
How many barcodes are in |
Thanks for your question. It was 160 (set B- 96 and the rest in Set C) kit used: NextFlex Rapid XP kit Forward primer: GTGCCAGCMGCCGCGGTAA Two files have been received. head -n 4 sample2_1_L001_R1_001.fastq How many barcodes are in barcodes_fwd.fasta and barcodes_rev.fasta, respectively? Two 96 plates: B, and C barcode , example see below: In Plate B, the barcode combination is as follows: In Plate C, the barcode combination is as follows: I combined plates B and C and created barcodes_fwd.fasta and barcodes_rev.fasta files for each sample number. This means that plate B starts with LIB1_sample_001 and ends with LIB1_sample_096, while plate C starts with LIB1_sample_097 and ends with LIB1_sample_160. barcodes_fwd.fasta
barcodes_rev.fasta
|
I still don’t have a clear picture of how your data is structured. I believe you need to understand this yourself before you can proceed. In particular, you need to figure out where the index sequences are. First, to be explicit: There is a difference between unique dual indexing and combinatorial indexing. Unless you have reliable information that combinatorial indexing was used, it is more likely that unique dual indexing was done.
I am not familiar with it, but the manual for version 2 of that kit talks about Unique Dual Indices:
According to the same manual, the UDIs need to be bought separately, so it would still possible to use combinatorial indexing, but that would be against Illumina’s own advice:
Your first read looks like this:
Do all reads in the first file contain |
Hello there Do all reads in the first file contain CGCTGCTC+GATCTGCC? What about the second file? Checking LIB1_L2_1.fq for sequence CGCTGCTC+GATCTGCC... I had given a bad example,: here is right one from L001_1.fastq In this string 'TTACCGCTGTGCCAGCAGCCGCGGTAA' of of L001_1, the first eight nucleotides are barcodesTTACCGCT, and the rest is forward primer. @A00783:1516:HWGV2DRX3:1:2145:3992:12289 1:N:0:CGCTGCTC+GATCTGCC @A00783:1516:HWGV2DRX3:1:2146:4408:26099 1:N:0:CGCTGCTC+GATCTGCC @A00783:1516:HWGV2DRX3:1:2147:20889:26741 1:N:0:CGCTGCTC+GATCTGCC @A00783:1516:HWGV2DRX3:1:2151:12608:26271 1:N:0:CGCTGCTC+GATCTGCC L001_1.fastq In this string 'TTACCGCTGGACTACAGGGGTATCTAAT' of L001_1 , the first eight nucleotides are barcodesTTACCGCT, and the rest is reverse primer. @A00783:1516:HWGV2DRX3:1:2116:7536:34084 1:N:0:CGCTGCTC+GATCTGCC @A00783:1516:HWGV2DRX3:1:2117:22815:14325 1:N:0:CGCTGCTC+GATCTGCC @A00783:1516:HWGV2DRX3:1:2121:20509:32659 1:N:0:CGCTGCTC+GATCTGCC |
Do you know which 230 barcode combinations are possible? Then you look for only those in the following way:
If you do not know which combinations are possible, just use the 230 biggest files. That may not be entirely correct, but it may be good enough. |
Here are some of the output generated from the code above:
The text was updated successfully, but these errors were encountered: