Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-G does not behave as expected, i.e. as -g for R2. #692

Open
ryandward opened this issue Apr 5, 2023 · 3 comments
Open

-G does not behave as expected, i.e. as -g for R2. #692

ryandward opened this issue Apr 5, 2023 · 3 comments

Comments

@ryandward
Copy link

It doesn't seem that switching -g to -G, and also switching the order of the reads results in the same results.

❯ cutadapt \
  --action=none \
  -G ^file:adapter_file.fasta \ # <-- looking at read 2
  -j 12 --no-indels --error-rate=0 \
  -o demultiplexed_{name}_R1.fastq.gz \
  -p demultiplexed_{name}_R2.fastq.gz \
  read1.fastq.gz \
  read2.fastq.gz

This results in everything going to files with unknown in the variable portion of the name.

However, when I switch around -g to -G, and switch places of read1 and read2, adapter trimming works as expected.

❯ cutadapt \
  --action=none \
  -g ^file:adapter_file.fasta \ # <-- changed to lowercase, i.e. looking at read1
   -j 12 --no-indels --error-rate=0 \
  -o demultiplexed_{name}_R1.fastq.gz \
  -p demultiplexed_{name}_R2.fastq.gz \
  read2.fastq.gz \ # <-- switched order
  read1.fastq.gz 

❯ conda --version
conda 22.11.1
❯ cutadapt --version
4.3

@marcelm
Copy link
Owner

marcelm commented Apr 6, 2023

This may be easy to overlook, but it is documented in the section about demultiplexing:

Paired-end demultiplexing always uses the adapter matches of the first read to decide where a read should be written. If adapters for read 2 are given (-A/-G), they are detected and removed as normal, but these matches do not influence where the read pair is written.

To demultiplex using a barcode that is located on read 2, you can “cheat” and swap the roles of R1 and R2 for both the input and output files

cutadapt -e 1 -g ^file:barcodes.fasta -o trimmed-{name}.2.fastq.gz -p trimmed-{name}.1.fastq.gz input.2.fastq.gz input.1.fastq.gz

If you do this in a script or pipeline, it may be a good idea to add a comment to clarify that this reversal of R1 and R2 is intended.

I wonder how to improve this. I think the first step is to print a warning when trying to demultiplex and no adapters were provided for R1.

@ryandward
Copy link
Author

Thanks for the clarification. I finally did wind up cheating by swapping the reads, which is totally okay by the way. I think your proposal for a warning makes a lot of sense.

My script will then have to swap the reads back from [2,1] to [1,2] in a downstream process, but it's not at all broken.

Cheers.

@marcelm
Copy link
Owner

marcelm commented Apr 6, 2023

I’ll also see whether I can make it so that you can specify {name2} as a template variable. That would then explicitly declare that you want to use the matches on R2 for demultiplexing. Then no file swapping is necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants