Perform CoV Cross-validation (hold-out) #48

rcedgar · 2020-04-24T14:42:42Z

From Artem by email: "[T]he difference in number of reads mapping between CoV+ control libraries and mammal transcriptomes is very large."

I suspect this may be misleading. If I understand correctly, Cov+ datasets have known infections by known coronaviruses, mostly (all?) Cov-19, but we are looking for incidental infections by novel coronaviruses which by definition are not in the pan-genome. Possibly, the number of viral transcripts will tend to be much lower with an incidental infection. Certainly, a novel virus will have lower identity and uneven coverage to the pan-genome compared to a positive control with a Cov-19 infection.

To model what we might see in production, Cov+ datasets should be mapped to a pan-genome with the genome of the known infection (Cov-19) excluded and of its genes plus close relatives. This is hold-out validation, also called cross-validation.

A hold-out pan-genome can be constructed using usearch as follows:

usearch -usearch_global pan_genome.fa \
-db cov19_genome.fa \
-strand both \
-id 0.95 \
-uc hits.uc \
-notmached holdout_pan_genome.fa

Here, 0.95 is the identity threshold; here all sequences having >=95% with cov19_genome.fa are removed from the reference; hits.uc is a tsv file with one record for each pan-genome sequence indicating whether it matched or not.

ababaian · 2020-04-24T16:20:32Z

This is very well outlined. @victorlin #27 this is a perfect example of the application of the script you're working on.

rcedgar added the Bioinformatics Bioinformatics task label Apr 24, 2020

rcedgar mentioned this issue Apr 24, 2020

Hard Optimize bowtie2 alignment parameters #35

Closed

ababaian changed the title ~~Testing on Cov+ datasets~~ Perform CoV Cross-validation (hold-out) Apr 24, 2020

rcedgar self-assigned this Apr 27, 2020

rcedgar closed this as completed May 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform CoV Cross-validation (hold-out) #48

Perform CoV Cross-validation (hold-out) #48

rcedgar commented Apr 24, 2020 •

edited

Loading

ababaian commented Apr 24, 2020

Perform CoV Cross-validation (hold-out) #48

Perform CoV Cross-validation (hold-out) #48

Comments

rcedgar commented Apr 24, 2020 • edited Loading

ababaian commented Apr 24, 2020

rcedgar commented Apr 24, 2020 •

edited

Loading