New workshop data set #42
Thread datacarpentry/wrangling-genomics#111 brings up the concern that
Here is an example: SRA accession SRR2584858. These are 2x101bp paired-end reads sequenced from E. coli with the HiSeq 2000 platform and published in 2016. Using the same reference genome as before, this particular data set results in 4 SNV calls and a single INDEL call from the variant calling workflow.
Note, I haven't checked this data for poly-N content, which is an important consideration given its relevance to the file search and redirection sections in the shell-genomics module.
referenced this issue
Apr 16, 2018
The paper is here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4988878/
See this issue for a detailed description of new clean and messy spreadsheets, including a list of all the changes made to make the data messy, which could be added to the instructor notes or to the solution of a challenge.