Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Data Sets #2043
I appreciate your quick response, thank you.
I am looking for any large data set to do performance testing of the ADAM with the infrastructure that I have built for the analytics. This setup is going to be used by biotech companies. Can you recommend the data sets that I can use. I am thinking of cancer data.
You might consider using one of the standard human genome references made available from the Genome In a Bottle project. I have been using NA12878 for some of my own performance benchmarking of ADAM. You can obtain the whole genome sequencing data from
95GB BAM file "NA12878_phased_possorted_bam.bam" contains the sequencing records. Although this isn't "cancer data" it is a whole genome sequencing run of the human genome.