We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For example, using the Metalign default training database (199807 genomes) and running
python MakeStreamingDNADatabase.py ${trainingFiles} ${outputDir}/${cmashDatabase} -n ${numHashes} -k 60 -v python MakeStreamingPrefilter.py ${outputDir}/${cmashDatabase} ${outputDir}/${prefilterName} 30-60-10
results in uncompressed:
16G Mar 22 03:39 cmash_db_n1000_k60.h5 9.3G Mar 22 08:07 cmash_db_n1000_k60_30-60-10.bf 6.9G Mar 22 04:34 cmash_db_n1000_k60.tst
yet
4.6G Mar 22 03:39 cmash_db_n1000_k60.h5.gz 3.6G Mar 22 08:07 cmash_db_n1000_k60_30-60-10.bf.gz 3.6G Mar 22 04:34 cmash_db_n1000_k60.tst.gz
so ~2-4x compression.
Would need to either:
MakeStreamingDNADatabase.py
MakeStreamingPrefilter.py
The text was updated successfully, but these errors were encountered:
No branches or pull requests
For example, using the Metalign default training database (199807 genomes) and running
results in uncompressed:
yet
so ~2-4x compression.
Would need to either:
MakeStreamingDNADatabase.py
andMakeStreamingPrefilter.py
to detect compressed training data and decompress it in the script or (better yet)The text was updated successfully, but these errors were encountered: