HoarfrostLab Datasets Index

A curated collection of datasets published under HoarfrostLab.

Dataset	Link
`GRIMM`	link

To recreate Stratified data:

Download links in links.txt
Run code/scripts/id_mapping.py to get chunks for Swissprot/Uniref50 id mapping
Run code/scripts/combine_data.py to get the swissprot_full.tsv
Run appropriate "create_data_splits_{}.py" to get the appropriate data splits (either biological pseudo-kfold (custom) or scikit k-fold (scikit). The custom version is not truly a k-fold but preserves uniref50 clusters. It must be used such that 5 independent models are trained and infered individually or via ensemble.

For nucleotides:

Run id_mapping.py again for EMBL_CDS ids
Run add_nuc_to_swissprot.py to combine files into new swissprot
Run download.py to grab the sequences from ENA and pipe to fasta (courtesy: Soumya)
Run create_nuc_splits.py to reformat ids in split_[1-5] into nuc_split_[1-5]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
DS_results		DS_results
code		code
metrics		metrics
LICENSE		LICENSE
README.md		README.md
links.txt		links.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HoarfrostLab Datasets Index

About

Uh oh!

Releases

Packages

Languages

License

Hoarfrost-Lab/GRIMM

Folders and files

Latest commit

History

Repository files navigation

HoarfrostLab Datasets Index

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages