diseasy

Ideas

Compare text vs. semantic similarity
Bake-off various language distance metrics
Various method of distance too
Compare to random

Can you use text comparison methods to find similarities between human diseases and zebrafish phenotypes?

Or do you need a very custom mapping via ontologies? The original design was to do something like this. Good idea to first do a bunch of comparisons and then determine if you need to develop something new.

Download a bunch of python-based text comparision libraries
Start some simple bake-offs
Figure out how to do the random model
Which are our gold standards?

Q: What does failure look like? A: Random is indistinguishable from real diseases

Q: What does success look like? A: Gold standards are found

Clustering

Compare human diseases vs. human diseases Compare zf phenotypes vs zf phenotypes

textcompare1.py

install conda pip3 install nltk scikit-learn transformers torch fasttext

curl https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz --output cc.en.300.bin

textcompare2.py

pip3 install -U sentence-transformers

works

textcompare3.py

pip3 install tensorflow tensorflow_hub

works

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
archive		archive
.gitignore		.gitignore
H2ZAll		H2ZAll
H2ZRanks		H2ZRanks
LICENSE		LICENSE
README.md		README.md
averager.py		averager.py
bigtable.sem-lines.tsv		bigtable.sem-lines.tsv
bigtable.sem-words.tsv		bigtable.sem-words.tsv
bigtable.txt-lines.tsv		bigtable.txt-lines.tsv
bigtable.txt-words.tsv		bigtable.txt-words.tsv
bigtables.py		bigtables.py
clusters.R		clusters.R
comparescores		comparescores
comparescores_all		comparescores_all
comparisonAverages.Rmd		comparisonAverages.Rmd
comparisonAverages.html		comparisonAverages.html
diseasy.json		diseasy.json
diseasy.pretty.stats.R		diseasy.pretty.stats.R
h2h-sem-lines.mat		h2h-sem-lines.mat
h2h-sem-words.mat		h2h-sem-words.mat
h2h-txt-lines.mat		h2h-txt-lines.mat
h2h-txt-words.mat		h2h-txt-words.mat
h2h.aves		h2h.aves
h2z.aves		h2z.aves
h2zconfirmation.Rmd		h2zconfirmation.Rmd
matrixer.py		matrixer.py
methodstats.py		methodstats.py
ortholist.py		ortholist.py
orthologs.txt		orthologs.txt
parsedoid.pl		parsedoid.pl
reader.py		reader.py
semcmp.py		semcmp.py
setup.py		setup.py
template.json		template.json
txtcmp.py		txtcmp.py
z2z-sem-lines.mat		z2z-sem-lines.mat
z2z-sem-words.mat		z2z-sem-words.mat
z2z-txt-lines.mat		z2z-txt-lines.mat
z2z-txt-words.mat		z2z-txt-words.mat
z2z.aves		z2z.aves
z2zdendograms.Rmd		z2zdendograms.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

diseasy

Clustering

About

Releases

Packages

Contributors 2

Languages

License

KorfLab/diseasy

Folders and files

Latest commit

History

Repository files navigation

diseasy

Clustering

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages