Skip to content

JaesikKim/HiG2Vec

Repository files navigation

HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball

Bioinformatics 2021 [Paper]

Installation

Simply clone this repository via

$ git clone https://github.com/JaesikKim/HiG2Vec.git
$ cd HiG2Vec
$ conda env create -n hig2vec -f environment.yml
$ conda activate hig2vec
$ python setup.py build_ext --inplace 

Corpus

Gene Ontology and Gene Ontology Annotation are available in official website (http://geneontology.org/)

Preprocessing

Transitive closure of GO

$ data/transitive_closure.py -dset data/GO.tsv

Train

$ ./run_embedding.sh

Evaluation

$ python evalGO/link_prediction.py -dset evalGO/GO_samples.txt -model result/hig2vec.pth -distfn poincare
$ python evalGO/reconstruction.py -model result/hig2vec.pth -eval data/GO_closure.tsv -distfn poincare
$ python evalGO/level_prediction.py -dset evalGO/level_samples.txt -model result/hig2vec.pth -fout evalGO/level_output.txt 
$ python evalGene/binary_prediction_NN.py -dset evalGene/STRING_samples_binary.csv -model result/hig2vec.pth -fout evalGene/binary_output.txt
$ python evalGene/binary_prediction_NN.py -dset evalGene/STRING_samples_binary.csv -model result/hig2vec.pth -fout evalGene/binary_output.txt
$ python multilabel_prediction_NN.py -dset evalGene/STRING_samples_multilabel.csv -model result/hig2vec.pth
$ python evalGene/score_prediction_NN.py -dset evalGene/STRING_samples_score.csv -model result/hig2vec.pth -fout evalGene/score_output.txt

GO and gene embeddings

[Download Link] for HiG2Vec 200 dim and 1000 dim (GOonly, Human, Mouse, and Yeast)

Python code for usage

import torch

model = torch.load("HiG2Vec.pth", map_location="cpu")
objects, embeddings = model['objects'], model['embeddings']

Dependencies

  • python 3 with numpy
  • pytorch >= 2.0.0
  • scikit-Learn >= 1.2.1
  • pandas
  • tqdm
  • cython >= 0.29.33

Citation

@article{10.1093/bioinformatics/btab193, 
  author = {Kim, Jaesik and Kim, Dokyoon and Sohn, Kyung-Ah},
  title = "{HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball}",
  journal = {Bioinformatics},
  year = {2021}
}

** License Software code is under MIT license, and the pre-trained HiG2Vec (GOonly, Human, Mouse, and Yeast) are under CC0 license

About

No description, website, or topics provided.

Resources

License

MIT, CC0-1.0 licenses found

Licenses found

MIT
LICENSE
CC0-1.0
LICENSE-CC0

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published