Gene2vec: Neural word embeddigs of genetic data

Overview

Gene2vec is an adaptation of the Word2vec model that aims to construct quasi-syntactic and semantic relationships from amino acid sequence data. Word2vec is an extension upon the continuous Skip-gram model that allows for precise representation of semantic and syntactic word relationships. Additionally, Word2vec representations exhibit additive composability such that vector arithmetic can be performed on words. Mikolov et al. illustrate this behavior by noting that the resulting vector space representation of ("Madrid" - "Spain" + "France") is closer to that of "Paris" than any other word.

We demonstrate the successful construction of such relationships from amino acid sequences by using them to perform some rudimentary protein classification.

See the report for more info.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
report		report
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gene2vec: Neural word embeddigs of genetic data

Overview

About

Releases

Packages

Languages

david-r-cox/Gene2vec

Folders and files

Latest commit

History

Repository files navigation

Gene2vec: Neural word embeddigs of genetic data

Overview

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages