Skip to content
This repository has been archived by the owner on Jan 2, 2019. It is now read-only.

Replace Cortical fingerprints with non-proprietary word vectors #9

Open
ahirner opened this issue Feb 21, 2016 · 0 comments
Open

Replace Cortical fingerprints with non-proprietary word vectors #9

ahirner opened this issue Feb 21, 2016 · 0 comments
Assignees
Labels

Comments

@ahirner
Copy link
Owner

ahirner commented Feb 21, 2016

The essence of the cortical API is just mapping words into fixed-length sparse bit vectors. You can get the same functionality with dense vectors. The most famous famous algorithms are for example implemented in Gensim which allows for subsequent:

  • clustering to discover common types of documents
  • (approximate) nearest neighbor search to form recommendations for similar tables
  • semantic search (e.g. "doctors +
    The main "secret sauce" is to do efficient matrix decomposition on term-frequencies around the word in focus (original paper by Mikolov et. al 2013, good explanation on Quora).
    Many pre-learnt word vectors on different corpora exist (Wikipedia, news articles, etc.). Thus, it's feasible to just load such a dictionary once and put them on a server and avoid dependency on Cortical. This includes basic operations such as averaging on bag-of-words.

A ready made server implementation is from 3Top: https://github.com/3Top/word2vec-api
If we need more sophisticated NLP with syntactic parsing, e.g. to allow disambiguation of words depending on their context, we will extend the API-fy with this library.

@ahirner ahirner self-assigned this Feb 21, 2016
@ahirner ahirner assigned ahirner and unassigned ahirner Mar 16, 2016
@ahirner ahirner added this to the Architecture Freeze milestone Mar 16, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant