Skip to content

Check and explore other document representations and distance metric #9

@amartyaamp

Description

@amartyaamp

Our current representation uses averaged Word2Vec . There can be other word/ document embeddings which may improve the search results. For eg. -

  • Fasttext (averaged)
  • MOE (averaged)
  • StarSpace (averaged)
  • Glove (averaged)
  • Universal Sentence encoder

Alongwith these, we need to look at different distance metrics that can be used

  • Manhattan
  • WMD-relax (pure WMD is too slow)

We need to set a threshold by comparing each or most of them with the current approach.
Accuracy, latency and speed of training is the main concern here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions