Skip to content

Added new gensim node embedder and refactored similarity to support different backends#91

Merged
eugeniashurko merged 8 commits intomasterfrom
feature/gensim-embedder-similarity-extension
Sep 28, 2021
Merged

Added new gensim node embedder and refactored similarity to support different backends#91
eugeniashurko merged 8 commits intomasterfrom
feature/gensim-embedder-similarity-extension

Conversation

@eugeniashurko
Copy link
Copy Markdown
Contributor

@eugeniashurko eugeniashurko commented Sep 13, 2021

New features and major updates:

  • Added the GensimNodeEmbedder interface allowing to wrap gensim embedding models (in particular, added the implementation of poincare embedding).

  • Refactored SimilarityProcessor to support multiple backends (through dependency injection), in particular:

    • Added SimilarityIndex interface for backend-specific similarity indices
    • Added FaissSimilarityIndex interface for faiss similarity index
    • Added ScikitLearnSimilarityIndex interface for sklearn similarity index based on sklearn.neighbors.KDTree and sklearn.neighbors.BallTree
    • Added new similarity metric to ScikitLearnSimilarityIndex based on sklearn.neighbors.BallTree corresponding to Poincare distance
    • SimilarityProcessor API changed (takes on input a SimilarityIndex)
  • Moved EmbeddingPipeline into blugraph.downstream.pipelines

Minor fixes and updates

  • Updates in Neo4j (gds.beta.shortestPath.yens.stream becomes gds.shortestPath.yens.stream in the newer versions of the GDS plugin)
  • Bugfixes in core
  • Updated docs
  • Added a tutorial notebook with embedding pipelines

@eugeniashurko eugeniashurko requested a review from MFSY September 13, 2021 15:38
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 13, 2021

Codecov Report

Merging #91 (c835e71) into master (00bb7b3) will increase coverage by 0.01%.
The diff coverage is 87.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #91      +/-   ##
==========================================
+ Coverage   81.25%   81.26%   +0.01%     
==========================================
  Files          44       47       +3     
  Lines        3723     3886     +163     
==========================================
+ Hits         3025     3158     +133     
- Misses        698      728      +30     
Impacted Files Coverage Δ
bluegraph/backends/neo4j/analyse/paths.py 80.55% <ø> (ø)
bluegraph/backends/neo4j/io.py 91.53% <ø> (ø)
bluegraph/downstream/data_structures.py 78.37% <ø> (-7.74%) ⬇️
bluegraph/core/io.py 64.02% <33.33%> (-0.11%) ⬇️
bluegraph/backends/gensim/embed/embedders.py 84.09% <84.09%> (ø)
bluegraph/downstream/similarity.py 87.73% <87.95%> (-4.42%) ⬇️
bluegraph/downstream/pipelines.py 88.78% <88.78%> (ø)
bluegraph/backends/gensim/__init__.py 100.00% <100.00%> (ø)
bluegraph/backends/neo4j/embed/embedders.py 90.99% <100.00%> (+0.16%) ⬆️
bluegraph/backends/params.py 100.00% <100.00%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00bb7b3...c835e71. Read the comment docs.

@eugeniashurko eugeniashurko merged commit fece005 into master Sep 28, 2021
@eugeniashurko eugeniashurko deleted the feature/gensim-embedder-similarity-extension branch November 10, 2021 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants