-
Notifications
You must be signed in to change notification settings - Fork 10
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block Filtering and Block Purging after Vector Based Blocking #10
Comments
Hello, the proper way to use Vector Based Blocking is presented here: https://pyjedai.readthedocs.io/en/latest/tutorials/pyTorchWorkflow.html Vector Based Blocking generates a dictionary of ids that correspond to candidate matches. Therefore, at the end of vb blocking, you'll either get this dictionary or a graph similar to entity matching. FAISS also gives distance/similarity scores, avoiding the need for an additional step of entity matching. Check out the tutorial, and if you have any questions, I'm happy to help. |
Hi Nikoletos, : Code: from pyjedai.vector_based_blocking import EmbeddingsNNBlockBuilding blocks, g = emb.build_blocks(data, from pyjedai.clustering import ConnectedComponentsClustering, UniqueMappingClustering Results:
Method name: Embeddings-NN Block Building
Method name: Unique Mapping Clustering |
What I suggest you do is start experimenting with:
and then with the clustering:
or you can even check the optuna tutorial here https://pyjedai.readthedocs.io/en/latest/tutorials/Optuna.html |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Hi, I have tried vector based blocking with sentence transformers and faiss, and these blocks contain a dict of indices and sets of indices.
I can't proceed with Block Filtering and Block Purging.
It doesn't allow cardinality as I presume other methods like QGramsBlocking returns a dict of {'key':datamodel.Block} items.
The text was updated successfully, but these errors were encountered: