Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Vespa Node's and indexing with additional fields #13356

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

gokturkDev
Copy link

@gokturkDev gokturkDev commented May 8, 2024

Description

This PR adds support for indexing nodes to vespa with custom fields other than text and metadata.
Issue #13319

Motivation

A schema in Vespa has many fields that can be searched from and referenced in ranking profiles. This PR enables users to define the values for their vespa fields with a VespaNode.

Type of Change

  • [✓] New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • [ ✓] Added new unit/integration tests

Suggested Checklist:

  • [✓ ] I have performed a self-review of my own code
  • [ ✓] I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • [ ✓] My changes generate no new warnings
  • [ ✓] I have added tests that prove my fix is effective or that my feature works
  • [ ✓] New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods

Notes

Development decisions

  • Tests now try to get the local running vespa instance rather than deploying each time when tests run. This is done to make TDD possible. A caveat is to check the locally running vespa instances to avoid making accidental changes to other instances
  • Nodes are added and deleted before and after each test. This is to avoid conflicts between tests and versioning issues
  • When adding a node, an isinstance check checks for whether the passed in nodes are Vespa and if so add the vespa_fields to the fields in entry
  • When querying, the fields of the hits are checked for length. If its bigger than 5, its assumed that custom fields have been added and its treated as a vespa node. This check could be made safer

Future Considerations

Note to maintainers

Please don't merge this before @thomasht86 reviews and confirms the changes

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 8, 2024
Copy link
Contributor

@thomasht86 thomasht86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! 💯
Could you perhaps add a small section to the VespaIndexDemo.ipynb, demonstrating that one can obtain node.vespa_fields as well?

node = metadata_dict_to_node(metadata)
text = response_fields.get("body", "")
node.set_content(text)
node = self._vespa_hit_to_node(hit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a nice refactor! 🥇

Comment on lines +26 to +30
try:
# Try getting the local instance if available
return VespaVectorStore(url="http://localhost", application_package=app_package, deployment_target="local")
except ConnectionError:
return VespaVectorStore(application_package=app_package, deployment_target="local")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better, as this allows for faster development, as you said 👍

Comment on lines +98 to +100
# I am not sure what to name this, so I decided a name that highlights the difference with hybrid_template
# This needs to be renamed to something more meaningful
with_fields_template = ApplicationPackage(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good for now, and we can improve template naming once we get a better feel for use cases.

@gokturkDev
Copy link
Author

Hi great you liked it! I am overwhelmed with work on my side rn so can't say for sure but will try if I get some free time on my hands 🤟



class VespaNode(TextNode):
vespa_fields: dict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we need a new object class for this? I wonder if this could just be a metadata field?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants