Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

starter step that's explicitly using index #1453

Closed
wants to merge 1 commit into from

Conversation

mpollmeier
Copy link
Contributor

No description provided.

@mpollmeier mpollmeier requested a review from ml86 October 11, 2021 13:46
/**
* Shorthand for `cpg.method.fullNameExact(fullName)`, but first looks up the fullname in the index.
* */
def methodFullNameExact(value: String): Traversal[Method] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.... then we'd have to introduce this for each field. Any specific reason why we're doing this? I'd rather have it consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

We wouldn't need it for all fields, but only those which are indexed. The only indexed property is FULL_NAME and it's used on a few node types (method, namespaceBlock, typeDecl, tpe).

So yes, for consistency we should have it for all of the above, and it would probably make sense to define the index as part of the schema, and generate the NodeTypeStarters as part of the CodeGen.

The implementation will likely look a bit different with ShiftLeftSecurity/overflowdb#222, but the user facing API will probably be quite similar.

Let me know your thoughts - I'll digest the idea and might make those changes to schema/codegen later this week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go a different route:

cpg.method should return an object that knows about indices, instead of a stupid iterator; e.g. it could implement some new trait like HasFastIndices.

Then, traversal filters (like e.g. .fullName on a traversal of methods) should check whether their parent traversal isInstanceOf[HasFastIndices] and if so, ask it whether the relevant key ("FULL_NAME") is indexed.

There is no need to change any API with that approach.

The entire dance is done eagerly and once per traversal construction, just like the regex dance (.name(string) checks whether the string is equivalent to non-regex; if so, it uses nameExact logic, otherwise it eagerly precompiles the regex and allocates a matcher that can be reused, such that later traversal filtering is allocation-free).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agreed, that's a good model - it's what you mentioned in the standup as well.

Two ideas for optimisations:
Prerequisite: the schema contains the information about which properties are indexed, and we generate the NodeTypeStarters - both would make sense either way IMO.

  1. Then we can do that 'dance' only for those (few) indexed properties. This won't make a big different for runtime performance, but the implementation might end up easier to read.

  2. Given that we decide if we want to use an index dynamically at runtime anyway, we might as well compare the cardinalities. I.e. if we have more METHOD nodes than nodes with FULL_NAME properties, it would still make sense to start with the METHODs.

Note that (for historic reasons, this goes back all the way to cpg.json), we define properties separate from nodes. And for other historic reasons (tinkerpop), indices are created on a per-property basis, i.e. not on a (nodeType, property) basis.
I guess it would make sense to change at least the latter. We could do so in the current version of odb already, to make the diff to the new layout smaller. Thoughts?

@mpollmeier
Copy link
Contributor Author

superseded

@mpollmeier mpollmeier closed this Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants