-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unique properties / indexing #45
Comments
It would indeed be very useful to enable the creation of constraints / indexes. This could help to accelerate the creation of edges between nodes that were added earlier, which slows down a lot when there are a lot of nodes with the same label:
The See comment below for an update. Click here to view the original comment content.You can create a unique index, which will prevent the creation of duplicate values, but will not speed up matching, by creating an immutable function that casts to json over text
and using that immutable function in the create index command
|
Here some links just for reference. List of Cypher Improvement Proposals (CIP) As for indexes and constraints in Opencypher, there are already some PR for their addition, but i think that they are "blocked" by another CIP CIP: PR: |
I tried using id() instead of using properties for matching, but this was even slower (I noticed using an additional For this, I rewrote
to
where I obtained the ids (the ones above are made up) using
|
There is a (1000 times faster) workaround for the creation of edges between vertices that were added earlier. The workaround consists of inserting the data directly in the underlying tables that are used by Apache AGE. See comment below for an update. Click here to view the original comment content.First, the underlying ids of the vertices are retrieved:
A single edge is then created using the cypher function to make sure the underlying tables are created correctly (the id values are made up)
Indices on start_id and end_id are then created in the underlying table for the edges of a certain type, to speed up the inserts in the
All other edges are created by direct insertion, first in the specific edge table (using executemany):
Where Secondly, the edges are also directly inserted into the
Where Any thoughts on this workaround? |
When the edges are inserted in the specific edge table, they are automatically added to the First, the underlying ids of the vertices are retrieved:
A single edge is then created using the cypher function to make sure the underlying tables are created correctly (the id values are made up)
All other edges are created by direct insertion in the specific edge table (using executemany):
Where |
Hello, A graph name is a schema and a label name is a table. Id and properties are columns in vertex table. Id, start_id, end_id, and properties are columns in the edge tables. Use the agtype_access_operator(properties, key) to get to get a property value. Knowing all that you can use Postges' standard DDL language to implement constraints, indices and unique values.
|
This is great, thank you! Since indices require an immutable function, an additional function will still need to be created for them. When I create a
and use it in an index with
the creation of vertices with the same id will be prevented
but the index will still not be used when trying to match vertices with a specific id:
Is there a way to use indices when matching? |
It might be safe to change Per Postgres' Documentation: IMMUTABLE indicates that the function cannot modify the database and always returns the same result when given the same argument values; that is, it does not do database lookups or otherwise use information not directly present in its argument list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value. STABLE indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values, but that its result could change across SQL statements. This is the appropriate selection for functions whose results depend on database lookups, parameter variables (such as the current time zone), etc. (It is inappropriate for AFTER triggers that wish to query rows modified by the current command.) Also note that the current_timestamp family of functions qualify as stable, since their values do not change within a transaction. The access operator works for Agtype Lists and Maps, it does not perform any database lookup, it just extracts a value from the first passed in parameter. |
Indices cannot currently be used while matching. There will need to be some re factoring done to allow the planner to realize opportunities where the indices can be used. |
Hello, the commit 57e11a3 in master should have resolved this issue and will be available in the next release. |
Thank you for the update! Is it possible to explain in more detail what issue was resolved? I tried creating a unique index as described in #45 (comment), but a Sequential scan is still executed when I use a I pulled the latest changes, did a |
Hi @pdpotter, sorry for the confusion, AGE now supports constraints, the MATCH clause does not yet support using index scans. Constraints now work and the updating (SET, REMOVE, etc) clauses work with constraints and they no longer break indices. The patch 6279c10 supports GIN indices. So if you create an index on a label's properties and place the quals in the {} in the MATCH clause. Such as |
See comment below for an update. Click here to view the original comment content.Wow, this is fantastic. It is now possible to create relations quickly (~10 000/s on my local VM) using simple queries (with executemany) in the form of ``` SELECT * FROM cypher('graph_name', $$ MATCH (d:LabelA {id: 1}), (r:LabelB {id: 2}) CREATE (d)-[:Relation {prop: 'value'}]->(r) $$) as (a agtype) ```After simply adding GIN indexes
For this specific use case, it would of course be more disk space efficient to only index the |
Commit 379983b includes some improvements which are relevant here:
Very nice work, thank you @JoshInnis! |
It looks like I have been a bit too enthusiastic in my previous comments. Property constraintsWhen using GIN indices on properties and creating relations using these indices, I didn't check if the relations were actually added. Unfortunately, they were not. After adding a GIN index and adding enough vertices so the index is used, a match query doesn't return any results. E.g.,
A match query using property constraints returns 0 results.
Query plan:
Where clauseWhen using the where clause, it is not the GIN index that is being used, but the unique index that was added to prevent duplicate entries by executing
The WHERE clause does give a correct result:
Query plan:
When using the WHERE clause for creating relations, the performance decreases when adding a lot of relations (when adding relations in batches of 5000, the first batch achieves ~5000 it/s, while the ninth batch achieves ~500 it/s) using something similar to
Test suiteWhen having a new look at the tests (https://github.com/apache/incubator-age/blob/master/regress/expected/index.out#L299-L336) to check if I was making mistakes, I found out some of the index related tests might have some issues:
Thank you so much for making Apache AGE better and better with each commit. I'm sorry if I caused any confusion with my previous comments. |
Hi @pdpotter, GIN Indices are for a subset of JsonB operators https://www.postgresql.org/docs/11/gin-builtin-opclasses.html These operators are not usabale in the cypher Where clause. Currently the only place they can be used is in the property constraint field in the match clause. The where clause is now compatible with the comparison operators that the cypher command currently has. These new operators need to be added to the cypher command's where clause |
For an article about this you can checkout this https://bitnine.net/blog-postgresql/postgresql-internals-jsonb-type-and-its-indexes/ Its a bit dated and about Agensgraph and not AGE, but if you replace insances of the '->>' with the '.' operator the information is still accurate |
As #228 has been fixed, it is now possible to use GIN indices for adding relations.
Adding relations using a query like
is now over 30 times faster (~300/s on my local VM) than without the GIN indices. I tried improving the performance by creating indices using
but Another attempt using a specialized index created by
didn't seem to work because the index wasn't used when executing As using GIN indixes is ~30 times slower than the method described in #45 (comment), I think I'm going to keep using this other method for initial data import. |
For onlookers: within Rust driver I've implemented methods that provide unique indexes, and properties contraints. As of now it can be treated as a summary of this disscussion |
Just wondering if this issue is resolved? Or, is there more that needs to be done that this issue needs to stay open? It is a bit difficult to tell from the correspondence. |
I wouldn't say it's resolved. Fact - currently there are mechanisms that allow client to create indexes / constraints. On the other hand there is neither good documentation on this, nor agreement on the topic if constraints should also be included within cypher-like methods. |
I will try to see if I can get others engaged that can help. |
Any update on indexing side of things? Any pointers on enabling the cypher queries to use specific property indexes? Other than the GIN index, nothing seems to be working. |
In #1000, a patch that is still being worked on is mentioned that would allow GIN indices to be used in WHERE clauses as well. |
In the update_entity_tuple() function, when we call table_tuple_update() and assign the returned value to the result variable, the buffer variable receives the value of 0. Made a workaround so that the original value isn't lost.
In the update_entity_tuple() function, when we call table_tuple_update() and assign the returned value to the result variable, the buffer variable receives the value of 0. Made a workaround so that the original value isn't lost.
Hello, it would be great if we could use unique keys in nodes and edges. in Neo4j it's called Node Key and it can be used like these:
CREATE CONSTRAINT constraint_name ON (n:Person) ASSERT (n.firstname) IS NODE KEY
CREATE CONSTRAINT constraint_name ON (n:Person) ASSERT (n.firstname, n.surname) IS NODE KEY
CREATE CONSTRAINT constraint_name IF NOT EXISTS ON (n:Person) ASSERT (n.firstname, n.surname) IS NODE KEY
CREATE CONSTRAINT constraint_with_provider ON (n:Label) ASSERT (n.prop1) IS NODE KEY OPTIONS {indexProvider: 'native-btree-1.0'}
DROP CONSTRAINT constraint_name
DROP CONSTRAINT missing_constraint_name IF EXISTS
I've tried to use Postgres queries but since the
properties
column type is not json/jsonb I got error:The text was updated successfully, but these errors were encountered: