Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgres: poor performance on simple recursive queries #765

Open
tucnak opened this issue Feb 12, 2019 · 2 comments
Open

postgres: poor performance on simple recursive queries #765

tucnak opened this issue Feb 12, 2019 · 2 comments
Labels

Comments

@tucnak
Copy link

@tucnak tucnak commented Feb 12, 2019

Postgres driver performs badly on certain simple recursive queries. Herein are two Gizmo queries (a trivial select by-type and a recursive walk over the tree) over n=8258 nodes.

Toy dataset (n-quads, 2mb compress) used in this example: knols.nq.zip

//// postgres: cayley -d postgres -a "..." http
//// bolt: cayley -d bolt -a knols.db http
//// memstore: cayley -d memstore http -i knols.nq
//
// postgres: 5ms
// bolt:     14ms
// memstore: 4ms
g.V()
 .Has("<rdf:type>", "<tas:Critique>")
 .Count() // 8258

// postgres: 4.2s (849x) sic!
// bolt:     204ms (14x)
// memstore: 60ms (15x)
var child_critique = g.M()
 .Out("<tas:text>")
 .Out("<tas:critique>")

g.V()
 .Has("<rdf:type>", "<tas:Knol>")
 .FollowRecursive(child_critique)
 .Count() // 8258

Received results:

I0212 00:43:39.167161   88149 shape.go:205] shape: (path.iteratorBuilder)(0x4484270)
I0212 00:43:39.167180   88149 shape.go:209] optimized: (path.iteratorBuilder)(0x4484270)
I0212 00:43:39.182679   88149 and_optimize.go:187] And: 5 Root: 10 Total Cost: 40 Best: 4611686018427387904
I0212 00:43:39.182699   88149 and_optimize.go:187] And: 5 Root: 12 Total Cost: 65568411 Best: 40
I0212 00:43:39.182705   88149 and_optimize.go:195] And: 5 Choosing: 10 Best: 40
I0212 00:43:39.182723   88149 and_optimize.go:92] 5 become 15
I0212 00:43:39.190905   88149 iterate.go:76] {
  "UID": 7,
  "Name": "Recursive",
  "Type": "recursive",
  "Iterators": [
    {
      "UID": 6,
      "Name": "HasA(subject)",
      "Type": "hasa",
      "Size": 8,
      "Iterators": [
        {
          "UID": 15,
          "Name": "And",
          "Type": "and",
          "Size": 8,
          "Exact": true,
          "Iterators": [
            {
              "UID": 10,
              "Name": "LinksTo(object)",
              "Type": "linksto",
              "Size": 8,
              "Exact": true,
              "Iterators": [
                {
                  "UID": 9,
                  "Name": "Fixed([f4122d376d3bd58b458c0913ffc8beb22a4b0bc4])",
                  "Type": "fixed",
                  "Size": 1,
                  "Exact": true
                }
              ]
            },
            {
              "UID": 12,
              "Name": "LinksTo(predicate)",
              "Type": "linksto",
              "Size": 17169,
              "Exact": true,
              "Iterators": [
                {
                  "UID": 11,
                  "Name": "Fixed([1f822143c9f0cd043e31684ded5dd95982520224])",
                  "Type": "fixed",
                  "Size": 1,
                  "Exact": true
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}
I0212 00:43:43.757395   88149 iterate.go:87] {
  "UID": 7,
  "Type": "recursive",
  "ContainsCost": 240,
  "NextCost": 17,
  "Size": 0,
  "ExactSize": false,
  "Next": 0,
  "Contains": 0,
  "ContainsNext": 0,
  "SubIts": [
    {
      "UID": 6,
      "Type": "hasa",
      "ContainsCost": 240,
      "NextCost": 6,
      "Size": 8,
      "ExactSize": false,
      "Next": 9,
      "Contains": 0,
      "ContainsNext": 0,
      "SubIts": [
        {
          "UID": 15,
          "Type": "and",
          "ContainsCost": 4,
          "NextCost": 5,
          "Size": 8,
          "ExactSize": true,
          "Next": 9,
          "Contains": 0,
          "ContainsNext": 0,
          "SubIts": [
            {
              "UID": 10,
              "Type": "linksto",
              "ContainsCost": 2,
              "NextCost": 3,
              "Size": 8,
              "ExactSize": true,
              "Next": 10,
              "Contains": 0,
              "ContainsNext": 8,
              "SubIts": [
                {
                  "UID": 9,
                  "Type": "fixed",
                  "ContainsCost": 1,
                  "NextCost": 1,
                  "Size": 1,
                  "ExactSize": true,
                  "Next": 0,
                  "Contains": 0,
                  "ContainsNext": 0,
                  "SubIts": null
                }
              ]
            },
            {
              "UID": 12,
              "Type": "linksto",
              "ContainsCost": 2,
              "NextCost": 3,
              "Size": 17169,
              "ExactSize": true,
              "Next": 0,
              "Contains": 8,
              "ContainsNext": 0,
              "SubIts": [
                {
                  "UID": 11,
                  "Type": "fixed",
                  "ContainsCost": 1,
                  "NextCost": 1,
                  "Size": 1,
                  "ExactSize": true,
                  "Next": 0,
                  "Contains": 0,
                  "ContainsNext": 0,
                  "SubIts": null
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Expected results:

dennwc [12:45 AM]
Is it for Postgres? It looks weird.
linksto -> fixed should be optimized away
Those linksto(fixed) in the tree mean that it does the lookup+scan manually, instead of using WHERE.

Output of cayley version or commit hash: acf0ab2

Cayley version: 0.7.5

Environment details:

Backend database: postgres 10.5

@dennwc

@tucnak

This comment has been minimized.

Copy link
Author

@tucnak tucnak commented Feb 12, 2019

I've also tested cockroach, and got the following result:

2019/02/12 04:20:15 n = 10
2019/02/12 04:20:15
2019/02/12 04:20:15 [find n critiques] started
2019/02/12 04:20:15 found 10
2019/02/12 04:20:15 elapsed: 28.113888ms
2019/02/12 04:20:15
2019/02/12 04:20:15 [find n knols + depth=0] started
2019/02/12 04:20:16 found 8
listening...

In the case (a) cockroach is 2x faster than postgres, but later it fails to load (b) completely. I suspect that cockroach driver's implementation is probably based off postgres driver, and could be lagging behind.

@dennwc dennwc added the bug label Feb 18, 2019
@dennwc

This comment has been minimized.

Copy link
Member

@dennwc dennwc commented Feb 18, 2019

As mentioned in Slack discussion, this is due to a bug in query optimizer - it won't let SQL to run a full optimization pass for recursive queries.

Fixing the issue allows recursive queries to be optimized, but a set of output SQL queries turns out to be slower then the scan from Cayley side.

So this will require more work to implement properly - need to add a few more optimizers to SQL.

@dennwc dennwc added performance and removed bug labels Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.