Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sum over facets is incorrect #4160

Closed
campoy opened this issue Oct 11, 2019 · 3 comments
Closed

Sum over facets is incorrect #4160

campoy opened this issue Oct 11, 2019 · 3 comments
Labels
area/facets Issues related to face handling, querying, etc. kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it.

Comments

@campoy
Copy link
Contributor

campoy commented Oct 11, 2019

What version of Dgraph are you using?

master

Have you tried reproducing the issue with the latest release?

yes

What is the hardware spec (RAM, OS)?

n/a

Steps to reproduce the issue (command/config used to run Dgraph).

Given the dataset generated by this mutation:

{
  set {
    _:a <name> "Anne" .
    _:b <name> "Brian" .
    
    _:jp <name> "Jurassic Park" .
    _:ij <name> "Indiana Jones" .
    
    _:a <rated> _:jp (rating=5) .
    _:a <rated> _:ij (rating=2) .
    _:b <rated> _:ij (rating=2) .
  }
}

If you run the following request:

{
  q(func: has(rated)) {
    name
    rated @facets(r as rating)
    partial_sum: sum(val(r))
  }
      
  sum() {
    total_sum: sum(val(r))
  }
}

Expected behaviour and actual result.

I'd expect partial_sum to be 7 for Anne and 2 for Brian, then total_sum would be 9.

Instead, the result is as follows:

{
  "data": {
    "q": [
      {
        "name": "Anne",
        "rated": [
          {
            "rated|rating": 5
          },
          {
            "rated|rating": 2
          }
        ],
        "partial_sum": 9
      },
      {
        "name": "Brian",
        "rated": [
          {
            "rated|rating": 2
          }
        ],
        "partial_sum": 4
      }
    ],
    "sum": [
      {
        "total_sum": 9
      }
    ]
  }
}

I have a theory about why we're getting these weird numbers.

Variables attach values to uid, but in this case that's not the right behavior, as the value of the variable should not be attached to the UID of the person nor the movie, but rather the combination of both linked by the predicate.

You can see the weird artifact by querying by this value on all of the nodes.

{
  var(func: has(rated)) {
    rated @facets(r as rating)
  }
      
  sum(func: has(name)) {
    name
    val(r)
  }
}

returns

{
  "data": {
    "sum": [
      {
        "name": "Jurassic Park",
        "val(r)": 5
      },
      {
        "name": "Indiana Jones",
        "val(r)": 4
      },
      {
        "name": "Anne"
      },
      {
        "name": "Brian"
      }
    ]
  }
}

This proves that the variable r has been attached to the movie UIDs by adding all of the values in the facets pointing to them.

Once we understand this, it makes sense that the sum of the ratings for Anne is 9 instead of 7, as it's the sum of the ratings for the two movies. Same goes for the ratings for Brian being 4 instead of 2.

Fixing this might be complicated, as it might imply making variables work as a map from <uid, uid> to value rather than to value.

@campoy campoy added kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it. area/facets Issues related to face handling, querying, etc. labels Oct 11, 2019
@MichelDiz
Copy link
Contributor

MichelDiz commented Dec 3, 2019

I think this query can fit it. But I'm not sure if it can cover all scenarios.

{
  var(func: has(rated)) {
    rated {
      ~rated @facets(r as rating)
    }
  }
   partial(func: uid(r), orderdesc: val(r)) {
    name
    partial_rated_sum : val(r)
  }

  sum() {
    total_sum: sum(val(r))
  }

}

Result

{
  "data": {
    "partial": [
      {
        "name": "Anne",
        "partial_rated_sum": 7
      },
      {
        "name": "Brian",
        "partial_rated_sum": 2
      }
    ],
    "sum": [
      {
        "total_sum": 9
      }
    ]
  }
}

@mileung
Copy link

mileung commented Mar 30, 2020

I am having the same issue with facet variables being scoped improperly and causing incorrect aggregation values. Is there an ETA on when this will be fixed?

@minhaj-shakeel
Copy link
Contributor

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

drawing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/facets Issues related to face handling, querying, etc. kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it.
Development

No branches or pull requests

4 participants