Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

has(predicate) captures nodes which had the predicate deleted #2574

Closed
slawo opened this issue Sep 7, 2018 · 7 comments · Fixed by #2585
Closed

has(predicate) captures nodes which had the predicate deleted #2574

slawo opened this issue Sep 7, 2018 · 7 comments · Fixed by #2585
Assignees
Labels
kind/bug Something is broken.

Comments

@slawo
Copy link

slawo commented Sep 7, 2018

Bug report

Desktop (please complete the following information):

  • OS: Linux + dockerImage dgraph/dgraph:v1.0.8
  • Dgraph Version: v1.0.8
  • RAM available: 8GB with "lru_mb 4096" for each Dgraph Server instance
  • Storage Type: Cloud Storage
  • Dgraph Client: ratel

if Containerization (please complete the following information):

  • Main Container Software: kubernetes v1.11.1
  • Dgraph TAG Version: v1.0.8 1dd8376

Describe the bug

When using has(predicate) in queries we have strange results. After some debugging I found that many nodes without having the predicate were included in results.

The ones I have identified had the predicates removed.

This happens with dates, but I have not tested other predicate types.

To Reproduce
Steps to reproduce the behavior:

  • create a bunch of nodes with dates
  • delete the dates from those nodes
  • query with had(date_predicate)
  • observe nodes being returned: among the results, nodes with the predicate previously deleted are present.
{
  broken(func: has(obj.date_ended)) {
    uid
    obj.date_ended
  }
}

returns


      {
        "uid": "0x5184"
      },
      {
        "uid": "0x618c"
      },
      {
        "uid": "0x6a65"
      },
      {
        "uid": "0x9c37"
      },
      {
        "uid": "0x9d28"
      },

...

Expected behavior

not receiving nodes where predicates don't exist

Additional context

This is a fresh install of dgraph with all data added today using the dgo driver using api.NQuad

@manishrjain manishrjain added the kind/bug Something is broken. label Sep 7, 2018
@MichelDiz
Copy link
Contributor

MichelDiz commented Sep 7, 2018

Duplicated of #2212
Should be fixed by 999a4e0.
Is related to #2484

"This is a bug with has queries where empty PL's are not ignored." pawanrawal.

@slawo In this case, you are using @reverse?

@slawo
Copy link
Author

slawo commented Sep 7, 2018

It seems related but it happens in v1.0.8 so I did not bother linking this.

Also I have not tried this on other predicate types.

The issue is quite important as I use has() to filter data. In most cases on dates I'm able to use le(predicate, CURRENT_DATE) to get away with it but in some cases I need not has() and there I don't believe I can go around the issue.

@MichelDiz
Copy link
Contributor

The point is that it should have been fixed.

If you can make a sample mutation, the way you use it. Would help.
Mostly we need to know if you are using @reverse in your schema. At the level using "has".
We need to know why it's happening. Since we have a fix 999a4e0 .

However, in any case by your report its seem to be 80% chance of being duplicated.

@manishrjain
Copy link
Contributor

We changed the way Posting List Deletions are happening, so maybe that's causing it. Needs investigation.

@manishrjain manishrjain self-assigned this Sep 12, 2018
manishrjain added a commit that referenced this issue Sep 12, 2018
manishrjain added a commit that referenced this issue Sep 13, 2018
Generate the full posting list based on the read timestamp, so we can determine if it is empty or not. If empty, then has should not pick up the UID. Similarly, use Btree to pick up posting lists from memory, also considering whether they have a valid posting keeping transactional properties intact.

This fixes #2574.
@manishrjain
Copy link
Contributor

This should now be fixed. Can you test, @slawo ?

@MichelDiz
Copy link
Contributor

MichelDiz commented Sep 13, 2018

Hey @manishrjain I did a test with image: 'dgraph/dgraph:v1.0.9-rc1' and is working as expected.

{
    "set": [
 		{
        "obj.date_ended": "2018-08-14T11:56:46Z"
      },
      {
        "obj.date_ended": "2018-08-14T11:56:46Z"
      },
      {
        "obj.date_ended": "2018-08-14T11:56:46Z"
      },
      {
        "obj.date_ended": "2018-08-14T11:56:46Z"
      },
      {
        "obj.date_ended": "2018-08-14T11:56:46Z"
      }
	]
}

{
    delete {
      <0x1> <obj.date_ended> * .
      <0x4> <obj.date_ended> * .
      <0x7> <obj.date_ended> * .
      <0x8> <obj.date_ended> * .
      <0x9> <obj.date_ended> * .
	}
}

And tried too

{
    delete {
      <0x1> * * .
      <0x4> * * .
      <0x7> * * .
      <0x8> * * .
      <0x9> * * .
	}
}

@manishrjain
Copy link
Contributor

Thanks, @MichelDiz !

dna2github pushed a commit to dna2fork/dgraph that referenced this issue Jul 19, 2019
Generate the full posting list based on the read timestamp, so we can determine if it is empty or not. If empty, then has should not pick up the UID. Similarly, use Btree to pick up posting lists from memory, also considering whether they have a valid posting keeping transactional properties intact.

This fixes dgraph-io#2574.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken.
Development

Successfully merging a pull request may close this issue.

3 participants