Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix graphql resolve off empty store #4410

Merged
merged 1 commit into from Oct 7, 2021

Conversation

dwsutherland
Copy link
Member

These changes close cylc/cylc-uiserver#247

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.py and
    conda-environment.yml.
  • Appropriate tests are included (unit and/or functional).
  • Already covered by existing tests.
  • Does not need tests (why?).
  • Appropriate change log entry included.
  • No change log entry required (why? e.g. invisible to users).
  • (master branch) I have opened a documentation PR at cylc/cylc-doc/pull/XXXX.
  • (7.8.x branch) I have updated the documentation in this PR branch.
  • No documentation update required.

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying reproducing the issue today with no success on both master and on this branch @dwsutherland 😅 can't tell if it was due to some other update in master, if today my workflows were not stopped quickly enough, or something else.

But I trust you'll know whether this change makes sense or not. It probably needs a test - if doable. But doesn't break anything for the command CLI or UI when starting/stopping workflows, so +1.

Thanks!

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dwsutherland - can you add a comment to the code, to explain how we end up here? Also, can existing tests be tweaked to cover this too?

@dwsutherland
Copy link
Member Author

@dwsutherland - can you add a comment to the code, to explain how we end up here? Also, can existing tests be tweaked to cover this too?

I don't think I've got to the bottom of this.. investigating further..

@kinow
Copy link
Member

kinow commented Sep 16, 2021

@dwsutherland are you able to reproduce the issue on master? I haven't been able after syncing Flow & UIS. I wonder if it got fixed somehow perhaps?

@dwsutherland
Copy link
Member Author

dwsutherland commented Sep 17, 2021

@dwsutherland are you able to reproduce the issue on master? I haven't been able after syncing Flow & UIS. I wonder if it got fixed somehow perhaps?

Yes, I can.. I can do it with just one workflow, and the minimum subscription appears to be:

subscription GscanSubscriptionQuery {
  deltas (stripNull: true) {
    updated {
      workflow {
        id
      }
    }
  }
}

Not this:

subscription GscanSubscriptionQuery {
  deltas (stripNull: false) {
    updated {
      workflow {
        id
      }
    }
  }
}

and not this:

subscription GscanSubscriptionQuery {
  deltas (stripNull: true) {
    added {
      workflow {
        id
      }
    }
  }
}

I think this is because there is some mixing between delta-stores of the shutdown delta (produced by the Scheduler):

{'added': {'edges': {}, 'families': {}, 'family_proxies': {}, 'jobs': {}, 'tasks': {}, 'task_proxies': {}, 'workflow': }, 'updated': {'edges': {}, 'families': {}, 'family_proxies': {}, 'jobs': {}, 'tasks': {}, 'task_proxies': {}, 'workflow': }, 'pruned': {'edges': [], 'families': [], 'family_proxies': [], 'jobs': [], 'tasks': [], 'task_proxies': []}, 'id': 'sutherlander|five/run1', 'shutdown': True}

and the stopping delta (produced by the UIS)

{'added': {'edges': {}, 'families': {}, 'family_proxies': {}, 'jobs': {}, 'tasks': {}, 'task_proxies': {}, 'workflow': }, 'updated': {'edges': {}, 'families': {}, 'family_proxies': {}, 'jobs': {}, 'tasks': {}, 'task_proxies': {}, 'workflow': stamp: "sutherlander|five/run1@1631839177.7392"
id: "sutherlander|five/run1"
name: "five/run1"
status: "stopped"
host: ""
port: 0
owner: "sutherlander"
api_version: 0
}, 'pruned': {'edges': [], 'families': [], 'family_proxies': [], 'jobs': [], 'tasks': [], 'task_proxies': []}, 'id': 'sutherlander|five/run1'}

I couldn't figure out why it was even trying to resolve id on the shutdown delta (or the other way round; stopping delta resolve off the shutdown delta)..

It's clear they must be setting and resolving off a common object somewhere.. (probably the stripping, since I can only reproduce with it turned on)

And the randomness is just the UIS scan proximity to the shutdown delta (closer they are, more likely it is to happen).. This explains why it only happens with updates (not added)

@dwsutherland
Copy link
Member Author

So I will push a proper fix to this PR when I find out exactly what is is.

@kinow
Copy link
Member

kinow commented Sep 17, 2021

Got it @dwsutherland ! I thought it was fixed since I couldn't reproduce it anymore on the UI with five. Once you have pushed your new changes I will try with your example workflows and review again 👍 Thanks!

@dwsutherland
Copy link
Member Author

Actually I've found I can cause this issue without the stopping delta being produced.. And can reproduce it all the time. So a few things to work out..

@kinow
Copy link
Member

kinow commented Sep 27, 2021

Note: just saw this error on the branch for fixing an enum issue #4409

[I 2021-09-28 11:03:34.627 CylcHubApp log:189] 200 POST /user/kinow/cylc/graphql (kinow@::1) 19.54ms
ERROR:graphql.execution.utils:Traceback (most recent call last):
  File "/home/kinow/Development/python/workspace/cylc-uiserver/venv/lib/python3.8/site-packages/promise/promise.py", line 87, in try_catch
    return (handler(*args, **kwargs), None)
  File "/home/kinow/Development/python/workspace/cylc-uiserver/venv/lib/python3.8/site-packages/graphql/execution/executor.py", line 531, in <lambda>
    lambda resolved: complete_value(
  File "/home/kinow/Development/python/workspace/cylc-uiserver/venv/lib/python3.8/site-packages/graphql/execution/executor.py", line 571, in complete_value
    return complete_object_value(
  File "/home/kinow/Development/python/workspace/cylc-uiserver/venv/lib/python3.8/site-packages/graphql/execution/executor.py", line 722, in complete_object_value
    return execute_fields(  # type: ignore
  File "/home/kinow/Development/python/workspace/cylc-uiserver/venv/lib/python3.8/site-packages/graphql/execution/executor.py", line 244, in execute_fields
    result = resolve_field(
  File "/home/kinow/Development/python/workspace/cylc-uiserver/venv/lib/python3.8/site-packages/graphql/execution/executor.py", line 367, in resolve_field
    return complete_value_catching_error(
  File "/home/kinow/Development/python/workspace/cylc-uiserver/venv/lib/python3.8/site-packages/graphql/execution/executor.py", line 475, in complete_value_catching_error
    return complete_value(exe_context, return_type, field_asts, info, path, result)
  File "/home/kinow/Development/python/workspace/cylc-uiserver/venv/lib/python3.8/site-packages/graphql/execution/executor.py", line 546, in complete_value
    return complete_nonnull_value(
  File "/home/kinow/Development/python/workspace/cylc-uiserver/venv/lib/python3.8/site-packages/graphql/execution/executor.py", line 743, in complete_nonnull_value
    raise GraphQLError(
graphql.error.base.GraphQLError: Cannot return null for non-nullable field Workflow.id.

Just commenting because I thought the issue had been fixed on master by one of the last commits, but it's still broken there 👍

@dwsutherland
Copy link
Member Author

Just commenting because I thought the issue had been fixed on master by one of the last commits, but it's still broken there 👍

Not sure I understand.. because this is the PR that's suppose to fix it.

@kinow
Copy link
Member

kinow commented Sep 28, 2021

Not sure I understand.. because this is the PR that's suppose to fix it.

Sorry for the confusing comment @dwsutherland . Last week I couldn't reproduce the bug on master and thought it had been fixed by some commit to cylc-flow, but since I saw the exception today again it means master is still broken and this PR will hopefully fix it 😬

@dwsutherland
Copy link
Member Author

Last week I couldn't reproduce the bug on master

That's the thing, and why I haven't push a final fix yet.. There's an underlying issue causing the randomness, like a race condition.. I think I know where it is.. but need to investigate more.

@dwsutherland
Copy link
Member Author

Two issues here:

  1. The empty workflow is not being stripped at the deltas.updated (or deltas.added) level..

And protobuf will return and empty string without stripping, which is why we only get this error with stripping (as it sets it to None)..

  1. The delta queue is being processed asynchronously (we are using an async gen), so two deltas close together from the same workflow can interfere with each other (the last delta will replace the current delta-store, and so previous deltas' resolvers will read for this)

So the stopped delta is so close that it wipes out the shutdown delta's store, hence why we sometimes don't get the error.
However, we do want the async behavior for multiple workflows, but not for deltas from the same workflow, they need to be serial.

I will try to address the first one in this PR, but create a new issue for the second point.

@kinow
Copy link
Member

kinow commented Oct 7, 2021

Great detective work @dwsutherland !!! 👏

However, we do want the async behavior for multiple workflows, but not for deltas from the same workflow, they need to be serial.

Sounds right to me.

I will try to address the first one in this PR, but create a new issue for the second point.

Thanks

@dwsutherland
Copy link
Member Author

dwsutherland commented Oct 7, 2021

I've pushed the fix.. But not quite sure how to write a test for it really.
It would be used by a GraphQL call that resolves off and empty store and strips empties... So maybe
(the block is being run, but the conditions just aren't being met by the current set of calls)

@hjoliver hjoliver merged commit 4c4e836 into cylc:master Oct 7, 2021
@MetRonnie MetRonnie modified the milestones: cylc-8.0.0, cylc-8.0b3 Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Cannot return null for non-nullable field Workflow.id" when stopping multiple workflows
4 participants