Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exception: "ancestor argument should match namespace" during /_ah/pipeline/output #42

Open
NickFranceschina opened this issue Jun 17, 2015 · 5 comments
Labels

Comments

@NickFranceschina
Copy link

The root of this particular pipeline starts out in the default namespace, then it's immediate children change to a different namespace each (one for each of our customers), and those continue to spawn other pipeline children within that namespace. Something is happening when the pipeline is trying to finalize and roll up to the top... looks like slot_key being used as ancestor in BarrierIndex query has the default namespace ("" from our root pipeline) but the BarrierIndex query is using namespace from the child pipeline ("1" because new tasks, by default, take on the namespace of the task that spawned them)... and thus seeing BadArgumentError

This is my best guess at what is happening... and I don't understand the details of the code in order to try and render a fix... tho I'm going to try...

0.1.0.2 - - [17/Jun/2015:09:59:23 -0700] "POST /_ah/pipeline/output HTTP/1.1" 500 0 "http://1.tasks.some-app-id.appspot.com/_ah/pipeline/run" "AppEngine-Google; (+http://code.google.com/appengine)" "1.tasks.some-app-id.appspot.com" ms=15 cpu_ms=2 queue_name=notify task_name=0992829243632586978 instance=0 app_engine_release=1.9.22 
ancestor argument should match namespace ("''" != "'1'")
Traceback (most recent call last):
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
    rv = self.handle_exception(request, response, e)
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
    rv = self.router.dispatch(request, response)
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
    return handler.dispatch()
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
    return method(*args, **kwargs)
  File "/base/data/home/apps/s~some-app-id/tasks:1.385065318471689233/pipeline/pipeline.py", line 2647, in post
    use_barrier_indexes=self.request.get('use_barrier_indexes') == 'True')
  File "/base/data/home/apps/s~some-app-id/tasks:1.385065318471689233/pipeline/pipeline.py", line 1538, in notify_barriers
    barrier_index_list = query.fetch(max_to_notify)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 2161, in fetch
    return list(self.run(limit=limit, offset=offset, **kwargs))
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 2080, in run
    iterator = raw_query.Run(**kwargs)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 1681, in Run
    itr = Iterator(self.GetBatcher(config=config))
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 1660, in GetBatcher
    return self.GetQuery().run(_GetConnection(), query_options)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 1534, in GetQuery
    group_by=self.__group_by)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 105, in positional_wrapper
    return wrapped(*args, **kwds)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_query.py", line 1934, in __init__
    ancestor=ancestor)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 105, in positional_wrapper
    return wrapped(*args, **kwds)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_query.py", line 1729, in __init__
    (ancestor.name_space(), namespace))
BadArgumentError: ancestor argument should match namespace ("''" != "'1'")```
@NickFranceschina
Copy link
Author

for now I'm forcing use_barrier_indexes = False so that it uses the legacy code-path... then I can at least get things deployed

@soundofjw
Copy link
Contributor

Good call using use_barrier_indexes for now - this is a potentially tricky one.

If we move to fix this, it may be surprising behavior for others.
We use namespaces as well, and I think this change should be ok from my perspective, because we don't manipulate the namespaces through the pipeline. For us, a pipeline starts completely in a namespace, and stays on that namespace through appengine_config.namespace_manager_default_namespace_for_request.

With all of that noted, I want to make sure I understand the problem:

The datastore is complaining because your using an ancestor query, where the namespace of the ancestor does not match the namespace passed to the query (defaults to empty string). (https://cloud.google.com/appengine/docs/python/ndb/queryclass)

I believe the fix may be as simple as changing this line in notify_barriers:
_BarrierIndex.all(cursor=cursor, keys_only=True)
=>
_BarrierIndex.all(cursor=cursor, keys_only=True, namespace=ancestor_key.namespace())

@NickFranceschina
Copy link
Author

when you say "defaults to empty" I don't think that's what is happening... instead I believe the query is defaulting to the namespace of the task that called it (child task which was in namespace "1" called /output which triggered BarrierHandler), but the ancestor key is from the root pipeline's namespace (which is '')... that's how it looks from the last line printed out in the trace:

(ancestor.name_space(), namespace)) ---> ("''" != "'1'")

when I grab the string keys from the headers and build datastore Keys out of them, it appears the ancestor has the root pipeline ID in it, but that ID doesn't exist in the current namespace (even in the console if you open the detail barrier record, you can't click on the ancestor because it doesn't exist)

guessing, as you explained how you guys normally use the namespaces, that the Key path is just getting generated with a list of kind/id assuming they are all in the same namespace.... but in my case the top-level kind/id isn't... so it makes that key technically invalid

I can probably re-engineer our stuff to kick off a pipeline per namespace... but let me know if you think you can get it working! Thanks!

@soundofjw
Copy link
Contributor

Obviously, longterm: you don't want to kick off one pipeline per namespace permanently, you lose the major benefits of pipeline fan-out abortion and success if you aren't yielding child pipelines.

You are also correct about the default namespace - it would be the namespace of the process that created the task, for any of the pipeline tasks complete fanout abort etc.

A much more comfortable solution, now that I'm seeing a larger scope here, would involve keeping ALL pipeline entities in the namespace of the root pipeline. Then, if you need namespace switching, you would explicitly achieve this per pipeline.

This also simplifies the answer to "How do I find a pipeline with a given pipeline_id?".

One paradigm I use a lot is class inheritance for pipelines with a common setup function, to prepare for any common variables. This is good for your larger pipeline chains, like I believe you may have.

class MyRootPipeline(Pipeline):
    def setup(self, **kwargs):
        """Perform setup for MyRootPipeline and all derivative pipes."""
        # Get pipeline information, and do setup.
        self.namespace = kwargs.get('namespace', None)
        self.kwargs = kwargs.copy()  # changes to kwargs shouldn't affect local copy

        if self.namespace:
            # Sets the namespace for the current HTTP request.
            namespace_manager.set_namespace(self.namespace)

    def run(self, **kwargs):
        # Do your setup
        self.setup(**kwargs)

        # Do stuff

        # Yield child with same kwargs, and any additional args.
        kwargs['namespace'] = "other_namespace"
        yield ChildPipeline(child_wants_candy=True, **kwargs)

class ChildPipeline(MyRootPipeline):
    """Subclassed from MyRootPipeline for common setup procedure."""

    def run(self, child_wants_candy, **kwargs):
        # Performs setup, switches namespace, ...
        self.setup(**kwargs)

        # Stuff and things in the new namespace.

This won't work now, until the pipeline knows to explicitly use the namespace for the yielded child pipeline and callback tasks, etc. - but that's the support I'd consider targeting for this issue.

I hope this all makes sense!!! 🐻

@NickFranceschina
Copy link
Author

Funny... that's pretty much exactly our pipeline subclass paradigm as well

class MigrateFiles(MigrationPipeline):
    def run(self):
        self.setup()
        ...

as for our existing structure, we don't need to run a single pipeline across namespaces, we just had it setup that way. we could definitely restructure it to be one per namespace... but as you stated, long term, it would be better to not have to think about it... and it would indeed be easier to figure out where the pipeline records are (and clear out the old ones) if they were always stored in the default namespace

So yeah, this all makes sense... and I really appreciate your input!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants