Skip to content

Conversation

@gsakkis
Copy link
Contributor

@gsakkis gsakkis commented May 11, 2021

Problem description

Most tiledb functions and classes accept an optional ctx parameter. This is less than ideal for the following reasons:

ctx = tiledb.Ctx()

dom = tiledb.Domain(
    *[
        tiledb.Dim(name="str_dim", domain=(None, None), dtype=np.bytes_, ctx=ctx),
        tiledb.Dim(name="int64_dim", domain=(0, 100), tile=10, dtype=np.int64, ctx=ctx),
        tiledb.Dim(
            name="float64_dim",
            domain=(-100.0, 100.0),
            tile=10,
            dtype=np.float64,
            ctx=ctx,
        ),
    ],
    ctx=ctx
)

att = tiledb.Attr(name="a", ctx=ctx, dtype=np.int64)
schema = tiledb.ArraySchema(
    ctx=ctx, domain=dom, attrs=(att,), sparse=True, capacity=10000
)
tiledb.SparseArray.create(path, schema)
  • Error-prone
    In the example above ctx is not passed to the tiledb.SparseArray.create call, most likely an accidental omission. In this particular example it doesn't matter because the default global Ctx instance is equal to tiledb.Ctx() but it could matter if ctx was initialized with a different Config. Such accidental omissions happen in several places in tiledb.dataframe_.

  • Global variable
    A global Ctx instance can be initialized and used by default via tiledb.default_ctx(). This solves the above verbosity and error-proneness issues at the cost of introducing a global variable. In addition to the usual drawbacks of global variables, the global Ctx can be initialized only once in a program's lifetime.


Proposal

This PR introduces the following enhancements:

  • Adds a new tiledb.ctx(config=None) context manager for specifying a tiledb.Ctx within a block of code. This Ctx is both returned by the context manager (for explicit usage) and set as "global" within the scope of the block (for implicit usage).
  • Replaces the global variable default with a ContextVar (introduced in Python 3.7 and available as 3rd party backport for 3.6). In short, a ContextVar can be thought of as a generalization of global or thread-local variables that also plays well with asynchronous (async/await) code.
  • Removes the (long deprecated) tiledb.initialize_ctx function.

Once/if this is accepted, a following PR will clean up the redundant and/or verbose explicit ctx usage with appropriate tiledb.ctx() context blocks.

@gsakkis gsakkis requested a review from ihnorton May 11, 2021 10:49
@ihnorton
Copy link
Member

ihnorton commented May 11, 2021

I like this very much for usage in the test blocks. I have an old prototype of this on my "post Python 3" list, so thanks!

However, I'm a bit hesitant about encouraging it as a general pattern, because contexts should be created rarely. Contexts are relatively expensive: each one has a separate thread pool and cache, and recreating a context in an interactive session could be detrimental to performance (because the caches will be invalidated). The pattern I'm worried about is:

with tiledb.ctx(...):
  with tiledb.open(...):
    ..

with tiledb.ctx(...):
  with tiledb.open(...):
    ...

I also think the Ctx/ctx overlap could be confusing; what do you think about ctx -> scope_ctx? (soft suggestion - please discuss if you don't like).

@ihnorton
Copy link
Member

Relatedly: I believe the only reason I restricted updating tiledb.default_ctx was that TBB could only be safely initialized once per process lifetime, so another option here would be to also remove that restriction now that we don't use TBB. That would allow a user to call tiledb.default_ctx again if really needed, which I think would improve the UX for most interactive use-cases.

cc @stavrospapadopoulos @Shelnutt2 @MargrietGroenendijk

@gsakkis
Copy link
Contributor Author

gsakkis commented May 11, 2021

The pattern I'm worried about is:

with tiledb.ctx(...):
  with tiledb.open(...):
    ..

with tiledb.ctx(...):
  with tiledb.open(...):
    ...

Presumably one would(should) use this only if the second context uses a different config from the first, in which case the extra cost of a new ctx is unavoidable. Besides one can already create explicitly contexts with ctx = tiledb.Ctx(), not sure why the context manager would be more prone to suboptimal usage.

I also think the Ctx/ctx overlap could be confusing; what do you think about ctx -> scope_ctx? (soft suggestion - please discuss if you don't like).

No strong opinion on naming, scope_ctx is fine; thanks!

Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion on naming, scope_ctx is fine

I think making the distinction from Ctx is sufficient for my other usage concern. Please make this change, and otherwise this LGTM (pending any refactoring to use it internally). Thank you!

@gsakkis gsakkis force-pushed the gsk/contextvars branch from 09009e8 to 7b1fd6c Compare May 12, 2021 08:27
Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants