Skip to content

Made new xarray context behavior to register multiple tables from a dataset#166

Merged
alxmrs merged 8 commits into
mainfrom
xarray-context
May 17, 2026
Merged

Made new xarray context behavior to register multiple tables from a dataset#166
alxmrs merged 8 commits into
mainfrom
xarray-context

Conversation

@eparkle
Copy link
Copy Markdown
Collaborator

@eparkle eparkle commented Apr 29, 2026

This should account for the differing dimensions between variables

Copy link
Copy Markdown
Owner

@alxmrs alxmrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good! Here are my thoughts on design.

Comment thread xarray_sql/sql.py
return self


def _group_vars_by_dims(ds: xr.Dataset) -> dict[tuple[str, ...], list[str]]:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a few unit tests for this method, even though it is private.

Comment thread xarray_sql/sql.py Outdated

return self

def from_dataset_multi(
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level feedback: I think we should roll this into from_dataset and it should be the "default" behavior. Since we are an experimental project, I'm OK with changing the public API.

I think @ghostiee-11 or @ahuang11 might have a few ideas about what good names would be for these kwargs, i.e. what is most conventional for Xarray. So far, these LGTM.

Comment thread xarray_sql/sql.py Outdated

def from_dataset_multi(
self,
dataset_name: str,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be table_name IMO

Comment thread xarray_sql/sql.py Outdated
self,
dataset_name: str,
input_table: xr.Dataset,
dim_aliases: dict[tuple[str, ...], str] | None = None,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like "aliases", but "dim" is a bit unclear; I'm not sure exactly what these should be called. Maybe "dim_group_aliases"? I'm curious what @ahuang11 and @ghostiee-11 think, too.

Comment thread xarray_sql/sql.py Outdated
dataset_name: A name for the dataset, used as a prefix for all
registered table names.
input_table: An xarray Dataset. Variables may have differing dimensions.
dim_aliases: Optional mapping from dimension tuples to custom table
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that we have an aliases override.

Comment thread xarray_sql/sql.py Outdated
suffix = dim_aliases.get(dims, "_".join(dims))
table_name = f"{dataset_name}_{suffix}"
sub_ds = input_table[var_names]
self.from_dataset(table_name, sub_ds, chunks)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this pattern, so much so that I think in the new version that we could turn the existing from_dataset into a private method (_from_dataset). Then, in a method like this, we could have a big if or return early logic to tell if there is more than one group of dims.

@alxmrs alxmrs merged commit 15231d3 into main May 17, 2026
12 checks passed
@alxmrs alxmrs deleted the xarray-context branch May 17, 2026 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants