Skip to content
forked from pydata/xarray

Commit

Permalink
Merge branch 'main' into speedup-dt-accesor
Browse files Browse the repository at this point in the history
* main:
  Introduce Grouper objects internally (pydata#7561)
  [skip-ci] Add cftime groupby, resample benchmarks (pydata#7795)
  Fix groupby binary ops when grouped array is subset relative to other (pydata#7798)
  adjust the deprecation policy for python (pydata#7793)
  [pre-commit.ci] pre-commit autoupdate (pydata#7803)
  Allow the label run-upstream to run upstream CI (pydata#7787)
  Update asv links in contributing guide (pydata#7801)
  Implement DataArray.to_dask_dataframe() (pydata#7635)
  `ds.to_dict` with data as arrays, not lists (pydata#7739)
  Add lshift and rshift operators (pydata#7741)
  Use canonical name for set_horizonalalignment over alias set_ha (pydata#7786)
  Remove pandas<2 pin (pydata#7785)
  [pre-commit.ci] pre-commit autoupdate (pydata#7783)
  • Loading branch information
dcherian committed May 6, 2023
2 parents cce3df8 + fde773e commit 993ad42
Show file tree
Hide file tree
Showing 28 changed files with 1,133 additions and 422 deletions.
57 changes: 57 additions & 0 deletions .github/workflows/upstream-dev-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ on:
pull_request:
branches:
- main
types: [opened, reopened, synchronize, labeled]
schedule:
- cron: "0 0 * * *" # Daily “At 00:00” UTC
workflow_dispatch: # allows you to trigger the workflow run manually
Expand Down Expand Up @@ -41,6 +42,7 @@ jobs:
&& (
(github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
|| needs.detect-ci-trigger.outputs.triggered == 'true'
|| contains( github.event.pull_request.labels.*.name, 'run-upstream')
)
defaults:
run:
Expand Down Expand Up @@ -92,3 +94,58 @@ jobs:
uses: xarray-contrib/issue-from-pytest-log@v1
with:
log-path: output-${{ matrix.python-version }}-log.jsonl

mypy-upstream-dev:
name: mypy-upstream-dev
runs-on: ubuntu-latest
needs: detect-ci-trigger
if: |
always()
&& (
contains( github.event.pull_request.labels.*.name, 'run-upstream')
)
defaults:
run:
shell: bash -l {0}
strategy:
fail-fast: false
matrix:
python-version: ["3.10"]
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0 # Fetch all history for all branches and tags.
- name: Set up conda environment
uses: mamba-org/provision-with-micromamba@v15
with:
environment-file: ci/requirements/environment.yml
environment-name: xarray-tests
extra-specs: |
python=${{ matrix.python-version }}
pytest-reportlog
conda
- name: Install upstream versions
run: |
bash ci/install-upstream-wheels.sh
- name: Install xarray
run: |
python -m pip install --no-deps -e .
- name: Version info
run: |
conda info -a
conda list
python xarray/util/print_versions.py
- name: Install mypy
run: |
python -m pip install mypy --force-reinstall
- name: Run mypy
run: |
python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report
- name: Upload mypy coverage to Codecov
uses: codecov/codecov-action@v3.1.3
with:
file: mypy_report/cobertura.xml
flags: mypy
env_vars: PYTHON_VERSION
name: codecov-umbrella
fail_ci_if_error: false
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ repos:
files: ^xarray/
- repo: https://github.com/charliermarsh/ruff-pre-commit
# Ruff version.
rev: 'v0.0.261'
rev: 'v0.0.263'
hooks:
- id: ruff
args: ["--fix"]
Expand Down
1 change: 1 addition & 0 deletions asv_bench/asv.conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
// determined by looking for tools on the PATH environment
// variable.
"environment_type": "conda",
"conda_channels": ["conda-forge"],

// timeout in seconds for installing any dependencies in environment
// defaults to 10 min
Expand Down
67 changes: 54 additions & 13 deletions asv_bench/benchmarks/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,29 @@ def setup(self, *args, **kwargs):
"c": xr.DataArray(np.arange(2 * self.n)),
}
)
self.ds2d = self.ds1d.expand_dims(z=10)
self.ds2d = self.ds1d.expand_dims(z=10).copy()
self.ds1d_mean = self.ds1d.groupby("b").mean()
self.ds2d_mean = self.ds2d.groupby("b").mean()

@parameterized(["ndim"], [(1, 2)])
def time_init(self, ndim):
getattr(self, f"ds{ndim}d").groupby("b")

@parameterized(["method", "ndim"], [("sum", "mean"), (1, 2)])
def time_agg_small_num_groups(self, method, ndim):
@parameterized(
["method", "ndim", "use_flox"], [("sum", "mean"), (1, 2), (True, False)]
)
def time_agg_small_num_groups(self, method, ndim, use_flox):
ds = getattr(self, f"ds{ndim}d")
getattr(ds.groupby("a"), method)().compute()
with xr.set_options(use_flox=use_flox):
getattr(ds.groupby("a"), method)().compute()

@parameterized(["method", "ndim"], [("sum", "mean"), (1, 2)])
def time_agg_large_num_groups(self, method, ndim):
@parameterized(
["method", "ndim", "use_flox"], [("sum", "mean"), (1, 2), (True, False)]
)
def time_agg_large_num_groups(self, method, ndim, use_flox):
ds = getattr(self, f"ds{ndim}d")
getattr(ds.groupby("b"), method)().compute()
with xr.set_options(use_flox=use_flox):
getattr(ds.groupby("b"), method)().compute()

def time_binary_op_1d(self):
(self.ds1d.groupby("b") - self.ds1d_mean).compute()
Expand Down Expand Up @@ -115,15 +121,21 @@ def setup(self, *args, **kwargs):
def time_init(self, ndim):
getattr(self, f"ds{ndim}d").resample(time="D")

@parameterized(["method", "ndim"], [("sum", "mean"), (1, 2)])
def time_agg_small_num_groups(self, method, ndim):
@parameterized(
["method", "ndim", "use_flox"], [("sum", "mean"), (1, 2), (True, False)]
)
def time_agg_small_num_groups(self, method, ndim, use_flox):
ds = getattr(self, f"ds{ndim}d")
getattr(ds.resample(time="3M"), method)().compute()
with xr.set_options(use_flox=use_flox):
getattr(ds.resample(time="3M"), method)().compute()

@parameterized(["method", "ndim"], [("sum", "mean"), (1, 2)])
def time_agg_large_num_groups(self, method, ndim):
@parameterized(
["method", "ndim", "use_flox"], [("sum", "mean"), (1, 2), (True, False)]
)
def time_agg_large_num_groups(self, method, ndim, use_flox):
ds = getattr(self, f"ds{ndim}d")
getattr(ds.resample(time="48H"), method)().compute()
with xr.set_options(use_flox=use_flox):
getattr(ds.resample(time="48H"), method)().compute()


class ResampleDask(Resample):
Expand All @@ -132,3 +144,32 @@ def setup(self, *args, **kwargs):
super().setup(**kwargs)
self.ds1d = self.ds1d.chunk({"time": 50})
self.ds2d = self.ds2d.chunk({"time": 50, "z": 4})


class ResampleCFTime(Resample):
def setup(self, *args, **kwargs):
self.ds1d = xr.Dataset(
{
"b": ("time", np.arange(365.0 * 24)),
},
coords={
"time": xr.date_range(
"2001-01-01", freq="H", periods=365 * 24, calendar="noleap"
)
},
)
self.ds2d = self.ds1d.expand_dims(z=10)
self.ds1d_mean = self.ds1d.resample(time="48H").mean()
self.ds2d_mean = self.ds2d.resample(time="48H").mean()


@parameterized(["use_cftime", "use_flox"], [[True, False], [True, False]])
class GroupByLongTime:
def setup(self, use_cftime, use_flox):
arr = np.random.randn(10, 10, 365 * 30)
time = xr.date_range("2000", periods=30 * 365, use_cftime=use_cftime)
self.da = xr.DataArray(arr, dims=("y", "x", "time"), coords={"time": time})

def time_mean(self, use_cftime, use_flox):
with xr.set_options(use_flox=use_flox):
self.da.groupby("time.year").mean()
5 changes: 4 additions & 1 deletion ci/min_deps_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"pytest-timeout",
}

POLICY_MONTHS = {"python": 24, "numpy": 18}
POLICY_MONTHS = {"python": 30, "numpy": 18}
POLICY_MONTHS_DEFAULT = 12
POLICY_OVERRIDE: dict[str, tuple[int, int]] = {}
errors = []
Expand Down Expand Up @@ -109,6 +109,9 @@ def metadata(entry):
(3, 6): datetime(2016, 12, 23),
(3, 7): datetime(2018, 6, 27),
(3, 8): datetime(2019, 10, 14),
(3, 9): datetime(2020, 10, 5),
(3, 10): datetime(2021, 10, 4),
(3, 11): datetime(2022, 10, 24),
}
)

Expand Down
1 change: 1 addition & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -632,6 +632,7 @@ DataArray methods
DataArray.from_iris
DataArray.from_series
DataArray.to_cdms2
DataArray.to_dask_dataframe
DataArray.to_dataframe
DataArray.to_dataset
DataArray.to_dict
Expand Down
6 changes: 3 additions & 3 deletions doc/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -829,17 +829,17 @@ Running the performance test suite
Performance matters and it is worth considering whether your code has introduced
performance regressions. *xarray* is starting to write a suite of benchmarking tests
using `asv <https://github.com/spacetelescope/asv>`__
using `asv <https://github.com/airspeed-velocity/asv>`__
to enable easy monitoring of the performance of critical *xarray* operations.
These benchmarks are all found in the ``xarray/asv_bench`` directory.
To use all features of asv, you will need either ``conda`` or
``virtualenv``. For more details please check the `asv installation
webpage <https://asv.readthedocs.io/en/latest/installing.html>`_.
webpage <https://asv.readthedocs.io/en/stable/installing.html>`_.
To install asv::
pip install git+https://github.com/spacetelescope/asv
python -m pip install asv
If you need to run a benchmark, change your directory to ``asv_bench/`` and run::
Expand Down
2 changes: 1 addition & 1 deletion doc/getting-started-guide/installing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Minimum dependency versions
Xarray adopts a rolling policy regarding the minimum supported version of its
dependencies:

- **Python:** 24 months
- **Python:** 30 months
(`NEP-29 <https://numpy.org/neps/nep-0029-deprecation_policy.html>`_)
- **numpy:** 18 months
(`NEP-29 <https://numpy.org/neps/nep-0029-deprecation_policy.html>`_)
Expand Down
4 changes: 4 additions & 0 deletions doc/user-guide/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ Data arrays also implement many :py:class:`numpy.ndarray` methods:
arr.round(2)
arr.T
intarr = xr.DataArray([0, 1, 2, 3, 4, 5])
intarr << 2 # only supported for int types
intarr >> 1
.. _missing_values:

Missing values
Expand Down
15 changes: 13 additions & 2 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,26 @@ v2023.05.0 (unreleased)

New Features
~~~~~~~~~~~~
- Added new method :py:meth:`DataArray.to_dask_dataframe`, convert a dataarray into a dask dataframe (:issue:`7409`).
By `Deeksha <https://github.com/dsgreen2>`_.
- Add support for lshift and rshift binary operators (``<<``, ``>>``) on
:py:class:`xr.DataArray` of type :py:class:`int` (:issue:`7727` , :pull:`7741`).
By `Alan Brammer <https://github.com/abrammer>`_.


Breaking changes
~~~~~~~~~~~~~~~~

- adjust the deprecation policy for python to once again align with NEP-29 (:issue:`7765`, :pull:`7793`)
By `Justus Magin <https://github.com/keewis>`_.

Deprecations
~~~~~~~~~~~~


Bug fixes
~~~~~~~~~

- Fix groupby binary ops when grouped array is subset relative to other. (:issue:`7797`).
By `Deepak Cherian <https://github.com/dcherian>`_.

Documentation
~~~~~~~~~~~~~
Expand Down Expand Up @@ -102,6 +109,10 @@ New Features
- Added ability to save ``DataArray`` objects directly to Zarr using :py:meth:`~xarray.DataArray.to_zarr`.
(:issue:`7692`, :pull:`7693`) .
By `Joe Hamman <https://github.com/jhamman>`_.
- Keyword argument `data='array'` to both :py:meth:`xarray.Dataset.to_dict` and
:py:meth:`xarray.DataArray.to_dict` will now return data as the underlying array type. Python lists are returned for `data='list'` or `data=True`. Supplying `data=False` only returns the schema without data. ``encoding=True`` returns the encoding dictionary for the underlying variable also.
(:issue:`1599`, :pull:`7739`) .
By `James McCreight <https://github.com/jmccreight>`_.

Breaking changes
~~~~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@

numpy >= 1.21
packaging >= 21.3
pandas >= 1.4, <2
pandas >= 1.4

0 comments on commit 993ad42

Please sign in to comment.