Merge branch 'main' into speedup-dt-accesor

* main: Introduce Grouper objects internally (pydata#7561) [skip-ci] Add cftime groupby, resample benchmarks (pydata#7795) Fix groupby binary ops when grouped array is subset relative to other (pydata#7798) adjust the deprecation policy for python (pydata#7793) [pre-commit.ci] pre-commit autoupdate (pydata#7803) Allow the label run-upstream to run upstream CI (pydata#7787) Update asv links in contributing guide (pydata#7801) Implement DataArray.to_dask_dataframe() (pydata#7635) `ds.to_dict` with data as arrays, not lists (pydata#7739) Add lshift and rshift operators (pydata#7741) Use canonical name for set_horizonalalignment over alias set_ha (pydata#7786) Remove pandas<2 pin (pydata#7785) [pre-commit.ci] pre-commit autoupdate (pydata#7783)
dcherian · May 6, 2023 · 993ad42 · 993ad42
2 parents cce3df8 + fde773e
commit 993ad42
Show file tree

Hide file tree

Showing 28 changed files with 1,133 additions and 422 deletions.
diff --git a/.github/workflows/upstream-dev-ci.yaml b/.github/workflows/upstream-dev-ci.yaml
@@ -6,6 +6,7 @@ on:
   pull_request:
     branches:
       - main
+    types: [opened, reopened, synchronize, labeled]
   schedule:
     - cron: "0 0 * * *" # Daily “At 00:00” UTC
   workflow_dispatch: # allows you to trigger the workflow run manually
@@ -41,6 +42,7 @@ jobs:
         && (
             (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
             || needs.detect-ci-trigger.outputs.triggered == 'true'
+            || contains( github.event.pull_request.labels.*.name, 'run-upstream')
         )
     defaults:
       run:
@@ -92,3 +94,58 @@ jobs:
         uses: xarray-contrib/issue-from-pytest-log@v1
         with:
           log-path: output-${{ matrix.python-version }}-log.jsonl
+
+  mypy-upstream-dev:
+    name: mypy-upstream-dev
+    runs-on: ubuntu-latest
+    needs: detect-ci-trigger
+    if: |
+        always()
+        && (
+            contains( github.event.pull_request.labels.*.name, 'run-upstream')
+        )
+    defaults:
+      run:
+        shell: bash -l {0}
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.10"]
+    steps:
+      - uses: actions/checkout@v3
+        with:
+          fetch-depth: 0 # Fetch all history for all branches and tags.
+      - name: Set up conda environment
+        uses: mamba-org/provision-with-micromamba@v15
+        with:
+          environment-file: ci/requirements/environment.yml
+          environment-name: xarray-tests
+          extra-specs: |
+            python=${{ matrix.python-version }}
+            pytest-reportlog
+            conda
+      - name: Install upstream versions
+        run: |
+          bash ci/install-upstream-wheels.sh
+      - name: Install xarray
+        run: |
+          python -m pip install --no-deps -e .
+      - name: Version info
+        run: |
+          conda info -a
+          conda list
+          python xarray/util/print_versions.py
+      - name: Install mypy
+        run: |
+          python -m pip install mypy --force-reinstall
+      - name: Run mypy
+        run: |
+          python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report
+      - name: Upload mypy coverage to Codecov
+        uses: codecov/codecov-action@v3.1.3
+        with:
+          file: mypy_report/cobertura.xml
+          flags: mypy
+          env_vars: PYTHON_VERSION
+          name: codecov-umbrella
+          fail_ci_if_error: false
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -16,7 +16,7 @@ repos:
         files: ^xarray/
   - repo: https://github.com/charliermarsh/ruff-pre-commit
     # Ruff version.
-    rev: 'v0.0.261'
+    rev: 'v0.0.263'
     hooks:
       - id: ruff
         args: ["--fix"]

diff --git a/asv_bench/asv.conf.json b/asv_bench/asv.conf.json
@@ -30,6 +30,7 @@
     // determined by looking for tools on the PATH environment
     // variable.
     "environment_type": "conda",
+    "conda_channels": ["conda-forge"],
 
     // timeout in seconds for installing any dependencies in environment
     // defaults to 10 min

diff --git a/asv_bench/benchmarks/groupby.py b/asv_bench/benchmarks/groupby.py
@@ -18,23 +18,29 @@ def setup(self, *args, **kwargs):
                 "c": xr.DataArray(np.arange(2 * self.n)),
             }
         )
-        self.ds2d = self.ds1d.expand_dims(z=10)
+        self.ds2d = self.ds1d.expand_dims(z=10).copy()
         self.ds1d_mean = self.ds1d.groupby("b").mean()
         self.ds2d_mean = self.ds2d.groupby("b").mean()
 
     @parameterized(["ndim"], [(1, 2)])
     def time_init(self, ndim):
         getattr(self, f"ds{ndim}d").groupby("b")
 
-    @parameterized(["method", "ndim"], [("sum", "mean"), (1, 2)])
-    def time_agg_small_num_groups(self, method, ndim):
+    @parameterized(
+        ["method", "ndim", "use_flox"], [("sum", "mean"), (1, 2), (True, False)]
+    )
+    def time_agg_small_num_groups(self, method, ndim, use_flox):
         ds = getattr(self, f"ds{ndim}d")
-        getattr(ds.groupby("a"), method)().compute()
+        with xr.set_options(use_flox=use_flox):
+            getattr(ds.groupby("a"), method)().compute()
 
-    @parameterized(["method", "ndim"], [("sum", "mean"), (1, 2)])
-    def time_agg_large_num_groups(self, method, ndim):
+    @parameterized(
+        ["method", "ndim", "use_flox"], [("sum", "mean"), (1, 2), (True, False)]
+    )
+    def time_agg_large_num_groups(self, method, ndim, use_flox):
         ds = getattr(self, f"ds{ndim}d")
-        getattr(ds.groupby("b"), method)().compute()
+        with xr.set_options(use_flox=use_flox):
+            getattr(ds.groupby("b"), method)().compute()
 
     def time_binary_op_1d(self):
         (self.ds1d.groupby("b") - self.ds1d_mean).compute()
@@ -115,15 +121,21 @@ def setup(self, *args, **kwargs):
     def time_init(self, ndim):
         getattr(self, f"ds{ndim}d").resample(time="D")
 
-    @parameterized(["method", "ndim"], [("sum", "mean"), (1, 2)])
-    def time_agg_small_num_groups(self, method, ndim):
+    @parameterized(
+        ["method", "ndim", "use_flox"], [("sum", "mean"), (1, 2), (True, False)]
+    )
+    def time_agg_small_num_groups(self, method, ndim, use_flox):
         ds = getattr(self, f"ds{ndim}d")
-        getattr(ds.resample(time="3M"), method)().compute()
+        with xr.set_options(use_flox=use_flox):
+            getattr(ds.resample(time="3M"), method)().compute()
 
-    @parameterized(["method", "ndim"], [("sum", "mean"), (1, 2)])
-    def time_agg_large_num_groups(self, method, ndim):
+    @parameterized(
+        ["method", "ndim", "use_flox"], [("sum", "mean"), (1, 2), (True, False)]
+    )
+    def time_agg_large_num_groups(self, method, ndim, use_flox):
         ds = getattr(self, f"ds{ndim}d")
-        getattr(ds.resample(time="48H"), method)().compute()
+        with xr.set_options(use_flox=use_flox):
+            getattr(ds.resample(time="48H"), method)().compute()
 
 
 class ResampleDask(Resample):
@@ -132,3 +144,32 @@ def setup(self, *args, **kwargs):
         super().setup(**kwargs)
         self.ds1d = self.ds1d.chunk({"time": 50})
         self.ds2d = self.ds2d.chunk({"time": 50, "z": 4})
+
+
+class ResampleCFTime(Resample):
+    def setup(self, *args, **kwargs):
+        self.ds1d = xr.Dataset(
+            {
+                "b": ("time", np.arange(365.0 * 24)),
+            },
+            coords={
+                "time": xr.date_range(
+                    "2001-01-01", freq="H", periods=365 * 24, calendar="noleap"
+                )
+            },
+        )
+        self.ds2d = self.ds1d.expand_dims(z=10)
+        self.ds1d_mean = self.ds1d.resample(time="48H").mean()
+        self.ds2d_mean = self.ds2d.resample(time="48H").mean()
+
+
+@parameterized(["use_cftime", "use_flox"], [[True, False], [True, False]])
+class GroupByLongTime:
+    def setup(self, use_cftime, use_flox):
+        arr = np.random.randn(10, 10, 365 * 30)
+        time = xr.date_range("2000", periods=30 * 365, use_cftime=use_cftime)
+        self.da = xr.DataArray(arr, dims=("y", "x", "time"), coords={"time": time})
+
+    def time_mean(self, use_cftime, use_flox):
+        with xr.set_options(use_flox=use_flox):
+            self.da.groupby("time.year").mean()
diff --git a/ci/min_deps_check.py b/ci/min_deps_check.py
@@ -29,7 +29,7 @@
     "pytest-timeout",
 }
 
-POLICY_MONTHS = {"python": 24, "numpy": 18}
+POLICY_MONTHS = {"python": 30, "numpy": 18}
 POLICY_MONTHS_DEFAULT = 12
 POLICY_OVERRIDE: dict[str, tuple[int, int]] = {}
 errors = []
@@ -109,6 +109,9 @@ def metadata(entry):
                 (3, 6): datetime(2016, 12, 23),
                 (3, 7): datetime(2018, 6, 27),
                 (3, 8): datetime(2019, 10, 14),
+                (3, 9): datetime(2020, 10, 5),
+                (3, 10): datetime(2021, 10, 4),
+                (3, 11): datetime(2022, 10, 24),
             }
         )
 

diff --git a/doc/api.rst b/doc/api.rst
@@ -632,6 +632,7 @@ DataArray methods
    DataArray.from_iris
    DataArray.from_series
    DataArray.to_cdms2
+   DataArray.to_dask_dataframe
    DataArray.to_dataframe
    DataArray.to_dataset
    DataArray.to_dict

diff --git a/doc/contributing.rst b/doc/contributing.rst
@@ -829,17 +829,17 @@ Running the performance test suite
 
 Performance matters and it is worth considering whether your code has introduced
 performance regressions.  *xarray* is starting to write a suite of benchmarking tests
-using `asv <https://github.com/spacetelescope/asv>`__
+using `asv <https://github.com/airspeed-velocity/asv>`__
 to enable easy monitoring of the performance of critical *xarray* operations.
 These benchmarks are all found in the ``xarray/asv_bench`` directory.
 
 To use all features of asv, you will need either ``conda`` or
 ``virtualenv``. For more details please check the `asv installation
-webpage <https://asv.readthedocs.io/en/latest/installing.html>`_.
+webpage <https://asv.readthedocs.io/en/stable/installing.html>`_.
 
 To install asv::
 
-    pip install git+https://github.com/spacetelescope/asv
+    python -m pip install asv
 
 If you need to run a benchmark, change your directory to ``asv_bench/`` and run::
 

diff --git a/doc/getting-started-guide/installing.rst b/doc/getting-started-guide/installing.rst
@@ -86,7 +86,7 @@ Minimum dependency versions
 Xarray adopts a rolling policy regarding the minimum supported version of its
 dependencies:
 
-- **Python:** 24 months
+- **Python:** 30 months
   (`NEP-29 <https://numpy.org/neps/nep-0029-deprecation_policy.html>`_)
 - **numpy:** 18 months
   (`NEP-29 <https://numpy.org/neps/nep-0029-deprecation_policy.html>`_)

diff --git a/doc/user-guide/computation.rst b/doc/user-guide/computation.rst
@@ -63,6 +63,10 @@ Data arrays also implement many :py:class:`numpy.ndarray` methods:
     arr.round(2)
     arr.T
 
+    intarr = xr.DataArray([0, 1, 2, 3, 4, 5])
+    intarr << 2  # only supported for int types
+    intarr >> 1
+
 .. _missing_values:
 
 Missing values

diff --git a/doc/whats-new.rst b/doc/whats-new.rst
@@ -22,19 +22,26 @@ v2023.05.0 (unreleased)
 
 New Features
 ~~~~~~~~~~~~
+- Added new method :py:meth:`DataArray.to_dask_dataframe`, convert a dataarray into a dask dataframe (:issue:`7409`).
+  By `Deeksha <https://github.com/dsgreen2>`_.
+- Add support for lshift and rshift binary operators (``<<``, ``>>``) on
+  :py:class:`xr.DataArray` of type :py:class:`int` (:issue:`7727` , :pull:`7741`).
+  By `Alan Brammer <https://github.com/abrammer>`_.
 
 
 Breaking changes
 ~~~~~~~~~~~~~~~~
-
+- adjust the deprecation policy for python to once again align with NEP-29 (:issue:`7765`, :pull:`7793`)
+  By `Justus Magin <https://github.com/keewis>`_.
 
 Deprecations
 ~~~~~~~~~~~~
 
 
 Bug fixes
 ~~~~~~~~~
-
+- Fix groupby binary ops when grouped array is subset relative to other. (:issue:`7797`).
+  By `Deepak Cherian <https://github.com/dcherian>`_.
 
 Documentation
 ~~~~~~~~~~~~~
@@ -102,6 +109,10 @@ New Features
 - Added ability to save ``DataArray`` objects directly to Zarr using :py:meth:`~xarray.DataArray.to_zarr`.
   (:issue:`7692`, :pull:`7693`) .
   By `Joe Hamman <https://github.com/jhamman>`_.
+- Keyword argument `data='array'` to both :py:meth:`xarray.Dataset.to_dict` and
+  :py:meth:`xarray.DataArray.to_dict` will now return data as the underlying array type. Python lists are returned for `data='list'` or `data=True`. Supplying `data=False` only returns the schema without data. ``encoding=True`` returns the encoding dictionary for the underlying variable also.
+  (:issue:`1599`, :pull:`7739`) .
+  By `James McCreight <https://github.com/jmccreight>`_.
 
 Breaking changes
 ~~~~~~~~~~~~~~~~

diff --git a/requirements.txt b/requirements.txt
@@ -4,4 +4,4 @@
 
 numpy >= 1.21
 packaging >= 21.3
-pandas >= 1.4, <2
+pandas >= 1.4