Skip to content

WIP Spark 56763 sarutak 3.5 restore additional functionality r2#55886

Draft
holdenk wants to merge 20 commits into
apache:branch-3.5from
holdenk:SPARK-56763-sarutak-3.5-restore-additional-functionality-r2
Draft

WIP Spark 56763 sarutak 3.5 restore additional functionality r2#55886
holdenk wants to merge 20 commits into
apache:branch-3.5from
holdenk:SPARK-56763-sarutak-3.5-restore-additional-functionality-r2

Conversation

@holdenk
Copy link
Copy Markdown
Contributor

@holdenk holdenk commented May 14, 2026

What changes were proposed in this pull request?

This is a rebase of https://github.com/apache/spark/pull/55740/changes on the PPA and docker fix

This re-enables R doc build and Py3.8

For type testing to continue to work in Py3.8 it changes how we fall back on torch import failure given the lack of ongoing 3.8 support by torch..

Why are the changes needed?

Our R version floats and various things have changed in 4.4 which has broken CI, similarily many of our dependencies float which broke MyPy type checking in Python.

Note: I plan to follow up with a seperate PR to pin our R version (in this branch) back to 4.3 but for now lets fix it (we can also pin to 4.4 if people prefer but I do want to pin the R version eventually).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Base image build workflow passes on GitHub Actions.
  • docker build dev/infra succeeds locally.

Was this patch authored or co-authored using generative AI tooling?

Kiro CLI / Opus 4.6

sarutak and others added 12 commits May 10, 2026 14:46
### What changes were proposed in this pull request?
Add `apt-get update` before `apt-get install` for R-related dev libraries to avoid stale package index causing 404 errors.

### Why are the changes needed?
The `apt-get install` for R dev dependencies (libtiff5-dev, libharfbuzz-dev, etc.) is in a separate RUN layer from the earlier `apt-get update`, so when the package index becomes stale (packages are superseded on the Ubuntu archive), the install fails with 404.

### Does this PR introduce *any* user-facing change?
No.

### How was this patch tested?
CI.

### Was this patch authored or co-authored using generative AI tooling?
No.
Primitive functions (e.g., min, max, sum) do not have environments and
attempting to set one via environment<- has no effect. Since R 4.4.0,
this operation emits a deprecation warning, which causes test failures
when running with options(warn = 2).

Add is.primitive() guards in both processClosure and cleanClosure so
that primitive functions are handled without attempting to access or
modify their environment.
Pin Werkzeug==2.1.2 in Dockerfile to maintain compatibility with
markupsafe==2.0.1 used in the workflow lint step.

Pin ragg==1.2.5 in the workflow before pkgdown installation because
ragg 1.5.x requires libwebp which is not available in the Docker
image, and its configure script fails to find freetype2 headers.
…763-sarutak-3.5-restore-additional-functionality-r2
sfc-gh-hkarau and others added 2 commits May 14, 2026 20:35
- Fix PEP 585 dict[K,V] syntax in plan.py (runtime TypeError on 3.8)
- Add grpcio/protobuf stack for python3.8 in Dockerfile
- Guard unconditional torch imports in ml/connect/classification.py and
  ml/torch/data.py so missing torch fails gracefully instead of crashing
  the test runner
- Restore python3.8 to default executables in python/run-tests.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@holdenk
Copy link
Copy Markdown
Contributor Author

holdenk commented May 14, 2026

CC @sarutak & @gaogaotiantian & @zhengruifeng

@holdenk
Copy link
Copy Markdown
Contributor Author

holdenk commented May 14, 2026

Oh also CC @devin-petersohn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants