Skip to content

[SPARK-56889][INFRA][FOLLOWUP] Restore python3.10 in dev/infra/Dockerfile#55946

Closed
zhengruifeng wants to merge 1 commit into
apache:masterfrom
zhengruifeng:restore-infra-dockerfile-python310
Closed

[SPARK-56889][INFRA][FOLLOWUP] Restore python3.10 in dev/infra/Dockerfile#55946
zhengruifeng wants to merge 1 commit into
apache:masterfrom
zhengruifeng:restore-infra-dockerfile-python310

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This is a partial revert of #55914 (SPARK-56889) restricted to dev/infra/Dockerfile. It restores the file to its state at 0a0d31bea00~1 — switching the system Python used in the base CI image from python3.11 back to python3.10. All other 20 files changed by #55914 are kept as-is.

Why are the changes needed?

The Build / Cache base image workflow has been failing on every branch since #55914 was merged (2026-05-16): https://github.com/apache/spark/actions/runs/26010261434/job/76449371588

The failure aborts at the first Build and push step (the base ./dev/infra/ image), with:

RUN add-apt-repository ppa:deadsnakes/ppa
  ...
  ModuleNotFoundError: No module named 'pyparsing'
ERROR: process "/bin/sh -c add-apt-repository ppa:deadsnakes/ppa" did not complete successfully: exit code: 1

This breaks the cache build for all downstream image jobs, since the base layer is shared. Restoring the previous Dockerfile is the smallest change that unblocks the cache workflow while a forward fix is investigated.

Does this PR introduce any user-facing change?

No. CI-only change. PySpark itself is unaffected — only the system Python inside the test image.

How was this patch tested?

This PR re-runs Build / Cache base image against the restored Dockerfile.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (model: claude-opus-4-7)

…file

Generated-by: Claude Code (model: claude-opus-4-7)
@zhengruifeng zhengruifeng marked this pull request as ready for review May 18, 2026 03:56
@zhengruifeng
Copy link
Copy Markdown
Contributor Author

maybe we don't need this image in master/4.x, but let me restore it first

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reporting, but shall we fix the root cause?

According to the reporting, we can install python3-pyparsing before add-apt-repository ppa:deadsnakes/ppa, can't we?

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @zhengruifeng and @LuciferYang . I made an alternative follow-up.

@dongjoon-hyun
Copy link
Copy Markdown
Member

dongjoon-hyun commented May 18, 2026

It seems that the alternative isn't enough. Let's merge this to recover CIs.

dongjoon-hyun pushed a commit that referenced this pull request May 18, 2026
…file

### What changes were proposed in this pull request?

This is a partial revert of #55914 (SPARK-56889) restricted to `dev/infra/Dockerfile`. It restores the file to its state at `0a0d31bea00~1` — switching the system Python used in the base CI image from `python3.11` back to `python3.10`. All other 20 files changed by #55914 are kept as-is.

### Why are the changes needed?

The `Build / Cache base image` workflow has been failing on every branch since #55914 was merged (2026-05-16): https://github.com/apache/spark/actions/runs/26010261434/job/76449371588

The failure aborts at the first `Build and push` step (the base `./dev/infra/` image), with:

```
RUN add-apt-repository ppa:deadsnakes/ppa
  ...
  ModuleNotFoundError: No module named 'pyparsing'
ERROR: process "/bin/sh -c add-apt-repository ppa:deadsnakes/ppa" did not complete successfully: exit code: 1
```

This breaks the cache build for all downstream image jobs, since the base layer is shared. Restoring the previous Dockerfile is the smallest change that unblocks the cache workflow while a forward fix is investigated.

### Does this PR introduce _any_ user-facing change?

No. CI-only change. PySpark itself is unaffected — only the system Python inside the test image.

### How was this patch tested?

This PR re-runs `Build / Cache base image` against the restored Dockerfile.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (model: claude-opus-4-7)

Closes #55946 from zhengruifeng/restore-infra-dockerfile-python310.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 3608538)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Copy Markdown
Member

Merged to master/4.x.

For the CIs, I'll keep monitoring the effective Python version.

@dongjoon-hyun
Copy link
Copy Markdown
Member

Thank you again, @zhengruifeng and @LuciferYang .

@zhengruifeng zhengruifeng deleted the restore-infra-dockerfile-python310 branch May 19, 2026 00:35
zhengruifeng pushed a commit that referenced this pull request May 19, 2026
…ng from dockerfile

### What changes were proposed in this pull request?

Remove `python3-pyparsing` from `dev/infra/Dockerfile`.

### Why are the changes needed?

This line was added in #55956 as an attempt to fix the image build issue caused by removing 3.10.

This was not the correct fix. The reason CI failed was because we somehow uninstalled `pyparsing` from system python3. We mitigated the issue later in #55946 . We should not keep this line around - it could confuse people in the future.

This specific dockerfile is only used by 3.5. We should just retire the file after 3.5 EOL. We don't have to upgrade the 3.10 part (we didn't touch the 3.9 part of the file). We can just keep it as it is and delete it in the future. As we are using the docker file for 3.5, we probably shouldn't remove 3.10 because it's actually supported in 3.5.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI should work.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #55965 from gaogaotiantian/remove-pyparsing.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
zhengruifeng pushed a commit that referenced this pull request May 19, 2026
…ng from dockerfile

### What changes were proposed in this pull request?

Remove `python3-pyparsing` from `dev/infra/Dockerfile`.

### Why are the changes needed?

This line was added in #55956 as an attempt to fix the image build issue caused by removing 3.10.

This was not the correct fix. The reason CI failed was because we somehow uninstalled `pyparsing` from system python3. We mitigated the issue later in #55946 . We should not keep this line around - it could confuse people in the future.

This specific dockerfile is only used by 3.5. We should just retire the file after 3.5 EOL. We don't have to upgrade the 3.10 part (we didn't touch the 3.9 part of the file). We can just keep it as it is and delete it in the future. As we are using the docker file for 3.5, we probably shouldn't remove 3.10 because it's actually supported in 3.5.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI should work.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #55965 from gaogaotiantian/remove-pyparsing.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(cherry picked from commit eeecea4)
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants