Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Jan 1, 2026

related to #6428

Summary by CodeRabbit

Release Notes

  • New Features
    • Added support for the PYTHONIOENCODING environment variable, allowing configuration of text I/O encoding and error handling behavior.
    • Enhanced encoding support with ASCII codecs alongside UTF-8 encodings.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 1, 2026

📝 Walkthrough

Walkthrough

This pull request adds support for the PYTHONIOENCODING environment variable, enabling customization of standard input/output stream encoding and error handling. Changes include environment variable parsing, new settings fields, VM initialization updates for ASCII and UTF-8 encoding registration, and corresponding module aliasing.

Changes

Cohort / File(s) Summary
Spell Check Dictionary
.cspell.dict/python-more.txt
Added PYTHONIOENCODING token to recognized dictionary entries.
Encoding Module Registration
crates/vm/Lib/core_modules/encodings_ascii.py
New module that references the standard ASCII encoding implementation; used for frozen stdlib mode.
VM Initialization & Alias Resolution
crates/vm/src/vm/mod.rs
Renamed import_utf8_encodings to import_ascii_utf8_encodings; now registers both ASCII and UTF-8 encodings with conditional module names (standard or frozen variants). Updated resolve_frozen_alias() to map encodings_asciiencodings.ascii and encodings_utf_8encodings.utf_8. Enhanced TextIOWrapper creation to use stdio_encoding and stdio_errors settings; stderr continues using backslashreplace.
Settings Configuration
crates/vm/src/vm/setting.rs
Added public fields stdio_encoding: Option<String> and stdio_errors: Option<String> to Settings struct; updated Default implementation.
Environment Variable Parsing
src/settings.rs
Added parsing of PYTHONIOENCODING environment variable in parse_opts(), supporting both encoding:errors and encoding formats; populates corresponding Settings fields when env vars are not ignored.

Sequence Diagram

sequenceDiagram
    participant User as User/Environment
    participant Parser as Settings Parser
    participant Settings as Settings Struct
    participant VM as VM Initialization
    participant Encodings as Encoding Registry
    participant Stdio as TextIOWrapper

    User->>Parser: PYTHONIOENCODING=utf-8:strict
    Parser->>Settings: Parse & populate stdio_encoding, stdio_errors
    Settings->>VM: Initialize with Settings
    VM->>Encodings: import_ascii_utf8_encodings()
    Encodings->>Encodings: Register ASCII codec (encodings_ascii)
    Encodings->>Encodings: Register UTF-8 codec (encodings_utf_8)
    VM->>Stdio: Create stdin/stdout TextIOWrapper
    Stdio->>Stdio: Apply stdio_encoding & stdio_errors from Settings
    VM->>Stdio: Create stderr TextIOWrapper
    Stdio->>Stdio: Use hardcoded backslashreplace error handling
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • enhance error codec and exit code #6602: Modifies VM stdio/TextIOWrapper initialization and stderr error handling in the same file (crates/vm/src/vm/mod.rs), suggesting overlapping functionality or coordination needed.

Suggested reviewers

  • ShaharNaveh

Poem

🐰 A bunny hops through code so fine,
With encodings now in perfect line!
UTF-8, ASCII, errors too—
PYTHONIOENCODING sees it through! 🌈
Stdio flows where settings guide,
With Python's quirks bunny-approved! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Fix stdio encoding' accurately reflects the main changes across multiple files that implement stdio encoding support, including adding PYTHONIOENCODING environment variable processing and propagating encoding/error settings throughout the VM initialization and stdio TextIOWrapper construction.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4df238 and 9d7b2b7.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_tarfile.py is excluded by !Lib/**
📒 Files selected for processing (5)
  • .cspell.dict/python-more.txt
  • crates/vm/Lib/core_modules/encodings_ascii.py
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/setting.rs
  • src/settings.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • src/settings.rs
  • crates/vm/src/vm/setting.rs
  • crates/vm/src/vm/mod.rs
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • crates/vm/Lib/core_modules/encodings_ascii.py
🪛 Flake8 (7.3.0)
crates/vm/Lib/core_modules/encodings_ascii.py

[error] FileNotFoundError: [Errno 2] No such file or directory: 'crates/vm/Lib/core_modules/encodings_ascii.py'

(E902)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
🔇 Additional comments (8)
.cspell.dict/python-more.txt (1)

181-181: LGTM!

The addition of PYTHONIOENCODING to the spell-check dictionary is correct and properly positioned alphabetically.

src/settings.rs (1)

301-316: LGTM!

The PYTHONIOENCODING parsing correctly handles both encoding:errors and encoding formats, with proper empty-string checks to avoid setting empty values. The logic correctly respects the ignore_environment setting.

crates/vm/src/vm/setting.rs (2)

115-118: LGTM!

The new stdio_encoding and stdio_errors fields are properly typed, documented, and positioned alongside related stdio settings.


202-203: LGTM!

The default initialization to None is correct, allowing the VM to use default encoding behavior when PYTHONIOENCODING is not set.

crates/vm/src/vm/mod.rs (3)

341-347: LGTM! Correct Python stdio behavior.

The stdio initialization correctly propagates stdio_encoding and stdio_errors from settings. The special handling for stderr (always using backslashreplace) matches Python's standard behavior, as stderr should never fail on encoding errors.


1029-1030: LGTM!

The alias mappings correctly resolve the core_modules names to their standard library equivalents, maintaining consistency with the conditional module naming in import_ascii_utf8_encodings.


264-291: Conditional module paths verified and working correctly.

The freeze-stdlib conditional logic is properly implemented. Both code paths are correctly configured:

  • With freeze-stdlib enabled: Uses dotted names (encodings.ascii, encodings.utf_8) from the full stdlib
  • Without freeze-stdlib: Uses underscored names (encodings_ascii, encodings_utf_8) via symlinks in core_modules that point back to the actual encoding implementations

The implementation properly registers both ASCII and UTF-8 codecs for all configurations.

crates/vm/Lib/core_modules/encodings_ascii.py (1)

1-1: This is a symbolic link and follows the intended pattern in the codebase.

The file crates/vm/Lib/core_modules/encodings_ascii.py is a symbolic link pointing to ../../../../Lib/encodings/ascii.py, which is a tracked file in the repository. This is a standard and intentional symlink pattern used in core_modules for module aliasing (see also codecs.py, copyreg.py, encodings_utf_8.py). The symlink target does not resolve in the current sparse checkout, but this is expected—the target files exist in the repository and will be accessible in a full checkout.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review January 1, 2026 01:51
@youknowone youknowone merged commit e10dc94 into RustPython:main Jan 1, 2026
23 of 24 checks passed
@youknowone youknowone deleted the stdio-encoding branch January 1, 2026 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant