Skip to content

fix(dbt): specify utf-8 encoding when opening files#4503

Merged
mobuchowski merged 1 commit into
OpenLineage:mainfrom
hcthakur2004:fix/dbt-encoding-4502
Apr 29, 2026
Merged

fix(dbt): specify utf-8 encoding when opening files#4503
mobuchowski merged 1 commit into
OpenLineage:mainfrom
hcthakur2004:fix/dbt-encoding-4502

Conversation

@hcthakur2004
Copy link
Copy Markdown
Contributor

Fixes #4502

Copilot AI review requested due to automatic review settings April 28, 2026 21:25
@hcthakur2004 hcthakur2004 requested a review from a team as a code owner April 28, 2026 21:25
@boring-cyborg boring-cyborg Bot added area:documentation Improvements or additions to documentation area:integration/common openlineage-integration-common language:python Uses Python programming language labels Apr 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Windows-specific Unicode errors in the dbt integration by making file encoding explicit (UTF-8) when reading/writing dbt artifacts and structured log files.

Changes:

  • Specify encoding="utf-8" for dbt log file creation/opening in the structured logs processor.
  • Specify encoding="utf-8" when opening dbt manifest/metadata JSON and YAML files in the local artifact processor.
  • Add a new root-level markdown file (issue_encoding.md) containing the issue description.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
issue_encoding.md Adds a markdown note describing the encoding issue and proposed fix.
integration/common/src/openlineage/common/provider/dbt/structured_logs.py Opens/creates the dbt log file using UTF-8 to avoid platform-default encoding issues.
integration/common/src/openlineage/common/provider/dbt/local.py Reads JSON/YAML artifacts using UTF-8 to avoid platform-default encoding issues.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread issue_encoding.md Outdated
Comment on lines +1 to +11
### Description
In `integration/common/src/openlineage/common/provider/dbt/structured_logs.py` and `integration/common/src/openlineage/common/provider/dbt/local.py`, the `open()` function is used without specifying an `encoding` argument.

This relies on the platform's default encoding (which is `cp1252` on Windows) and can cause `UnicodeDecodeError` or `UnicodeEncodeError` when reading/writing logs or manifest files that contain non-ASCII characters.

### Proposed fix
Add `encoding="utf-8"` to all `open()` calls in the `dbt` provider module.

### Location of missing encodings
- `integration/common/src/openlineage/common/provider/dbt/local.py` (lines 201, 236)
- `integration/common/src/openlineage/common/provider/dbt/structured_logs.py` (lines 918, 921)
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file looks like a copied issue description and doesn’t appear to be used by the codebase or referenced from docs. If it was only added for PR context, consider removing it (or moving it into the appropriate documentation location) to avoid adding an untracked root-level artifact to releases/packages.

Copilot uses AI. Check for mistakes.
Signed-off-by: Harish Chandra Thakur <hcthakur2004@email.com>
@hcthakur2004 hcthakur2004 force-pushed the fix/dbt-encoding-4502 branch from 6c71fa5 to 6a0a449 Compare April 28, 2026 21:33
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.83%. Comparing base (67a9391) to head (6a0a449).

Files with missing lines Patch % Lines
...openlineage/common/provider/dbt/structured_logs.py 0.00% 2 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4503   +/-   ##
=======================================
  Coverage   72.83%   72.83%           
=======================================
  Files          21       21           
  Lines        2271     2271           
=======================================
  Hits         1654     1654           
  Misses        617      617           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mobuchowski mobuchowski merged commit acfa50e into OpenLineage:main Apr 29, 2026
23 checks passed
@mobuchowski
Copy link
Copy Markdown
Member

Thanks @hcthakur2004

creazyfrog added a commit to creazyfrog/OpenLineage that referenced this pull request May 1, 2026
…ns facet move

- Update _get_schema() in generated Python file to reference schema version 1-2-0
- Move dataQualityAssertions from inputFacets to facets in dbt test snapshots
- Update _schemaURL in snapshots from 1-1-0 to 1-2-0
- Rebase on upstream/main to incorporate upstream dbt fixes (OpenLineage#4499-OpenLineage#4503)

Signed-off-by: Rohit Sharma <rohitcse.gec@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:documentation Improvements or additions to documentation area:integration/common openlineage-integration-common language:python Uses Python programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Specify encoding when opening files in dbt integration

4 participants