Skip to content

Conversation

avirajsingh7
Copy link
Collaborator

@avirajsingh7 avirajsingh7 commented Sep 25, 2025

Context

  • Transformed documents containing Indian languages (e.g., Hindi) were displaying as garbled text (सà¥�…) after being uploaded to object storage.
  • Root cause: uploaded files were missing explicit UTF-8 charset in their Content-Type metadata.
  • Many clients defaulted to ISO-8859-1 (Latin-1), which cannot represent Devanagari or other Indian scripts, causing mojibake.

Changes Introduced

Updated content_type_map to include charset=utf-8 for markdown formats

Summary by CodeRabbit

  • Bug Fixes
    • Markdown downloads/exports now set Content-Type to “text/markdown; charset=utf-8” to ensure correct encoding and rendering.
    • Non-markdown outputs now default to “text/plain,” avoiding misleading content-type headers.
    • Standardized response headers improve compatibility with editors, viewers, and APIs, reducing garbled characters and formatting issues when opening or sharing generated files.

Copy link

coderabbitai bot commented Sep 25, 2025

Walkthrough

The content_type_map in execute_job was reduced to a single explicit mapping: "markdown" → "text/markdown; charset=utf-8". Previous explicit mappings for "text" and "html" were removed, causing non-markdown formats to default to "text/plain".

Changes

Cohort / File(s) Summary
Content-Type mapping update
backend/app/core/doctransform/service.py
Simplified content_type_map: only "markdown" explicitly maps to "text/markdown; charset=utf-8"; removed explicit mappings for "text" and "html", making non-markdown formats default to "text/plain".

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nibbled the map of types so fine,
Left markdown with charset—tastes divine.
The others now take the plain-text trail,
Light as thistle seeds on a breeze-swept gale.
Hop hop! Headers aligned, neat and bright—
A tidy burrow for bytes tonight.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly summarizes the primary change of enforcing UTF-8 content type to address encoding issues in transformed document uploads, aligning directly with the modifications made to the content_type_map in execute_job without including unnecessary details.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hotfix/content_type

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 410c57d and 85f8d17.

📒 Files selected for processing (1)
  • backend/app/core/doctransform/service.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/core/doctransform/service.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4724a2b and 410c57d.

📒 Files selected for processing (1)
  • backend/app/core/doctransform/service.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: AI Platform CI
backend/app/core/doctransform/service.py

[error] 1-1: Black formatting hook failed: file reformatted by Black. 1 file reformatted; please commit changes. Run 'pre-commit run --all-files' again or 'black backend/app/core/doctransform/service.py' to format.

Copy link

codecov bot commented Sep 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@avirajsingh7 avirajsingh7 added ready-for-review bug Something isn't working labels Sep 25, 2025
@avirajsingh7 avirajsingh7 merged commit 58939a8 into main Sep 25, 2025
3 checks passed
@avirajsingh7 avirajsingh7 deleted the hotfix/content_type branch September 25, 2025 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready-for-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants