Skip to content

Conversation

@avirajsingh7
Copy link
Collaborator

@avirajsingh7 avirajsingh7 commented Sep 25, 2025

Context

  • Transformed documents containing Indian languages (e.g., Hindi) were displaying as garbled text (सà¥�…) after being uploaded to object storage.
  • Root cause: uploaded files were missing explicit UTF-8 charset in their Content-Type metadata.
  • Many clients defaulted to ISO-8859-1 (Latin-1), which cannot represent Devanagari or other Indian scripts, causing mojibake.

Changes Introduced

Updated content_type_map to include charset=utf-8 for markdown formats

Summary by CodeRabbit

  • Bug Fixes
    • Markdown downloads/exports now set Content-Type to “text/markdown; charset=utf-8” to ensure correct encoding and rendering.
    • Non-markdown outputs now default to “text/plain,” avoiding misleading content-type headers.
    • Standardized response headers improve compatibility with editors, viewers, and APIs, reducing garbled characters and formatting issues when opening or sharing generated files.

@coderabbitai
Copy link

coderabbitai bot commented Sep 25, 2025

Walkthrough

The content_type_map in execute_job was reduced to a single explicit mapping: "markdown" → "text/markdown; charset=utf-8". Previous explicit mappings for "text" and "html" were removed, causing non-markdown formats to default to "text/plain".

Changes

Cohort / File(s) Summary
Content-Type mapping update
backend/app/core/doctransform/service.py
Simplified content_type_map: only "markdown" explicitly maps to "text/markdown; charset=utf-8"; removed explicit mappings for "text" and "html", making non-markdown formats default to "text/plain".

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nibbled the map of types so fine,
Left markdown with charset—tastes divine.
The others now take the plain-text trail,
Light as thistle seeds on a breeze-swept gale.
Hop hop! Headers aligned, neat and bright—
A tidy burrow for bytes tonight.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly summarizes the primary change of enforcing UTF-8 content type to address encoding issues in transformed document uploads, aligning directly with the modifications made to the content_type_map in execute_job without including unnecessary details.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hotfix/content_type

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 410c57d and 85f8d17.

📒 Files selected for processing (1)
  • backend/app/core/doctransform/service.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/core/doctransform/service.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4724a2b and 410c57d.

📒 Files selected for processing (1)
  • backend/app/core/doctransform/service.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: AI Platform CI
backend/app/core/doctransform/service.py

[error] 1-1: Black formatting hook failed: file reformatted by Black. 1 file reformatted; please commit changes. Run 'pre-commit run --all-files' again or 'black backend/app/core/doctransform/service.py' to format.

@codecov
Copy link

codecov bot commented Sep 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@avirajsingh7 avirajsingh7 added ready-for-review bug Something isn't working labels Sep 25, 2025
@avirajsingh7 avirajsingh7 merged commit 58939a8 into main Sep 25, 2025
3 checks passed
@avirajsingh7 avirajsingh7 deleted the hotfix/content_type branch September 25, 2025 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready-for-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants