Skip to content

Comments

feat(tutor): accepts docx files#28

Merged
sandragjacinto merged 2 commits intomainfrom
tutor-more-files
Jun 13, 2025
Merged

feat(tutor): accepts docx files#28
sandragjacinto merged 2 commits intomainfrom
tutor-more-files

Conversation

@sandragjacinto
Copy link
Collaborator

Description

Why?

How?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My code is tested.
  • I have updated the documentation accordingly.

@sandragjacinto sandragjacinto requested a review from Copilot June 13, 2025 13:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

The PR introduces support for extracting text from DOCX files, refactors file content extraction into a unified get_file_content function with pluggable extractors, and adds corresponding tests and a new dependency.

  • Refactor get_file_content to use a content-type→extractor mapping and add _extract_docx_content
  • Update endpoint formatting to handle already-decoded content strings
  • Add tests for DOCX, PDF, TXT, unsupported, and empty file scenarios and include python-docx in dependencies

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

File Description
src/app/services/tutor/utils.py Refactored file extraction with extractor mapping, added DOCX support
src/app/api/api_v1/endpoints/tutor.py Updated formatting to use string content directly
src/app/tests/services/tutor/test_utils.py Added tests for DOCX, PDF, TXT, unsupported, and empty files
pyproject.toml Added python-docx dependency
Comments suppressed due to low confidence (2)

src/app/tests/services/tutor/test_utils.py:118

  • [nitpick] Opening a real file in tests can lead to environment dependencies. Use an in-memory stream (e.g., BytesIO(b"")) to simulate an empty file for more reliable tests.
file = open("test.empty.txt", "rb"),

src/app/services/tutor/utils.py:76

  • [nitpick] Internal helper functions like _extract_pdf_content could use a brief docstring to explain their purpose and expected behavior for maintainability.
async def _extract_pdf_content(file) -> str:

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@sandragjacinto sandragjacinto merged commit 62818d5 into main Jun 13, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant