Skip to content

Conversation

@lpi-tn
Copy link
Collaborator

@lpi-tn lpi-tn commented Nov 20, 2025

This pull request makes the data models in hal.py, oapen.py, and ted.py more robust by allowing many fields to be optional. This improves compatibility with incomplete or missing data from external sources and prevents validation errors when fields are absent.

HAL model improvements:

  • Changed several fields in the Doc class to be Optional, including authFullName_s, language_s, docType_s, producedDate_tdate, and publicationDate_tdate, to handle missing data gracefully.
  • Made the nextCursorMark field in the HALModel class optional.

OAPEN model improvements:

  • Updated most fields in CheckSum, Bitstream, Metadatum, and OapenModel classes to be Optional, ensuring the model can handle absent or incomplete fields from the OAPEN data source.

TED model improvements:

  • Added Optional typing to many fields in Paragraph, Translation, TEDData, and TEDModel to support cases where data may be missing. Also imported Optional and List for proper type hinting. [1] [2]

@lpi-tn lpi-tn requested review from Copilot and jmsevin November 20, 2025 10:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request enhances the robustness of Pydantic data models across three source model files by making numerous fields optional. This change allows the models to gracefully handle incomplete or missing data from external APIs (HAL, OAPEN, and TED), preventing validation errors when fields are absent.

  • Made critical fields optional in HAL, OAPEN, and TED models to handle missing data
  • Added proper type imports (Optional, List) for Python typing
  • Set default values to None for optional fields to maintain backward compatibility

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File Description
welearn_datastack/data/source_models/ted.py Added Optional typing and List imports; made nested model fields optional throughout TED data structures
welearn_datastack/data/source_models/oapen.py Converted most fields in CheckSum, Bitstream, Metadatum, and OapenModel classes to optional with None defaults
welearn_datastack/data/source_models/hal.py Added Optional import; made author, language, document type, and date fields optional in Doc and HALModel classes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@lpi-tn lpi-tn requested a review from samonaisi November 20, 2025 13:15
@lpi-tn lpi-tn merged commit 7c8e7a3 into main Nov 20, 2025
7 checks passed
@lpi-tn lpi-tn deleted the Fix/pydantic branch November 20, 2025 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants