fix: update document collection logic to use batch documents #73

lpi-tn · 2025-11-19T10:56:12Z

This pull request updates the document collection logic in the DocumentHubCollector workflow to ensure that each corpus plugin processes only its relevant documents. The main change improves data handling accuracy by passing the correct set of documents to each collector.

Data extraction logic improvement:

Updated the call to corpus_collector.run in document_collector.py to use batch_docs[corpus_name] instead of welearn_documents, ensuring each corpus processes only its own documents.

Copilot

Pull Request Overview

This PR fixes a data processing bug where all corpus plugins were incorrectly receiving the complete set of documents instead of their specific subset. The change ensures each corpus collector processes only its relevant documents, improving data handling accuracy.

Key Changes:

Updated the document filtering logic to pass corpus-specific documents to each collector

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fix: update document collection logic to use batch documents

4e22190

lpi-tn requested review from Copilot, jmsevin and sandragjacinto November 19, 2025 10:56

Copilot AI reviewed Nov 19, 2025

View reviewed changes

sandragjacinto approved these changes Nov 19, 2025

View reviewed changes

lpi-tn merged commit 02a1150 into main Nov 19, 2025
7 checks passed

lpi-tn deleted the Fix/mix-corpus-collectors branch November 19, 2025 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: update document collection logic to use batch documents #73

fix: update document collection logic to use batch documents #73

Uh oh!

lpi-tn commented Nov 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: update document collection logic to use batch documents #73

fix: update document collection logic to use batch documents #73

Uh oh!

Conversation

lpi-tn commented Nov 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants