Skip to content

Conversation

@lpi-tn
Copy link
Collaborator

@lpi-tn lpi-tn commented Jun 13, 2025

This pull request introduces a new Category model and integrates it into the Corpus model to allow categorization of corpus data. It also includes database migrations, test updates, and SQL scripts to populate the new category field.

Database changes:

Code updates:

Test updates:

  • Updated various test files (test_document_classifier.py, test_document_vectorizer.py, test_qdrant_syncronizer.py, test_retrieve_data_from_database.py, test_url_sanitary_crawler.py) to include the Category model and ensure category_id is properly set in test data. [1] [2] [3] [4] [5]

@lpi-tn lpi-tn requested review from Copilot and sandragjacinto June 13, 2025 13:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new Category model and integrates it into the Corpus model for categorization, includes migrations and SQL scripts to backfill data, and updates tests across multiple modules to insert and reference Category.

  • Introduced category table and category_id FK on corpus (Alembic migration + SQL populate script)
  • Added Category ORM model and category_id field on Corpus
  • Updated tests in several packages to create Category entries and assign category_id

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
welearn_datastack/data/db_models.py Defined Category model and added category_id to Corpus
alembic/versions/89920abb7ff8_add_category.py Migration to create category table and add FK on corpus
sql/89920abb7ff8_populate_corpus_category.sql SQL CTEs to insert categories and update existing corpora
tests/url_sanitary_crawler/test_url_sanitary_crawler.py Inserted Category in setup and set category_id
tests/test_retrieve_data_from_database.py Added Category creation and usage in retrieval tests
tests/qdrant_syncronizer/test_qdrant_syncronizer.py Updated setup to include Category
tests/document_vectorizer/test_document_vectorizer.py Added Category in test fixtures and assigned to Corpus
tests/document_collector_hub/test_nodes/test_extract_n_collect_docs.py Included Category in setup
tests/document_classifier/test_document_classifier.py Added Category creation and assigned category_id
Comments suppressed due to low confidence (7)

tests/url_sanitary_crawler/test_url_sanitary_crawler.py:42

  • There's a typo in the test string: "categroy_test0" should be "category_test0".
        self.category_name = "categroy_test0"

tests/document_collector_hub/test_nodes/test_extract_n_collect_docs.py:15

  • This import is unused and unrelated to your models. Please remove the sympy.integrals.meijerint_doc.category import.
from sympy.integrals.meijerint_doc import category

tests/document_collector_hub/test_nodes/test_extract_n_collect_docs.py:111

  • There's a typo in the test string: "categroy_test0" should be "category_test0".
        self.category_name = "categroy_test0"

tests/document_classifier/test_document_classifier.py:11

  • This import is unused and conflicts with your Category model. Please remove the sympy.integrals.meijerint_doc.category import.
from sympy.integrals.meijerint_doc import category

tests/document_classifier/test_document_classifier.py:45

  • There's a typo in the test string: "categroy_test0" should be "category_test0".
        self.category_name = "categroy_test0"

tests/document_classifier/test_document_classifier.py:46

  • uuid4() is not imported; this will raise a NameError. Either use uuid.uuid4() or add from uuid import uuid4.
        self.category_id = uuid4()

welearn_datastack/data/db_models.py:41

  • [nitpick] Consider adding a relationship property on Corpus (e.g., category: Mapped["Category"] = relationship()) to enable ORM navigation from Corpus to Category.
    category_id: Mapped[UUID] = mapped_column(

@lpi-tn lpi-tn requested a review from jmsevin June 13, 2025 14:02
@lpi-tn lpi-tn merged commit c9e768d into main Jun 13, 2025
7 checks passed
@lpi-tn lpi-tn deleted the Feature/add-category-for-sources branch June 13, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants