Feature/add category for sources #42
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new
Categorymodel and integrates it into theCorpusmodel to allow categorization of corpus data. It also includes database migrations, test updates, and SQL scripts to populate the new category field.Database changes:
alembic/versions/89920abb7ff8_add_category.py: Added a newcategorytable and acategory_idcolumn to thecorpustable, with a foreign key linkingcorpus.category_idtocategory.id.sql/89920abb7ff8_populate_corpus_category.sql: Added SQL scripts to populate thecategorytable and updatecorpus.category_idbased onsource_name.Code updates:
welearn_datastack/data/db_models.py: Introduced theCategorymodel and added acategory_idfield to theCorpusmodel with a foreign key constraint.Test updates:
test_document_classifier.py,test_document_vectorizer.py,test_qdrant_syncronizer.py,test_retrieve_data_from_database.py,test_url_sanitary_crawler.py) to include theCategorymodel and ensurecategory_idis properly set in test data. [1] [2] [3] [4] [5]