Add Bag-of-Words converter by Irozuku · Pull Request #358 · DashAISoftware/DashAI

Irozuku · 2025-10-27T20:45:47Z

This pull request adds a new Bag-of-Words text preprocessing converter to the backend, enabling text data to be transformed into token frequency columns using scikit-learn's CountVectorizer.

New Bag-of-Words Converter Implementation:

Added BagOfWordsConverter class in scikit_learn/bag_of_words.py, which uses scikit-learn's CountVectorizer to convert text into a Bag-of-Words representation. It supports customizable hyperparameters for max features, lowercase conversion, stop word removal, and n-gram bounds.

(1,2) n-grams

…in BagOfWordsConverter

Irozuku added 4 commits October 27, 2025 17:41

feat: add Bag-of-Words converter and integrate into initial components

684da9b

refactor: simplify parameter naming and improve docstring formatting …

bd749eb

…in BagOfWordsConverter

refactor: remove unnecessary module docstring from BagOfWordsConverter

92d6914

Merge branch 'develop' into feat/bow-converter

059f6cd

cristian-tamblay approved these changes Oct 29, 2025

View reviewed changes

cristian-tamblay merged commit 8539141 into develop Oct 29, 2025
18 checks passed

cristian-tamblay deleted the feat/bow-converter branch October 29, 2025 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Bag-of-Words converter#358

Add Bag-of-Words converter#358
cristian-tamblay merged 4 commits into
developfrom
feat/bow-converter

Irozuku commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Irozuku commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants