Modernize code with sklearn pipelines and NumPy vectorization #21

roshanData · 2025-05-20T17:08:23Z

This PR modernizes the codebase with:

Scikit-learn pipelines for data processing
NumPy vectorization for token extraction
Improved code organization with OOP principles
Comprehensive docstrings and type hints
Backward compatibility for existing users

The changes improve maintainability and performance while ensuring compatibility with modern ML practices.

Overview

Brief description of what this PR does, and why it is needed.

Demo

Optional. Screenshots, curl examples, etc.

Notes

Optional. Ancillary topics, caveats, alternative strategies that didn't work out, anything else.

Testing Instructions

How to test this PR
Prefer bulleted description
Start after checking out this branch
Include any setup required, such as bundling scripts, restarting services, etc.
Include test case, and expected output

This PR modernizes the codebase with: 1. Scikit-learn pipelines for data processing 2. NumPy vectorization for token extraction 3. Improved code organization with OOP principles 4. Comprehensive docstrings and type hints 5. Backward compatibility for existing users The changes improve maintainability and performance while ensuring compatibility with modern ML practices.

roshanData closed this May 21, 2025

roshanData reopened this May 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modernize code with sklearn pipelines and NumPy vectorization #21

Modernize code with sklearn pipelines and NumPy vectorization #21

Uh oh!

roshanData commented May 20, 2025

Uh oh!

Uh oh!

Modernize code with sklearn pipelines and NumPy vectorization #21

Are you sure you want to change the base?

Modernize code with sklearn pipelines and NumPy vectorization #21

Uh oh!

Conversation

roshanData commented May 20, 2025

Overview

Demo

Notes

Testing Instructions

Uh oh!

Uh oh!