Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New transformer #238

Closed
wants to merge 5 commits into from
Closed

New transformer #238

wants to merge 5 commits into from

Conversation

Boorinio
Copy link
Contributor

@Boorinio Boorinio commented Jul 9, 2022

Hello,
This is my first pr and I want to start off by saying I am really happy I discovered this project!

This pr includes a small fix for the text normalize class which according to the docs should be categorical. Also it includes a new transformer that shuffles a given string's words. I used the transformer in a personal project and it is useful for cases where the order of the words is not important (Product titles that need to be classified into categories).

Have a nice day, I hope this helps!

@github-actions
Copy link
Contributor

github-actions bot commented Jul 9, 2022

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@Boorinio
Copy link
Contributor Author

Boorinio commented Jul 9, 2022

I have read the CLA Document and I hereby sign the CLA

Copy link
Member

@andrewdalpino andrewdalpino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey nice PR @Boorinio! I'm struggling to think of the use cases for this Transformer though. Can you help me?

src/Transformers/TextNormalizer.php Outdated Show resolved Hide resolved
WordOrderRandomizer is now compatible with all datasets.
@Boorinio
Copy link
Contributor Author

Hey nice PR @Boorinio! I'm struggling to think of the use cases for this Transformer though. Can you help me?

Hey, thanks for the reply!
It's not a widely used technique as in most nlp problems the order of the words is actually really important. But as I mentioned in my initial comment there are problems where the order of the words should be disregarded as for example in product titles that we want to classify to specific categories (Red cotton blanket -> is matched to blankets but also Blanket red cotton should be matched to the same category). I opened this pr because I used this filter for a personal project, but I do understand that it might be too specific. If that's the case you can close this pr :)

@DrDub
Copy link
Contributor

DrDub commented Sep 5, 2022

This transformer will make sense when we have RNNs. For the bag-of-words classifiers available, it will not make much sense.

Care to @Boorinio what was the classifier you were using this transformer with?

@andrewdalpino
Copy link
Member

I'm struggling to think of use cases for this. Maybe they will become more apparent in the future but as for now I think we should put this in the Extras package. @Boorinio would you mind submitting a PR to Extras repo?

https://github.com/RubixML/Extras

@Boorinio
Copy link
Contributor Author

Boorinio commented Nov 6, 2022

Hey thanks for the responses and sorry for the big delay, closing this one!

@Boorinio Boorinio closed this Nov 6, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Nov 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants