-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request : Apply Lemma / stemming in FeaturizeText options #5281
Comments
Hi, @ErwanL08 . Unfortunately, there's no option for doing lemmatization or stemming in ML.NET, so I will mark this issue as a feature request so that we can take it into account when planning future features. In the meantime, there are a couple of options you can explore:
|
This feature is very important, I'm impatient to see it inside the awesome ML.NET. Also NLP is a very essential today, I hope a serious attention will be granted to it. |
I agree, there should be a direct lemmatizer/stemmer. The default in the FeaturizeText transform uses unigrams (one word) + bigrams (two words) + tricharactergrams (three letter ngram). The default tricharactergrams gives a good part of the gains of a full stemmer. For example, it will extract the same tricharactergram The word embedding transform can also help. The fastTextWikipedia300D model in particular has a large vocabulary, and already has a word vector for runner/running/runs and they will be in similar position in the embedding space. All this said, the world is moving towards transformer networks like BERT. |
I totally agree with @AniaBerthelot , if ML.Net can have a .Net version of a stemmer / lemmatizer (up to date) the framework will be so awesome 👍 |
I would also like to see lemmatization support built into ML.NET. |
@luisquintanilla for prioritization. |
Hi, @luisquintanilla. Is there any chance to get this feature in near future ? Actually for text data preprocessing I rely on spacy (python) and for my current C# project I really need to stick with ML.NET to avoid dependencies with libraries like python.net. |
Hi @AlbelTec Thanks for your question. Our current NLP solutions are focused on deep learning, Text Classification and Sentence Similarity being a few examples. As a result, there are no immediate plans to work on lemmatization / stemming at this time. That being said, would you be willing to share your use case and scenario? As we get more feedback on the topic we can think about where this fits in our future roadmap. In the meantime, I would take a look at Antonio's comment above as a potential workaround. |
Hi @luisquintanilla,
|
I have been experimenting on this as well. Seems like this could be a good addition to ML.NET, since there's a lot of upvotes for this feature and it would be convenient for .NET developers to use a built-in feature for Lemmatization. |
Hi
![image](https://user-images.githubusercontent.com/16559628/86459192-bddcea00-bd26-11ea-8274-d3ad23a55eeb.png)
First Thank you for all the work done, i know that FeaturizeText apply NLP preprocessing like skipword with a specifique language :
But is there a way to apply lemma / stemming in this function ?
The text was updated successfully, but these errors were encountered: