You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TokenizeChar produce vector of keys.
TokenizeWords produce vector of strings.
I have to add MapValueToKey to TokenizeWords in order to apply ProduceNgrams to it.
The text was updated successfully, but these errors were encountered:
This is intentional and as designed. The set of possible characters (in the sense of possible values of char) is finite and well defined. This allows tokenization of characters to forego being a trained transform, if we like, and skip the necessary step of building dictionaries or hashing, or whatever, which remains a requirement of words. The set of possible words is not as finite and easy to enumerate, and therefore we need further processing to turn them into something useful for, say, n-gram processing to take ahold of. So, in the process of taking text to terms to keys, with character we are able to skip and step and have what becomes a far more efficient pipeline; good for us! But we can't skip that step with words.
TokenizeChar produce vector of keys.
TokenizeWords produce vector of strings.
I have to add MapValueToKey to TokenizeWords in order to apply ProduceNgrams to it.
The text was updated successfully, but these errors were encountered: