-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for generating the word cloud from array-like of labels #271
Comments
I'm not sure what you mean by "under the hood" here. You need to provide the counts to the wordcloud in some way. |
Yes, I'm proposing for
Basically, I'm doing multi-label classification, and averaging predictions over the test set for visualization purposes. So I run inference on a list of samples, collect the results in a list of lists (each sublist stores the predicted labels for a single sample in an order of decreasing confidence indices), flatten the outer list, count the number of occurencies of each label, build WordCloud. This could also be applied to a multi-class classification problem for recommender systems, where the one of usecases is to explore the top3/top5/topN predictions over the test set. |
Ah, so a list with repetitions. You can either do " ".join(array) and pass
it to generate_from_text or call pandas value_count on it and pass it to
generate from frequencies.
Sent from phone. Please excuse spelling and brevity.
…On Jun 8, 2017 09:44, "Egor Panfilov" ***@***.***> wrote:
@amueller <https://github.com/amueller>
I'm not sure what you mean by "under the hood" here. You need to provide
the counts to the wordcloud in some way.
Yes, I'm proposing for wordcloud to take care of this in the case of
array-like input (which is an often Notice, that
WordCloud().generate_from_text, substantially, is implemented in a
similar way.
Can you maybe give an example for your usecase?
Basically, I'm doing multi-label classification, and averaging predictions
over the test set for visualization purposes. So I run inference on a list
of samples, collect the results in a list of lists (each sublist stores the
predicted labels for a single sample in an order of decreasing confidence
indices), flatten the outer list, count the number of occurencies of each
label, build WordCloud.
This could also be applied to a multi-class classification problem for
recommender systems, where the one of usecases is to explore the
top3/top5/topN predictions over the test set.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#271 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAbcFiFul_Em_S-Jm-0TkwnWwA7Qb_NVks5sB6ZfgaJpZM4NyuPi>
.
|
I recommend the second as that will bypass the tokenization, as you already
know what the tokens are supposed to be.
Sent from phone. Please excuse spelling and brevity.
…On Jun 8, 2017 09:52, "Andreas Mueller" ***@***.***> wrote:
Ah, so a list with repetitions. You can either do " ".join(array) and pass
it to generate_from_text or call pandas value_count on it and pass it to
generate from frequencies.
Sent from phone. Please excuse spelling and brevity.
On Jun 8, 2017 09:44, "Egor Panfilov" ***@***.***> wrote:
> @amueller <https://github.com/amueller>
>
> I'm not sure what you mean by "under the hood" here. You need to provide
> the counts to the wordcloud in some way.
>
> Yes, I'm proposing for wordcloud to take care of this in the case of
> array-like input (which is an often Notice, that
> WordCloud().generate_from_text, substantially, is implemented in a
> similar way.
>
> Can you maybe give an example for your usecase?
>
> Basically, I'm doing multi-label classification, and averaging
> predictions over the test set for visualization purposes. So I run
> inference on a list of samples, collect the results in a list of lists
> (each sublist stores the predicted labels for a single sample in an order
> of decreasing confidence indices), flatten the outer list, count the number
> of occurencies of each label, build WordCloud.
>
> This could also be applied to a multi-class classification problem for
> recommender systems, where the one of usecases is to explore the
> top3/top5/topN predictions over the test set.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#271 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAbcFiFul_Em_S-Jm-0TkwnWwA7Qb_NVks5sB6ZfgaJpZM4NyuPi>
> .
>
|
Of course, but there is an overhead in both cases (and, frankly speaking, in my pipeline as well): creating/spliting a potentially large string in the first, Going back to the original question :) : would you like to see such kind of input supported by |
I'm weary of adding too many interfaces. You an also implement value_counts in your own code in three lines, which is exactly the code you'd add to wordcloud: d = defaultdict(int)
for word in array:
d[word] += 1 What's the problem with adding those to your code? |
No problems at all, it is already implemented in such way :). I was just wondering if that is a common enough case. Thank you very much for the feedback! Closing as wontfix. |
I.e. something like
generate_from_array(array)
, wherearray
is supposed to be an array-like with labels:('a', 'b', 'c', 'a')
/['a', 'b', 'c', 'a']
/np.array([1, 2, 3, 2])
.The counting is meant to run under the hood (using
collections.Counter
, for example).Please, let me know if you would be interested to have this feature. If so, I'll work on the implementation.
P.S. Thank you for the great tool :)
The text was updated successfully, but these errors were encountered: