Dictionary input to count_words(....) #954

TobyRoseman · 2018-08-09T01:53:58Z

The doctoring for turicreate.text_analytics.count_words(...) contains the following example:

# Run count_words with dictionary input
>>> sa = turicreate.SArray([{'alice bob': 1, 'Bob alice': 0.5},
                                                {'a dog': 0, 'a dog cat': 5}])
>>> turicreate.text_analytics.count_words(sa)
dtype: dict
Rows: 2
[{'bob': 1.5, 'alice': 1.5}, {'a': 5, 'dog': 5, 'cat': 5}]

However the actual output is:

dtype: dict
Rows: 2
[{'bob': 2, 'alice': 2}, {'cat': 1, 'dog': 2, 'a': 2}]

It's determining count by looking at only occurrence in the keys. However the doctoring (both the example and the description) claim it should sum the values for keys that contain the word.

This is a bug as things are not working as described, but a bigger question is why do we want to support this use case? Why do we want be able to tokenize strings/keys and add up their values for each token?

The text was updated successfully, but these errors were encountered:

nickjong · 2018-10-19T22:50:40Z

Yeah, I'm not sure I understand the use case, but the documented behavior makes the most sense given the shape of the API. If you treat each key as an atomic unit, then there's no really need for the API (and there's no notion of "word" involved). If you just count the number of keys containing each word, then the dictionary values are irrelevant and these should just be lists.

dhivyaaxim · 2019-10-29T11:59:36Z

@TobyRoseman Please assign this ticket to me as I would like to try fixing it and contribute. Thanks in advance!

TobyRoseman added bug question text analytics labels Aug 9, 2018

znation added the toolkits label Oct 17, 2018

nickjong added the p3 label Oct 19, 2018

TobyRoseman assigned dhivyaaxim Oct 29, 2019

dhivyaaxim mentioned this issue Nov 4, 2019

Dictionary input to count_words #2558

Merged

dhivyaaxim mentioned this issue Nov 13, 2019

Wrong behavior for nan converting from float to int #2557

Merged

guihao-liang closed this as completed in #2558 Nov 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dictionary input to count_words(....) #954

Dictionary input to count_words(....) #954

TobyRoseman commented Aug 9, 2018

nickjong commented Oct 19, 2018

dhivyaaxim commented Oct 29, 2019

Dictionary input to count_words(....) #954

Dictionary input to count_words(....) #954

Comments

TobyRoseman commented Aug 9, 2018

nickjong commented Oct 19, 2018

dhivyaaxim commented Oct 29, 2019