incorrect result while running on large dataset #134

un-lock-me · 2022-02-01T22:27:34Z

Hello,

I am trying your tools and I experienced a weird bug. I really appreciate it if you can share your thought regarding this issue with me. I have a dataset of let's say 1000 instances(Some are positive, some negative, and the rest neutral). When I run the tools on the csv file only a portion of each category will be labeled correctly!
For example, "Great place" will be labeled positive but "GREAT!" will be labeled Neutral. And if I remove the "Great place" instance from the dataset then "Great" will be labeled positive!!!!

So, I have tried different scenarios to find the bug and the only conclusion I could make is that it does not work when the number of samples increases. But I don't get why??

I tried another scenario as well. I kept the code run on top of the CSV file and have the result saved on the CSV file. Then, I pass just "GREAT!" to the model right after finishing labeling of CSV file. It labeled it as neutral again!! (If I pass "GREAT!" before running the model on the csv file then it label it as "Positive") which kinda confirmed what I said earlier.

Could you please share with me what could be the reason? The code seems very straightforward I don't know why this is happening?

Thanks in advance @cjhutto

cjhutto · 2022-02-02T14:19:34Z

Hi @un-lock-me ... this does seem strange, indeed. 1000 instances should be extremely easy for VADER (I and others routinely use it for files with thousands and millions of records). Would you mind sharing a sample of the structure of the CSV file and your pipeline/code to show how you are parsing and processing the CSV file?

Siddharth-Latthe-07 · 2024-06-27T05:01:41Z

@un-lock-me , The vader module works on the basis of finding the lexical meaning of the phrases and then providing the scores between -1 and +1. There might be different sentiment outputs for words sending individually(Great, place) or sending it in a phrase(Great Place), to the model. Apart from this, the difference in the sentiment output for word great, is sought of related to how the model processes the word with symbols, like words like Great! and Great might have different sentiment scores, though the word is same, but their lexical meaning might differ.
Hope this helps
Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect result while running on large dataset #134

incorrect result while running on large dataset #134

un-lock-me commented Feb 1, 2022 •

edited

Loading

cjhutto commented Feb 2, 2022 •

edited

Loading

Siddharth-Latthe-07 commented Jun 27, 2024

incorrect result while running on large dataset #134

incorrect result while running on large dataset #134

Comments

un-lock-me commented Feb 1, 2022 • edited Loading

cjhutto commented Feb 2, 2022 • edited Loading

Siddharth-Latthe-07 commented Jun 27, 2024

un-lock-me commented Feb 1, 2022 •

edited

Loading

cjhutto commented Feb 2, 2022 •

edited

Loading