Skip to content
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.

Verify that Chinese stopwords get removed, add some more stopwords if needed #172

Closed
pypt opened this issue Jul 10, 2017 · 3 comments
Closed

Comments

@pypt
Copy link
Contributor

pypt commented Jul 10, 2017

Hey Natalie,

Chinese support (#169) works fine, but it seems like some stopwords are still present in the word cloud:

https://dashboard.mediacloud.org/#query/["北京市"]/[{}]/["2017-6-25"]/["2017-7-9"]/[{"uid":1,"name":"北京市","color":"e14c11"}]

Many terms from the word cloud look like stopwords to me:

screenshot 2017-07-10 17 31 30

Does this Google Translate screenshot make sense? Maybe it would be worth it to introduce some more stopwords by finding more stopword lists and merging them into our current one?

@pypt
Copy link
Contributor Author

pypt commented Jul 11, 2017

Pushed #173.

@pypt
Copy link
Contributor Author

pypt commented Jul 11, 2017

@natlungfy, are you sure you've added everything you wanted to the updated list in #173?

I still get "year", "month", "day" in the dashboard word cloud for https://dashboard.mediacloud.org/#query/["北京市"]/[{}]/["2017-6-25"]/["2017-7-9"]/[{"uid":3,"name":"北京市","color":"e14c11"}]:

screenshot 2017-07-11 14 16 56

(By the way, if you add a magic phrase "fixes #issue_number" to your PR's description, this issue will get auto-referenced from the PR and closed once the PR gets merged.)

@pypt
Copy link
Contributor Author

pypt commented Jul 11, 2017

Deployed, seems to work now!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants