Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflicting Entries in separate (country) files #1276

Closed
Luinquid opened this issue Apr 24, 2023 · 2 comments
Closed

Conflicting Entries in separate (country) files #1276

Luinquid opened this issue Apr 24, 2023 · 2 comments
Assignees

Comments

@Luinquid
Copy link

I was thinking of using some lists for a project, when I noticed some domains being mentioned in multiple files.

Sadly, there are not only dupplicate but also conflicting classifications:

http://livetv.sx/: 2 conflicts
lists/fr.csv:http://livetv.sx/,MMED,Media sharing,2022-09-26,Community Member,blocked streaming website
lists/it.csv:http://livetv.sx/,FILE,File-sharing,2017-04-12,,Site reported to be blocked by AGCOM - Italian Autority on Communication

http://rutracker.org/: 2 conflicts
lists/md.csv:http://rutracker.org/,HOST,Hosting and Blogging Platforms,2014-04-15,citizenlab,
lists/ru.csv:http://rutracker.org/,FILE,File-sharing,2014-04-15,citizenlab,largest torrent sharing service
lists/ua.csv:http://rutracker.org/,HOST,Hosting and Blogging Platforms,2014-04-15,citizenlab,

http://www.hkfront.org/: 2 conflicts
lists/cn.csv:http://www.hkfront.org/,HUMR,Human Rights Issues,2019-09-27,Netalitica,HK independence
lists/hk.csv:http://www.hkfront.org/,MILX,Terrorism and Militants,2020-07-19,Netalitica,separatism

https://cabar.asia/: 2 conflicts
lists/kg.csv:https://cabar.asia/,HUMR,Human Rights Issues,2022-09-16,School of peacemaking,
lists/tj.csv:https://cabar.asia/,NEWS,News Media,2023-01-30,CIPI,

https://chaturbate.com/: 2 conflicts
lists/lb.csv:https://chaturbate.com/,PORN,Pornography,2020-07-24,Netalitica,
lists/pl.csv:https://chaturbate.com/,COMT,Communication Tools,2021-12-21,Netaltica,

https://curia.europa.eu/: 2 conflicts
lists/hu.csv:https://curia.europa.eu/,IGO,Intergovernmental Organizations,2021-12-29,Netalitica,Court of Justice of the European Union
lists/pl.csv:https://curia.europa.eu/,GOVT,Government,2022-06-10,Netaltica,Court of Justice of the European Union

https://ec.europa.eu/: 2 conflicts
lists/hu.csv:https://ec.europa.eu/,IGO,Intergovernmental Organizations,2021-12-29,Netalitica,The European Commission
lists/pl.csv:https://ec.europa.eu/,GOVT,Government,2022-06-10,Netaltica,The European Commission

https://mediazona.by/: 2 conflicts
lists/by.csv:https://mediazona.by/,POLR,Political Criticism,2021-03-05,community member,
lists/ru.csv:https://mediazona.by/,NEWS,News Media,2023-03-10,test-lists.ooni.org contribution,

https://politobzor.net/: 2 conflicts
lists/kz.csv:https://politobzor.net/,NEWS,News Media,2019-11-15,Netalitica,Russian journal with focus on politics
lists/pl.csv:https://politobzor.net/,POLR,Political Criticism,2022-06-10,Netaltica,medium with focus on politics
lists/ru.csv:https://politobzor.net/,NEWS,News Media,2019-11-15,Netalitica,focus on politics

https://steamcommunity.com/: 2 conflicts
lists/cn.csv:https://steamcommunity.com/,GAME,Gaming,2022-08-19,test-lists.ooni.org contribution,
lists/kz.csv:https://steamcommunity.com/,MMED,Media sharing,2019-11-11,Netalitica,reportedly blocked in KZ

https://www.coe.int/: 2 conflicts
lists/hu.csv:https://www.coe.int/,IGO,Intergovernmental Organizations,2021-12-29,Netalitica,Council of Europe
lists/pl.csv:https://www.coe.int/,GOVT,Government,2022-06-10,Netaltica,Council of Europe

https://www.europarl.europa.eu/: 2 conflicts
lists/hu.csv:https://www.europarl.europa.eu/,IGO,Intergovernmental Organizations,2021-12-29,Netalitica,European Parliament
lists/pl.csv:https://www.europarl.europa.eu/,GOVT,Government,2022-06-10,Netaltica,European Parliament

Possible improvements are:

@sloncocs sloncocs self-assigned this Apr 25, 2023
@sloncocs
Copy link
Collaborator

Hi @Luinquid! Thank you for your interest in the test lists!

Indeed sometimes the same domains are included in multiple country-specific test lists, but in most cases, it is done on purpose for the domains which are relevant for testing in multiple countries but not worldwide. Usually, it is done to avoid cluttering the global test list, which is being tested by all OONI Probe users in all countries.

For example, in #1274 you can see that we added https://sputniknews.lat/ to 19 different country-specific test lists because this is a media in Spanish that targets countries in Latin America. While we could add it to the global test list, so it would be tested in all countries including these 19, it would take some testing capacity from other countries for which this domain is not relevant (e.g., non-Spanish-speaking countries) and could be critical for some countries where we do not have many measurements.

Regarding the categorization -- thank you for highlighting this! I will create a separate issue and add it here.

@sloncocs
Copy link
Collaborator

Follow-up: inconsistent categorization should be addressed as part of this issue: ooni/backend#611

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants