Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Category "fake science" / malicious journals #720

Open
pascalwhoop opened this issue Jul 19, 2018 · 7 comments
Open

Proposal: Category "fake science" / malicious journals #720

pascalwhoop opened this issue Jul 19, 2018 · 7 comments
Assignees

Comments

@pascalwhoop
Copy link

pascalwhoop commented Jul 19, 2018

this project mirrors a list that was taken down recently. The idea is to help researchers beware of fake journals. This could be very useful for academic institutions that make sure their researchers aren't tricked into publishing in such fake journals.
Obviously this requires good crowdsourcing to ensure the domains are actually fake and not legit but small journals.
Kicking off a discussion to see what others think.

I think this repo is a great place to embed this into. You have reputation, experience and the toolchain to manage such a list efficiently and publicly. There was a recent study done by my University that found that 5% of all German researchers have been tricked once or more and that several thousand researchers worldwide were fooled by these.

I'd be happy to turn that linked list into an initial hosts file but I'd like to make sure somehow that these are actually all fake and I am not yet sure how that could be easily achieved.

@welcome
Copy link

welcome bot commented Jul 19, 2018

Hello! Thank you for opening your first issue in this repo. It’s people like you who make these host files better!

@StevenBlack
Copy link
Owner

Hi @pascalwhoop that's a very interesting idea. Thanks!

@katrinleinweber
Copy link

Nice :-) Should the list be maintained here then, or over at @stop-predatory-journals?

I wonder whether it would be possible to generate the journals & publishers lists in such a machine-readable hosts file (or two), and auto-generate the website from them?

@pascalwhoop
Copy link
Author

@katrinleinweber you can definitely generate the lists (hosts -> csv -> website) automatically, as long as the base list follows some strict pattern. The rest can be done with grep and sed. I wrote one script that did most of the work from the csv files to hosts files but the csv files are a bit messy and so I didn't continue.
More importantly, how do we ensure that these lists are "true"? I imagine there is a gradient between predatory journals and just a really unpopular / unimportant ones. What would be a good "in or out" determination criteria. Alternatively, we could have 3 categories with increasing levels of "probably evil". Then, universities could manage these and handle them differently. They could have a yellow warning, red warning and finally complete block of the host from within their network.

I will contact my universities network administrators and see what they have set up in terms of infrastructure. Hosts are a good start for plain DNS blocking but there may be some other ways that are a bit more complex but gentle, like the "this is malware, continue anyways?" page that chrome sometimes displays. One could have an internally hosted application that says "this is a bad journal known to trick people, continue anyways?" and if selected, the researcher is forwarded to the actual website.

@katrinleinweber
Copy link

More importantly, how do we ensure that these lists are "true"?

In whatever way @stop-predatory-journals is currently using. See stop-predatory-journals/stop-predatory-journals.github.io#1 (comment) for example. That's why I think also a hosts file should either be maintained there, or auto-generated from their source.

What exactly is wrong with their CSV files? I imagine they can be cleaned up so that they lend themselves to being sed straight automatically.

@pascalwhoop
Copy link
Author

pascalwhoop commented Aug 7, 2018

I was having troubles catching this line for example
https://github.com/stop-predatory-journals/stop-predatory-journals.github.io/blob/master/_data/journals.csv#L361

also line 404 and 477

@spirillen
Copy link
Contributor

spirillen commented Nov 20, 2018

After reading into this project goal, I must admit it's a good idea, but how would you differ the greedy from the hoax sites?

As I understand this repo it's not against greedy basters as github.com (Microsoft) would have been add to the hosts file..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants