New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New filter: website resembles username #450
Comments
Sounds like a good idea |
Actually, having assigned this to myself, I've just realised this isn't currently possible. We only check one of username/title/body/summary at a time, so there's no point when check code has access to both. |
@ArtOfCode- Wouldn't it be possible to schedule the Username check before the body check and save the username temporarily so you can access it in the body check? |
@magisch Possibly. Would have to look at that. |
Sounds messy, I'd probably be against that. It'd be better to just make a new reason-method type that takes all parts of the post at once. |
@Undo1 agree |
Are there other test cases to run against? Right now I am using the following tests:
I get the following results:
It's a little messier than I thought it'd be, and does require a library be added to Smokey, but it works. My tests have been pretty simple so far. I've only passed the domain, not the entire body of the text. Doing that will require an HTML parser (likely BeautifulSoup), so that'd need to be included too. What I need:
|
That looks awesome. We already have beautifulsoup (4, I think?), and that
TLD library is tiny. Have the code for this in a branch somewhere?
…On Tue, Feb 21, 2017, 7:34 AM A Wegner ***@***.***> wrote:
Are there other test cases to run against? Right now I am using the
following tests:
checks = [
("http://www.price-buy.com/", "Price Buy"),
("https://thebestparkourgear.com/backpack-for-parkour/", "TheBestParkourGear"),
("httl://bestonwardticket.com", "Best onward Ticket"),
("https://i.stack.imgur.com/eS6WQ.jpg", "Best onward Ticket"),
("www.stackoverflow.com", "Andy"),
("www.stackoverflow.notarealtld", "Andy"),
("stackoverflow.notarealtld", "Andy"),
("http://stackoverflow.notarealtld", "Andy"),
("httl://stackoverflow.notarealtld", "Andy"),
]
I get the following results:
SIMILAR: (1.0) => Name: Price Buy, Domain: http://www.price-buy.com/
SIMILAR: (1.0) => Name: TheBestParkourGear, Domain: https://thebestparkourgear.com/backpack-for-parkour/
SIMILAR: (1.0) => Name: Best onward Ticket, Domain: httl://bestonwardticket.com
NOT SIMILAR: (0.0952380952381) => Name: Best onward Ticket, Domain: https://i.stack.imgur.com/eS6WQ.jpg
NOT SIMILAR: (0.117647058824) => Name: Andy, Domain: www.stackoverflow.com
NOT SIMILAR: (0.117647058824) => Name: Andy, Domain: www.stackoverflow.notarealtld
NOT SIMILAR: (0.117647058824) => Name: Andy, Domain: stackoverflow.notarealtld
NOT SIMILAR: (0.0) => Name: Andy, Domain: http://stackoverflow.notarealtld
NOT SIMILAR: (0.0) => Name: Andy, Domain: httl://stackoverflow.notarealtld
------------------------------
It's a little messier than I thought it'd be, and does require a library
be added to Smokey <https://pypi.python.org/pypi/tld>, but it works. My
tests have been pretty simple so far. I've only passed the domain, not the
entire body of the text. Doing that will require an HTML parser (likely
BeautifulSoup), so that'd need to be included too.
What I need:
- The OK to include at least 1 new library: tld
<https://pypi.python.org/pypi/tld>. If we don't already include
BeautifulSoup, we also need to include that for parsing the links out of
the body.
- More test cases so I can throw those into here and make sure I'm not
missing any other cases.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#450 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE7FZuJHsM3EApNMozuFGhcHF6gYUK0mks5revXkgaJpZM4LfV4q>
.
|
Good job! Here's another TP from today: https://metasmoke.erwaysoftware.com/post/58200 |
@Glorfindel83 HyperText Testing Language |
No, no branch yet. I've been testing alternatives all morning though and am ready to implement. However this brings up another point of discussion. I've opened another issue because it will impact more than just this change. Related issue: #538 |
@Glorfindel83, yes it does. That |
Closed with 2860085 |
E.g. for these kind of spam posts, which go undetected quite often or
https://metasmoke.erwaysoftware.com/post/52946
https://metasmoke.erwaysoftware.com/post/52841
https://metasmoke.erwaysoftware.com/post/51936
Procedure: replace spaces in username by \W? and check if there's a link in the post which contains that string.
There are some users with 3 character usernames which have a chance of accidentally triggering the filter. Maybe this should only work for usernames above a certain length.
The text was updated successfully, but these errors were encountered: