New filter: website resembles username #450

Glorfindel83 · 2017-01-10T12:13:20Z

E.g. for these kind of spam posts, which go undetected quite often or
https://metasmoke.erwaysoftware.com/post/52946
https://metasmoke.erwaysoftware.com/post/52841
https://metasmoke.erwaysoftware.com/post/51936

Procedure: replace spaces in username by \W? and check if there's a link in the post which contains that string.
There are some users with 3 character usernames which have a chance of accidentally triggering the filter. Maybe this should only work for usernames above a certain length.

magisch · 2017-01-10T12:27:46Z

Sounds like a good idea

ArtOfCode- · 2017-01-10T13:26:52Z

Actually, having assigned this to myself, I've just realised this isn't currently possible. We only check one of username/title/body/summary at a time, so there's no point when check code has access to both.

magisch · 2017-01-11T08:21:48Z

@ArtOfCode- Wouldn't it be possible to schedule the Username check before the body check and save the username temporarily so you can access it in the body check?

ArtOfCode- · 2017-01-11T12:12:18Z

@magisch Possibly. Would have to look at that.

Undo1 · 2017-01-11T16:38:59Z

Sounds messy, I'd probably be against that. It'd be better to just make a new reason-method type that takes all parts of the post at once.

ghost · 2017-01-12T03:17:53Z

@Undo1 agree

AWegnerGitHub · 2017-02-21T14:34:10Z

Are there other test cases to run against? Right now I am using the following tests:

checks = [
	("http://www.price-buy.com/", "Price Buy"),
	("https://thebestparkourgear.com/backpack-for-parkour/", "TheBestParkourGear"),
	("httl://bestonwardticket.com", "Best onward Ticket"),
	("https://i.stack.imgur.com/eS6WQ.jpg", "Best onward Ticket"),
	("www.stackoverflow.com", "Andy"),
	("www.stackoverflow.notarealtld", "Andy"),
	("stackoverflow.notarealtld", "Andy"),
	("http://stackoverflow.notarealtld", "Andy"),
	("httl://stackoverflow.notarealtld", "Andy"),
]

I get the following results:

SIMILAR: (1.0) => Name: Price Buy, Domain: http://www.price-buy.com/
SIMILAR: (1.0) => Name: TheBestParkourGear, Domain: https://thebestparkourgear.com/backpack-for-parkour/
SIMILAR: (1.0) => Name: Best onward Ticket, Domain: httl://bestonwardticket.com
NOT SIMILAR: (0.0952380952381) => Name: Best onward Ticket, Domain: https://i.stack.imgur.com/eS6WQ.jpg
NOT SIMILAR: (0.117647058824) => Name: Andy, Domain: www.stackoverflow.com
NOT SIMILAR: (0.117647058824) => Name: Andy, Domain: www.stackoverflow.notarealtld
NOT SIMILAR: (0.117647058824) => Name: Andy, Domain: stackoverflow.notarealtld
NOT SIMILAR: (0.0) => Name: Andy, Domain: http://stackoverflow.notarealtld
NOT SIMILAR: (0.0) => Name: Andy, Domain: httl://stackoverflow.notarealtld

It's a little messier than I thought it'd be, and does require a library be added to Smokey, but it works. My tests have been pretty simple so far. I've only passed the domain, not the entire body of the text. Doing that will require an HTML parser (likely BeautifulSoup), so that'd need to be included too.

What I need:

The OK to include at least 1 new library: tld. If we don't already include BeautifulSoup, we also need to include that for parsing the links out of the body.
More test cases so I can throw those into here and make sure I'm not missing any other cases.

Undo1 · 2017-02-21T14:56:50Z

That looks awesome. We already have beautifulsoup (4, I think?), and that TLD library is tiny. Have the code for this in a branch somewhere?

…

On Tue, Feb 21, 2017, 7:34 AM A Wegner ***@***.***> wrote: Are there other test cases to run against? Right now I am using the following tests: checks = [ ("http://www.price-buy.com/", "Price Buy"), ("https://thebestparkourgear.com/backpack-for-parkour/", "TheBestParkourGear"), ("httl://bestonwardticket.com", "Best onward Ticket"), ("https://i.stack.imgur.com/eS6WQ.jpg", "Best onward Ticket"), ("www.stackoverflow.com", "Andy"), ("www.stackoverflow.notarealtld", "Andy"), ("stackoverflow.notarealtld", "Andy"), ("http://stackoverflow.notarealtld", "Andy"), ("httl://stackoverflow.notarealtld", "Andy"), ] I get the following results: SIMILAR: (1.0) => Name: Price Buy, Domain: http://www.price-buy.com/ SIMILAR: (1.0) => Name: TheBestParkourGear, Domain: https://thebestparkourgear.com/backpack-for-parkour/ SIMILAR: (1.0) => Name: Best onward Ticket, Domain: httl://bestonwardticket.com NOT SIMILAR: (0.0952380952381) => Name: Best onward Ticket, Domain: https://i.stack.imgur.com/eS6WQ.jpg NOT SIMILAR: (0.117647058824) => Name: Andy, Domain: www.stackoverflow.com NOT SIMILAR: (0.117647058824) => Name: Andy, Domain: www.stackoverflow.notarealtld NOT SIMILAR: (0.117647058824) => Name: Andy, Domain: stackoverflow.notarealtld NOT SIMILAR: (0.0) => Name: Andy, Domain: http://stackoverflow.notarealtld NOT SIMILAR: (0.0) => Name: Andy, Domain: httl://stackoverflow.notarealtld ------------------------------ It's a little messier than I thought it'd be, and does require a library be added to Smokey <https://pypi.python.org/pypi/tld>, but it works. My tests have been pretty simple so far. I've only passed the domain, not the entire body of the text. Doing that will require an HTML parser (likely BeautifulSoup), so that'd need to be included too. What I need: - The OK to include at least 1 new library: tld <https://pypi.python.org/pypi/tld>. If we don't already include BeautifulSoup, we also need to include that for parsing the links out of the body. - More test cases so I can throw those into here and make sure I'm not missing any other cases. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#450 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE7FZuJHsM3EApNMozuFGhcHF6gYUK0mks5revXkgaJpZM4LfV4q> .

Glorfindel83 · 2017-02-21T14:57:36Z

Good job! Here's another TP from today: https://metasmoke.erwaysoftware.com/post/58200
Also, one of your testcases has a httl://. I don't know that scheme.

ArtOfCode- · 2017-02-21T15:28:26Z

@Glorfindel83 HyperText Testing Language

AWegnerGitHub · 2017-02-21T19:13:26Z

No, no branch yet. I've been testing alternatives all morning though and am ready to implement. However this brings up another point of discussion. I've opened another issue because it will impact more than just this change.

Related issue: #538

AWegnerGitHub · 2017-02-21T20:40:32Z

@Glorfindel83, yes it does. That httl is from https://metasmoke.erwaysoftware.com/post/51936

AWegnerGitHub · 2017-02-23T18:34:56Z

Closed with 2860085

ArtOfCode- self-assigned this Jan 10, 2017

ArtOfCode- removed their assignment Jan 10, 2017

angussidney added type: feature request Shinies. area: spamchecks Detections or the process of testing posts. (No space in the label, is because of Hacktoberfest) labels Jan 23, 2017

AWegnerGitHub mentioned this issue Feb 21, 2017

Discussion: Spam methods need multiple parts of a post #538

Closed

AWegnerGitHub mentioned this issue Feb 22, 2017

This implements username like website; adjusts method calls to accept *args #539

Merged

AWegnerGitHub closed this as completed Feb 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New filter: website resembles username #450

New filter: website resembles username #450

Glorfindel83 commented Jan 10, 2017

magisch commented Jan 10, 2017

ArtOfCode- commented Jan 10, 2017

magisch commented Jan 11, 2017

ArtOfCode- commented Jan 11, 2017

Undo1 commented Jan 11, 2017

ghost commented Jan 12, 2017

AWegnerGitHub commented Feb 21, 2017

Undo1 commented Feb 21, 2017 via email

Glorfindel83 commented Feb 21, 2017 •

edited

ArtOfCode- commented Feb 21, 2017

AWegnerGitHub commented Feb 21, 2017

AWegnerGitHub commented Feb 21, 2017

AWegnerGitHub commented Feb 23, 2017

New filter: website resembles username #450

New filter: website resembles username #450

Comments

Glorfindel83 commented Jan 10, 2017

magisch commented Jan 10, 2017

ArtOfCode- commented Jan 10, 2017

magisch commented Jan 11, 2017

ArtOfCode- commented Jan 11, 2017

Undo1 commented Jan 11, 2017

ghost commented Jan 12, 2017

AWegnerGitHub commented Feb 21, 2017

Undo1 commented Feb 21, 2017 via email

Glorfindel83 commented Feb 21, 2017 • edited

ArtOfCode- commented Feb 21, 2017

AWegnerGitHub commented Feb 21, 2017

AWegnerGitHub commented Feb 21, 2017

AWegnerGitHub commented Feb 23, 2017

Glorfindel83 commented Feb 21, 2017 •

edited