Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INVALID domains catched by PyFunceble (dead-hosts) #674

Closed
funilrys opened this issue Jun 19, 2018 · 16 comments
Closed

INVALID domains catched by PyFunceble (dead-hosts) #674

funilrys opened this issue Jun 19, 2018 · 16 comments

Comments

@funilrys
Copy link
Contributor

Hi Steven @StevenBlack,

I hope that you're right!

I just wanted to mention the following which was marked as INVALID by PyFunceble testing with dead-hosts.

Here they are:

0.0.0.0 _thums.ero-advertising.com

0.0.0.0 o_thus.ero-advertising.com

About _thums.ero-advertising.com

Please note the presence of _ which is illegal in domain names.

About o_thus.ero-advertising.com

Please note the presence of _ which is illegal in domain names.

@StevenBlack
Copy link
Owner

Thanks @funilrys fixed!

@Rikk
Copy link

Rikk commented Jul 6, 2018

I don't think underscores are completely invalid. At least DNS resolves it. Luckily, http://o_thus.ero-advertising.com/ just redirects to another blocked subdomain.

https://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it
https://stackoverflow.com/questions/7111881/what-are-the-allowed-characters-in-a-subdomain

@lightswitch05
Copy link
Contributor

lightswitch05 commented Jul 6, 2018

@Rikk thanks for those links. I've been running my discovered domains through a pretty strict regex filter. I didn't think about it much at the time, but it could be dropping a bunch of discovered domains that would work for advertising or tracking without me knowing about it. I think I'll play around with the filter some this weekend to serve more as a symbol blacklist then a strict whitelist.

Reading up on Punycode it says:

While the Domain Name System (DNS) technically supports arbitrary sequences of octets in domain name labels, the DNS standards recommend the use of the LDH subset of ASCII conventionally used for host names, and require that string comparisons between DNS domain names should be case-insensitive.

This is an interesting example: http://www.са.com - the URL looks fine, but it has a Unicode character in it. Most browsers will display it in the URL bar differently, but as far as the DNS lookup goes its perfectly valid. At the moment, my list would drop that domain since it doesn't pass the strict filter.

@funilrys
Copy link
Contributor Author

funilrys commented Jul 10, 2018

Hello @Rikk @lightswitch05 and thanks for reopening this interesting and eternal discussion which have to be stated and clarified one day.

Unless you can point me to other technical paper which are clear, not obscure and not a Stack Overflow link, we are going to keep that as it is.

Indeed, I did not took that decision out of the wild. I first searched by myself and I read those links before (along with others) but it was not clear to me. So I took some of my free time to read the RFC related to domain names and host names and those where sometime obscure about the presence of underscore.

Finally, I consulted many people including Steven @StevenBlack, Mitchell @mitchellkrogza and other PhD students/researchers/professors from CISPA and my University who were able to give a 1st semester student some time to let him understand the problematic and how he may solve it.
From the result of those awesome talk about domain, hosts names and DNS resolution, I ended up with marking those domains as INVALID.

But as I said, if you point me to a specialist, the RFCs writers or a technical paper which give us a clear and indisputable statement about that, I'm ready to rewrite my regular expression which catch those invalid domains.

Between, they are marked as INVALID today but because Dead-Hosts keep a track of all INVALID and INACTIVE over time, if the algorithm or logic behind PyFunceble change (including the validation of those INVALID domains) a simple diff of the clean.list or a lookup of list (from the desired repository) will let any maintainer know if they are still INVALID in order to reintroduce them into their list.

Some other discussions/comments about PyFunceble or Funceble, the older sister:
#412 #669 (comment)

@Rikk
Copy link

Rikk commented Jul 10, 2018

You don't need a PhD thesis, and I don't need to read RFCs. You just need to test the address on your browser: if it loads and you are thrown into a bad site, then I would say it is valid and desired to keep it blocked.

@funilrys
Copy link
Contributor Author

funilrys commented Jul 10, 2018

@Rikk We are then talking about incertitude as your DNS can choose to resolve it or not. The idea is to have a global overview. Also in one of the link you shared there is for example the technical case with Java which show us that it is not a good idea to have an underscore in a subdomain. So for me it's still INVALID as it's not always resolvable.

@Rikk
Copy link

Rikk commented Jul 10, 2018

I prefer talking about this hosts file and what it should block or not. If a certainly harmful domain exists, works for a lot of people, my vote is for it to be blocked.
This theoric talk about unclear standards looks like no more than an excuse to exclude these domains from the blacklist. Feels like an anti-virus company saying this absurd: "It's not a good practice creating programs that infect user's computers, therefore we won't block it" or "We don't think this virus will affect much people, we won't do anything against it.".

@mitchellkrogza
Copy link
Contributor

Myself and @funilrys have done numerous tests regarding underscores in host names and I never had any success getting Bind9 to co-operate with me. Your post @Rikk mentioning http://o_thus.ero-advertising.com/ lead me to really dig into this deeper. I now have a definitive answer on underscores.

YES Bind9 can and does support underscores in host names.

A normal Bind9 master zone is configured as follows

zone "abuse.co.za" {
	type master;
	file "/var/lib/bind/abuse.co.za.hosts";
	};

Adding any underscore in a host name like mx_5 IN A 1.1.1.1 results in Bind failing to load the zone with the following error.

bad owner name (check-names)

A change is needed in the way the zone is configured as follows (telling bind to ignore checking names)

zone "abuse.co.za" {
	type master;
	file "/var/lib/bind/abuse.co.za.hosts";
    check-names ignore;
	};

Now I can add an entry mx_5 IN A 1.1.1.1 and an nslookup returns

nslookup mx_5.abuse.co.za
Server:  google-public-dns-a.google.com
Address:  8.8.8.8

Non-authoritative answer:
Name:    mx_5.abuse.co.za
Address:  1.1.1.1

And another host name beginning and containing an underscore

nslookup _mx_5.abuse.co.za
Server:  google-public-dns-a.google.com
Address:  8.8.8.8

Non-authoritative answer:
Name:    _mx_5.abuse.co.za
Address:  1.1.1.1

So .... very interesting for me as my past efforts in confirming this failed but clearly it can be done.

@mitchellkrogza
Copy link
Contributor

mitchellkrogza commented Jul 17, 2018

An update after some extensive testing with nslookup, dig, Nginx and Let's Encrypt. @funilrys @StevenBlack

abuse-results

@StevenBlack
Copy link
Owner

StevenBlack commented Jul 17, 2018

Hey guys, just so we're clear: I don't care about presently invalid domains.

If I was writing malware, I would certainly use tactics to "disappear" a domain. That merely takes TTL time for DNS to propagate. I could then "reappear" the domain relatively quickly, anytime I want to strike.

So I don't want to scrub any domains the curators haven't scrubbed by whatever process they use to maintain their lists.

@funilrys
Copy link
Contributor Author

We are clear about that and I don't know about others but as I mentioned before elsewhere here, it's not your role to clean what comes to you 😸

Basically, this whole discussion (as it was continued not as we both started) is about how to efficiently detect invalid domains!

But again I hope that it's clear for everyone reading this: This discussion (as continued) is not about how Steven @StevenBlack should work/clean his awesome compilation!

@Rikk
Copy link

Rikk commented Jul 17, 2018

Invalid domains that work? Invalid in archaic theories, not in reality.
@StevenBlack I think the two subdomains excluded are examples of domains that should be readded to the list.

@funilrys
Copy link
Contributor Author

@StevenBlack please lock this discussion as it take us (you and I) in a unconstructive loop where we may have to repeat ourselves.

@Rikk did you read the dicussions I linked previously?

This discussion (as continued) has nothing to do with @StevenBlack as he only distribute a compilation. The rest is the responsibility of the curators (again we are repeating ourselves).

@mitchellkrogza if Steven @StevenBlack lock this, please feel free to post future tests results with a new issue on PyFunceble's repository.

Cheers,
Nissar

@mitchellkrogza
Copy link
Contributor

Lock it +1 as we already discuss this on Keybase

@Rikk
Copy link

Rikk commented Jul 18, 2018

By the simple fact that commit 26d74f7 was issued due to this topic and excluded above subdomains (which exist and are online), my talk is completely on-topic. Off topic are your tests 😒. If your python library is failing to understand underscores in subdomain, then the place for discussion is obviously not here.

@funilrys
Copy link
Contributor Author

funilrys commented Jul 18, 2018

@Rikk this issue was not posted out of mind.

We had (I and Steven @StevenBlack) private discussion about the allowed character in domain and host names which turn out to be false regarding @mitchellkrogza's tests. I agree that underscore are "acceptable" but they are still invalid somewhere as you can't purchase and use a certificate with those and run a Bind9 without disabling the name checking subsystem. In other words, it's tolerated but still invalid at the time we are talking.

Desperate the fact that those domains are legally invalid, PyFunceble will be corrected in the coming hours or days in order to include underscored domain or hosts names (along with other fix in the invalidation procedure) in its tests.

Please also take in consideration that I never create such issue regarding PyFunceble result without private discussion with list maintainers.

As conclusion, as Linus Torvald said : Talking is cheap...

Let's close this discussion and start code, fix issues, send pull requests, test and/or hack!

If you want to continue this discussion please move in private, we are unconstructive right now... We can be reached on Keybase.

Repository owner locked and limited conversation to collaborators Jul 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants
@StevenBlack @lightswitch05 @Rikk @mitchellkrogza @funilrys and others