New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INVALID domains catched by PyFunceble (dead-hosts) #674
Comments
Thanks @funilrys fixed! |
I don't think underscores are completely invalid. At least DNS resolves it. Luckily, http://o_thus.ero-advertising.com/ just redirects to another blocked subdomain. https://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it |
@Rikk thanks for those links. I've been running my discovered domains through a pretty strict regex filter. I didn't think about it much at the time, but it could be dropping a bunch of discovered domains that would work for advertising or tracking without me knowing about it. I think I'll play around with the filter some this weekend to serve more as a symbol blacklist then a strict whitelist. Reading up on Punycode it says:
This is an interesting example: http://www.са.com - the URL looks fine, but it has a Unicode character in it. Most browsers will display it in the URL bar differently, but as far as the DNS lookup goes its perfectly valid. At the moment, my list would drop that domain since it doesn't pass the strict filter. |
Hello @Rikk @lightswitch05 and thanks for reopening this interesting and eternal discussion which have to be stated and clarified one day. Unless you can point me to other technical paper which are clear, not obscure and not a Stack Overflow link, we are going to keep that as it is. Indeed, I did not took that decision out of the wild. I first searched by myself and I read those links before (along with others) but it was not clear to me. So I took some of my free time to read the RFC related to domain names and host names and those where sometime obscure about the presence of underscore. Finally, I consulted many people including Steven @StevenBlack, Mitchell @mitchellkrogza and other PhD students/researchers/professors from CISPA and my University who were able to give a 1st semester student some time to let him understand the problematic and how he may solve it. But as I said, if you point me to a specialist, the RFCs writers or a technical paper which give us a clear and indisputable statement about that, I'm ready to rewrite my regular expression which catch those invalid domains. Between, they are marked as Some other discussions/comments about PyFunceble or Funceble, the older sister: |
You don't need a PhD thesis, and I don't need to read RFCs. You just need to test the address on your browser: if it loads and you are thrown into a bad site, then I would say it is valid and desired to keep it blocked. |
@Rikk We are then talking about incertitude as your DNS can choose to resolve it or not. The idea is to have a global overview. Also in one of the link you shared there is for example the technical case with Java which show us that it is not a good idea to have an underscore in a subdomain. So for me it's still |
I prefer talking about this hosts file and what it should block or not. If a certainly harmful domain exists, works for a lot of people, my vote is for it to be blocked. |
Myself and @funilrys have done numerous tests regarding underscores in host names and I never had any success getting Bind9 to co-operate with me. Your post @Rikk mentioning YES Bind9 can and does support underscores in host names. A normal Bind9 master zone is configured as follows
Adding any underscore in a host name like
A change is needed in the way the zone is configured as follows (telling bind to ignore checking names)
Now I can add an entry
And another host name beginning and containing an underscore
So .... very interesting for me as my past efforts in confirming this failed but clearly it can be done. |
An update after some extensive testing with nslookup, dig, Nginx and Let's Encrypt. @funilrys @StevenBlack |
Hey guys, just so we're clear: I don't care about presently invalid domains. If I was writing malware, I would certainly use tactics to "disappear" a domain. That merely takes TTL time for DNS to propagate. I could then "reappear" the domain relatively quickly, anytime I want to strike. So I don't want to scrub any domains the curators haven't scrubbed by whatever process they use to maintain their lists. |
We are clear about that and I don't know about others but as I mentioned before elsewhere here, it's not your role to clean what comes to you 😸 Basically, this whole discussion (as it was continued not as we both started) is about how to efficiently detect invalid domains! But again I hope that it's clear for everyone reading this: This discussion (as continued) is not about how Steven @StevenBlack should work/clean his awesome compilation! |
Invalid domains that work? Invalid in archaic theories, not in reality. |
@StevenBlack please lock this discussion as it take us (you and I) in a unconstructive loop where we may have to repeat ourselves. @Rikk did you read the dicussions I linked previously? This discussion (as continued) has nothing to do with @StevenBlack as he only distribute a compilation. The rest is the responsibility of the curators (again we are repeating ourselves). @mitchellkrogza if Steven @StevenBlack lock this, please feel free to post future tests results with a new issue on PyFunceble's repository. Cheers, |
Lock it +1 as we already discuss this on Keybase |
By the simple fact that commit 26d74f7 was issued due to this topic and excluded above subdomains (which exist and are online), my talk is completely on-topic. Off topic are your tests 😒. If your python library is failing to understand underscores in subdomain, then the place for discussion is obviously not here. |
@Rikk this issue was not posted out of mind. We had (I and Steven @StevenBlack) private discussion about the allowed character in domain and host names which turn out to be false regarding @mitchellkrogza's tests. I agree that underscore are "acceptable" but they are still invalid somewhere as you can't purchase and use a certificate with those and run a Bind9 without disabling the name checking subsystem. In other words, it's tolerated but still invalid at the time we are talking. Desperate the fact that those domains are legally invalid, PyFunceble will be corrected in the coming hours or days in order to include underscored domain or hosts names (along with other fix in the invalidation procedure) in its tests. Please also take in consideration that I never create such issue regarding PyFunceble result without private discussion with list maintainers. As conclusion, as Linus Torvald said : Talking is cheap... Let's close this discussion and start code, fix issues, send pull requests, test and/or hack! If you want to continue this discussion please move in private, we are unconstructive right now... We can be reached on Keybase. |
Hi Steven @StevenBlack,
I hope that you're right!
I just wanted to mention the following which was marked as
INVALID
by PyFunceble testing with dead-hosts.Here they are:
hosts/data/StevenBlack/hosts
Line 884 in 2246df0
hosts/data/StevenBlack/hosts
Line 810 in 2246df0
About
_thums.ero-advertising.com
Please note the presence of
_
which is illegal in domain names.About
o_thus.ero-advertising.com
Please note the presence of
_
which is illegal in domain names.The text was updated successfully, but these errors were encountered: