-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is considered logging? (in reference to Cloudflare) #128
Comments
Good question! Just as a reference here is what I'm doing on my server: https://dns.seby.io/stats.html All this really shows is how the server and the clients are behaving. I'm pretty sure that it's impossible to identify someone from these graphs. This is the only data I have. I use it to see how popular the service is and if I need to take manual action (e.g. when the graphs go down and stay at 0 or sky-rocket and someone is abusing the service) From my graphs I could get aggregate data on the following :
I don't consider this as logging but I'm technically logging some information so maybe I should remove the no-logging label too? I don't know. It depend on an individuals thread model. Maybe we should define logging such that if it's possible to identity a unique user or query from the logs it's logging else its non-logging? That definition still doesn't help much though. For Cloudflare I think they may use unique identifiers to determine unique users in the 24 hour period. Than after 24h they just increment the "Number of unique users" counter. I don't know but I'm speculating. I do think they are pushing the no-logging envelope a bit though. @jedisct1 What do you think |
There has never been a formal definition of a non-logging resolver, but this is a very important topic, and something that we should define all together. Logging the client IP address, even temporarily, should probably clear the 'non-logging' bit immediately. Now, what about logging queries and responses? Even without client IP addresses, this can leak sensitive information. While a unique sequence of queries does not reveal the client IP, it reveals when that device is online. More importantly, DNS queries, even to nonexistent names, reveal information about the network, what software is being used and more. For example, queries for Another issue is that when a query for a nonexistent name is made, operating systems can be configured to retry using the "default" domain (or even a set of domains, e.g. with the While the first query doesn't reveal much information about the identity of the client, the second does. A third issue, similar to the previous one, is browser autocompletion, that can also trigger the default suffix. So that search queries can end up as queries for Unfortunately, this information is already public. Sensors recording queries and responses sent to authoritative servers are everywhere. Companies such as Cisco and Farsight log everything the see and sell access to their database. This data is stored forever. There are also many free services doing the same. This is very useful for security and marketing purposes. Even data sent to a resolver that doesn't log may end up in these databases, because the sensors are placed between the resolvers and the authoritative servers, not between the client and the authoritative servers. So, the consensus in the DNS community, maybe as a way to downplay the fact that DNSSEC doesn't provide any confidentiality, or that names can be brute-forced, has always been that "DNS data should be considered public". If we agree with that, maybe the definition of "doesn't log" can just be "doesn't log the client IP, even temporarily". |
Thank you both so much for your responses, I really appreciate the open discussion we're having. I think this topic goes beyond just cloudflare, and that was not my intention to single them out. In terms of what is considered logging I think there are at least 3 instances that we're dealing with:
Which, begs the question: at what point does it become too much? I agree with @jedisct1 about this:
For example, testing-secret-internal-project.bankofamerica.com could also be found by things like:
(sorry @jedisct1 no fish rubbing at github yet.) So in that sense I would agree with "ip logging is considered logging". However, I think when we look at the list cloudflare logs, I do believe there is more to worry about than just queries and responses. And that's where I would love to get your input about @jedisct1 and @publicarray. You see if the query is public data but the ip address isn't, one could argue:
However if we look at that list, I don't think that statement applies anymore. Thank you guys again, I hope we can continue this conversation. |
The information Cloudflare logs doesn't seem to be enough to passively link queries to users, so the Maybe they make an rough estimate based on the number of queries, and the fact that on average, a user makes Or maybe they temporarily use client IP addresses, independently from the payloads they send and receive, for throttling and DoS mitigation. That can be implemented at any layer, but a firewall rule that prevents a single client IP to send tons of queries in a short time fits in this category. Using client IP addresses that way is probably fine and should not void the "non logging" flag.
Rather than speculating, maybe @vavrusa can clarify what exactly gets logged and what |
Hi, I am the product manager for the 1.1.1.1 team. I can see why this can be confusing. We don't store anything that can actually tell us how many unique users we have for the public DNS resolver. We do internally sometimes make rough estimates based on the number of queries. Here's what we actually log:
We will work on making this clearer in our privacy policy. |
Thanks a lot for chiming in and for the clarification, Mohd! So, shall we define "non-logging" as "doesn't log or use the client IP address, except for rate limiting, and without correlation with DNS queries"? What do you think? The "non-logging" bit is important, if only because by default, dnscrypt-proxy ignores resolvers having that bit set (and we probably shouldn't change that). |
"non-logging" as "doesn't log or use the client IP address, except for rate limiting Yes. IMO, that's fair. |
Yes I’m happy with that 👍 |
@irtefa could you please confirm the end of the sentence applies to cloudflare too? doesn't log or use the client IP address, except for rate limiting, and without correlation with DNS queries. Then as far as my opinion goes, I'm good with it too, as my only concern left was the one @jedisct1 mentioned here: #128 (comment) |
That's correct. We may use the IP address for rate limiting but we don't log them. Furthermore, they are not associated with DNS queries. |
How about changing "log" to retain?
For DoH resolvers, even things like User-Agent + ASN might be enough to identify users. so changing client IP address to "user identifiable information" might be better. The Mozilla DoH resolver policy takes it up nicely: https://wiki.mozilla.org/Security/DOH-resolver-policy |
Hi!
I tried visiting the wiki here on github, but I can't find what your policy is, regarding logging.
I'm asking because have some concerns about cloudflare being under the "no logging" label.
According to their website they log this:
This seems like enough information to identify someone.
I do understand they remove the IP address, as seen here:
My point here is this: the reason people worry about their ip address being logged, is beause it is considered 'identifying information'.
However, if you look at that list above, there are several things in there that can identify someone easily.
Which they actually admit to being able to do in the bold section here:
If they can identify unique users, and keep all the information above (some of it permanently), my suggestion is to reconsider putting them under "no-logging".
Regardless, I trust your opinion.
Source: https://developers.cloudflare.com/1.1.1.1/commitment-to-privacy/privacy-policy/privacy-policy/
The text was updated successfully, but these errors were encountered: