Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upImprove data collection practices #3895
Comments
andrewdavidwong
added
task
privacy
business
labels
May 12, 2018
andrewdavidwong
added this to the
Documentation/website milestone
May 12, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
May 12, 2018
Member
We don't have full control over this. The logs we have are mostly the thing that anyone on the network path could observe (the fact that given IP address connects to qubes updates server). Also, we have no direct control what mirrors providers store. Even mirrors.kernel.org does not have privacy policy, leaving alone other mirrors.
Following our distrust in infrastructure, even if we publish such rules, there is no reason to believe that they are respected. A explained there, there are multiple areas that we don't control and multiple methods that even things we somewhat control, could be taken over. For example service provider we use for the updates server could silently take snapshots of memory and disk of that machine and we'd never know.
Providing a statement, which we don't really have means to keep up with, would be irresponsible from our side.
I propose to not publish any statement about what data we keep and instead add an FAQ entry why we don't have one. And hint users that if they want to hide their IP when downloading updates, there is an option to use Tor.
|
We don't have full control over this. The logs we have are mostly the thing that anyone on the network path could observe (the fact that given IP address connects to qubes updates server). Also, we have no direct control what mirrors providers store. Even mirrors.kernel.org does not have privacy policy, leaving alone other mirrors. Following our distrust in infrastructure, even if we publish such rules, there is no reason to believe that they are respected. A explained there, there are multiple areas that we don't control and multiple methods that even things we somewhat control, could be taken over. For example service provider we use for the updates server could silently take snapshots of memory and disk of that machine and we'd never know. Providing a statement, which we don't really have means to keep up with, would be irresponsible from our side. I propose to not publish any statement about what data we keep and instead add an FAQ entry why we don't have one. And hint users that if they want to hide their IP when downloading updates, there is an option to use Tor. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
May 13, 2018
Member
We don't have full control over this. The logs we have are mostly the thing that anyone on the network path could observe (the fact that given IP address connects to qubes updates server). Also, we have no direct control what mirrors providers store. Even mirrors.kernel.org does not have privacy policy, leaving alone other mirrors.
Following our distrust in infrastructure, even if we publish such rules, there is no reason to believe that they are respected. A explained there, there are multiple areas that we don't control and multiple methods that even things we somewhat control, could be taken over. For example service provider we use for the updates server could silently take snapshots of memory and disk of that machine and we'd never know.
If I understand correctly, the argument is: Since we cannot guarantee that third-parties will handle user data carefully, we should not bother handling user data carefully either. I don't think that's the right way to think about this. What matters is not whether we have total control over the data, but what we do with the data that we do have control over. In other words, we should be responsible for what we control, regardless of whether other entities are responsible for what they control. The fact that other entities could be irresponsible does not absolve us of our own responsibility.
Think about it this way: If a user asks, what do you (i.e., the Qubes team) do to protect the data about me that you collect from the updates servers, then the truthful answer will have to be "nothing" (or "close to nothing). If a user asks, "Why don't you encrypt my IP address before storing it?" then our answer is, "Because someone else could store your IP address unencrypted." But that doesn't really make sense. Even if we can't solve the entire problem, we can at least refrain from exacerbating it.
Providing a statement, which we don't really have means to keep up with, would be irresponsible from our side.
That's not what I'm proposing. I'm proposing that we set a policy for what we do control that we can keep up with.
It's fine to say, "Look, there are all sorts of ways in which this data could be intercepted before it gets into our hands, but once it gets into our hands, we try to handle it with care."
If I understand correctly, the argument is: Since we cannot guarantee that third-parties will handle user data carefully, we should not bother handling user data carefully either. I don't think that's the right way to think about this. What matters is not whether we have total control over the data, but what we do with the data that we do have control over. In other words, we should be responsible for what we control, regardless of whether other entities are responsible for what they control. The fact that other entities could be irresponsible does not absolve us of our own responsibility. Think about it this way: If a user asks, what do you (i.e., the Qubes team) do to protect the data about me that you collect from the updates servers, then the truthful answer will have to be "nothing" (or "close to nothing). If a user asks, "Why don't you encrypt my IP address before storing it?" then our answer is, "Because someone else could store your IP address unencrypted." But that doesn't really make sense. Even if we can't solve the entire problem, we can at least refrain from exacerbating it.
That's not what I'm proposing. I'm proposing that we set a policy for what we do control that we can keep up with. It's fine to say, "Look, there are all sorts of ways in which this data could be intercepted before it gets into our hands, but once it gets into our hands, we try to handle it with care." |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
May 13, 2018
Member
We distrust the infrastructure for good reason, but insofar as we are part of the infrastructure, we should try to be trustworthy. There are some systems that we could, in principle, run in a trustworthy way ourselves (e.g., hosting the website and the mailing lists), but doing so would be prohibitively expensive (in both time and money), so we offload it to the distrusted infrastructure. If improving our data collection practices with respect to the Qubes update servers would also be prohibitively expensive, then it would be consistent to adopt the position that users would be better served by us spending our time on things that we know will significantly benefit their privacy and security, rather than on strengthening the one link we control in a very weak chain. It's an empirical question whether this is the case. It's also conceivable that we could become the weak link in the chain (e.g., if we do nothing while other parts of the infrastructure are compelled by law to adopt better privacy practices) or that we have a measure of control in the selection of the other links we associate with (e.g., choosing to use privacy-respecting service providers).
|
We distrust the infrastructure for good reason, but insofar as we are part of the infrastructure, we should try to be trustworthy. There are some systems that we could, in principle, run in a trustworthy way ourselves (e.g., hosting the website and the mailing lists), but doing so would be prohibitively expensive (in both time and money), so we offload it to the distrusted infrastructure. If improving our data collection practices with respect to the Qubes update servers would also be prohibitively expensive, then it would be consistent to adopt the position that users would be better served by us spending our time on things that we know will significantly benefit their privacy and security, rather than on strengthening the one link we control in a very weak chain. It's an empirical question whether this is the case. It's also conceivable that we could become the weak link in the chain (e.g., if we do nothing while other parts of the infrastructure are compelled by law to adopt better privacy practices) or that we have a measure of control in the selection of the other links we associate with (e.g., choosing to use privacy-respecting service providers). |
andrewdavidwong commentedMay 12, 2018
From QubesOS/qubes-doc#649 (comment):
I think it's very important that we improve our practices in this area. Thankfully, we don't have to handle much data about users, but with respect to the data about users that we do handle, we should:
I'm assigning this to the "Documentation/website" milestone, but this primarily applies to the data we collect from the Qubes update servers. Since we don't host the Qubes website ourselves, we don't have any access to or control over the data generated when people visit the website.
CC: @rootkovska, @marmarek, @woju, @mfc