Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
Set Google Analytics #121
Comments
jordanful
commented
Apr 7, 2013
|
Surprised to hear this isn't in place already. +1 |
|
I'm a little surprised to see no one objecting: Having google getting a feed of everyone who visits the site (/downloads the software), data which they may sell or release even without a court order— is a kind of yucky result. ... At the same time github (and to a lesser extent, sourceforge) have exactly that position now and we don't even get useful stats out of the deal. So at least from my perspective, opposing it now would be principle without purpose. If in the future we move onto our own hosting and can substantially improve the privacy for the sites viewers— we should seriously revisit this. Stats are useful but local webalizer stats cover a lot of the most important analytics, so I don't think we need to compromise privacy.... but right now privacy isn't an option, we only have this hobson's choice— so if someone is actually going to look at the stats, might as well have them. |
jrmithdobbs
commented
Apr 9, 2013
|
Please no. Privacy invasion by github and sourceforce is already enough. No need to add more for no benefit. |
|
I agree with gmaxwell. Also, visitors are free to protect their privacy if they want to. Almost all websites have either social network integration or Google Analytics by these days. Nevertheless, I also think that it makes sense to drop Google Analytics in the future if we can handle this by ourselves. In the mean time, I think that these statistics are valuable both to improve the content of the website and to give us more sources of information to understand the evolution of Bitcoin all over the world. |
|
Given the recent rise in search volume in Google for Bitcoin, many sites received a drastic increase in their audience. weusecoins traffic started to go crazy and an important part of it is from bitcoin.org (actually much more than what I expected). That makes me thinking we also need to measure existing traffic both to anticipate when github will drop us due to bandwidth consumption and what configuration we will need to replace it. So I'm going to setup this tomorrow, unless a core developer disagree until then. |
|
I'm no core dev, but as a user I'm against it. As gmaxwell says, statistics are useful but you don't give them out of the hand. Use common open source tools instead. |
|
@schildbach : Open source tools is not something we can use with github hosting. So it's Analytics or nothing right now (and we can drop it in the future if we go with a dedicated hosting). In that circumstance and considering my lasts points, would you still disagree with using Analytics until we have something better? |
|
I disagree. bitcoin.org should not collect information it doesn't need and allowing a third party to collect these information is even worse. The site runs fine without. |
|
I agree with protecting users privacy. And it is the opposite of what we are doing right now by using github and sourceforce. And migrating such a high-traffic website to a dedicated server to better protect user privacy without having a good idea of it's existing load seems a bad idea to me. So if we really care about privacy, I think that we should do it right. And I think that Analytics can help us to get there. And as I exposed, |
|
(I'm suspending any plan to setup Google Analytics until we have a better consensus) |
saracen
commented
Apr 12, 2013
|
As discussed on IRC with saivann, my vote would be for using CloudFlare. Not only do they provide analytics, but they would also help shield github from any rise in traffic. |
|
Update : Github won't penalize our account unless we serve huge binary files. So that removes many concerns and priority on this issue. The privacy concerns remains though, so if we consider changing hosting plan later, enabling analytics or cloudflare for a limited time could be helpful to evaluate the migration costs and resources. Meanwhile, I won't change anything unless we have a better consensus. Cloudflare has better privacy policies, but needs to take developers time so that they setup and test cloudflare in the DNS. So maybe not the most urgent thing to do right now. |
|
+1 to using Analytics. For those of you worried about privacy, go look at how it works. The Analytics cookies are served off a different domain to all other Google cookies. They aren't linkable with personal data. |
|
I just found that Google Analytics has a feature to anonymize the IP addresses, which prevents Google to store the last bits of visitors IP addresses. Using that feature, I think we can have a good compromise between user privacy and useful statistics. |
|
@saivann Can you link to a description of this? it sounds interesting. |
|
@gmaxwell : there it is : https://support.google.com/analytics/answer/2763052?hl=en&uls=en Concretely, it seems as simple as using _anonymizeIp() in the tracking code. |
|
I don't see how its possible to obscure the visitor's IP from google unless there is a proxy running on the webserver to pass the analytics data back. |
|
Google technically receive the IP, but they state the IP is anonymized in memory before any write on disk. So it's about trusting Google for doing what they say, just like we would trust Cloudflare to respect their privacy policies if bitcoin.org was hosted on a dedicated server like the foundation website. "Only after this anonymization process is the request written to disk for processing. If the IP anonymization method is used, then at no time is the full IP address written to disk as all anonymization happens in memory nearly instantaneously after the request has been received." |
|
I don't like the idea of trusting Google... |
|
@gmaxwell : Absolutely. I can see your point, but github can already do that. Cloudflare can also do that for the Foundation website even if we don't use their analytics tool. And to some extent, even a dedicated server business would do the same thing under subpoena. At this point, we would need to host the website as a tor hidden service if we really want to get this level of privacy. And I'm not sure that it's so much relevant based on the fact that we don't collect any personal information. |
|
@saivann With our own service (even in the case of dedicated hosting) we have standing to fight a subpoena in court— Under current US caselaw we do not have a standing to fight a subpoena when the data is stored by a third party, and we can choose to not log historical data. In any case: I'm not even trying to argue this here— as I said before, I think there are enough other reasons as to why this isn't a big deal (nor do I consider subpoena a major threat— I'm generally more concerned about data leaks). I just don't want to see it justified on the basis of a misunderstanding. |
|
I am not seeing a coherent threat model here. If you're worried about NSLs then there's no regular subpoena for you fight in court and you do not have the legal resources to take it on fighting an NSL. Google does have those resources and has taken them on. If you're worried about leaks due to dishonest people, you have that issue with whoever does have access to server logs regardless of who they work for. Let us assume that the documentation is correct and the lower 8 bits of the IP are scrambled before being written to disk. Then there are no raw logs to subpoena. But even if there were, IPs are not really that reliable for identifying people. Otherwise there would be no need for cookies. And as I already pointed out, the analytics cookie is on a different domain to all others. It's not joinable. Knowing how people find our website and what they look at when they're there is a no brainer, it's obviously useful information that can help us build a better website. |
Google complies with thousands of such orders per year, with no opportunity provided to the impacted party to fight the order. If you are self hosting you always have an opportunity, if not the resources. Even ignoring NSL, Google freely complies with regular blanket subpoenas. For example, a friend of mine in law enforcement was recently involved in the theft of several $1000 bills and they requested and received a dragnet search results for queries on the subject.
There is a difference in sheer number. P(A or B) >= P(A).
Not being reliable doesn't make it not a privacy leak.
It is joinable with high probability subject to a little data mining. Cookie A visits site A, B, Q and at similar times Cookie B visits sites A, B, Q. What is the probability that the holders of these cookies are one and the same? Almost 1. I'm all for saying that the harms/consequences here are small compared to the other problems and the marginal harm is worth the result. But if you're going to keep arguing that there is no harm I'm going to keep telling you that you're wrong. You're doing your position no service by defending it using such weak arguments. And even if analytic used astonishing cryptographic magic to prevent any actual privacy leak it is widely percieved to be a privacy leak by people who care. Perception matters too.
Absolutely, but both of those questions can be answered with basic log analysis and don't require third party trackers. |
|
Yeah, I don't really want to justify it. Adding Google Analytics adds another third party, that's correct. My point is that no matter what we do, we will always have at least one third party that is able to log data we don't log ourselves under subpoena or because of malicious employees. So the "subpoena" thing is not exactly accurate : we should not think we can really protect against this without drastic measures like tor. Just as an example, current Foundation website is protected under cloudflare DDOS protection. That means Cloudflare has tons of detailed logs. That doesn't mean it's a good thing, but that gives us a relative point of view. Personnally, @gmaxwell suggestion of using internal analytics tools on a dedicated server also seems the best privacy approach to me. But meanwhile with our current setup, I consider Google Analytics with IP anonymization a good compromise. And we don't prevent visitors that seeks for real privacy to surf anonymously on the web. @tcatm said he was against logging any data at all in it's last comment. With luke-jr and Mike Hearn comments, we are still stuck with 4 OK VS 4 Not OK. |
|
This debate is becoming a bit huge for such a feature, IMO. So unless jrmithdobbs, tcatm, luke-jr or schildbach change their position, Google Analytics won't be implemented. Meanwhile, I close this request. Thanks all for your opinions. |
saivann commentedApr 6, 2013
I am asking permission to setup Google Analytics on bitcoin.org .
I would create a dedicated Google account for this task and make the password available to the core developers. I think that there is a lot instructive statistics and informations that we can get from there.