Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
Enable Google analytics #1077
Conversation
Cobra-Bitcoin
added some commits
Oct 2, 2015
|
Technical and concept ACK. I think @saivann needs to comment on what the lawyer told him last year about cookies regarding the privacy policy we had written. For the technical review, I verified the JS code provided is the same code as shown on Google's website here (with our particular tracking ID added). I do think that this means Google could conceivably load or replace content on our pages, but I'm not personally concerned about that. |
|
I tend to see this in NACK territory. Would |
|
@jonasschnelli you may want to read #853 where we dropped our self-hosted stats system because it couldn't deal with the crazy redirect traffic we received. I think we're hoping that Google Analytics has already implemented all the code in their system to deal with garbage data like that, whereas we just don't have the manpower to maintain our own statistics system to the same quality. In addition, potential advertisers may be more likely to trust Analytics data than stats from our logs (which we could fake). Regarding privacy, I also dislike allowing third parties to track visitors (especially large organizations like Google who have access to big data to run correlations on), but the site needs revenue to pay for its server costs (at the very least) and so we need to prove the value of the site to potential advertisers. |
|
@Cobra-Bitcoin According to the lawyer I spoke with, we need a formal privacy policy if we enable Google Analytics (the current text on the site does not meet the definition of a privacy policy, and there is additional requirements than just writing the page), otherwise the site wouldn't be compliant with the law (since we would track cookies here, it's not just EU laws). For the record, while I personally find Google Analytics to be very useful and even submitted it in the past, this was previously (and repeatedly) rejected by a fair amount of contributors. |
|
How can we ensure that google (or someone who has compromised or ordered them to do so) cannot replace the content (including links to the downloads)? If this were done would there be a way to load it only in a hidden iframe so that it would not have access to the page content (e.g. merely tracking and not tracking and manipulation)? Would it be possible to not emit this to clients which send the do not track header? |
|
@jonasschnelli Just to add to what @harding said: A lot of our visitors come from Google, so they already have a good idea of who is visiting bitcoin.org. @saivann Can you email me all the requirements? I'll work on making a more formal privacy policy. I'm also assuming we would need a cookie disclaimer to comply with EU laws. @gmaxwell I understand your concerns, but I don't think we can prevent that risk without significantly reducing the quality of our tracking. We already trust businesses like our hosting provider, domain registrar, SSL CA, GitHub, and more to behave properly. As long as the binaries continue to be signed, I think risks like these are something we can live with. It might be possible to not emit this to clients which send the do not track header. It would have to be done through JavaScript though as the site doesn't have any other way of dynamically modifying content. |
|
Forgive my pedantry, but for the record we don't trust GitHub any more AFAIK (provided there's no remotely-exploitable bug it git). That's because, as of #918, the tip of each branch must be GPG signed by an authorized committer in order for it to be automatically built. On the other hand, you can add a second hosting provider to that trust list because we use a separate VPS from a separate provider to build the site web content. And, of course, that trust list should probably mention the several of us who have commit access or even direct server access. I think we've demonstrated trustworthiness in the past, but I doubt we're any more able to resist legal or illegal compulsion than Google, and we too can have our computers compromised. |
|
@Cobra-Bitcoin Just sent you the information. Re Third party risk: In general we try to reduce that risk, I think we can't use the fact that we already trust third parties as an argument to make it worse, we have disabled embedded Youtube and locked GitHub out from updating pages for that reason. I think the real point is, does Google Analytics provide enough value to justify that risk. I am personally sort of neutral here. Re Signed binaries; It is worth to note that signatures are very rarely used. I think this also isn't a practical argument for Google Analytics. |
|
@Cobra-Bitcoin how would you like to proceed? If this were a regular feature request, I'd say that it had too much opposition and I would suggest closing it. On the other hand, this is yours and @theymos's domain, so I think the final decision is up to you two. If you're working on implementing some of the suggestions above to limit the security and privacy downsides, let me know and I'll tag this as Needs More Info so it doesn't appear in my regular queue of issues needing attention. Thanks. |
|
@harding I'm working on implementing some of the suggestions above. I'll make a new pull request for that once I'm done since it involves some significant changes to the UI. Thanks to everyone for their feedback and thoughts. I'm closing this pull request now. |
Cobra-Bitcoin
closed this
Oct 7, 2015
Cobra-Bitcoin
deleted the
Cobra-Bitcoin:google-analytics branch
Oct 9, 2015
|
A bit late, but I'm happy the decision turned out to not do this. Google Analytics is a giant, opaque, spying apparatus, and a central point of failure of the web. |
mattrybin
commented
Oct 12, 2015
|
Would a possible solution be to track, let's say 20% of the visitors? Currently without a tracking system bitcoin.org:
The possible downsides to tracking are: The possibility for google to change content. longer loading time. Possible solution With 7 million people per quarter, bitcoin has high enough traffic to be able to extract a percentage from the total pool and still with 99% confidence say that the number represent the whole pool. I do not know if 20% would provide 99% confidence level, but somebody should be able to calculate that if the need to be precise exist: |
|
I agree that tracking can be useful. But why off-site via google? What is wrong with analyzing apaches access log? I think you can also extend apaches log (add preferred browser language, etc.). Do we really need browser window sizes, etc and taking all the downsides? Your three bullet points are achievable with apache log parsing through webalizer or similar. |
|
Nginx log parsing is what we did previously and I even wrote a script for The thing is bitcoin.org was targetted by junk traffic which made this data This crossed the limits of what could be easily filtered as obvious bot Cobra expressed hope that Google analytics would be better at filtering |
mattrybin
commented
Oct 12, 2015
|
An other solution would be to make it time based instead. Include google every full last week of the month to sample to traffic. |
|
Hello, how about Piwik along with our own referral spam blacklist? |
|
@wilbns That's cool. No idea if it would actually work well in our case (although this can't be worse than server logs). This said, the privacy policy would remain a pre-requisite to using Piwik if it tracks cookies and/or if we need to log full IP addresses. |
Cobra-Bitcoin commentedOct 2, 2015
Adds Google Analytics to the site so that visitor numbers and behavior can be accurately measured. This is necessary as Bitcoin.org is currently looking to work with new advertising partners to help fund the site's expenses.