Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Enable Google analytics #1077

Closed
wants to merge 2 commits into
from

Conversation

Projects
None yet
8 participants
Contributor

Cobra-Bitcoin commented Oct 2, 2015

Adds Google Analytics to the site so that visitor numbers and behavior can be accurately measured. This is necessary as Bitcoin.org is currently looking to work with new advertising partners to help fund the site's expenses.

Cobra-Bitcoin added some commits Oct 2, 2015

Update Privacy Policy
Change the privacy policy to mention Google Analytics data collection
and usage.
Add Google Analytics
Adds the Google Analytics tracking code to the site.
Contributor

harding commented Oct 2, 2015

Technical and concept ACK. I think @saivann needs to comment on what the lawyer told him last year about cookies regarding the privacy policy we had written.

For the technical review, I verified the JS code provided is the same code as shown on Google's website here (with our particular tracking ID added). I do think that this means Google could conceivably load or replace content on our pages, but I'm not personally concerned about that.

Contributor

jonasschnelli commented Oct 2, 2015

I tend to see this in NACK territory.
Allowing Google to track bitcoin.org visitors is something i personally dislike.

Would webalizer or a similar non-third-party and fully-opensourced solution not be enough?

Contributor

harding commented Oct 2, 2015

@jonasschnelli you may want to read #853 where we dropped our self-hosted stats system because it couldn't deal with the crazy redirect traffic we received. I think we're hoping that Google Analytics has already implemented all the code in their system to deal with garbage data like that, whereas we just don't have the manpower to maintain our own statistics system to the same quality.

In addition, potential advertisers may be more likely to trust Analytics data than stats from our logs (which we could fake).

Regarding privacy, I also dislike allowing third parties to track visitors (especially large organizations like Google who have access to big data to run correlations on), but the site needs revenue to pay for its server costs (at the very least) and so we need to prove the value of the site to potential advertisers.

Contributor

saivann commented Oct 2, 2015

@Cobra-Bitcoin According to the lawyer I spoke with, we need a formal privacy policy if we enable Google Analytics (the current text on the site does not meet the definition of a privacy policy, and there is additional requirements than just writing the page), otherwise the site wouldn't be compliant with the law (since we would track cookies here, it's not just EU laws).

For the record, while I personally find Google Analytics to be very useful and even submitted it in the past, this was previously (and repeatedly) rejected by a fair amount of contributors.

Contributor

gmaxwell commented Oct 2, 2015

How can we ensure that google (or someone who has compromised or ordered them to do so) cannot replace the content (including links to the downloads)?

If this were done would there be a way to load it only in a hidden iframe so that it would not have access to the page content (e.g. merely tracking and not tracking and manipulation)?

Would it be possible to not emit this to clients which send the do not track header?

Contributor

Cobra-Bitcoin commented Oct 4, 2015

@jonasschnelli Just to add to what @harding said: A lot of our visitors come from Google, so they already have a good idea of who is visiting bitcoin.org.

@saivann Can you email me all the requirements? I'll work on making a more formal privacy policy. I'm also assuming we would need a cookie disclaimer to comply with EU laws.

@gmaxwell I understand your concerns, but I don't think we can prevent that risk without significantly reducing the quality of our tracking. We already trust businesses like our hosting provider, domain registrar, SSL CA, GitHub, and more to behave properly. As long as the binaries continue to be signed, I think risks like these are something we can live with.

It might be possible to not emit this to clients which send the do not track header. It would have to be done through JavaScript though as the site doesn't have any other way of dynamically modifying content.

Contributor

harding commented Oct 4, 2015

Forgive my pedantry, but for the record we don't trust GitHub any more AFAIK (provided there's no remotely-exploitable bug it git). That's because, as of #918, the tip of each branch must be GPG signed by an authorized committer in order for it to be automatically built. On the other hand, you can add a second hosting provider to that trust list because we use a separate VPS from a separate provider to build the site web content.

And, of course, that trust list should probably mention the several of us who have commit access or even direct server access. I think we've demonstrated trustworthiness in the past, but I doubt we're any more able to resist legal or illegal compulsion than Google, and we too can have our computers compromised.

@harding harding referenced this pull request Oct 4, 2015

Closed

Redesign Idea #1078

Contributor

saivann commented Oct 4, 2015

@Cobra-Bitcoin Just sent you the information.

Re Third party risk: In general we try to reduce that risk, I think we can't use the fact that we already trust third parties as an argument to make it worse, we have disabled embedded Youtube and locked GitHub out from updating pages for that reason. I think the real point is, does Google Analytics provide enough value to justify that risk. I am personally sort of neutral here.

Re Signed binaries; It is worth to note that signatures are very rarely used. I think this also isn't a practical argument for Google Analytics.

Contributor

harding commented Oct 7, 2015

@Cobra-Bitcoin how would you like to proceed? If this were a regular feature request, I'd say that it had too much opposition and I would suggest closing it. On the other hand, this is yours and @theymos's domain, so I think the final decision is up to you two.

If you're working on implementing some of the suggestions above to limit the security and privacy downsides, let me know and I'll tag this as Needs More Info so it doesn't appear in my regular queue of issues needing attention.

Thanks.

Contributor

Cobra-Bitcoin commented Oct 7, 2015

@harding I'm working on implementing some of the suggestions above. I'll make a new pull request for that once I'm done since it involves some significant changes to the UI.

Thanks to everyone for their feedback and thoughts. I'm closing this pull request now.

@Cobra-Bitcoin Cobra-Bitcoin deleted the Cobra-Bitcoin:google-analytics branch Oct 9, 2015

Contributor

laanwj commented Oct 11, 2015

A bit late, but I'm happy the decision turned out to not do this. Google Analytics is a giant, opaque, spying apparatus, and a central point of failure of the web.

Would a possible solution be to track, let's say 20% of the visitors?

Currently without a tracking system bitcoin.org:

  • Have much harder time attracting possible advertisers without the ability to provide necessary data.
  • No ability to adapt if bitcoin.org was receiving 50% of its traffic from China, because you just don't know what language you need to focus on.
  • no ability to improve the content and know for sure that its actuality improving the time people spend on the site.

The possible downsides to tracking are:

The possibility for google to change content.

longer loading time.

Possible solution
I would say that the reward is higher than the risk, but a possible solution would be to hedge the risk by only loading google analysis on 20% of the visitors.

With 7 million people per quarter, bitcoin has high enough traffic to be able to extract a percentage from the total pool and still with 99% confidence say that the number represent the whole pool.

I do not know if 20% would provide 99% confidence level, but somebody should be able to calculate that if the need to be precise exist:
https://en.wikipedia.org/wiki/Confidence_interval

Contributor

jonasschnelli commented Oct 12, 2015

I agree that tracking can be useful. But why off-site via google? What is wrong with analyzing apaches access log? I think you can also extend apaches log (add preferred browser language, etc.). Do we really need browser window sizes, etc and taking all the downsides?

Your three bullet points are achievable with apache log parsing through webalizer or similar.

Contributor

saivann commented Oct 12, 2015

Nginx log parsing is what we did previously and I even wrote a script for
it which was used to provide public stats :
https://github.com/saivann/bitcoinstats, and which would filter obvious
bots and DoS (note not DDoS)

The thing is bitcoin.org was targetted by junk traffic which made this data
baseless. Tons of hacked websites were redirected at bitcoin.org, as well
as additional suspicious traffic in addition to casual DDoS attacks.

This crossed the limits of what could be easily filtered as obvious bot
traffic without constant work. And stats became useless.

Cobra expressed hope that Google analytics would be better at filtering
this data with their large scale traffic analysis and algorithms, but
there's no garantee that it would be good enough.

An other solution would be to make it time based instead. Include google every full last week of the month to sample to traffic.

Contributor

wbnns commented Oct 12, 2015

Hello, how about Piwik along with our own referral spam blacklist?

Contributor

saivann commented Oct 12, 2015

@wilbns That's cool. No idea if it would actually work well in our case (although this can't be worse than server logs). This said, the privacy policy would remain a pre-requisite to using Piwik if it tracks cookies and/or if we need to log full IP addresses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment