Skip to content

Notes for developers

bgrzyscc edited this page Sep 5, 2013 · 1 revision

As you should know after reading README.me, this script is not accurate. It's only kind of simulation. This is because in Google Analytics Reporting API there is no way to return data about single visit [1]. Read more about it in Google Reporting API. Also don't miss great Google Analytics Query Explorer 2 with Dimensions & Metrics Reference close at hand.

In terms of exporting data from API it is important that you are limited to use 7 dimensions per query.

On the piwik's side we use 3 tables:

  • log_visit
  • log_link_visit_action
  • log_action

When you already know what you can grab from Google API it's becomes clear that you have to simulate user's actions to populate Piwik's database. The approach taken in this script is to collect visit's info by merging series of dimensions (dimensions.DVALS).

General algorithm

  1. Create ActionManager

    Fetch data from API (dimensions: ga:pagePath,ga:pageTitle and metrics: ga:pageviews) and for each pagePath create an Action object. Resultant ActionManager object is stored in a global variable action_manager and used later in a Simulate actions step.

  2. Create VisitHashGenerator

  3. Main loop

    Exports period of time day by day (while currentdate <= enddate)

    Export day:
    • Create VisitSimulator
      Fetch number of visits for a current day and create equal number of Visit objects.
    • Initialize simulator
      Populate visits with basic dimensions and index them by visit["ga:latitude"], visit["ga:longitude"], visit["ga:hour"].
    • Update simulator with series of dimensions from DVALS dict. Also update with landing and exit pages.
      Be careful when doing changes in DVALS dict. It's easy to break the results.
    • Finalize
      Changes Google Analytics fields and values into Piwik's
    • Simulate actions
      Populate log_link_visit_action - generated semi-randomly, not reliable.
  4. Update visit actions

Resources

[1] Actually there is a way, but it does not fit our use case. It depends on adding custom variable with visitor id to the google analytics js script. See http://cutroni.com/blog/2011/05/05/merging-google-analytics-with-your-data-warehouse/.
Clone this wiki locally