# Google Analytics with Python

Begin by using the [setup wizard](https://console.developers.google.com/start/api?id=analyticsreporting.googleapis.com&credential=client_key) to create a new project in the Google APIs console. Download your `client_secrets.json` into the project directory.

**Note:** `client_secrets.json` should never be committed to version control! For your convenience, it is listed in `.gitignore`.

You'll also need the `View ID` which you can get from the Google Analytics console menu under `Admin>View>View Settings>Basic Settings`, among other places.

#### Querying the API

Google offers a unified client for their APIs, `google-api-python-client`. The documentation can be a little spotty, but it works similar to other cloud provider clients in Python (e.g. boto3 for Amazon.) The lifecycle is roughly:

1. `import` it
2. Establish a resource
    `apiclient` lets you do this with the `apiclient.discovery.build()` or `.build_from_document()`
    In this way we go from the general client for all Google APIs to the specific client for Google Analytics
3. Interact with the resource
    Creating an instance of the Google Analytics client exposes methods that map to their RESTful API endpoints. In this tutorial we're going to interact with just one: `.batchGet`
    
The file `./bald_query.py`, transcribed below, follows an example of this lifecycle.

#### Running the example script.

You can run `bald_query.py` from Jupyter using a terminal instance. With a view id of `001` you call it like so:

    python bald_query.py 001
    
If you set everything up correctly, you should hopefully be seeing some JSON with data about your view.

Let's drill down on `get_report`:

Here we can see that `get_report` is just calling `analytics.reports().batchGet()`. This is a method exposed by the Google Analytics client, which maps onto the following API endpoint:

    POST https://analyticsreporting.googleapis.com/v4/reports:batchGet
    
The Google Analytics client handles authentication, but we still need to supply data for the HTTP POST request. In particular, the specification for this endpoint requires a [ReportRequest object](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet#ReportRequest) with a view id. That's why we're calling this a "bald" query -- it's as empty as we can make it.

The [batchGet docs](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet) have an API explorer that you can use to build more complex queries.

#### Parsing the response

Let's move on to parsing the JSON response. `import` the functions from `bald_query.py`

In [None]:
from bald_query import get_report, initialize_analyticsreporting

We also want the supporting libraries and environment variables

In [None]:
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials


SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
KEY_FILE_LOCATION = './client_secrets.json'

And we're ready to go!

In [None]:
VIEW_ID = "<insert your view id>"
analytics = initialize_analyticsreporting()
response = get_report(analytics, {
        'reportRequests': [
        {
          'viewId': VIEW_ID,
        }]
      })

In this next part we're going to use `pandas` to print this JSON response as a tabular dataset. `pandas` ships with JSON normalization utilities, and we'll use those here to flatten this response.

In [None]:
import pandas as pd
from pandas.io.json import json_normalize

def parse_data(response):
    reports = response['reports'][0]
    headers = reports.get('columnHeader')
    columnHeader = headers.get('dimensions', [])
    metricHeader = headers.get('metricHeader', {}).get('metricHeaderEntries', [])
    
    columns = columnHeader
    for metric in metricHeader:
        columns.append(metric['name'])

    data = json_normalize(reports['data']['rows'])
    if headers.get('dimensions'):
        data_dimensions = pd.DataFrame(data['dimensions'].tolist())
    data_metrics = pd.DataFrame(data['metrics'].tolist())
    data_metrics = data_metrics.applymap(lambda x: x['values'])
    data_metrics = pd.DataFrame(data_metrics[0].tolist())
    if headers.get('dimensions'):
        result = pd.concat([data_dimensions, data_metrics], axis=1, ignore_index=True)
    else:
        result = data_metrics
    result.columns = columns
    return result

df = parse_data(response)
df

This parser may look complicated, but it's simply a chain of `append` and `get` calls with a few `pandas` methods mixed in. Let's break it down:

1. Get the zeroth report

2. Get the headers

3. Flatten the headers

4. Parse the rows

**Note:** Here we use if/else control flow to handle the cases where the response doesn't include `dimensions` headers.

5. Finally, apply our headers

This parser (adapted from [here](https://stackoverflow.com/a/49359989/10553976)) lets us create a `pandas` object from our Google Analytics responses. It's purpose-built, brittle, and it could probably use some optimizations. But once we have a `pandas.DataFrame` object we can use it for any number of data wrangling, dataviz, and data analysis tasks. We can also export it to a file as simply as:

In [None]:
df = parse_data(response)
df.to_csv("my_google_analytics_data.csv")

The "correct" way to write the parser depends on the data and on what you need to do with the reponse data. However, you can get very far indeed with `.get` and `.append` calls on their own, and even further with a little bit of `pandas` knowledge mixed in.

#### Conclusion

In this tutorial you learned about how to pass queries to the Google Analytics API and to parse the responses. You learned about building specific resource clients the general-purpose API client from google. You learned about using the `getBatch` API endpoint, how to build custom queries for it, and you saw an example of how you can parse them. You also learned a little bit about tidying up JSON responses with `pandas`.

References:

[Reporting API v4](https://developers.google.com/analytics/devguides/reporting/core/v4/)

[Hello Analytics; Python Quickstart by Google](https://developers.google.com/analytics/devguides/reporting/core/v4/quickstart/service-py)

[Reference docs by googleapis@github.com](https://googleapis.github.io/google-api-python-client/docs/epy/index.html)