Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Google Analytics Data API: slicer updated, unit tests added #21169

Merged
merged 17 commits into from
Jan 11, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -626,7 +626,7 @@
documentationUrl: https://docs.airbyte.com/integrations/sources/google-analytics-v4
icon: google-analytics.svg
sourceType: api
releaseStage: alpha
releaseStage: beta
- name: Google Directory
sourceDefinitionId: d19ae824-e289-4b14-995a-0632eb46d246
dockerRepository: airbyte/source-google-directory
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,5 @@ COPY source_google_analytics_data_api ./source_google_analytics_data_api
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.1.0
LABEL io.airbyte.version=0.1.1
LABEL io.airbyte.name=airbyte/source-google-analytics-data-api
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,13 @@

from setuptools import find_packages, setup

MAIN_REQUIREMENTS = ["airbyte-cdk~=0.1", "google-analytics-data==0.11.2", "PyJWT==2.4.0", "cryptography==37.0.4", "requests==2.28.1"]
MAIN_REQUIREMENTS = ["airbyte-cdk~=0.16", "PyJWT==2.4.0", "cryptography==37.0.4", "requests==2.28.1"]

TEST_REQUIREMENTS = [
"freezegun",
"pytest~=6.1",
"pytest-mock~=3.6.1",
"requests-mock~=1.9",
"source-acceptance-test",
]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -264,19 +264,15 @@ def stream_slices(
else:
start_date = self.config["date_ranges_start_date"]

timedelta: int = self.config["window_in_days"]

while start_date <= today:
end_date: datetime.date = start_date + datetime.timedelta(days=timedelta)
if timedelta > 1 and end_date > today:
end_date: datetime.date = start_date + datetime.timedelta(days=timedelta - (end_date - today).days)

if self._stop_iteration:
return

yield {"startDate": utils.date_to_string(start_date), "endDate": utils.date_to_string(end_date)}

start_date: datetime.date = end_date + datetime.timedelta(days=1)
yield {
"startDate": utils.date_to_string(start_date),
"endDate": utils.date_to_string(min(start_date + datetime.timedelta(days=self.config["window_in_days"] - 1), today)),
}
start_date += datetime.timedelta(days=self.config["window_in_days"])


class GoogleAnalyticsDataApiMetadataStream(GoogleAnalyticsDataApiAbstractStream):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,6 @@
"required": ["property_id", "date_ranges_start_date"],
"additionalProperties": true,
"properties": {
"property_id": {
"type": "string",
"title": "Property ID",
"description": "A Google Analytics GA4 property identifier whose events are tracked. Specified in the URL path and not the body",
"order": 1
},
"credentials": {
"order": 0,
"type": "object",
Expand All @@ -33,7 +27,6 @@
"title": "Client ID",
"type": "string",
"description": "The Client ID of your Google Analytics developer application.",
"airbyte_secret": true,
"order": 1
},
"client_secret": {
Expand Down Expand Up @@ -76,12 +69,19 @@
"examples": [
"{ \"type\": \"service_account\", \"project_id\": YOUR_PROJECT_ID, \"private_key_id\": YOUR_PRIVATE_KEY, ... }"
],
"airbyte_secret": true
"airbyte_secret": true,
"order": 1
}
}
}
]
},
"property_id": {
"type": "string",
"title": "Property ID",
"description": "A Google Analytics GA4 property identifier whose events are tracked. Specified in the URL path and not the body",
"order": 1
},
"date_ranges_start_date": {
"type": "string",
"title": "Start Date",
Expand All @@ -93,7 +93,7 @@
"order": 3,
"type": "string",
"title": "Custom Reports",
"description": "A JSON array describing the custom reports you want to sync from Google Analytics. See <a href=\"https://docs.airbyte.com/integrations/sources/google-analytics-v4#data-processing-latency\">the docs</a> for more information about the exact format you can use to fill out this field."
"description": "A JSON array describing the custom reports you want to sync from Google Analytics. See <a href=\"https://docs.airbyte.com/integrations/sources/google-analytics-v4/#custom-reports\">the docs</a> for more information about the exact format you can use to fill out this field."
},
"window_in_days": {
"type": "integer",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#
# Copyright (c) 2022 Airbyte, Inc., all rights reserved.
#

import requests
from freezegun import freeze_time
from source_google_analytics_data_api.authenticator import GoogleServiceKeyAuthenticator


@freeze_time("2023-01-01 00:00:00")
def test_token_rotation(requests_mock):
credentials = {
"client_email": "client_email",
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIBVQIBADANBgkqhkiG9w0BAQEFAASCAT8wggE7AgEAAkEA3slcXL+dA36ESmOi\n1xBhZmp5Hn0WkaHDtW4naba3plva0ibloBNWhFhjQOh7Ff01PVjhT4D5jgqXBIgc\nz9Gv3QIDAQABAkEArlhYPoD5SB2/O1PjwHgiMPrL1C9B9S/pr1cH4vPJnpY3VKE3\n5hvdil14YwRrcbmIxMkK2iRLi9lM4mJmdWPy4QIhAPsRFXZSGx0TZsDxD9V0ZJmZ\n0AuDCj/NF1xB5KPLmp7pAiEA4yoFox6w7ql/a1pUVaLt0NJkDfE+22pxYGNQaiXU\nuNUCIQCsFLaIJZiN4jlgbxlyLVeya9lLuqIwvqqPQl6q4ad12QIgS9gG48xmdHig\n8z3IdIMedZ8ZCtKmEun6Cp1+BsK0wDUCIF0nHfSuU+eTQ2qAON2SHIrJf8UeFO7N\nzdTN1IwwQqjI\n-----END PRIVATE KEY-----\n",
"client_id": "client_id"
}
authenticator = GoogleServiceKeyAuthenticator(credentials)

auth_request = requests_mock.register_uri(
"POST",
authenticator._google_oauth2_token_endpoint,
json={"access_token": "bearer_token", "expires_in": 3600}
)

authenticated_request = authenticator(requests.Request())
assert auth_request.call_count == 1
assert auth_request.last_request.qs.get("assertion") == ['eyj0exaioijkv1qilcjhbgcioijsuzi1niisimtpzci6imnsawvudf9pzcj9.eyjpc3mioijjbgllbnrfzw1hawwilcjzy29wzsi6imh0dhbzoi8vd3d3lmdvb2dszwfwaxmuy29tl2f1dggvyw5hbhl0awnzlnjlywrvbmx5iiwiyxvkijoiahr0chm6ly9vyxv0adiuz29vz2xlyxbpcy5jb20vdg9rzw4ilcjlehaioje2nzi1mzq4mdasimlhdci6mty3mjuzmtiwmh0.u1gpfmncrtlsy_ujxpc2iazpvdzb6eq4mobq3xez5v6gqtj0xgou__c6neu9d7qvb8h0jkynggsfibkoci_g7a']
assert auth_request.last_request.qs.get("grant_type") == ["urn:ietf:params:oauth:grant-type:jwt-bearer"]
assert authenticator._token.get("expires_at") == 1672534800
assert authenticated_request.headers.get("Authorization") == "Bearer bearer_token"
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from unittest.mock import MagicMock

import pytest
from freezegun import freeze_time
from source_google_analytics_data_api.source import GoogleAnalyticsDataApiBaseStream

json_credentials = """
Expand Down Expand Up @@ -271,3 +272,34 @@ def test_backoff_time(patch_base_class):
stream = GoogleAnalyticsDataApiBaseStream(authenticator=MagicMock(), config=patch_base_class["config"])
expected_backoff_time = None
assert stream.backoff_time(response_mock) == expected_backoff_time


@freeze_time("2023-01-01 00:00:00")
def test_stream_slices():
config = {"date_ranges_start_date": datetime.date(2022, 12, 29), "window_in_days": 1}
stream = GoogleAnalyticsDataApiBaseStream(authenticator=None, config=config)
slices = list(stream.stream_slices(sync_mode=None))
assert slices == [
{"startDate": "2022-12-29", "endDate": "2022-12-29"},
{"startDate": "2022-12-30", "endDate": "2022-12-30"},
{"startDate": "2022-12-31", "endDate": "2022-12-31"},
{"startDate": "2023-01-01", "endDate": "2023-01-01"},
]

config = {"date_ranges_start_date": datetime.date(2022, 12, 28), "window_in_days": 2}
stream = GoogleAnalyticsDataApiBaseStream(authenticator=None, config=config)
slices = list(stream.stream_slices(sync_mode=None))
assert slices == [
{"startDate": "2022-12-28", "endDate": "2022-12-29"},
{"startDate": "2022-12-30", "endDate": "2022-12-31"},
{"startDate": "2023-01-01", "endDate": "2023-01-01"},
]

config = {"date_ranges_start_date": datetime.date(2022, 12, 20), "window_in_days": 5}
stream = GoogleAnalyticsDataApiBaseStream(authenticator=None, config=config)
slices = list(stream.stream_slices(sync_mode=None))
assert slices == [
{"startDate": "2022-12-20", "endDate": "2022-12-24"},
{"startDate": "2022-12-25", "endDate": "2022-12-29"},
{"startDate": "2022-12-30", "endDate": "2023-01-01"},
]
65 changes: 46 additions & 19 deletions docs/integrations/sources/google-analytics-v4.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,28 +6,21 @@ This connector supports GA4 properties through the [Analytics Data API v1](https

## Prerequisites

* JSON credentials for the service account that has access to Google Analytics. For more details check [instructions](https://support.google.com/analytics/answer/1009702#zippy=%2Cin-this-article)
* JSON credentials for the service account that has access to Google Analytics. For more details check [instructions](https://support.google.com/analytics/answer/1009702)
* OAuth 2.0 credentials for the service account that has access to Google Analytics
* Property ID
* Date Range Start Date
* Start Date
* Custom Reports (Optional)
* Data request time increment in days (Optional)

## Custom reports

* Support for multiple custom reports
* Custom reports in format `[{"name": "<report-name>", "dimensions": ["<dimension-name>", ...], "metrics": ["<metric-name>", ...]}]`
* Custom report format when using segments and / or filters `[{"name": "<report-name>", "dimensions": ["<dimension-name>", ...], "metrics": ["<metric-name>", ...], "segments": ["<segment-id-or-dynamic-segment-v3-format]", filter: "<filter-definition-v3-format>"}]`
* When using segments, make sure you add the `ga:segment` dimension.
* Custom reports: [Dimensions and metrics explorer](https://ga-dev-tools.web.app/dimensions-metrics-explorer/)

## Step 1: Set up Source

### Create a Service Account

First, you need to select existing or create a new project in the Google Developers Console:

1. Sign in to the Google Account you are using for Google Analytics as an admin.
2. Go to the [Service accounts page](https://console.developers.google.com/iam-admin/serviceaccounts).
2. Go to the [Service Accounts](https://console.developers.google.com/iam-admin/serviceaccounts) page.
3. Click `Create service account`.
4. Create a JSON key file for the service user. The contents of this file will be provided as the `credentials_json` in the UI when authorizing GA after you grant permissions \(see below\).

Expand All @@ -42,18 +35,51 @@ Use the service account email address to [add a user](https://support.google.com

### Property ID

Specify the Property ID as set [here](https://analytics.google.com/analytics/web/a54907729p153687530/admin/property/settings)
To determine a Google Analytics 4 [Property ID](https://developers.google.com/analytics/devguides/reporting/data/v1/property-id#what_is_my_property_id)

### Step 2: Set up the Google Analytics connector in Airbyte

## Step 2: Set up the source connector in Airbyte
**For Airbyte Cloud:**

Set the required fields in the Google Analytics Data API connector page such as the JSON credentials, property ID,
custom reports, date ranges start date, data request time increment in days.
1. [Login to your Airbyte Cloud](https://cloud.airbyte.io/workspaces) account.
2. In the left navigation bar, click **Sources**. In the top-right corner, click **+ new source**.
3. On the source setup page, select **Google Analytics 4 (GA4)** from the Source type dropdown and enter a name for this connector.
4. Click `Authenticate your account` by selecting Oauth or Service Account for Authentication.
5. Log in and Authorize the Google Analytics account.
6. Enter the **Property ID** whose events are tracked.
7. Enter the **Start Date** from which to replicate report data in the format YYYY-MM-DD.
8. Enter the **Custom Reports (Optional)** a JSON array describing the custom reports you want to sync from Google Analytics.
9. Enter the **Data request time increment in days (Optional)**. The bigger this value is, the faster the sync will be, but the more likely that sampling will be applied to your data, potentially causing inaccuracies in the returned results. We recommend setting this to 1 unless you have a hard requirement to make the sync faster at the expense of accuracy. The minimum allowed value for this field is 1, and the maximum is 364.

## Supported sync modes

The Google Analytics source connector supports the following [sync modes](https://docs.airbyte.com/cloud/core-concepts#connection-sync-modes):
- Full Refresh
- Incremental

- [Full Refresh - Overwrite](https://docs.airbyte.com/understanding-airbyte/glossary#full-refresh-sync)
- [Full Refresh - Append](https://docs.airbyte.com/understanding-airbyte/connections/full-refresh-append)
- [Incremental - Append](https://docs.airbyte.com/understanding-airbyte/connections/incremental-append)
- [Incremental - Deduped History](https://docs.airbyte.com/understanding-airbyte/connections/incremental-deduped-history)

## Supported Streams

This connector outputs the following incremental streams:

* [daily_active_users](https://developers.google.com/analytics/devguides/reporting/data/v1/rest/v1beta/properties/runReport)
* [devices](https://developers.google.com/analytics/devguides/reporting/data/v1/rest/v1beta/properties/runReport)
* [four_weekly_active_users](https://developers.google.com/analytics/devguides/reporting/data/v1/rest/v1beta/properties/runReport)
* [locations](https://developers.google.com/analytics/devguides/reporting/data/v1/rest/v1beta/properties/runReport)
* [pages](https://developers.google.com/analytics/devguides/reporting/data/v1/rest/v1beta/properties/runReport)
* [traffic_sources](https://developers.google.com/analytics/devguides/reporting/data/v1/rest/v1beta/properties/runReport)
* [website_overview](https://developers.google.com/analytics/devguides/reporting/data/v1/rest/v1beta/properties/runReport)
* [weekly_active_users](https://developers.google.com/analytics/devguides/reporting/data/v1/rest/v1beta/properties/runReport)

## Custom reports

* Support for multiple custom reports
* Custom reports in format `[{"name": "<report-name>", "dimensions": ["<dimension-name>", ...], "metrics": ["<metric-name>", ...]}]`
* Custom report format when using segments and / or filters `[{"name": "<report-name>", "dimensions": ["<dimension-name>", ...], "metrics": ["<metric-name>", ...], "segments": ["<segment-id-or-dynamic-segment-v3-format]", filter: "<filter-definition-v3-format>"}]`
* When using segments, make sure you add the `ga:segment` dimension.
* Custom reports: [Dimensions and metrics explorer](https://ga-dev-tools.web.app/dimensions-metrics-explorer/)

## Rate Limits & Performance Considerations \(Airbyte Open-Source\)

Expand All @@ -63,13 +89,14 @@ The Google Analytics source connector supports the following [sync modes](https:

# Reports

The reports are custom by setting the dimensions and metrics required. To support Incremental sync, the `uuid` field is
added by default to any report. There are 8 default reports. To add more reports, you need to specify the `custom reports` field.
The reports are custom by setting the dimensions and metrics required. To support Incremental sync, the `date` dimension is
added by default to all reports. There are 8 default reports. To add more reports, you need to specify the `custom reports` field.

## Changelog

| Version | Date | Pull Request | Subject |
|:--------|:-----------|:---------------------------------------------------------|:---------------------------------------------------|
| 0.1.1 | 2023-01-10 | [21169](https://github.com/airbytehq/airbyte/pull/21169) | Slicer updated, unit tests added |
| 0.1.0 | 2023-01-08 | [20889](https://github.com/airbytehq/airbyte/pull/20889) | Improved config validation, SAT |
| 0.0.3 | 2022-08-15 | [15229](https://github.com/airbytehq/airbyte/pull/15229) | Source Google Analytics Data Api: code refactoring |
| 0.0.2 | 2022-07-27 | [15087](https://github.com/airbytehq/airbyte/pull/15087) | fix documentationUrl |
Expand Down