Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 New Source: Amplitude #3664

Merged
merged 14 commits into from
Jun 8, 2021
Merged

Conversation

yevhenii-ldv
Copy link
Contributor

What

closes #1457

How

Describe the solution

Pre-merge Checklist

  • Run integration tests
  • Publish Docker images

Recommended reading order

  1. test.java
  2. component.ts
  3. the rest

@yevhenii-ldv
Copy link
Contributor Author

yevhenii-ldv commented May 27, 2021

/test connector=source-amplitude

🕑 source-amplitude https://github.com/airbytehq/airbyte/actions/runs/882536391
❌ source-amplitude https://github.com/airbytehq/airbyte/actions/runs/882536391


MAIN_REQUIREMENTS = [
"airbyte-cdk~=0.1",
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were all requirements included here? E.g, there are pendulum and request importing in api.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These libraries are installed together with airbyte-cdk

# Unable to use 'state_path' because Amplitude returns an error when specifying a date in the future.
# state_path: "integration_tests/abnormal_state.json"
cursor_paths:
events: [ "event_time" ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two more incremental streams. Should cursor path be set for them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it is not at all necessary to specify "cursor_paths" here if it is specified in the directory, however, there is a bug and we must specify the cursor for at least one stream for the test to work.

yield from respose_data.get(self.name, [])

def path(self, **kwargs) -> str:
return f"/{self.api_version}/{self.name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, version should be a part of base url.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconvenient because different versions are used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return f"/{self.api_version}/{self.name}"
@property
def url_base(self) -> str:
return f"https://amplitude.com/api/{self.api_version}/"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about this?


class Cohorts(AmplitudeStream):
primary_key = "id"
api_version = 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why several versions are used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used documentation for Amplitude, and there, different versions are used for different streams:

  1. All Cohorts - version 3
  2. Active User Counts - version 2
  3. Export Events - version 2

Copy link
Contributor Author

@yevhenii-ldv yevhenii-ldv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the stream Events from the tests, because there is too much data on it. I tried to run it locally and the test only passes if the start_date is 1 or 2 days ago (no more).
Otherwise, the test crashes due to timeout or insufficient memory (since it stores data for comparison).

@yevhenii-ldv
Copy link
Contributor Author

yevhenii-ldv commented May 28, 2021

/test connector=source-amplitude

🕑 source-amplitude https://github.com/airbytehq/airbyte/actions/runs/885439055
✅ source-amplitude https://github.com/airbytehq/airbyte/actions/runs/885439055

@vitaliizazmic
Copy link
Contributor

LGTM only small change

@yevhenii-ldv
Copy link
Contributor Author

yevhenii-ldv commented Jun 1, 2021

/test connector=source-amplitude

🕑 source-amplitude https://github.com/airbytehq/airbyte/actions/runs/895528115
✅ source-amplitude https://github.com/airbytehq/airbyte/actions/runs/895528115

ykurochkin added 2 commits June 1, 2021 12:45
…/new-amplitude-connector

� Conflicts:
�	docs/integrations/connector-health.md
�	tools/bin/ci_credentials.sh
@yevhenii-ldv
Copy link
Contributor Author

yevhenii-ldv commented Jun 1, 2021

/test connector=source-amplitude

🕑 source-amplitude https://github.com/airbytehq/airbyte/actions/runs/895562197
✅ source-amplitude https://github.com/airbytehq/airbyte/actions/runs/895562197

@htrueman
Copy link
Contributor

htrueman commented Jun 1, 2021

LGTM

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! few questions

return params


class Events(IncrementalAmplitudeStream):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are events returned in ascending order of the cursor field? if so, why don't we checkpoint using checkpoint_interval? This would allow the connector to resume where it left off even if it failed halfway through the sync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question for other streams BTW

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this would happen automatically if we override the stream_slices method. Because you are doing time slicing here, it's a good fit for slicing and keeps track of state automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used state_checkpoint_interval, it was equal to 100 for all Incremental streams, but now I slightly corrected, for all I set 10, and for Events I set 1000 (since there are quite a few entries)

def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]:
response_data = response.json().get("data", [])
if response_data:
series = response_data["series"][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we getting the first index in the series list? Can you add a comment explaining it? It's not obvious from the API docs there will be only one record I think https://developers.amplitude.com/docs/dashboard-rest-api#average-session-length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment

ykurochkin added 2 commits June 4, 2021 16:41
…/new-amplitude-connector

� Conflicts:
�	airbyte-integrations/builds.md
�	docs/SUMMARY.md
@yevhenii-ldv
Copy link
Contributor Author

yevhenii-ldv commented Jun 4, 2021

/test connector=source-amplitude

🕑 source-amplitude https://github.com/airbytehq/airbyte/actions/runs/906457173
✅ source-amplitude https://github.com/airbytehq/airbyte/actions/runs/906457173

@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Jun 8, 2021
@yevhenii-ldv
Copy link
Contributor Author

yevhenii-ldv commented Jun 8, 2021

/publish connector=connectors/source-amplitude

🕑 connectors/source-amplitude https://github.com/airbytehq/airbyte/actions/runs/917468801
✅ connectors/source-amplitude https://github.com/airbytehq/airbyte/actions/runs/917468801

@yevhenii-ldv
Copy link
Contributor Author

yevhenii-ldv commented Jun 8, 2021

/test connector=source-amplitude

🕑 source-amplitude https://github.com/airbytehq/airbyte/actions/runs/917487410
✅ source-amplitude https://github.com/airbytehq/airbyte/actions/runs/917487410

@yevhenii-ldv yevhenii-ldv merged commit a90e5f0 into master Jun 8, 2021
@yevhenii-ldv yevhenii-ldv deleted the ykurochkin/new-amplitude-connector branch June 8, 2021 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Source: Amplitude
5 participants