Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 New Source: Sentry #6975

Merged
merged 23 commits into from
Nov 2, 2021
Merged

🎉 New Source: Sentry #6975

merged 23 commits into from
Nov 2, 2021

Conversation

koji-m
Copy link
Contributor

@koji-m koji-m commented Oct 12, 2021

What

How

Added support for the following streams:

  • Events
  • Issues
  • ProjectDetail (for check_connection only)

Recommended reading order

  1. bootstrap.md
  2. spec.json
  3. Schemas under schemas/*.json
  4. source.py
  5. streams.py

Test Runs

Integration Test

screenshot 2021-10-12 23 33 24

Unit Test

screenshot 2021-10-12 21 46 03

Pre-merge Checklist

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/SUMMARY.md
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions
  • Connector added to connector index like described here

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here

@CLAassistant
Copy link

CLAassistant commented Oct 12, 2021

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Oct 12, 2021
@harshithmullapudi
Copy link
Contributor

@koji-m I will take a look at it tomorrow.

"organization": config["organization"],
"project": config["project"],
}
return [Events(**stream_args), Issues(**stream_args)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProjectDetail is not added here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added ProjectDetail and Projects.

Comment on lines 22 to 28
project_detail_stream = ProjectDetail(
authenticator=TokenAuthenticator(token=config["auth_token"]),
hostname=config.get("hostname", self.DEFAULT_HOST),
organization=config["organization"],
project=config["project"],
)
next(project_detail_stream.read_records(sync_mode=SyncMode.full_refresh))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any simple way to test if the token is valid ? Which is more generic ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added Projects stream for endpoint /api/0/projects/ and changed to use Projects stream to test if the token is valid.

Comment on lines 28 to 31
if "next" in response.links and "results" in response.links["next"] and "cursor" in response.links["next"]:
if response.links["next"]["results"] == "true":
return {"cursor": response.links["next"]["cursor"]}
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use try catch here and comment here on what can go wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed to handle errors with try-catch, and also added comment.



class Events(SentryStream):
primary_key = "id"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can keep this for SentryStream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved primary_key variable to SentryStream class.

@harshithmullapudi
Copy link
Contributor

Also can you run ./gradlew format

@koji-m
Copy link
Contributor Author

koji-m commented Oct 15, 2021

I've ran ./gradlew format.

screenshot 2021-10-16 1 26 10

Comment on lines 240 to 249
"required": [
"avatar",
"dateCreated",
"id",
"isEarlyAdopter",
"name",
"require2FA",
"slug",
"status"
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove these required keys

Comment on lines 114 to 132
"required": [
"authors",
"commitCount",
"data",
"dateCreated",
"dateReleased",
"deployCount",
"firstEvent",
"lastCommit",
"lastDeploy",
"lastEvent",
"newGroups",
"owner",
"projects",
"ref",
"shortVersion",
"url",
"version"
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove these required keys

Comment on lines 4 to 45
"required": [
"allowedDomains",
"avatar",
"color",
"dataScrubber",
"dataScrubberDefaults",
"dateCreated",
"defaultEnvironment",
"digestsMaxDelay",
"digestsMinDelay",
"features",
"firstEvent",
"hasAccess",
"id",
"isBookmarked",
"isInternal",
"isMember",
"isPublic",
"latestRelease",
"name",
"options",
"organization",
"platform",
"platforms",
"processingIssues",
"relayPiiConfig",
"resolveAge",
"safeFields",
"scrapeJavaScript",
"scrubIPAddresses",
"securityToken",
"securityTokenHeader",
"sensitiveFields",
"slug",
"status",
"storeCrashReports",
"subjectPrefix",
"subjectTemplate",
"team",
"teams",
"verifySSL"
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove these required keys. Same across the schemas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed required-key fields from all of the schemas.


# Source
class SourceSentry(AbstractSource):
DEFAULT_HOST = "sentry.io"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed here since we have already used default in the config? can't we directly get this from config?

Copy link
Contributor Author

@koji-m koji-m Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to set default value in code. I've fixed.

Comment on lines 36 to 37
except KeyError:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any change with this we enter a loop because is not possible to update the cursor? If it's expected to always have the values raise an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't update the cursor parameter in the request_params method, so I fixed it to do so.
And as it's expected to always have the results and cursor values in link header, I fixed to raise error if there is no values for that.

Comment on lines 51 to 52
class Events(SentryStream):
def __init__(self, organization: str, project: str, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add docs to each endpoint, see example:

class CustomersCart(IncrementalCartStream):
"""
Docs: https://developers.cart.com/docs/rest-api/restapi.json/paths/~1customers/get
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added docs to each endpoint.

Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments, but overall great work! @harshithmullapudi execute tests and makes sure it's passing CI (format and tests)

@avida
Copy link
Contributor

avida commented Oct 29, 2021

would lean towards simplification here and remove the need for the xWithHash() classes entirely by refactoring serialize() like so:

def serialize(value) -> str: 
     """
     Serialize python object composed of primitive types.
     The resultant serialized string is order agnostic, such that the ordering of dicts and lists at any level of nesting doesn't affect the return value as long as data content is equal.
     """ 
     if isinstance(value, Mapping): 
         return json.dumps({k: serialize(v) for k, v in value.items()}, sort_keys=True)
     if isinstance(value, List): 
         return json.dumps(sorted([serialize(v) for v in value]))
     return str(value)

That's how it was before Ive replaced it with those xWithHash classes to correctly log records with detailed_logger fixture. Changing it back would break

Do we rely on differentiation in order of lists at all that this would break?

No, different order records should not fail test. We had such defect in SAT #5735 but it was fixed .

If the order in the list of response makes sense, my change suggestion is invalid. If so, I have to think of other ways

No, it should work with any items order.

@koji-m you can find acceptance_tests_logs directory after SAT run with detaild log on each case. For test_sequential_reads you gonna need add those lines to print records from first and second run: https://github.com/airbytehq/airbyte/pull/7440/files#diff-9578c930b7e368f14eaeccbae67aa1808f87f2ea4d1b1d8202694fd6cda1348dR25

Dont forget to rebuild SAT image or just run SAT with command line: python -m pytest -p source_acceptance_test.plugin

Could you try it out and share detailed logs with failed case?

@avida
Copy link
Contributor

avida commented Oct 29, 2021

@koji-m Created an issue #7487 will work on it

@Phlair
Copy link
Contributor

Phlair commented Oct 29, 2021

That's how it was before Ive replaced it with those xWithHash classes to correctly log records with detailed_logger fixture. Changing it back would break

@avida why would this break logging records? I must be missing some understanding there (pointing me at relevant PR/code might be easiest answer 😄)

@avida
Copy link
Contributor

avida commented Oct 29, 2021

@avida why would this break logging records? I must be missing some understanding there (pointing me at relevant PR/code might be easiest answer smile)

@Phlair there is function in detailed_logger that dumps well formatted json into log:

logger.info(json.dumps(list(l), indent=1))

serialized raw string would be a pain to read or parse with diff tools. Btw Ive created PR #7491 where I replaced serialize function (cause we actually dont need to serialize objects, just to compare) . Please check it out.

@koji-m
Copy link
Contributor Author

koji-m commented Oct 29, 2021

@avida Thank you for your advice and how to isolate the problem. I tried to confirm difference between first and second read. But response data is so large to compare, so before that I tried small test as follow:

import json
import functools
from typing import List, Mapping

@functools.total_ordering
class DictWithHash(dict):

    _hash: str = None

    def __hash__(self):
        if not self._hash:
            self._hash = hash(json.dumps({k: serialize(v) for k, v in self.items()}, sort_keys=True))
        return self._hash

    def __lt__(self, other):
        return hash(self) < hash(other)

    def __eq__(self, other):
        return hash(self) == hash(other)

def serialize(value) -> str:
    """Simplify comparison of nested dicts/lists"""
    if isinstance(value, Mapping):
        return DictWithHash(value)
    if isinstance(value, List):
        return sorted([serialize(v) for v in value])
    return str(value)


first = {
    "organization": {
        "features": [
              "performance-tag-page",
              "issue-percent-filters",
        ]
   }
}

second = {
    "organization": {
        "features": [
             "issue-percent-filters",
             "performance-tag-page",
         ]
   }
}

print(f'first {hash(serialize(first))}')
print(f'second {hash(serialize(second))}')

This show different hash value as follow:

first 2562171057633847383
second 4625636057198759702

I think the two values ​​must be the same. There may be some mistakes in the content being verified, but I would be grateful if you could confirm it.

@avida
Copy link
Contributor

avida commented Oct 29, 2021

@koji-m Yes, it should be the same. Please check out my PR #7491, Ive added your case to unittests

@koji-m
Copy link
Contributor Author

koji-m commented Oct 29, 2021

I checked the test case, it looks good to me.

@keu
Copy link
Contributor

keu commented Oct 29, 2021

ok, to add to everything above, the original serialize didn't have this issue and was using simple technic that most of pytest plugin use to compare, serialize and pprint will show much better diff than raw dicts. @avida added file logger that supposed to be used manually for detailed investigation, that is why objects there is not serialized.

@harshithmullapudi
Copy link
Contributor

@koji-m is this resolved and are test cases passing in local ?

@koji-m
Copy link
Contributor Author

koji-m commented Nov 1, 2021

Not yet resolved, but if #7491 is merged, I would assume the test case will pass. I'll let you know when the test is complete.

@avida
Copy link
Contributor

avida commented Nov 1, 2021

@koji-m #7491 merged, please check it out

@koji-m
Copy link
Contributor Author

koji-m commented Nov 1, 2021

Thank you @avida .
@harshithmullapudi I merged master and passed source-acceptance-test in local. Would you check it out again?

@karinakuz
Copy link
Contributor

@harshithmullapudi harshithmullapudi merged commit c689594 into airbytehq:master Nov 2, 2021
@unity
Copy link

unity commented Nov 2, 2021

OneSignal is now a duplicate source, prevents usage.

- sourceDefinitionId: bb6afd81-87d5-47e3-97c4-e2c2901b1cf8

airbyte-server      | 2021-11-02 15:10:39 ERROR i.a.s.e.UncaughtExceptionMapper(toResponse):22 - {workspace_app_root=/tmp/workspace/server/logs} - Uncaught exception
airbyte-server      | java.lang.IllegalArgumentException: Multiple records have the name: OneSignal

@koji-m
Copy link
Contributor Author

koji-m commented Nov 2, 2021

Sorry for the omission in the confirmation when merging master into this PR. I'll thoroughly check for conflicts.
#7565

@koji-m koji-m deleted the 5692-new-source-sentry branch November 3, 2021 03:24
lmossman pushed a commit that referenced this pull request Nov 3, 2021
* add Events, Issues, ProjectDetail stream

* add P/R number

* add SUMMARY entry

* add docs/integrations/README.md entry

* add source_definitions.yaml entry

* add connector JSON definition

* add builds.md entry

* SentryStream keeps primary_key

* add Projects stream

* change stream for connection checking

* handling errors with try-catch in next_page_token function

* remove required key field from schemas

* remove DEFAULT_HOST

* raise error if link header don't have mandatory field

* fix unit test for streams

* update cursor for pagination

* add docs to each endpoint

* add hostname property to invalid_config

* fix schema

* add hostname to sample_config
schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
* add Events, Issues, ProjectDetail stream

* add P/R number

* add SUMMARY entry

* add docs/integrations/README.md entry

* add source_definitions.yaml entry

* add connector JSON definition

* add builds.md entry

* SentryStream keeps primary_key

* add Projects stream

* change stream for connection checking

* handling errors with try-catch in next_page_token function

* remove required key field from schemas

* remove DEFAULT_HOST

* raise error if link header don't have mandatory field

* fix unit test for streams

* update cursor for pagination

* add docs to each endpoint

* add hostname property to invalid_config

* fix schema

* add hostname to sample_config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation bounty community connectors/source/sentry
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

New Source: Sentry