Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Framework v0.1 #2634

Merged
merged 5 commits into from
Apr 7, 2021
Merged

Framework v0.1 #2634

merged 5 commits into from
Apr 7, 2021

Conversation

sherifnada
Copy link
Contributor

@sherifnada sherifnada commented Mar 29, 2021

closes #2647
closes #2651

Alpha version of the source framework.

Main additions:

Current features of HTTP stream:

  • auth:
    • API-token based auth
  • two kinds of out-of-the-box rate limiting: static and dynamic (determined at runtime potentially by inspecting response)
  • support for generating N requests per stream e.g: for making one request per day like in exchangeratesapi
  • support for arbitrary nesting of child streams. Just pass child=<stream> and everything (mostly) wires up correctly.
  • ability to customize all request parameters based on context (parent streams, current request context, etc..)

Reading Order

  1. source_stripe/source.py for an example source (this could replace the singer Stripe source)
  2. abstract_source.py for how things all come together
  3. streams/core.py for the core stream interface
  4. streams/http.py
  5. auth/core.py and auth/simple.py for a basic idea of the oauth stuff
  6. streams/rate_limiting.py
  7. everything else

@sherifnada sherifnada marked this pull request as draft March 29, 2021 07:55
@sherifnada sherifnada changed the base branch from master to mt-integration-fmk March 29, 2021 07:55
@sherifnada sherifnada changed the title wip sdk wip santa Mar 31, 2021
@sherifnada sherifnada changed the title wip santa Framework v0.1 Mar 31, 2021
@sherifnada sherifnada marked this pull request as ready for review April 1, 2021 00:15
@auto-assign auto-assign bot requested review from cgardens and davinchia April 1, 2021 00:15
Copy link
Contributor

@jrhizor jrhizor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State handling is a bit confusing
The responsibilities of each method for updating/outputting doesn't feel obvious while reading through the code.

Typing
There are a lot of Dicts everywhere and it makes it hard to read and very comment-dependent to understand the code. I feel like a lot of this could be improved by aliasing (configs, different JSON-style dicts, etc).

Performance
How does the performance of this compare to the singer Stripe source? I'd like to make sure the out-of-the-box behavior is comparable.

airbyte-integrations/connectors/source-oracle/README.md Outdated Show resolved Hide resolved
:param response:
:return: An iterable containing the parsed response
"""
yield [response]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels strange to need to override these different methods when really we're trying to go from requests.Response -> Iterable[Dict].

Maybe we could use types to show how these are intended to chain together?

"""
return response.status_code == 429 or 500 <= response.status_code < 600

def backoff_time(self, response: requests.Response) -> Optional[float]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should specify behavior for a backoff instead of just allowing time? Like attempt number, response, and then allow calculating time? Then someone could do exp. backoff scaled their own way for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it feels like this should be a Duration equivalent or milliseconds instead of seconds-level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in person. Sounds like this will move towards custom decorators/handlers instead of these function overrides.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw this might be a lot cleaner if we can get this merged: litl/backoff#122

def get_updated_state(self, current_state: Mapping[str, Any], latest_record: Mapping[str, Any]):
"""
Inspect the latest record extracted from the data source and the current state object and return an updated state object.
It is safe to mutate the input state object and return it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you want to allow that? We shouldn't encourage that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

especially taking into account that Mapping is immutable (for modification there is MutableMapping)
also return type is missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the alternative is to copy the input state every time (or to create a new state object in this method) which seems not great for performance?

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! added a few thoughts. i'm going to just leave a comment since you are not lacking for reviewers on this one!


class IncrementalStream(Stream, ABC):
@property
def cursor_field(self) -> Union[str, List[str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@property
def cursor_field(self) -> Union[str, List[str]]:
"""
Override to return the name of the default cursor field used by this stream e.g: an API entity might always use created_at as the cursor field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this method be called default_cursor_field? it sounds like this is only populated if the source is setting the cursor field if i'm reading the comment correctly. or maybe call this method source_defined_cursor and then the method be low is just has_source_defined_cursor and default impl can just check if this method returns null / empty array

Copy link
Contributor

@keu keu Apr 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is either cursor_field or not, what makes it default_cursor_field? If there is default_cursor_field then there should be actual/current_cursor_field ?
also what is the case when we need additional property source_defined_cursor?
isn't cursor_field self sufficient? i.e. it is either null/[] or str/[str]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cursor_field in the general case could be chosen by a user. For example when replicating data from a SQL database, you could choose to replicate based on created_at or updated_at. In this case, because the cursor is configurable by the user, source_defined_cursor would return False even though the cursor field is defined.

Copy link
Contributor

@keu keu Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think base abstraction should include this logic of setting custom cursor field
  2. I think the name source_defined_cursor could be confusing, maybe something like cursor_field_overriden or has_custom_cursor_field?
  3. If we have cursor_field and default_cursor_field, wouldn't source_defined_cursor implementation just compare if one is equal to another? Also I still don't see the case when the stream might need this information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Added this
  2. Agreed with you. Unfortunately currently this is at the protocol level so changing this is a heavy lift because it would change all connectors, the UI, and the core API. To make this better I defaulted all HTTP streams to True on this field so the user doesn't need to think about this (most HTTP sources have a static cursor field anyways unlike e.g: databases)

@property
def cursor_field(self) -> Union[str, List[str]]:
"""
Override to return the name of the default cursor field used by this stream e.g: an API entity might always use created_at as the cursor field.
Copy link
Contributor

@keu keu Apr 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is either cursor_field or not, what makes it default_cursor_field? If there is default_cursor_field then there should be actual/current_cursor_field ?
also what is the case when we need additional property source_defined_cursor?
isn't cursor_field self sufficient? i.e. it is either null/[] or str/[str]

@sherifnada sherifnada changed the base branch from mt-integration-fmk to master April 7, 2021 07:09
@sherifnada sherifnada merged commit 94093f0 into master Apr 7, 2021
@sherifnada sherifnada deleted the sherif/fork-magibyte branch April 7, 2021 07:21
@michel-tricot
Copy link
Contributor

🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

santa: correctly populate catalog for incremental streams santa: HTTP Abstraction stress test
6 participants