Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch specs from definitions directly #7293

Merged
merged 15 commits into from
Oct 25, 2021

Conversation

lmossman
Copy link
Contributor

@lmossman lmossman commented Oct 22, 2021

What

Resolves #7136, though a slightly different approach was taken than what was described in the issue, as explained below.

Refactors the SpecFetcher to take in Source and Destination Definitions and attempt to fetch the spec from those structs directly, before falling back to the scheduler job client.

How

Source and Destination definitions are almost always retrieved from the database before the spec fetcher is called, so this change takes advantage of that behavior by allowing those definitions to be passed into the spec fetcher, where it attempts to read the spec from them directly.

Another advantage of this is that the logic for resolving the docker image name from a source/destination definition is now centralized in the spec fetcher and is only performed when the definition doesn't contain a spec, reducing some repeated code.

Recommended reading order

  1. SpecFetcher.java
  2. Other non-test classes, which use the SpecFetcher
  3. Test classes

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/SUMMARY.md
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions
  • Connector added to connector index like described here

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions
  • Connector version bumped like described here

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here

Connector Generator

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.

@lmossman lmossman changed the title Lmossman/fetch specs from definitions fetch specs from definitions directly Oct 22, 2021
@lmossman
Copy link
Contributor Author

I'm not sure why the Build steps are failing, both the Connectors Base and Platform builds finish successfully for me locally.

Looks like the Platform build failure is related to this:

 gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.7m -c regex_3/_regex.c -o build/temp.linux-x86_64-3.7/regex_3/_regex.o
    unable to execute 'gcc': No such file or directory
    error: command 'gcc' failed with exit status 1

which looks like it is also occurring in this PR: #7288 (comment)

Copy link
Contributor

@benmoriceau benmoriceau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, I lack context and knowledge of this app to feel confident to approve it.

// and
// specific to a job id, allowing us to do this.
// attempts ids are monotonically increasing starting from
// 0 and specific to a job id, allowing us to do this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: merge the 2 comment lines.

final SynchronousResponse<ConnectorSpecification> response = specFetcher.getSpecJobResponse(sourceDefinition);

assertEquals(ConfigType.GET_SPEC, response.getMetadata().getConfigType());
assertEquals(Optional.empty(), response.getMetadata().getConfigId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is really up to you especially on an existing test, I don't want to sound too pushy on that. You could use assertJ here Assertions.assertThat(response.getMetadata().getConfigId()).isEmpty()

Copy link
Contributor Author

@lmossman lmossman Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benmoriceau Does using assertJ here result in any meaningful benefit? I'm a little bit hesitant to change this test to use assertJ because the jupiter assertions are used across all of our other tests, and it seems slightly worse to me to use 2 different assertion libraries across the codebase.

I think if we make a decision as an org to switch to using assertJ going forward, then I would be happy to make this change. Maybe we can try to discuss it during a company meeting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a clear benefits in testing exception. It also provides a better display and interface for build-in types such as Optional or collection. This article show some example of the benefits: https://annaduldiier.medium.com/assertj-vs-junit-483b7d6dc997 even if some part of the article are discutable (like the auto-completion).

I don't think that having 2 assertions framework is an issue especially since in this case the static method are different, there is a very small probability of mismatching one for another one.

Copy link
Contributor

@cgardens cgardens Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benmoriceau this seems worth bringing up with the team in the weekly platform sync! help familiarize people with it and get some buy in!

now,
now,
true,
null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could you add named variable for thenull and true value, it makes it easier to review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added these named variables to the new static method that I added to the SynchronousJobMetadata class for generating this mocked object 👍

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I have some comments below. lmk if you want to follow up on anything. If it all makes sense feel free to merge it after you've address them all. A very good way to end your first week!

return getSpecFromJob(getSpecJobResponse(destinationDefinition));
}

public SynchronousResponse<ConnectorSpecification> getSpecJobResponse(final StandardSourceDefinition sourceDefinition) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how annoying! I see that usage in SchedulerHandler that is causing this. can you add a todo that says when we have moved the spec into the db as a required field we should get rid of the need for this?

in case it's not clear, all jobs the require spinning up a docker container we use this SynchronousResponse struct to help pass through metadata around what happened in the job and logs. Once we can guarantee that for spec that we will not be spinning up a docker container, we can just remove this part from the code path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you pointed out, the main reason I added this was because the SourceDefinitionSpecificationRead struct returned by the SchedulerHandler method (which is in turned returned by the ConfigurationApi) contains a required jobInfo field, so the method I added mocked out that job info when just getting the spec from the db.

Therefore it seems to me that as part of the change to make the spec field required/guaranteed on the db struct, we should also remove the jobInfo field from the SourceDefinitionSpecificationRead struct. Does this align with what you are suggesting here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually think the code you've written here is good. i'm not suggesting we change anything. just want to add a comment saying that we want to get rid of this part of the public interface in the future.

you're right, we will want to remove jobinfo from SourceDefinitionSpecificationRead, but we should wait to do it until we have the guarantee that we will only ever fetch the spec from the db. the reality is that while it is still possible to pull specs from the docker image, it is still helpful to keep the jobinfo because it includes logs to give clues as to what the problems are!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree! I meant that we should only remove jobInfo once we have that guarantee, and I agree that it makes sense to keep it for now 👍

private final SynchronousSchedulerClient schedulerJobClient;

public SpecFetcher(final SynchronousSchedulerClient schedulerJobClient) {
this.schedulerJobClient = schedulerJobClient;
}

public ConnectorSpecification execute(final String dockerImage) throws IOException {
public ConnectorSpecification getSpec(final String dockerImage) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the only reason we need to keep this method are 2 usages.

  1. ServerApp.java line 222, which we know we are going to get rid of when we get rid of the file migrations
  2. DockerImageValidator - left a comment below for how we might be able to remove this usage.

can we add a todo, reminding us that we want to kill this method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason i'm so focused on this method is because it has a different behavior that is non-obvious from the caller's point of view. the other methods all will try to find the spec in the definition itself. this one will never do that and will always start calling scheduler clients. so i want to advertise loudly that it is different and shouldn't be used. in fact, we should mark it as deprecated.

@cgardens
Copy link
Contributor

cgardens commented Oct 23, 2021

Oh. as for the build, I'm a little unsure what's going on. It looks like it's failing on something related to python. I just merged a PR that should totally remove the need for the platform build to need python at all. So maybe try rebasing on master and seeing if that helps? I'd like to avoid python oddities if we can just make them irrelevant instead 😉

@cgardens cgardens added area/platform issues related to the platform and removed area/core labels Oct 23, 2021
@lmossman lmossman force-pushed the lmossman/fetch-specs-from-definitions branch from 57e5316 to b4438be Compare October 25, 2021 17:38
@lmossman lmossman temporarily deployed to more-secrets October 25, 2021 17:39 Inactive
@lmossman lmossman temporarily deployed to more-secrets October 25, 2021 17:58 Inactive
@lmossman lmossman temporarily deployed to more-secrets October 25, 2021 19:11 Inactive
@lmossman lmossman temporarily deployed to more-secrets October 25, 2021 19:48 Inactive
@lmossman lmossman temporarily deployed to more-secrets October 25, 2021 22:02 Inactive
Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@lmossman lmossman temporarily deployed to more-secrets October 25, 2021 22:35 Inactive
@lmossman lmossman merged commit 5d2b5dc into master Oct 25, 2021
@lmossman lmossman deleted the lmossman/fetch-specs-from-definitions branch October 25, 2021 23:33
schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
* try fetching specs from definitions first

* refactor specFetcher and update tests

* run gradle format

* format again

* fix comment formatting

* fix test

* merge comment lines into single line

* move duplicate job metadata mocking logic to shared static method

* add todo

* formatting

* use local var and clone

* run gw format

* add todo

* skip spec fetcher in docker image validator and update todos

* run gw format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a new SynchronousSchedulerClient that pulls the spec from the db
3 participants