Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightweight export/import of configurations scoped by workspace Id #5598

Merged
merged 11 commits into from
Aug 30, 2021

Conversation

ChristopheDuong
Copy link
Contributor

@ChristopheDuong ChristopheDuong commented Aug 24, 2021

What

Describe what the change is solving
Closes #5124

How

Describe the solution

  • export only configurations related to a workspace id from the config database (not the job database)
  • import configurations into a workspace (tweaking configurations id as needed if any conflicts). This step is done in two API calls by going through a staging resource file first before actually importing.

Recommended reading order

  1. airbyte-api/src/main/openapi/config.yaml
  2. airbyte-server/src/main/java/io/airbyte/server/ConfigDumpExporter.java
  3. airbyte-server/src/main/java/io/airbyte/server/ConfigDumpImporter.java

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • docs/SUMMARY.md
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions
  • Connector added to connector index like described here

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions
  • Connector version bumped like described here

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here

Connector Generator

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.

@github-actions github-actions bot added area/api Related to the api area/platform issues related to the platform area/documentation Improvements or additions to documentation labels Aug 24, 2021
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not having tests for these new methods/class makes me super nervous. Could we add those?

airbyte-api/src/main/openapi/config.yaml Show resolved Hide resolved
.collect(Collectors.toList());
writeConfigsToArchive(parentFolder, ConfigSchema.SOURCE_CONNECTION.name(), sourceConnections.stream().map(Jsons::jsonNode));

final Map<UUID, StandardSourceDefinition> sourceDefinitionMap = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is formidable. can we break it up a little? are there any DRY opportunities here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just got the basic functionality working, I'll refactor parts and add tests now

return dump;
}

private void exportConfigsDatabase(Path parentFolder, UUID workspaceId) throws IOException, JsonValidationException, ConfigNotFoundException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at a high level it feels like there should be an export(list<uuid> workspaceIds) instead of having two different implementations one scoped by workspace. Would that make sense here or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This export function scoped by workspace Id is exporting configurations related to a workspace and then "cascade" down to other configurations that are being referred/dependent on them.

The second export without scope is dumping everything regardless of links/relations between configurations and thus, will cover more configurations than this export(uuid) or a export(list<uuid>) function

final Stream<T> configs = readConfigsFromArchive(sourceRoot, configSchema);
if (!dryRun) {
switch (configSchema) {
case STANDARD_SOURCE_DEFINITION -> importStandardSourceDefinitionsIntoWorkspace(configs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we going to handle this on cloud now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to introduce a wrapped version of this that changes the behavior in cloud

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment, i'm thinking about adding a different configDumpImporter that ignores all standard definitions for the short term (since we don't allow custom connectors yet)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// 1. Unzip source
Archives.extractArchive(archive.toPath(), sourceRoot);

// TODO: Auto-migrate archive?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought about the archive auto migration feature. It will be a medium size project. We need to write version-specific importer that can understand imports of different versions.

Issue here: #5682

// We sort the directories because we want to process SOURCE_CONNECTION after
// STANDARD_SOURCE_DEFINITION and DESTINATION_CONNECTION after STANDARD_DESTINATION_DEFINITION
// so that we can identify which definitions should not be upgraded to the latest version
directories.sort(Comparator.reverseOrder());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't update the definition in this method, the sorting here seems unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we update definitions when its a new definition to create.

This should be done before importing the connector as it checks that definitions exists first

@ChristopheDuong ChristopheDuong merged commit 314a747 into master Aug 30, 2021
@ChristopheDuong ChristopheDuong deleted the chris/scope-export-import branch August 30, 2021 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Related to the api area/documentation Improvements or additions to documentation area/platform issues related to the platform
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scope import / export by workspace id
4 participants