Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement flyway migration for config database normalization + tests #8563

Merged
merged 11 commits into from
Dec 21, 2021

Conversation

subodh1810
Copy link
Contributor

Issue : #8438

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/SUMMARY.md
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Connector Generator

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.

@subodh1810 subodh1810 self-assigned this Dec 6, 2021
@subodh1810 subodh1810 temporarily deployed to more-secrets December 6, 2021 23:08 Inactive
Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted a couple minor alterations to the schema, but once those are done, I think you are good to go! Very nice!

@@ -49,6 +49,10 @@
}
}

public static <T> T convertValue(final Object object, final Class<T> klass) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a javadoc comment explaining what context this should be used in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

"name" varchar(256) not null,
"configuration" jsonb not null,
"actor_type" actor_type not null,
"tombstone" bool not null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all of the tombstones, can we set a default value of false here? when we create these objects the common case is that it is not tombstoned yet.

"name" varchar(256) not null,
"docker_repository" varchar(256) not null,
"docker_image_tag" varchar(256) not null,
"documentation_url" varchar(256) not null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should allow this one to be null.

"updated_at" timestamptz(35) not null,
constraint "actor_oauth_parameter_pkey"
primary key ("id")
);
create table "public"."airbyte_configs"(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after this migration we can drop this table, right? totally fine if we don't do it in this migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking cleaning up the table as part of a separate migration. I thought it would be more safe

"configuration" jsonb not null,
"actor_type" actor_type not null,
"tombstone" bool not null,
"created_at" timestamptz(35) not null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all of the created_at and updated_at can we have it default to the current timestamp, please? setting that default helps avoid a bunch of manual input that would otherwise just be the current timestamp anyway.

create unique index "state_pkey" on "public"."state"(
"id" asc,
"connection_id" asc
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connection_operation is missing in this schema_dump for some reason.

foreign key ("connection_id")
references "public"."connection" ("id");
create index "actor_actor_definition_id_idx" on "public"."actor"("actor_definition_id" asc);
create index "actor_actor_type_idx" on "public"."actor"("actor_type" asc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure how to think about this index. it's just going to bucket into one of two categories? what query is this making more efficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for the other two actor_type indexes. i don't think they will usually get used. usually for really low cardinality fields it doesn't make sense to do an index. this SO article does an okay job explaining why.

.set(createdAt, OffsetDateTime.ofInstant(configWithMetadata.getCreatedAt(), ZoneOffset.UTC))
.set(updatedAt, OffsetDateTime.ofInstant(configWithMetadata.getUpdatedAt(), ZoneOffset.UTC))
.execute();
createAndPopulateConnectionOperation(ctx, !connectionOperationCreated, configWithMetadata);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah here's the problem. in the schema_dump i noticed this table doesn't exist. it is because it is never created because the creation of the table is happening in here. those two concerns need to be separated. the table should be created, even if there are no operations in the initial migration.

final Field<OffsetDateTime> createdAt = DSL.field("created_at", SQLDataType.TIMESTAMPWITHTIMEZONE.nullable(false));
final Field<OffsetDateTime> updatedAt = DSL.field("updated_at", SQLDataType.TIMESTAMPWITHTIMEZONE.nullable(false));
final Field<JSONB> configBlob = DSL.field("config_blob", SQLDataType.JSONB.nullable(false));
final Result<Record> results = ctx.select(asterisk()).from(DSL.table("airbyte_configs")).where(configType.eq(airbyteConfigType.name())).fetch();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth noting that we are making an assumption here that each table we are pulling can fully fit in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it might cause problems and needs change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a great point charles, it might if the database has a ton of connections, sources or destinations.

I'll be curious to test this. I think we should test how this performs with a database with ten thousands connections to be safe. I don't expect the other resources to be used with the same level of scale.

createAndPopulateState(ctx);
}

private static void createEnums(final DSLContext ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think i saw the enums in the schema dump fwiw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason the automatic schema dump command ./gradlew :airbyte-db:lib:dumpConfigsSchema doesnt create the enums. Added them manually

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any luck on getting these to populate automatically?

@subodh1810 subodh1810 temporarily deployed to more-secrets December 7, 2021 23:27 Inactive
@subodh1810 subodh1810 temporarily deployed to more-secrets December 7, 2021 23:34 Inactive
@subodh1810 subodh1810 temporarily deployed to more-secrets December 7, 2021 23:36 Inactive
constraint "actor_definition_pkey"
primary key ("id")
);
create table "public"."actor_oauth_parameter"(
Copy link
Contributor

@davinchia davinchia Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you figure out how to get jooq to add enums to the schema dump?

@subodh1810 subodh1810 temporarily deployed to more-secrets December 12, 2021 19:05 Inactive
@subodh1810 subodh1810 temporarily deployed to more-secrets December 14, 2021 17:29 Inactive
"schedule" jsonb null,
"manual" bool not null,
"resource_requirements" jsonb null,
"created_at" timestamptz(35) not null default cast(current_timestamp as timestamp with time zone),
Copy link
Contributor

@cgardens cgardens Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work on getting this default to populate!

@subodh1810 subodh1810 temporarily deployed to more-secrets December 15, 2021 09:41 Inactive
@subodh1810 subodh1810 temporarily deployed to more-secrets December 21, 2021 10:25 Inactive
* implement new database config persistence

* make new persistence compatible with applications

* update tests

* do not update createdAt timestamp

* review comments + incorporate changes from bootloader

* address review comments

* fixed test + remove unused method

* fix archive handler test

* handle null value for tombstone

* add logs for assert migration

* final review comments
@github-actions github-actions bot added area/platform issues related to the platform area/server labels Dec 21, 2021
@subodh1810 subodh1810 temporarily deployed to more-secrets December 21, 2021 13:00 Inactive
@subodh1810 subodh1810 merged commit 8654c4a into master Dec 21, 2021
@subodh1810 subodh1810 deleted the config-database-migration-implementation branch December 21, 2021 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants