Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 🎉 Source Airtable: migrate to the Metadata API for dynamic schema generation #20846

Merged
merged 12 commits into from
Jan 9, 2023

Conversation

bazarnov
Copy link
Collaborator

@bazarnov bazarnov commented Dec 22, 2022

What

Resolves:

How

  • implemented Metadata API approach to providing dynamic JSON schema generation
  • removed base_id and tables properties from input config, now these properties are dynamically fetched and passed to each stream
  • general refactor of the code + best practices + HTTP error handling

🚨 User Impact 🚨

Users should re-setup the connector and redo the sync all over again, since it's a major change.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here

@bazarnov
Copy link
Collaborator Author

bazarnov commented Dec 28, 2022

/test connector=connectors/source-airtable

🕑 connectors/source-airtable https://github.com/airbytehq/airbyte/actions/runs/3794730590
✅ connectors/source-airtable https://github.com/airbytehq/airbyte/actions/runs/3794730590
Python tests coverage:

Name                          Stmts   Miss  Cover
-------------------------------------------------
source_airtable/helpers.py       19      0   100%
source_airtable/__init__.py       2      0   100%
source_airtable/streams.py       91      1    99%
source_airtable/source.py        37      1    97%
-------------------------------------------------
TOTAL                           149      2    99%
	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       140      5    96%   87, 93, 238, 242-243
	 source_acceptance_test/conftest.py                     208     92    56%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-280, 288-301, 306-312, 319-330, 337-353
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       158     14    91%   52-59, 64-77, 240
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1603    336    79%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:386: Backward compatibility tests are disabled for version 0.1.3.
======================== 27 passed, 2 skipped in 37.52s ========================

@bazarnov bazarnov requested review from a team and removed request for grubberr, davydov-d, darynaishchenko and roman-yermilov-gl January 4, 2023 09:41
@evantahler evantahler requested a review from a team January 4, 2023 19:59
@bazarnov
Copy link
Collaborator Author

bazarnov commented Jan 6, 2023

@alafanechere
UPDATE:
The customers need just to Refresh Source Schema, which brings a new format of tables and streams, and uncheck the Reset affected streams (recommended) while saving the replication settings to keep the old data in place and produce new streams and catalog, which will replace the old one, with old streams format. But that's because I'm using a new Personal API Token with selected scopes, which was not available previously for all users, now it is.

The actual connection should also be refreshed with the new Personal API TOKEN acquired using Airtable UI because the whole mechanism of how Airtable does the authentication was changed.
So the general recommendation is to set another new connection from Airtable to the destination, to have a new way of data replication, and migrate gracefully in the future.

@bazarnov
Copy link
Collaborator Author

bazarnov commented Jan 6, 2023

/test connector=connectors/source-airtable

🕑 connectors/source-airtable https://github.com/airbytehq/airbyte/actions/runs/3855170365
✅ connectors/source-airtable https://github.com/airbytehq/airbyte/actions/runs/3855170365
Python tests coverage:

Name                                Stmts   Miss  Cover
-------------------------------------------------------
source_airtable/schema_helpers.py      24      0   100%
source_airtable/__init__.py             2      0   100%
source_airtable/streams.py             91      1    99%
source_airtable/source.py              33      1    97%
-------------------------------------------------------
TOTAL                                 150      2    99%
	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 source_acceptance_test/conftest.py                     211     95    55%   36, 42-44, 49, 54, 77, 83, 89-91, 110, 115-117, 123-125, 131-132, 137-138, 143, 149, 158-167, 173-178, 193, 217, 248, 254, 262-267, 275-285, 293-306, 311-317, 324-335, 342-358
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              402    115    71%   53, 58, 93-104, 109-116, 120-121, 125-126, 308, 346-363, 376-387, 391-396, 402, 435-440, 478-485, 528-530, 533, 598-606, 618-621, 626, 682-683, 689, 692, 728-738, 751-776
	 source_acceptance_test/tests/test_incremental.py       160     14    91%   56-63, 68-81, 244
	 source_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 source_acceptance_test/utils/json_schema_helper.py     107     13    88%   30-31, 38, 41, 65-68, 96, 120, 192-194
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1609    339    79%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:386: Backward compatibility tests are disabled for version 0.1.3.
======================== 27 passed, 2 skipped in 34.64s ========================

@alafanechere
Copy link
Contributor

Thanks @bazarnov for all these details.
@erica-airbyte / @YowanR I think we can merge this because:

  • It's increasing the stability: more accurate schema, less normalization errors
  • It's providing a more complete set of features: auto-discovery of available bases / tables
  • This is an alpha connector

@erica-airbyte shall we hold this while you prepare a customer outreach?

@bazarnov
Copy link
Collaborator Author

bazarnov commented Jan 6, 2023

About this one:

It's increasing the stability: more accurate schema, less normalization errors

I've also tested sync with normalization after this change, works as expected.

@YowanR
Copy link
Contributor

YowanR commented Jan 6, 2023

I would agree with that statement @alafanechere! This helps makes Airtable better and might open more opportunities in the future as well. @bazarnov Let's wait for @erica-airbyte to comment though to make sure that comms are handled before merging!

@erica-airbyte
Copy link
Contributor

@alafanechere I want to make sure I am understanding correctly what I am telling customers:

Once this update goes out the customer connection will break but they will need to first update their API token due to how Airtable updated their authentication then Refresh Source Schema to bring in the new format of tables and streams, and uncheck the Reset affected streams (recommended) while saving the replication settings?

Is that right?

If so I will reach out today. When do we want to deploy this?

@alafanechere
Copy link
Contributor

@bazarnov could you confirm Erica's statement? ☝️ You're more aware than I do 😄

@bazarnov
Copy link
Collaborator Author

bazarnov commented Jan 6, 2023

@alafanechere @erica-airbyte

I confirm, correct. Please follow the public docs https://airtable.com/developers/web/guides/personal-access-tokens to create new tokens before the docs are updated.

The scopes we need to set for new tokens are:

  • data.records:read
  • data.recordComments:read
  • schema.bases:read

Also, "All Workspaces & Bases" should be selected (there is a point for this in a dropdown menu, once clicked), if we need to fetch all the data available, or specific ones if something special is needed.

Enterprise Scopes are not supported yet, since we're limited in account functionality due to the subscription plan.

@erica-airbyte
Copy link
Contributor

Ok thanks! Can we plan on having this go out next Tuesday at 9AM PST so customer know when they will need to be ready?

@erica-airbyte
Copy link
Contributor

erica-airbyte commented Jan 6, 2023

All comms have been sent out to customers and I gave them Tuesday the 10th around 9AM PST that this will go out.
https://docs.google.com/spreadsheets/d/1hvZ1tabJFPHVtDwR0LSURPMLCwqCkHefe1Vda0hdqGc/edit#gid=0

@alafanechere
Copy link
Contributor

@erica-airbyte @bazarnov I'll merge it on Tuesday 9 am PST and it will get automatically deployed to cloud.

@bazarnov
Copy link
Collaborator Author

bazarnov commented Jan 7, 2023

I'll also adjust the docs till Monday to provide clear instructions for authorization with new format. Thanks @alafanechere

@alafanechere
Copy link
Contributor

alafanechere commented Jan 9, 2023

/publish connector=connectors/source-airtable

🕑 Publishing the following connectors:
connectors/source-airtable
https://github.com/airbytehq/airbyte/actions/runs/3876661459


Connector Did it publish? Were definitions generated?
connectors/source-airtable

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 9, 2023 18:57 — with GitHub Actions Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 9, 2023 18:57 — with GitHub Actions Inactive
@alafanechere alafanechere merged commit 98ee1c2 into master Jan 9, 2023
@alafanechere alafanechere deleted the baz/source-airtable-to-beta branch January 9, 2023 21:37
@erica-airbyte
Copy link
Contributor

This is still going out tomorrow right at 9AM PST right?

@erica-airbyte
Copy link
Contributor

Hey team, it seems like the update did go out yesterday instead of today (Tuesday the 10th) and they can not update their current connector they have to create a whole new connector with the new API token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/airtable
Projects
None yet
8 participants