-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAT: Validate connector documentation #34380
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
|
||
Now that you have set up the GitHub source connector, check out the following GitHub tutorials: | ||
|
||
- [Creating PAT](https:github.com/how-to-create-a-pat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need another check for valid URL links? At least assertion for status_code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that this change might impact some of our existing connectors beyond what you tested, can we make sure to communicate to the team on how to fix these quickly if they fail standard validation?
Can we try to test this with a couple more certified connectors because we should try to be very diligent about how we validate.
Overall nice work so far, but I have a few things I flagged especially when we get very specific to some of the description validations
...te-integrations/bases/connector-acceptance-test/connector_acceptance_test/tests/test_core.py
Outdated
Show resolved
Hide resolved
"For Airbyte Cloud:", | ||
"For Airbyte Open Source:", | ||
"Supported sync modes", | ||
"Supported Streams", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not need to be required and we can relax this restriction. We have some cases like source-airtable
which really functions more like a DB table hence we call it Supported Tables:
https://raw.githubusercontent.com/airbytehq/airbyte/master/docs/integrations/sources/airtable.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with that, airbyte protocol names it stream so docs should use this naming. Some users can be confused what they need streams or tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right athatconsistent naming scheme is better. No need to address this part.
I did a check and the three sources that use Supported tables
are my-hours
, airtable
, and dv-360
. I think it is acceptable for these connector docs to fail and to be changed to Supported streams
to pass tests.
connector_name, | ||
"Prerequisites", | ||
"Setup guide", | ||
f"Set up {connector_name}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required and where? I did a quick look of google ads
and while it is in the docs, its not a header and just in the text?
Am I missing something as to why this would pass the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs criteria here, cell E35
...te-integrations/bases/connector-acceptance-test/connector_acceptance_test/tests/test_core.py
Outdated
Show resolved
Hide resolved
...te-integrations/bases/connector-acceptance-test/connector_acceptance_test/tests/test_core.py
Outdated
Show resolved
Hide resolved
...te-integrations/bases/connector-acceptance-test/connector_acceptance_test/tests/test_core.py
Show resolved
Hide resolved
return heading.replace("Step 1: ", "") | ||
if "Step 2: " in heading: | ||
return heading.replace("Step 2: ", "") | ||
return heading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we stop after Step 2? I presume that many connectors will have steps beyond 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, really a good case with steps, as for example netsuite.md has more than 1 steps, we can try to strict compare step 1 and 2 as it is in connector standard template and skip others "Step" headers if they are after "Step 1:" header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see other message about whether we need to test this
process = Thread(target=validate_docs_links, args=[link]) | ||
process.start() | ||
threads.append(process) | ||
[process.join() for process in threads] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add some sort of timeout or failsafe for this? I don't think we want to hold up a CAT run if for example some URLs end up hanging. Maybe 30 seconds (per thread) is a safe amount of time to wait and exit out if it's not fulfilled by then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
...te-integrations/bases/connector-acceptance-test/connector_acceptance_test/tests/test_core.py
Show resolved
Hide resolved
f"Documentation structure doesn't follow standard template. docs is not full.\nMissing headers: {template_headings[template_headings_index:]}" | ||
) | ||
|
||
def test_docs_descriptions(self, docs_path: str, connector_documentation: str, connector_metadata: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test this some additional certified connectors like source-greenhouse
, source-airtable
and a few others. I'm a little concerned how many connectors will fail this validation since it's quite specific to the exact text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure, I will test it with some more certified connectors and let you know the results. My previous tests, unfortunately , show differences in the actual and expected docs structure, but it was reasonable, errors like typos, incorrect order of heading etc were found using this test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay sounds good thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is my test results for some connectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have access to this document. Please share with me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, please let me know if you don't have access
Yes, this test is really strict and most connectors might fail, but this test suite currently works as TestConnectorAttributes, it doesn't work unless |
…d steps, added timeout for test_validate_links
…ql lower than 400
Regarding a plan for the team how we will handle failures after this pr will be merged. I will create an issue with a list of cerified connectors and we will add a config for this test in acceptance-config.yaml for these connectors and fix docs in case test are failed. Now this test is not mandatory and will work only if config is provided in acceptance-config.yaml and ql > 400. Connectors without provided config will not be affected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please share the test document with me. But I think we can move forward with the test changes
What
resolved: #33750 #33749
How
Added
TestConnectorDocumentation
with:test_prerequisites_content
- test gets required fields from actual connector spec and check if title is in section under Prerequisites header. Special case is credentials. There are a lot of ways how to describe this step, so for credentials test searches for credentials keywords like auth, account etc.test_docs_structure
- test gets all heading from connector docs and check for the right order based on our standard template. If required heading is not in actual docs or not on a right order test will fail.test_docs_descriptions
- test gets headings that have specific description according our standard template and validates content after this heading to next one line by line. If actual documentation does have differences test will fail.test_validate_links
- retrieves all links from documentation file and makes a get request. if response status code in [200, 403, 401] it's a valid links, other links will be collected as invalid links and showed in assertation.TestConnectorDocumentation is not required for now.
Recommended reading order
airbyte-integrations/bases/connector-acceptance-test/connector_acceptance_test/tests/test_core.py
airbyte-integrations/bases/connector-acceptance-test/connector_acceptance_test/conftest.py
airbyte-integrations/bases/connector-acceptance-test/connector_acceptance_test/config.py
airbyte-integrations/bases/connector-acceptance-test/connector_acceptance_test/tests/doc_templates/
- folder with standard templates to compare with actual docs.Manualy tested with:
source-google-ads
source-github
source-faker