Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAT: Validate connector documentation #34380

Merged
merged 14 commits into from
Feb 8, 2024

Conversation

darynaishchenko
Copy link
Collaborator

@darynaishchenko darynaishchenko commented Jan 19, 2024

What

resolved: #33750 #33749

How

Added TestConnectorDocumentation with:

  • test_prerequisites_content - test gets required fields from actual connector spec and check if title is in section under Prerequisites header. Special case is credentials. There are a lot of ways how to describe this step, so for credentials test searches for credentials keywords like auth, account etc.
  • test_docs_structure - test gets all heading from connector docs and check for the right order based on our standard template. If required heading is not in actual docs or not on a right order test will fail.
  • test_docs_descriptions - test gets headings that have specific description according our standard template and validates content after this heading to next one line by line. If actual documentation does have differences test will fail.
  • test_validate_links - retrieves all links from documentation file and makes a get request. if response status code in [200, 403, 401] it's a valid links, other links will be collected as invalid links and showed in assertation.

TestConnectorDocumentation is not required for now.

Recommended reading order

  1. airbyte-integrations/bases/connector-acceptance-test/connector_acceptance_test/tests/test_core.py
  2. airbyte-integrations/bases/connector-acceptance-test/connector_acceptance_test/conftest.py
  3. airbyte-integrations/bases/connector-acceptance-test/connector_acceptance_test/config.py
  4. airbyte-integrations/bases/connector-acceptance-test/connector_acceptance_test/tests/doc_templates/ - folder with standard templates to compare with actual docs.

Manualy tested with:

  • source-google-ads
  • source-github
  • source-faker

Copy link

vercel bot commented Jan 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Feb 5, 2024 6:45pm

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Jan 19, 2024

Now that you have set up the GitHub source connector, check out the following GitHub tutorials:

- [Creating PAT](https:github.com/how-to-create-a-pat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need another check for valid URL links? At least assertion for status_code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that this change might impact some of our existing connectors beyond what you tested, can we make sure to communicate to the team on how to fix these quickly if they fail standard validation?

Can we try to test this with a couple more certified connectors because we should try to be very diligent about how we validate.

Overall nice work so far, but I have a few things I flagged especially when we get very specific to some of the description validations

"For Airbyte Cloud:",
"For Airbyte Open Source:",
"Supported sync modes",
"Supported Streams",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not need to be required and we can relax this restriction. We have some cases like source-airtable which really functions more like a DB table hence we call it Supported Tables:

https://raw.githubusercontent.com/airbytehq/airbyte/master/docs/integrations/sources/airtable.md

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with that, airbyte protocol names it stream so docs should use this naming. Some users can be confused what they need streams or tables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right athatconsistent naming scheme is better. No need to address this part.

I did a check and the three sources that use Supported tables are my-hours, airtable, and dv-360. I think it is acceptable for these connector docs to fail and to be changed to Supported streams to pass tests.

connector_name,
"Prerequisites",
"Setup guide",
f"Set up {connector_name}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required and where? I did a quick look of google ads and while it is in the docs, its not a header and just in the text?

Am I missing something as to why this would pass the test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs criteria here, cell E35

return heading.replace("Step 1: ", "")
if "Step 2: " in heading:
return heading.replace("Step 2: ", "")
return heading
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we stop after Step 2? I presume that many connectors will have steps beyond 2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, really a good case with steps, as for example netsuite.md has more than 1 steps, we can try to strict compare step 1 and 2 as it is in connector standard template and skip others "Step" headers if they are after "Step 1:" header.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other message about whether we need to test this

process = Thread(target=validate_docs_links, args=[link])
process.start()
threads.append(process)
[process.join() for process in threads]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some sort of timeout or failsafe for this? I don't think we want to hold up a CAT run if for example some URLs end up hanging. Maybe 30 seconds (per thread) is a safe amount of time to wait and exit out if it's not fulfilled by then

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

f"Documentation structure doesn't follow standard template. docs is not full.\nMissing headers: {template_headings[template_headings_index:]}"
)

def test_docs_descriptions(self, docs_path: str, connector_documentation: str, connector_metadata: dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test this some additional certified connectors like source-greenhouse, source-airtable and a few others. I'm a little concerned how many connectors will fail this validation since it's quite specific to the exact text

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure, I will test it with some more certified connectors and let you know the results. My previous tests, unfortunately , show differences in the actual and expected docs structure, but it was reasonable, errors like typos, incorrect order of heading etc were found using this test suite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay sounds good thank you

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is my test results for some connectors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have access to this document. Please share with me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, please let me know if you don't have access

@darynaishchenko
Copy link
Collaborator Author

I have a feeling that this change might impact some of our existing connectors beyond what you tested, can we make sure to communicate to the team on how to fix these quickly if they fail standard validation?

Yes, this test is really strict and most connectors might fail, but this test suite currently works as TestConnectorAttributes, it doesn't work unless connector_documentation is added to acceptance-test-config file.

@darynaishchenko
Copy link
Collaborator Author

Regarding a plan for the team how we will handle failures after this pr will be merged. I will create an issue with a list of cerified connectors and we will add a config for this test in acceptance-config.yaml for these connectors and fix docs in case test are failed. Now this test is not mandatory and will work only if config is provided in acceptance-config.yaml and ql > 400. Connectors without provided config will not be affected.

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please share the test document with me. But I think we can move forward with the test changes

@darynaishchenko darynaishchenko merged commit 8107081 into master Feb 8, 2024
25 checks passed
@darynaishchenko darynaishchenko deleted the daryna/cat/validate-source-docs branch February 8, 2024 16:04
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 21, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CAT scenario testing all required fields are listed under the Prerequisites section
5 participants