Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE #34755] do not propagate parameters on InlineSchemaLoader #34853

Merged
merged 4 commits into from
Feb 7, 2024

Conversation

maxi297
Copy link
Contributor

@maxi297 maxi297 commented Feb 5, 2024

What

Addresses #34755

How

During component transformation, exclude type InlineSchemaLoader from the propagation.

🚨 User Impact 🚨

Migration

TLDR: nothing to do to update sources that were currently affected by this.

Running the following command, we can see that only three connectors currently have both $parameters and InlineSchemaLoader:

connectors% grep -l 'InlineSchemaLoader' `grep -l '$parameters' source-*/source_*/manifest.yaml`
source-coingecko-coins/source_coingecko_coins/manifest.yaml
source-gnews/source_gnews/manifest.yaml
source-news-api/source_news_api/manifest.yaml

source-coingecko-coins

I was very surprised to see that source-coingecko-coins did not have the issue while running docker run -v $(pwd)/secrets:/data airbyte/source-coingecko-coins:0.1.0 discover --config /data/config.json. However, when checking in the docker image using docker run --rm -it --entrypoint bash airbyte/source-coingecko-coins:0.1.0, we can see that there is a mismatch between the current code

bash-5.1# ls -al
total 28
drwxr-xr-x    3 root     root          4096 Nov 18  2022 .
drwxr-xr-x    1 root     root          4096 Nov 18  2022 ..
-rw-r--r--    1 root     root           140 Nov 18  2022 __init__.py
-rw-r--r--    1 root     root          1906 Nov 18  2022 coingecko_coins.yaml
drwxr-xr-x    2 root     root          4096 Nov 18  2022 schemas
-rw-r--r--    1 root     root           490 Nov 18  2022 source.py
-rw-r--r--    1 root     root          1392 Nov 18  2022 spec.yaml

coingecko_coins.yaml is old enough to rely on $options instead of $parameters. At this point in time, either propagation wasn't implemented or InlineSchema was excluded.

source-gnews

The same logic from source-coingecko-coins apply to gnews as we can see with this file structure:

bash-5.1# ls -al source_gnews/
total 44
drwxr-xr-x    3 root     root          4096 Jan 17  2023 .
drwxr-xr-x    1 root     root          4096 Jan 17  2023 ..
-rw-r--r--    1 root     root           241 Jan 17  2023 __init__.py
drwxr-xr-x    2 root     root          4096 Jan 17  2023 __pycache__
-rw-r--r--    1 root     root          5439 Jan 17  2023 gnews.yaml
-rw-r--r--    1 root     root           471 Jan 17  2023 source.py
-rw-r--r--    1 root     root          8245 Jan 17  2023 spec.yaml
-rw-r--r--    1 root     root           969 Jan 17  2023 wait_until_midnight_backoff_strategy.py

source-news-api

source-news-api is affected by the propagation but in a way that is non breaking as name, primary_key and path seems to be ignored by jsonschema.Draft7Validator.check_schema(...)

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {...},
    "name": "everything",
    "primary_key": "publishedAt",
    "path": "/everything",
    "$parameters": {
        "name": "everything",
        "primary_key": "publishedAt",
        "path": "/everything"
    }
}

Hence, even though the current schema didn't seem to cause issue, it'll be cleaned up.

@maxi297 maxi297 requested a review from a team as a code owner February 5, 2024 15:38
Copy link

vercel bot commented Feb 5, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Feb 7, 2024 6:51pm

@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label Feb 5, 2024
@ambirdsall
Copy link
Contributor

Small note on the PR title, since that eventually ends up as the subject line of the merge commit: je ne crois pas que «propage» soit le bon mot 😄

The change is small and straightforward and everything looks good based on reading the code changes and PR description; I'll test it out a bit locally before I smash the approve button, though.

@maxi297 maxi297 changed the title [ISSUE #34755] do not propage parameters on InlineSchemaLoader [ISSUE #34755] do not propagate parameters on InlineSchemaLoader Feb 5, 2024
@maxi297
Copy link
Contributor Author

maxi297 commented Feb 5, 2024

Small note on the PR title, since that eventually ends up as the subject line of the merge commit: je ne crois pas que «propage» soit le bon mot 😄

Your French is good enough for the title to say "propage"

@@ -74,6 +74,8 @@
"SimpleRetriever.partition_router": "CustomPartitionRouter",
}

_PROPAGATION_EXCLUSION_TYPES = {"InlineSchemaLoader"} # propagation of extra parameters leads to invalid JSON schemas
Copy link
Contributor Author

@maxi297 maxi297 Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we should use InlineSchemaLoader.schema()["properties"]["type"]["enum"] to make sure that if the enum changes, the dependency to this is explicit and this is updated as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean a wider range of impact though. For example, spec.connection_specification would be affected by this. I don't see any parameters for this as well and it seems fine to say that there wouldn't ever be for the same reason we want to exclude InlineSchemaLoader.schema though so I would be fine with this

@ambirdsall
Copy link
Contributor

ambirdsall commented Feb 6, 2024

UPDATE: I was running the tests without the --use-local-cdk flag, which is why my tests runs didn't reflect the updated local code. Adding that flag make the migrated code pass the schema validation; there were no test failures that were not already present in master.

While this does seem to be an improvement to the propagation logic, it doesn't actually solve the test failures that lead to the PR being made. I tested by cherry picking these commits to a test branch along with amb/inline-spec-and-schema-files-for-source-activecampaign, which has the migrated connector whose inlined schema was failing the schema validation CAT; after adding the new propagation logic, I still see an identical error message (copying the TestDiscovery.test_streams_have_valid_json_schemas[inputs0] error output from each test branch and then running pbpaste | md5 generated identical checksums).

@@ -103,7 +105,7 @@ def propagate_types_and_parameters(
propagated_component["type"] = found_type

# When there is no resolved type, we're not processing a component (likely a regular object) and don't need to propagate parameters
if "type" not in propagated_component:
if "type" not in propagated_component or propagated_component["type"] in _PROPAGATION_EXCLUSION_TYPES:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor thought I had here is that we actually don't want to propagate things is the schema field of the InlineSchemaLoader as opposed to the whole schema loader.

And that is because the schema field is not actually a component at all. The same for ohter compoents. For a field something like JSON body field we wouldn't want to accidentally inject all the params there.

I did some snooping and I think this original check is actually not doing what we want. Because even for the InlineSchemaLoader.schema field, it has a type field set to object. So this condition never triggers. I haven't thoroughly vetted this, just re-running tests, but changing the condition to: if "type" not in propagated_component or propagated_component.get("type") == "object":.

So then the InlineSchemaLoader can still receive parameters, but it doesn't pass them to `schema. If this is indeed accurate, then I prefer this solution because it means less one-of logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. Let me dig down and see if there are edge cases for that solution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feel like it makes sense. I added a comment to clarify the scope of this check

@maxi297
Copy link
Contributor Author

maxi297 commented Feb 7, 2024

@ambirdsall I'm not sure I'm buying that. Let's do three scenarios:

  • from master
  • from amb/inline-spec-and-schema-files-for-source-activecampaign
  • from amb/inline-spec-and-schema-files-for-source-activecampaign with the change here

We can define a couple of diffs from that. The two most important are between master to others. So:

  • between master and amb/inline-spec-and-schema-files-for-source-activecampaign
image
  • between master and amb/inline-spec-and-schema-files-for-source-activecampaign with the change here
image

If the issue is still there, I would assume that the issue is on master as well. However I don't see this error in the weekly run. Can you add logging to make sure you are running with the versions you assume you are working with? If I get bullet point repro steps, I can try to reproduce or pin point the issue

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@maxi297 maxi297 merged commit 3d9f70f into master Feb 7, 2024
22 checks passed
@maxi297 maxi297 deleted the issue-34755/do-not-propagate-on-inlineschemaloader branch February 7, 2024 20:41
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 21, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants