Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and clean up json avro schema converter #9363

Merged
merged 19 commits into from
Jan 9, 2022

Conversation

tuliren
Copy link
Contributor

@tuliren tuliren commented Jan 9, 2022

What

  • Merge record schemas within a type union.
  • Rename methods and add Javadoc for clarity.
  • Clean up test case.

@github-actions github-actions bot added the area/connectors Connector related issues label Jan 9, 2022
@tuliren tuliren temporarily deployed to more-secrets January 9, 2022 00:42 Inactive
@tuliren tuliren temporarily deployed to more-secrets January 9, 2022 01:09 Inactive
@tuliren tuliren temporarily deployed to more-secrets January 9, 2022 01:21 Inactive
@tuliren tuliren temporarily deployed to more-secrets January 9, 2022 08:01 Inactive
@tuliren tuliren temporarily deployed to more-secrets January 9, 2022 08:03 Inactive
@tuliren tuliren temporarily deployed to more-secrets January 9, 2022 08:10 Inactive
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Jan 9, 2022
@tuliren tuliren temporarily deployed to more-secrets January 9, 2022 09:53 Inactive
obtainPaths(arrayPath, arrayNode.get(i), jsonNodePathMap);
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method proactively calculate all paths to generate the namespaces. However, this is not necessary. The namespaces can be computed while travesing the Json schema.

@@ -1216,5 +1168,226 @@
"identifier": ["151", "152", "true", "{\"id\":153}"],
"_airbyte_additional_properties": null
}
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above case is trying to test objects with same names. However, it had lots of irrelevant fields. Those fields have been removed to make the test case easier to read.

@tuliren tuliren temporarily deployed to more-secrets January 9, 2022 10:44 Inactive
@tuliren
Copy link
Contributor Author

tuliren commented Jan 9, 2022

/test connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/1674473764
✅ connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/1674473764
No Python unittests run

@tuliren
Copy link
Contributor Author

tuliren commented Jan 9, 2022

/test connector=connectors/destination-gcs

🕑 connectors/destination-gcs https://github.com/airbytehq/airbyte/actions/runs/1674473849
✅ connectors/destination-gcs https://github.com/airbytehq/airbyte/actions/runs/1674473849
No Python unittests run

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 9, 2022 17:40 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 9, 2022 17:40 Inactive
@tuliren tuliren merged commit 22ef236 into master Jan 9, 2022
@tuliren tuliren deleted the liren/avro-namespace-refactoring branch January 9, 2022 18:56
@@ -161,6 +161,149 @@ This is not supported in Avro schema. As a compromise, the converter creates a u
}
```

If the Json array has multiple object items, these objects will be recursively merged into one Avro record. For example, the following Json array expects two different objects, each with a different `id` field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this doc be in undertanding airbyte? or is it more appropriately scoped for the S3/blob storage connectors docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avro doc was previously inline in the S3 and other blob storage connector docs. But as it gets longer and longer, I don't want to duplicate this doc in multiple places anymore. And In the future, more connectors using Avro / Parquet staging files will be related to this topic. So I think keeping everything here is the way to go. All relevant connector docs alreadt have links to this doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that the wording is slightly off. I have updated the doc in a second Avro PR: #9367.

tuliren added a commit that referenced this pull request Jan 9, 2022
tuliren added a commit that referenced this pull request Jan 12, 2022
)

* Support array field with empty items specification

* Remove all exceptions

* Format code

* Bump connector versions

* Bump bigquery versions

* Update docs

* Remove unused code

* Update doc for PR #9363

* Update doc about defaulting all improperly typed fields to string

* Ignore bigquery

* Update version and doc

* Update doc

* Bump version in seed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants