Add CRC32C checksum based validation in GCS Avro to Cloud Spanner pipeline#3509
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the data integrity validation capabilities within the GCS Avro to Cloud Spanner pipeline. It introduces support for CRC32C checksums, providing an alternative validation mechanism when MD5 checksums are unavailable. This change improves the robustness of the import process by ensuring that data consistency can be verified using a widely accepted checksum algorithm, thereby reducing the risk of data corruption during transfer. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3509 +/- ##
============================================
- Coverage 52.49% 52.45% -0.05%
- Complexity 5790 5814 +24
============================================
Files 1057 1062 +5
Lines 63968 64198 +230
Branches 7052 7101 +49
============================================
+ Hits 33582 33673 +91
- Misses 28110 28240 +130
- Partials 2276 2285 +9
🚀 New features to boost your workflow:
|
bharadwaj-aditya
left a comment
There was a problem hiding this comment.
There should be changes to the export pipeline to provide this as well.
It would not make sense to export CRC32C checksum for all exports. Should I introduce a parameter like below and use it? |
adityatulasi-google
left a comment
There was a problem hiding this comment.
Responded to open comments. Please take another look.
|
Support and proses earnings |
adityatulasi-google
left a comment
There was a problem hiding this comment.
Probably last iteration before making the changes.
bharadwaj-aditya
left a comment
There was a problem hiding this comment.
Please add the comment as suggested around the reasoning for the structure of the code.
LGTM overall
Please make this an ENUM input with default as MDF |
adityatulasi-google
left a comment
There was a problem hiding this comment.
Added support for CRC32C in Export pipeline as well.
331c384 to
d14c4fc
Compare
adityatulasi-google
left a comment
There was a problem hiding this comment.
Cannot add default value to Enums in v1 templates.
bharadwaj-aditya
left a comment
There was a problem hiding this comment.
it seems like the job does not work with incorrect parameters - so marking LGTM
d14c4fc to
9c696c3
Compare
…eline when MD5 is missing
9c696c3 to
7579f87
Compare
adityatulasi-google
left a comment
There was a problem hiding this comment.
Fixed spotless errors. PTAL.
No description provided.