Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove LocalSpannerIO #1429

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Abacn
Copy link
Contributor

@Abacn Abacn commented Apr 11, 2024

Fixes #1362

Action item of #1424 (comment)

Affected PRs are notified: #1392 #1376

Target to merge once Template upgrade to Beam 2.56.0

@Abacn Abacn requested a review from a team as a code owner April 11, 2024 20:10
@Abacn Abacn requested review from darshan-sj and shreyakhajanchi and removed request for a team April 11, 2024 20:10
@Abacn Abacn marked this pull request as draft April 11, 2024 20:12
Copy link

codecov bot commented Apr 11, 2024

Codecov Report

Attention: Patch coverage is 0% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 41.64%. Comparing base (24409db) to head (db2d527).
Report is 15 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1429      +/-   ##
============================================
+ Coverage     40.21%   41.64%   +1.43%     
+ Complexity     2796     2779      -17     
============================================
  Files           738      729       -9     
  Lines         42790    41637    -1153     
  Branches       4577     4457     -120     
============================================
+ Hits          17207    17340     +133     
+ Misses        24104    22818    -1286     
  Partials       1479     1479              
Components Coverage Δ
spanner-templates 59.04% <0.00%> (+2.17%) ⬆️
spanner-import-export 65.59% <0.00%> (ø)
spanner-live-forward-migration 70.96% <ø> (+7.82%) ⬆️
spanner-live-reverse-replication 48.42% <ø> (+5.46%) ⬆️
spanner-bulk-migration 77.69% <ø> (+7.63%) ⬆️
Files Coverage Δ
...ogle/cloud/teleport/spanner/ApplyDDLTransform.java 0.00% <0.00%> (ø)
...google/cloud/teleport/spanner/ExportTransform.java 17.76% <0.00%> (ø)
...com/google/cloud/teleport/spanner/ReadDialect.java 0.00% <0.00%> (ø)
.../cloud/teleport/spanner/ReadInformationSchema.java 0.00% <0.00%> (ø)
...google/cloud/teleport/templates/SpannerToText.java 0.00% <0.00%> (ø)
...leport/templates/SpannerVectorEmbeddingExport.java 0.00% <0.00%> (ø)
...le/cloud/teleport/spanner/TextImportTransform.java 44.67% <0.00%> (ø)
...d/teleport/templates/common/SpannerConverters.java 75.91% <0.00%> (ø)
...google/cloud/teleport/spanner/ImportTransform.java 26.29% <0.00%> (ø)

... and 5 files with indirect coverage changes

@damccorm
Copy link
Contributor

I want to chime in here: I think this is the right change to make. It dramatically simplifies our infrastructure and it is generally a much better practice to have a single source of truth, rather than have 2 different similar but not quite the same IOs. More urgently, we need to fix the conflict problem mentioned in #1362 - it is not reasonable to expect the release manager to dedicate special time to this IO every release, which is what the current setup requires.

However, I also recognize that this fork was done for a reason, and this change introduces challenges. Waiting on Beam changes to roll out is slower and means it takes fixes longer to roll out. In the short term, merging this would also regress #1392 and #1376. I'm not sure how urgent those are, but it would certainly be a cost. In general, I'd probably prefer we at least merge this alongside the next Beam release/templates upgrade to avoid that problem.

Moving forward, I think we have a few options:

  1. Merge this PR with the next Beam release (preferred IMO)
  2. Find another way to fix [Bug]: LocalSpannerIO package namespace conflict with Beam's SpannerIO  #1362 so that it doesn't require release manager intervention to upgrade the Beam version every time.
  3. Do nothing - IMO this is a bad option though because of the maintenance burden it imposes, and if we do this we'd likely at least require the spanner team to be more involved in Beam version upgrades

@Abacn
Copy link
Contributor Author

Abacn commented Apr 12, 2024

Currently test fail with error

Error message from worker: java.lang.IllegalArgumentException: Unknown spanner type FLOAT32 

needs upcoming Beam release to include apache/beam#30893

@manitgupta
Copy link
Member

I agree that this something we need to get done. This is something we wanted to go to towards the later half of the year. LocalSpannerIO got added in the first place due to Beam's release schedule being slower than the release velocity the Spanner team desires in launching new features to the customers.

@damccorm / @Abacn - How do other teams handle this today? I would like to understand if there is a fundamental reason why Spanner needs to this differently, or if other services do not encounter similar release velocity issues.

Can you please let me know what support is needed from the Spanner team to get this through?

There are some implementation gaps between SpannerIO in Beam and LocalSpannerIO, which need to be covered to ensure we don't create a parity difference between the two.
@darshan-sj Can you please evaluate?

@darshan-sj
Copy link
Contributor

In LocalSpannerAccessor line number - 44 to 56, there are some retry settings which are set for ExecuteStreamingSQL method. These are required for Databoost feature to work properly. They are not present in Apache Beam and need to be moved to SpannerAccessor in Apache Beam.
I would suggest going with Option 2 from @damccorm 's comment for now. Then evaluate the diff properly and remove LocalSpannerIO once the diff is migrated to Beam and that version is consumed.

Regarding Beam's release schedule, @damccorm / @Abacn How do other teams deal with this issue? Apache Beam takes 2+ months for release.

@damccorm
Copy link
Contributor

How do other teams handle this today? I would like to understand if there is a fundamental reason why Spanner needs to this differently, or if other services do not encounter similar release velocity issues.

I think this is somewhat of an unsolved problem. Releasing Beam more often may eventually be the answer, but we're not ready for that yet. Existing approaches generally are:

  1. Something close to what you are doing
  2. Deal with the slow release cycle

In general, IOs tend to be mostly stable for teams, so (2) is acceptable for most. It sounds like that isn't true for your team - what is driving your velocity needs here? (I recognize that faster releases are just generally better, but I'm also trying to understand if there are things beyond that, e.g. a big feature, etc...).

Teams will also sometimes go with option (2), and then temporarily fork the code if an urgent issue comes up

Can you please let me know what support is needed from the Spanner team to get this through?

I think we need to drive one of the options above - either:

  1. Merge this PR with the next Beam release - this would also require fixing the gaps @darshan-sj called out in Beam head (which I think we should do no matter what)

or

  1. Find another way to fix [Bug]: LocalSpannerIO package namespace conflict with Beam's SpannerIO  #1362 so that it doesn't require release manager intervention to upgrade the Beam version every time.

I'm actually fine with either - I think (1) is less painful long term, but I'm ok with your team making that tradeoff if you want to go with option (2). If you decide to go that route, I'd ask your team to drive that fix forward.


In the very short term: lets make sure that the local Spanner IO is totally in sync with the one at Beam head so that the next release upgrade is painless.

@Abacn
Copy link
Contributor Author

Abacn commented Apr 15, 2024

Note that the real issue here is #1362 - LocalSpannerIO is under org.apache.beam.io.spanner namespace. This has two issues

It is fine to add some logic in DataflowTemplate to make use of its faster release schedule than Beam. Actually this is one of the advantage of the Template. However, the code should under "com.google.cloud.teleport" namespace

I kindly remember that originally it used the same namespace of Beam SpannerIO because some modification is needed to access package private methods of the SpannerIO. It is not a valid workaround by hacking java access modifiers. If needed, we can have a LocalSpannerIO, as a wrapper of SpannerIO, but it needs to be designed in a way that work under Templates' own namespace.

@manitgupta
Copy link
Member

Thank you @damccorm and @Abacn for your points. I largely agree with the points you have mentioned here. Specifically -

I'm actually fine with either - I think (1) is less painful long term, but I'm ok with your team making that tradeoff if you want to go with option (2). If you decide to go that route, I'd ask your team to drive that fix forward.

Option 2 is -

Find another way to fix [Bug]: LocalSpannerIO package namespace conflict with Beam's SpannerIO #1362 so that it doesn't require release manager intervention to upgrade the Beam version every time.

One of the possible solutions is suggested by @Abacn -

It is fine to add some logic in DataflowTemplate to make use of its faster release schedule than Beam. Actually this is one of the advantage of the Template. However, the code should under "com.google.cloud.teleport" namespace

This looks like a reasonable approach, but will require some investment to make the changes and fix the namespace conflicts as well as redesign LocalSpannerIO in a way that it can extend SpannerIO.

I am not yet sure what approach we should take here - it seems to me that Spanner needs to have an internal discussion with its feature teams to align on the following -

  1. Why do Spanner features need to go in more frequently compared to other services? What do we lose if we wait (Option 1)?
  2. Is there a technical solve (similar to one suggested by @Abacn) that can solve this problem without taking a hit on the release velocity?

I will discuss this internally and get back to you. I like option 2 but I don't think it falls under the "short term" category.


For the very short term -

In the very short term: lets make sure that the local Spanner IO is totally in sync with the one at Beam head so that the next release upgrade is painless.

@darshan-sj - Can you do the following -

  1. Evaluate the diff between SpannerIO and LocalSpannerIO. If the diff is only the retry settings which are added to LocalSpannerIO, go ahead and create the PR on beam so that they are both in sync.
  2. If there is additional differences identified, reach out to the Spanner internal feature team that created the diff in LocalSpannerIO and ask them to take up this work.

@damccorm
Copy link
Contributor

Thanks @manitgupta this plan SGTM

@Abacn Abacn marked this pull request as ready for review May 16, 2024 20:25
@Abacn
Copy link
Contributor Author

Abacn commented May 16, 2024

As dataflow template now upgraded to Beam 2.56.0, the upstream Beam now also included the newer Spanner API

https://github.com/GoogleCloudPlatform/DataflowTemplates/commits/main/v1/src/main/java/org/apache/beam/sdk/io/gcp/spanner hasn't been changed since Apr 10 (the latest commit was for Upgrade Beam version to 2.55.1) so we are save to move forward

@Abacn
Copy link
Contributor Author

Abacn commented May 16, 2024

R: @manitgupta @damccorm

@damccorm
Copy link
Contributor

Technically this looks good to me, and I agree we need to do something here. I think it is up to the Spanner folks on whether they want to move forward with this or a different direction. I'll note that my strong recommendation would be to unfork because of the tech debt forking inevitably leads to.

I will discuss this internally and get back to you. I like option 2 but I don't think it falls under the "short term" category.

@manitgupta what was the outcome of this conversation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: LocalSpannerIO package namespace conflict with Beam's SpannerIO
4 participants