Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the use of crdb.Execute() function. #70

Closed
georgysavva opened this issue May 20, 2020 · 5 comments
Closed

Question about the use of crdb.Execute() function. #70

georgysavva opened this issue May 20, 2020 · 5 comments

Comments

@georgysavva
Copy link
Contributor

georgysavva commented May 20, 2020

crdb package has that function:

cockroach-go/crdb/tx.go

Lines 95 to 102 in 73ffeee

func Execute(fn func() error) (err error) {
for {
err = fn()
if err == nil || !errIsRetryable(err) {
return err
}
}
}

From the docs it's clear that it should be used to retry single statement (implicit transaction) operations.
But I don't quite understand should my app use it or not if I can't predict is 16K result buffer is enough for me in all situations. Let me explain:
My application will extract reasonable amount of rows up to 100 e.g. via limits and it's not going to stream data somewhere else. So it scans data from all rows in an array locally and return as a whole.
But I can't be sure that some batch won't exceed the 16K limit (for example because of long texts in some column).
To protect myself from transaction contention errors. I see two solutions here:

  1. I could wrap all my single statement calls to cockraochDB in crdb.Execute().
  2. Or It's better to increase the result buffer to always be in the limit and don't allow cockroachdb to start to stream. And If I see transaction contention errors in the logs it will mean that I either should increase the limit again or investigate why my app extracts that much data and restrict it.

It might be unrelated to this repository and should be asked in the main slack channel.

@georgysavva
Copy link
Contributor Author

Hey. Any update on this?)

@rafiss
Copy link
Contributor

rafiss commented Jun 29, 2020

I think it would be best to always try to make sure that the batch won't exceed the 16K limit. What kind of data are you working with? Is there any max size for each row? If so, the safest thing would be to always make sure to only load as many rows as will fit if you assume each row has the max size.

Asking in the slack channel might be a good idea too -- dealing with a limited-size-buffer is probably something that has come up for others too.

@georgysavva
Copy link
Contributor Author

The type of data that I am working with is something like user profiles in a social app. Rows contain a bunch of text columns the size of which can be limited, maybe few JSON columns with unstructured data that also shouldn't grow large. So I guess, yes, It possible to calculate each row max size and use pagination limits with the buffer size in mind. For example if I know that max row size is 500B, and I need to select up to 50 rows. The result size will be 25KB that exceeds the default buffer limit in 16KB and I need to increase it to 32KB, right?

@rafiss
Copy link
Contributor

rafiss commented Jul 8, 2020

That math sounds good to me. :)

@georgysavva
Copy link
Contributor Author

Thanks for helping me to figure this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants