Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redownload CRDB if previous download got killed in the middle #112

Merged
merged 1 commit into from
Sep 10, 2021

Conversation

ZhouXing19
Copy link
Contributor

@ZhouXing19 ZhouXing19 commented Aug 25, 2021

Resolve #102

This commit is to add a flock to the local file when downloading CRDB binary.
If one process finds the localfile for CRDB binary already exists, but in unfinished
mode and can be flocked, it means the previous download process was killed
before it finished. In this case, this process removes the existing local file
and redownload CRDB binary.

Tests are added in TestFlockOnDownloadedCRDB() at testserver_test.go.
Note that test does not fully simulate the downloading gets killed halfway scenario.
A recording with two downloading processes in CLI is here

@ZhouXing19 ZhouXing19 requested a review from rafiss August 25, 2021 20:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ZhouXing19 ZhouXing19 force-pushed the fix-download-102 branch 5 times, most recently from 7b19f1c to dc3de6d Compare August 28, 2021 01:21
@ZhouXing19 ZhouXing19 changed the title Redownload if checking localfile times out Redownload CRDB if previous download got killed halfway Aug 28, 2021
@ZhouXing19 ZhouXing19 force-pushed the fix-download-102 branch 2 times, most recently from eed4db8 to edfb4ec Compare September 1, 2021 16:42
@ZhouXing19 ZhouXing19 marked this pull request as draft September 1, 2021 16:54
@ZhouXing19 ZhouXing19 marked this pull request as ready for review September 1, 2021 16:58
Copy link
Contributor

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ZhouXing19)


testserver/binaries.go, line 56 at r1 (raw file):

	if _, err := localFileLock.TryLock(); err != nil {
		return err
	}

i think it would be safest to add

defer func() { _ = localFileLock.Unlock() }()

testserver/binaries.go, line 73 at r1 (raw file):

	}

	if err := localFileLock.Close(); err != nil {

nit: i think localFileLock.Unlock() would be a bit more readable


testserver/binaries.go, line 84 at r1 (raw file):

var muslRE = regexp.MustCompile(`(?i)\bmusl\b`)

func GetDownloadResponse() (*http.Response, error) {

nit: add a comment to the function now that it's public


testserver/binaries.go, line 164 at r1 (raw file):

		localFileLock := flock.New(localFile)
		// If there's a process downloading the binary, local file cannot be flocked.
		canBeLocked, err := localFileLock.TryLock()

nit: instead of canBeLocked, the name locked is more understandable


testserver/binaries.go, line 165 at r1 (raw file):

		// If there's a process downloading the binary, local file cannot be flocked.
		canBeLocked, err := localFileLock.TryLock()
		if err == nil {

nit: the error handling is more readable like this:

if err != nil {
  return "", err
}
if locked {
  // If local file can be locked ...
  // ... rest of the code
}

testserver/binaries.go, line 169 at r1 (raw file):

				// If local file can be locked, it means the previous download was
				// killed half way. Delete local file and re-download.
				log.Printf("Find half-way failed download, delete and re-download")

nit: change to "previous download failed in the middle, deleting and re-downloading"


testserver/binaries.go, line 171 at r1 (raw file):

				log.Printf("Find half-way failed download, delete and re-download")
				if err := os.Remove(localFile); err != nil {
					log.Printf("failed to remove half-way failed download %s: %s", localFile, err)

nit: change to "failed to remove partial download %s: %v"


testserver/binaries.go, line 186 at r1 (raw file):

	err = downloadFile(response, localFile, tc)
	if err != nil {
		if err.Error() != stoppedInMiddle {

nit: change to if errors.Is(err, stoppedInMiddleError)

matching the error message isn't reliable. see also https://pkg.go.dev/errors#Is


testserver/testserver.go, line 149 at r1 (raw file):

	ts, err := NewTestServer(opts...)
	if err != nil {
		if err.Error() != stoppedInMiddle {

nit: change to if errors.Is(err, stoppedInMiddleError)

matching the error message isn't reliable. see also https://pkg.go.dev/errors#Is


testserver/testserver.go, line 235 at r1 (raw file):

}

const stoppedInMiddle = "download stopped in middle"

nit: this should be:

var stoppedInMiddleError error = errors.New("download stopped in middle")`

testserver/testserver.go, line 269 at r1 (raw file):

		cockroachBinary, err = downloadLatestBinary(&serverArgs.testConfig)
		if err != nil {
			if err.Error() == stoppedInMiddle {

nit: change to if errors.Is(err, stoppedInMiddleError)

matching the error message isn't reliable. see also https://pkg.go.dev/errors#Is


testserver/testserver_test.go, line 220 at r1 (raw file):

// two goroutines, the second goroutine waits for the first goroutine to
// finish downloading the CRDB binary into a local file.
func testFlockWithDownloadPassing(

nice test!


testserver/testserver_test.go, line 271 at r1 (raw file):

	// First NewDBForTest process to download the CRDB binary to the local file,
	// but killed in the middle.
	testserver.NewDBForTest(t, testserver.StopDownloadInMiddleOpt())

is there a way that the test can verify that this first download actually failed. (or at least verify that the DB didnt start up)

@ZhouXing19 ZhouXing19 changed the title Redownload CRDB if previous download got killed halfway Redownload CRDB if previous download got killed in the middle Sep 9, 2021
@ZhouXing19 ZhouXing19 force-pushed the fix-download-102 branch 3 times, most recently from 569ba1f to 851618e Compare September 9, 2021 23:40
Copy link
Contributor Author

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @ZhouXing19)


testserver/binaries.go, line 56 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i think it would be safest to add

defer func() { _ = localFileLock.Unlock() }()

Nice catch, thanks!


testserver/binaries.go, line 73 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: i think localFileLock.Unlock() would be a bit more readable

Done.


testserver/binaries.go, line 84 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: add a comment to the function now that it's public

Done.


testserver/binaries.go, line 164 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: instead of canBeLocked, the name locked is more understandable

Done.


testserver/binaries.go, line 165 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: the error handling is more readable like this:

if err != nil {
  return "", err
}
if locked {
  // If local file can be locked ...
  // ... rest of the code
}
</blockquote></details>

Done.
___
*[testserver/binaries.go, line 169 at r1](https://reviewable.io/reviews/cockroachdb/cockroach-go/112#-MjBc17G0XCriq-PThbQ:-MjBuVGr425rAX28-bUt:b-896fix) ([raw file](https://github.com/cockroachdb/cockroach-go/blob/a625026cbb15edb20aedc5b0081561d9310f7bee/testserver/binaries.go#L169)):*
<details><summary><i>Previously, rafiss (Rafi Shamim) wrote…</i></summary><blockquote>

nit: change to "previous download failed in the middle, deleting and re-downloading"
</blockquote></details>

Done.
___
*[testserver/binaries.go, line 171 at r1](https://reviewable.io/reviews/cockroachdb/cockroach-go/112#-MjBcF3b8QqDtC6SvNu2:-MjBuYCH0NLriqJV_7ql:b5u0s93) ([raw file](https://github.com/cockroachdb/cockroach-go/blob/a625026cbb15edb20aedc5b0081561d9310f7bee/testserver/binaries.go#L171)):*
> Done.
Done.
___
*[testserver/binaries.go, line 186 at r1](https://reviewable.io/reviews/cockroachdb/cockroach-go/112#-MjBdFpA3xRbZVynnHOw:-MjBuZQx6XjsMko9_1lc:b-896fix) ([raw file](https://github.com/cockroachdb/cockroach-go/blob/a625026cbb15edb20aedc5b0081561d9310f7bee/testserver/binaries.go#L186)):*
<details><summary><i>Previously, rafiss (Rafi Shamim) wrote…</i></summary><blockquote>

nit: change to `if errors.Is(err, stoppedInMiddleError)`

matching the error message isn't reliable. see also https://pkg.go.dev/errors#Is

</blockquote></details>

Done.
___
*[testserver/testserver.go, line 149 at r1](https://reviewable.io/reviews/cockroachdb/cockroach-go/112#-MjBd2Zr-TFNkGHsz6BV:-MjBub1fDFB6eEuTbF56:b-896fix) ([raw file](https://github.com/cockroachdb/cockroach-go/blob/a625026cbb15edb20aedc5b0081561d9310f7bee/testserver/testserver.go#L149)):*
<details><summary><i>Previously, rafiss (Rafi Shamim) wrote…</i></summary><blockquote>

nit: change to `if errors.Is(err, stoppedInMiddleError)`

matching the error message isn't reliable. see also https://pkg.go.dev/errors#Is
</blockquote></details>

Done.
___
*[testserver/testserver.go, line 235 at r1](https://reviewable.io/reviews/cockroachdb/cockroach-go/112#-MjBcdLB-iYkm00dFvsk:-MjBucnI-AnQOsf6rGDD:b-896fix) ([raw file](https://github.com/cockroachdb/cockroach-go/blob/a625026cbb15edb20aedc5b0081561d9310f7bee/testserver/testserver.go#L235)):*
<details><summary><i>Previously, rafiss (Rafi Shamim) wrote…</i></summary><blockquote>

nit: this should be:

var stoppedInMiddleError error = errors.New("download stopped in middle")`

</blockquote></details>

Done.
___
*[testserver/testserver.go, line 269 at r1](https://reviewable.io/reviews/cockroachdb/cockroach-go/112#-MjBdTvnA0biL7ZQdRLn:-MjBudRx9tTh44WIJHmu:b-896fix) ([raw file](https://github.com/cockroachdb/cockroach-go/blob/a625026cbb15edb20aedc5b0081561d9310f7bee/testserver/testserver.go#L269)):*
<details><summary><i>Previously, rafiss (Rafi Shamim) wrote…</i></summary><blockquote>

nit: change to `if errors.Is(err, stoppedInMiddleError)`

matching the error message isn't reliable. see also https://pkg.go.dev/errors#Is
</blockquote></details>

Done.
___
*[testserver/testserver_test.go, line 271 at r1](https://reviewable.io/reviews/cockroachdb/cockroach-go/112#-MjBeBBz5h7Fzpvtg6C3:-MjBufxW0WGEMcKlkVay:bvcjhhn) ([raw file](https://github.com/cockroachdb/cockroach-go/blob/a625026cbb15edb20aedc5b0081561d9310f7bee/testserver/testserver_test.go#L271)):*
<details><summary><i>Previously, rafiss (Rafi Shamim) wrote…</i></summary><blockquote>

is there a way that the test can verify that this first download actually failed. (or at least verify that the DB didnt start up)
</blockquote></details>

Added [this part](https://github.com/cockroachdb/cockroach-go/pull/112/files#diff-1f98793725685c02c97226566316ce6041e8222c01dc6e1d4c3a625dd31b9962R154-R157) to ensure that the ts is truly stopped. It's a bit hard to verify if the download is actually failed, since `NewDBForTest()` doesn't return any info of the testserver.


<!-- Sent from Reviewable.io -->

Copy link
Contributor

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @ZhouXing19)


testserver/binaries.go, line 56 at r4 (raw file):

	localFileLock := flock.New(filePath)

	defer func() { _ = localFileLock.Unlock() }()

nit: this defer should be moved so it's after the error check for TryLock (below the next if-statement)


testserver/binaries.go, line 174 at r4 (raw file):

		if err != nil {
			return "", err
		} else {

nit: we don't need the else since the previous if statement short-circuits

the reason i suggest this is because stylistically, it's more readable if the main logic of the function is indented as little as possible


testserver/testserver.go, line 152 at r4 (raw file):

			t.Fatal(err)
		} else {
			// If the testserver is intentionally killed in the middle,

nit: i think it would be more readable if the conditions were inverted, and we avoid the unneeded else:

if errors.Is(err, errStoppedInMiddle) {
  // If the testserver is intentionally killed ...
  return nil, func() { /* ... */ }
}
t.Fatal(err)

@ZhouXing19 ZhouXing19 force-pushed the fix-download-102 branch 2 times, most recently from 86222ab to f60658f Compare September 10, 2021 15:24
Copy link
Contributor Author

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @ZhouXing19)


testserver/binaries.go, line 56 at r4 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: this defer should be moved so it's after the error check for TryLock (below the next if-statement)

Done.


testserver/binaries.go, line 174 at r4 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: we don't need the else since the previous if statement short-circuits

the reason i suggest this is because stylistically, it's more readable if the main logic of the function is indented as little as possible

That makes lots of sense, thanks! Done.


testserver/testserver.go, line 152 at r4 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: i think it would be more readable if the conditions were inverted, and we avoid the unneeded else:

if errors.Is(err, errStoppedInMiddle) {
  // If the testserver is intentionally killed ...
  return nil, func() { /* ... */ }
}
t.Fatal(err)

Done.

Copy link
Contributor

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! nice work

Resolve cockroachdb#102

This commit is to add a flock to the local file when downloading CRDB binary.
If such localfile exists but can be flocked, it means  the previous download
process was killed before it finished. In this case, remove the existing local file
and redownload CRDB binary.
@ZhouXing19 ZhouXing19 merged commit fb73350 into cockroachdb:master Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

testserver: if killed during download, future invocations will not work
3 participants