Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache corruption on various Pkg servers #126

Open
fredrikekre opened this issue May 17, 2021 · 17 comments
Open

Cache corruption on various Pkg servers #126

fredrikekre opened this issue May 17, 2021 · 17 comments

Comments

@fredrikekre
Copy link
Member

$ for region in eu-central us-east1 us-east2 us-west; do
> curl -fsSL https://${region}.pkg.julialang.org/package/12aac903-9f7c-5d81-afc2-d9565ea332ae/83aa52e9121f1cf30e7dd170b66889f9d37ecf61 | sha256sum 
> done
89e49debfc51ef09f27cf69ac7c2d927b98667f964db254e0dbe6c290aec2a1b  -
a373504da76c1a87631d96b466c49c150c774efe10954daf9fa15055d42fb59c  -
a373504da76c1a87631d96b466c49c150c774efe10954daf9fa15055d42fb59c  -
a373504da76c1a87631d96b466c49c150c774efe10954daf9fa15055d42fb59c  -

Relevant log extract:

┌ Info: [2021-05-16 04:32:02] downloading resource
│   server = https://us-east.storage.juliahub.com
│   resource = /package/12aac903-9f7c-5d81-afc2-d9565ea332ae/83aa52e9121f1cf30e7dd170b66889f9d37ecf61
└ @ PkgServer /app/src/resource.jl:422
┌ Info: [2021-05-16 04:32:02] download complete
│   server = https://us-east.storage.juliahub.com
│   resource = /package/12aac903-9f7c-5d81-afc2-d9565ea332ae/83aa52e9121f1cf30e7dd170b66889f9d37ecf61
│   elapsed = 0.4293828010559082
└ @ PkgServer /app/src/resource.jl:481
┌ Info: [2021-05-16 04:32:02] Moving
│   temp_file = /app/storage/temp/package/12aac903-9f7c-5d81-afc2-d9565ea332ae/83aa52e9121f1cf30e7dd170b66889f9d37ecf61.inprogress
│   new_path = /app/storage/cache/package/12aac903-9f7c-5d81-afc2-d9565ea332ae/83aa52e9121f1cf30e7dd170b66889f9d37ecf61
│   filesize(temp_file) = 50805
└ @ PkgServer /app/src/resource.jl:84
┌ Info: [2021-05-16 04:32:02] Deleting and pruning
│   temp_file = /app/storage/temp/package/12aac903-9f7c-5d81-afc2-d9565ea332ae/83aa52e9121f1cf30e7dd170b66889f9d37ecf61.inprogress
└ @ PkgServer /app/src/resource.jl:91
┌ Error: [2021-05-16 04:32:02] dl_task abnormal termination!
└ @ PkgServer /app/src/resource.jl:402
┌ Error: [2021-05-16 04:32:02] file size mismatch
│   content_length = 125646
│   actual = 50805
└ @ PkgServer /app/src/resource.jl:572
@staticfloat
Copy link
Member

Strange. Looks like something went wrong during the copy, perhaps. But I don't see any errors.

@fredrikekre
Copy link
Member Author

fredrikekre commented Jun 30, 2021

This happened again with /package/fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc/8fd6e824d5fe2fbf45d19b17cb1c72895b5ceff1

Edit: Same log behavior as before

┌ Info: [2021-06-29 16:25:24] downloading resource
│   server = https://us-east.storage.juliahub.com
│   resource = /package/fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc/8fd6e824d5fe2fbf45d19b17cb1c72895b5ceff1
└ @ PkgServer /app/src/resource.jl:422
┌ Info: [2021-06-29 16:25:25] download complete
│   server = https://us-east.storage.juliahub.com
│   resource = /package/fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc/8fd6e824d5fe2fbf45d19b17cb1c72895b5ceff1
│   elapsed = 0.433337926864624
└ @ PkgServer /app/src/resource.jl:481
┌ Info: [2021-06-29 16:25:25] Moving
│   temp_file = /app/storage/temp/package/fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc/8fd6e824d5fe2fbf45d19b17cb1c72895b5ceff1.inprogress
│   new_path = /app/storage/cache/package/fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc/8fd6e824d5fe2fbf45d19b17cb1c72895b5ceff1
│   filesize(temp_file) = 1602524
└ @ PkgServer /app/src/resource.jl:84
┌ Info: [2021-06-29 16:25:25] Deleting and pruning
│   temp_file = /app/storage/temp/package/fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc/8fd6e824d5fe2fbf45d19b17cb1c72895b5ceff1.inprogress
└ @ PkgServer /app/src/resource.jl:91
┌ Error: [2021-06-29 16:25:25] dl_task abnormal termination!
└ @ PkgServer /app/src/resource.jl:402
┌ Error: [2021-06-29 16:25:25] file size mismatch
│   content_length = 2279999
│   actual = 1602524
└ @ PkgServer /app/src/resource.jl:572

@racinmat
Copy link

racinmat commented Jun 30, 2021

I noticed the corruption has happened for AWS.jl version 1.36.0, but it's been manually fixed

@racinmat
Copy link

Happened also with /package/bdcacae8-1622-11e9-2a5c-532679323890/e9f52dd5b33bba1b825bdb69b72844e81285c2c1 (LoopVectorization)

@fredrikekre
Copy link
Member Author

This seems to happen quite a lot. These are from yesterday for example:

$ gzip -cdk 2021-06-30-pkgserver.log.gz | grep "file size mismatch"
┌ Error: [2021-06-30 12:03:43] file size mismatch
┌ Error: [2021-06-30 13:38:20] file size mismatch
┌ Error: [2021-06-30 15:46:45] file size mismatch
┌ Error: [2021-06-30 19:19:51] file size mismatch
┌ Error: [2021-06-30 19:36:03] file size mismatch
┌ Error: [2021-06-30 19:48:15] file size mismatch
┌ Error: [2021-06-30 21:19:04] file size mismatch

@fredrikekre
Copy link
Member Author

Mostly on eu-central, but also on e.g. us-east2:

$ gzip -cdk 2021-06-30-pkgserver.log.gz | grep "file size mismatch"
┌ Error: [2021-06-30 08:03:05] file size mismatch
┌ Error: [2021-06-30 20:00:56] file size mismatch

@staticfloat
Copy link
Member

Hmmm, file size mismatch generally means that the amount we transmitted to the client doesn't match what we expected to send, not that the amount we received from the storage server doesn't match.

@fredrikekre
Copy link
Member Author

Right, so the grepping there might be wrong. But at least every cache corruption I have seen and fixed have had that in the logs. Perhaps the cache becomes corrupted if the transfer to client is aborted for some reason?

@StefanKarpinski
Copy link
Collaborator

@staticfloat: is it reading the files directly into the cache location? If so, that's bad: it should fully download a file to a temporary location and verify it's Tar.tree_hash value before moving the file into the cache.

@StefanKarpinski
Copy link
Collaborator

Did this get resolved?

@racinmat
Copy link

I haven't observed this error for few weeks, but maybe it's just coincidence, I'll be looking how it works.

@racinmat
Copy link

I have observed it on this url, for package Tables.jl version 1.5.0 https://pkg.julialang.org/package/bd369af6-aec1-5ad0-b16a-f7cc5008161c/d0c690d37c73aeb5ca063056283fde5585a41710

@StefanKarpinski
Copy link
Collaborator

The version I downloaded just now was complete and correct (correct tree hash).

@fredrikekre
Copy link
Member Author

Yea I removed the bad cached version.

@fredrikekre
Copy link
Member Author

fredrikekre commented Feb 4, 2022

Happened again (us-west2 this time though):

$ curl -fsSL https://eu-central.pkg.julialang.org/package/3b182d85-2403-5c21-9c21-1e1f0cc25472/344bf40dcab1073aca04aa0df4fb092f920e4011 | sha256sum 
d77d43daa270150a19dd5b3b99eb408765c8282434f22cecf0cffdbc61d145a6  -

$ curl -fsSL https://us-west2.pkg.julialang.org/package/3b182d85-2403-5c21-9c21-1e1f0cc25472/344bf40dcab1073aca04aa0df4fb092f920e4011 | sha256sum 
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  -

# Nuked the cache here...

$ curl -fsSL https://us-west2.pkg.julialang.org/package/3b182d85-2403-5c21-9c21-1e1f0cc25472/344bf40dcab1073aca04aa0df4fb092f920e4011 | sha256sum 
d77d43daa270150a19dd5b3b99eb408765c8282434f22cecf0cffdbc61d145a6  -

@KristofferC
Copy link
Member

How does it end up corrupted in the cache in the first place? Shouldn't it be verified before it is put in there?

@fredrikekre
Copy link
Member Author

Again:

$ for prefix in eu-central us-west us-east sa in kr sg au; do
> echo ${prefix}
> curl -fsSL ${prefix}.pkg.julialang.org/package/ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a/0aafd5121c6e1b6a83bd3bb341da45f058225a9b | sha256sum -
> done
eu-central
1be8b291c6e54caaa998a28ba62b6ca7a794ec57a67451f0569918b0dbdd6a96  -
us-west
1be8b291c6e54caaa998a28ba62b6ca7a794ec57a67451f0569918b0dbdd6a96  -
us-east
1be8b291c6e54caaa998a28ba62b6ca7a794ec57a67451f0569918b0dbdd6a96  -
sa
1be8b291c6e54caaa998a28ba62b6ca7a794ec57a67451f0569918b0dbdd6a96  -
in
14363f099eed8a59ed77069e720db705fee7c4543cac654cfd1a09fc6cab1204  -          <----- 
kr
1be8b291c6e54caaa998a28ba62b6ca7a794ec57a67451f0569918b0dbdd6a96  -
sg
1be8b291c6e54caaa998a28ba62b6ca7a794ec57a67451f0569918b0dbdd6a96  -
au
1be8b291c6e54caaa998a28ba62b6ca7a794ec57a67451f0569918b0dbdd6a96  -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants