-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PkgServer synchronization (Pkg Server version of General is delayed relative to Git clone of General) #16777
Comments
The Pkg client should fall back to fetching the package version directly from GitHub, no? |
The issue here is that pkg client would directly download the out-of-date "latest" registry from pkg server, and then it finds no v0.2.6 when doing version resolving.
I'm not sure how frequently storage server updates, but there's still a time gap here. |
@staticfloat How frequently does the Pkg server update its copy of the General registry? |
It runs in a loop continuously pulling the registry and updating things. So it’s generally mostly up-to-date but not instantaneous. |
Depending on a point version that you just published in CI seems like kind of a corner case. You have to wait for registration to go through as well, how is waiting for it to get into storage servers any different? |
That I received a merged notification/email from General, I retriggered the CI, and then Yes, this is an edge case for CI only, and it could be totally fine to do the rest of the work later. It's just not making the pipeline as smooth as it was; I usually retrigger the relevant CI when I saw the notification. Or should we unset PkgServer in CI? |
It would be nice to get some concrete measurements on this. If the delay between "PR merged in General" to "new version is available from the Pkg server" is 5 minutes, then I think that's no big deal. But if the delay is e.g. 30 minutes, I think that would be annoying and should be fixed. @johnnychen94 How long was the delay for you?
That would deprive the community of some useful telemetry statistics. I'd rather be able to keep using Pkg Server in CI. |
I checked it again after 15mins and it failed, then I went away to do other of my work, now it works. I'll report if I get more data. It usually takes about 8-15mins for an incremental update in my storage server in LAN. But I've also observed 40mins in the BFSU mirror, though. The current loop in gen_static.jl iterates on all packages and all versions, and most of the time is wasted on untaring existing versions only to get |
Personally, I think that 15 minutes is too slow. It would be good to get this delay down to 5 minutes or less, in my opinion. It seems like the only issue here is updating the registry, right? If you have an updated registry, but the tarball of the code is not in the Pkg Server, then you'll just fall back to downloading the tarball from Git. The issue here is that the registry itself is not up to date. Could we have two loops in the Pkg Server, running in parallel side by side? One loop does just the registry. It repeatedly updates the registry. The other loop does everything else. |
Just to elaborate, in the pre-PkgServer days, the delay was essentially zero. As soon as a pull request was merged, you only had to wait a few seconds before updating your registry, since the registry was a Git repo, and GitHub updates Git repos within a matter of seconds. So in my opinion, going from a delay of less than one minute to a delay of greater than or equal to fifteen minutes is a clear regression. |
I just tested and for me it was less than 1 minute. I don't think we can do better than that, we have to let stuff propagate through the system. |
1 minute is definitely fine. I would say anything less than or equal to 5 minutes is fine. |
Personally, I think this can be a good idea in practice. But just to be clear, IIUC it's deliberately designed to update |
It's curious that for Fredrik the update took 1 minute, but for Johnny it took more than 15 minutes. It might be helpful to collect more samples. Perhaps someone could write a script that routinely pulls the registry from the Pkg server, and pulls a list of recently merged PRs from General, and loops through the recently merged PRs (starting with the most recently merged PR) and goes backwards in time until it finds the most recent PR that is included in the registry provided by Pkg server. If we automate that process, we can collect a lot of samples and figure out how common it is for the Pkg server registry to be more than 5 minutes delayed. |
Yeah I realize now my "two loops" suggestion breaks that promise. Maybe best to look for other solutions. |
It just seems like a bad user experience that if a bug fix is registered and merged in General, now you have to wait an unknown period of time before the bug fix is accessible to you? At the very least, it would be useful to be able to figure out how recent your registry is. Does the Pkg server expose any endpoint that would let me get e.g. the UTC timestamp corresponding to the last time when the Pkg server cloned the registry? So at least I can get a sense of how recent my Pkg server registry is. |
@johnnychen94, are you running a modified version of the |
E.g. do you have |
The 15 minutes I observed is an approximate time in CI and not using my local storage server.
Oops, looks like I didn't set this correctly. My local storage server and BFSU mirror, however, do run a refactored-version of I'll make a script as @DilumAluthge suggested and give a further report |
The following is what I've collected since last 12 hours on
available time: the local time when I get a new hash from pkg/storage server via There're 26 new "discontinuous" commits recorded, while only 19 of them achieved by the storage server. script: https://gist.github.com/johnnychen94/98fde55fc341d0c967f8f5ef2a48956a |
That's great Johnny, we should track this over time somehow. |
I've just fixed some issues with the Korean storage server; please keep track of the latency of Registry -> https://kr.storage.julialang.org over the next couple of days. I think it shuold be much better than in the past. One remaining design reason why the registry updates may be slow sometimes is that the storage server does not advertise the new registry hash until it has downloaded and stored all new resources; e.g. it doesn't advertise a registry until it can serve everything referenced by that registry. It's possible we may want to change that in order to expedite registry service, but I'm not 100% sure. In any case, let's see what the user experience is like with the current design, but with less bugs. :) |
Tracking
The first three records seems like a warm-up. Feel free to close this issue when you think it is stable. Just curious, is there any public access to the build script? |
The storage server code is not public and is substantially more complex than the simple static server script. The premise as outlined in the original Pkg design issue, is that different entities provide independent storage services, which are treated uniformly by the pkg servers. @staticfloat and I have talked about exposing new |
Also, great to see those fast update times! That's what I had always imagined this should be like. It should continue to be like that going forward. |
Would it be possible to eventually open source the storage server code? |
That is not something we're planning. The storage servers are built and maintained by Julia Computing, offered to as a free service to the community. A large part of their functionality is interacting with proprietary systems like GitHub and GitLab to get resources and AWS/S3 for persistence and those capabilities are also key features of JC's JuliaTeam product offering. If anyone else wants to build and maintain a storage service, they should absolutely do so—the protocol is very simple. I do think that we should have an open source script that mirrors the Julia Computing storage servers and serves them statically. That will act as a backup in case the JC storage servers go down. I'm realizing now that since the storage servers are JC proprietary they probably should not be called {us-east,kr}.storage.julialang.org but should instead be named {us-east,kr}.storage.juliahub.com. @staticfloat, how hard would it be to change their host names? |
Not hard.
…On Tue, Jul 21, 2020 at 09:08 Stefan Karpinski ***@***.***> wrote:
That is not something we're planning. The storage servers are built and
maintained by Julia Computing, offered to as a free service to the
community. A large part of their functionality is interacting with
proprietary systems like GitHub, GitLab to get resources and AWS/S3 for
persistence and those capabilities are also key features of JC's JuliaTeam
product offering. If anyone else wants to build and maintain a storage
service, they should absolutely do so—the protocol is very simple. I do
think that we should have an open source script that mirrors the Julia
Computing storage servers and serves them statically. That will act as a
backup in case the JC storage servers go down.
I'm realizing now that since the storage servers are JC proprietary they
probably should not be called {us-east,kr}.storage.julialang.org but
should instead be named {us-east,kr}.storage.juliahub.com. @staticfloat
<https://github.com/staticfloat>, how hard would it be to change their
host names?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16777 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA762BUTDEFPM5LTF6TDC3R4W4PZANCNFSM4OFF6XLQ>
.
|
That makes sense to me! |
Based on the fix @JeffFessler mentioned (thanks!), I was able to get this to work by just adding the environment variable PS, without this fix I'm still getting the sync issue more than 36 hours later. |
Just experienced this too with a 5hr+ delay today. |
Yeah, it's a public resource and sometimes very large artifacts get submitted which take a long time to get processed, preventing updates for a while. If you want to see things immediately, you can do |
The delays in the registry updates have caused ~ten users of PySR (has SymbolicRegression.jl as backend) to raise GitHub issues or email me, despite me pinning an issue with the What happens is: I usually update the Julia backend, wait for it to merge with the registry, and then update the PyPI package (Python package server). PyPI is instantly updated, and my tests pass fine because the Julia GitHub action uses the git-based registry, but the Julia default registry can take more than a day sometimes, so this will cause any user (who doesn't use Julia regularly) who updates PySR to see an issue about Julia not being able to find the updated backend. |
The original design of Pkg/Storage server is that they only provide the registry versions of which it holds the complete package and artifact data, so a fallback like this should not be implemented on the PkgServer side. Currently, the pkg client talks to the pkg server to update its registry. Now I think this issue can be perfectly fixed by adding a registry server which only serves the General registry, so that
An officially hosted registry server also solves the trust issue about 3rd-party pkg servers; where pkg client queries the SHA and URL from the official registry server, download from a 3rd party pkg server, and verify the downloaded data.
|
To try to avoid out of date registry errors, following the instructions at: JuliaRegistries/General#16777 (comment)
…. Remove once synced. See JuliaRegistries/General/issues/16777
I am seeing a problem that looks like it is related, with a six hour delay between the registration of the package and now when adding the package fails: https://discourse.julialang.org/t/registered-package-invisible/67533/2 |
I made a Discourse post that summarizes this issue and provides the workaround for users that need immediate access to new packages and new versions. https://discourse.julialang.org/t/general-registry-delays-and-a-workaround/67537?u=dilumaluthge |
With some recent upgrades to the StorageServer, this should now be pretty much fixed. Please shout out if you experience PkgServer registry delays, as they should be eliminated now. We have aded some client-side configuration that can be used to communicate to the PkgServer if you would like a more bleeding-edge or conservative registry, see this issue for more detail: JuliaPackaging/PkgServer.jl#144 |
With pkg server enabled by default since Julia 1.5, there's an issue that a release PR is merged in General while the new version is still unavailable in the storage server in a short period. Because users don't know whether the pkg server has synced the commit, this would frequently break the CI test.
One such example is: once ImageMorphology v0.2.6 is added to General, I immediately retrigger the CI in JuliaImages/Images.jl#895 and then CI on nightly fails because it couldn't find ImageMorphology v0.2.6. In this case, the PR merged notification is a lie to developers 😂
cc: @staticfloat @StefanKarpinski
The text was updated successfully, but these errors were encountered: