-
Notifications
You must be signed in to change notification settings - Fork 1
USN database downloads are interrupted when the site is redeployed #36
Comments
I think a quick solution for this could be to increase I will do some testing with this setting. |
@WillMoggridge if a client is downloading a copy of the USN database, will new deployments of the USN website need to wait for that client to finish its download? I want to make sure that someone can't prevent us from publishing new USNs by simply repeatedly downloading the USN database in a loop. |
@tyhicks They will not be able to block a new release. Once a container is set for termination, no new connections can be made to it but the existing downloads will be allowed to finish. There will also be a time limit for existing connections we set and can tweak. While they are waiting for existing connections to close, the new containers will start up and serve the new site. |
@WillMoggridge that sounds like the perfect solution |
@WillMoggridge Hi! Any update here? This is fairly urgent to get corrected since it affects Landscape users. |
This appears to be happening literally every 1 minute. Is something causing the containers to be continually recycled at the moment? You can test simply with this command: It continually disconnects at a semi-random interval between 60-120 seconds 2018-03-22 15:32:53-- https://usn.ubuntu.com/usn-db/database.pickle.bz2 To be clear, this is causing major problems for Landscape Users who with slow enough connections (150KB/s-300KB/s) can never successfully download the 15MB pickle file (currently existing deployed versions do not attempt to resume the download - although that is still not always an ideal solution since the database may well change out from under them in some cases if its regenerated) |
I was able to verify this and I also made sure that a new deployment of the USN website wasn't happening at the same time. We'll need @nottrobin or @WillMoggridge to investigate this. |
The solution should be pushed live and I wanted to check in and see if any improvements have been seen for this situation but it sounds like it is lacking. We will investigate why those drops are happening. Separately from that we have been talking with IS who are working on a high priority ticket (RT#109653) building a new full caching layer. This hopes to be a full solution to these problems and is progressing well. |
Just confirming that as of right now the drops are still happening |
I want to update you that I am still looking into this for a fix for the timeouts. I am talking with IS a little and continuing to investigate. |
Still seeing this issue 2018-04-07 14:03:46-- https://usn.ubuntu.com/usn-db/database.pickle.bz2 |
I am no longer seeing that I always get disconnected every 2 minutes, sometimes I am and other times it takes longer. But I still always see it eventually. Wanted to try different IPs but I can't find a way to make wget/curl forcibly use the various IPs to see if there is a difference between them. Today from 162.213.33.205 |
On Mon, Apr 9, 2018, 00:18 Trent Lloyd ***@***.***> wrote:
I am no longer seeing that I always get disconnected every 2 minutes,
sometimes I am and other times it takes longer. But I still always see it
eventually. Wanted to try different IPs but I can't find a way to make
wget/curl forcibly use the various IPs to see if there is a difference
between them.
Today from 162.213.33.205
2018-04-09 15:08:57-- https://usn.ubuntu.com/usn-db/database.pickle.bz2
2018-04-09 15:16:36 (10.0 KB/s) - Connection closed at byte 4684437.
Retrying.
Curl has a --resolve switch or something similar that you can use to give a
specific IP address for a given name:port pair.
Thanks
… |
Thanks for the tip.. that works great using --resolve. Seeing roughly every 5 minutes from a couple of different IPs (.20, .207) sometimes up to 10 but more commonly 5. Won't update any further with the status of what IPs take how long etc as I don't see a specific pattern but more just wanted to make the point that it now seems more variable than previous -- before it was reliably every ~2 minutes now it seems usually every 5-10 and occasionally a bit longer. |
Ha, I love the blackhole tip, thanks!
|
I'm also seeing the same behaviour that Trent reports. I ran this curl command with a 20k speed limit multiple times this morning and they all failed: curl https://usn.ubuntu.com/usn-db/database.pickle.bz2 --output /dev/null --limit-rate 20k % Total % Received % Xferd Average Speed Time Time Time Current 33 15.1M 33 5284k 0 0 16300 0 0:16:17 0:05:32 0:10:45 12444 40 15.1M 40 6302k 0 0 16297 0 0:16:17 0:06:36 0:09:41 15308 67 15.1M 67 10.2M 0 0 16319 0 0:16:16 0:10:57 0:05:19 18969 12 15.1M 12 1888k 0 0 16385 0 0:16:12 0:01:58 0:14:14 14254 10 15.1M 10 1637k 0 0 16277 0 0:16:18 0:01:43 0:14:35 14361 16 15.1M 16 2563k 0 0 16307 0 0:16:16 0:02:41 0:13:35 20317 63 15.1M 63 9834k 0 0 16295 0 0:16:17 0:10:18 0:05:59 20778 The last 4 tests failed at: Wed Apr 11 09:09:49 EDT 2018 |
I can confirm that this is indeed the upstream USN server going away, and being replaced by another server instance. I ran 4 parallel instances in a loop of {1..10}, to download the database.pickle file, and all 4 when started, reached the same physical server upstream. At around the 3-4 minute mark in this case, all 4 instances went down, and the next iteration of the loop continued, reaching another server entirely, again all reached the same hostname, but a different host than the previous loop.
Look carefully at X-Hostname in each loop, and you'll see that it's changing when it gets dropped. You can see this by executing the following:
We could add resume code here using the Content-Length of the remote resource, and feeding that into curl with '-C -' in the curl command. Here's some sample code I've written that does exactly this:
Update: Cleaner curl resume/download code. This correctly restarts and resumes when the upstream/remote server goes away and gets replaced by another nginx instance, so the download does not "fail" or get truncated, but continues until 100%. This is purely POC code to demonstrate how this could be addressed in landscape's use of curl itself, to download the pickle file. |
I'm confused, are you proposing that we modify all clients that consume this data? Thanks |
https://askubuntu.com/questions/1012806/landscape-error-downloading-usn-pickle-from-https-usn-ubuntu-com-usn-db-data
If a client is downloading the USN database and a new version of the site is deployed, the clients will see an error and the download will fail.
You can reproduce this issue by downloading the database-all.pickle file:
$ curl -o /dev/null https://usn.ubuntu.com/usn-db/database-all.pickle
After the download begins, immediately rebuild the site (be sure to click the
clean
box).The curl command will fail once the Deploy to Kubernetes stage of the deployment job starts:
curl: (56) GnuTLS recv error (-9): A TLS packet with unexpected length was received.
The text was updated successfully, but these errors were encountered: