Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vaultwarden taking 200 seconds to time out icon requests #2118

Closed
siberx opened this issue Nov 26, 2021 · 12 comments · Fixed by #2125
Closed

Vaultwarden taking 200 seconds to time out icon requests #2118

siberx opened this issue Nov 26, 2021 · 12 comments · Fixed by #2125
Labels
enhancement New feature or request

Comments

@siberx
Copy link

siberx commented Nov 26, 2021

I have just got a new vaultwarden instance set up to test out, and things are mostly working correctly. I'm encountering an issue with icon downloads however, that substantially impacts application performance.

As far as I understand, the default icon download timeout is supposed to be 10 seconds, but my instance appears to be taking exactly 200 seconds to time out icon requests (a representative log snippet below). Since the timeout is so long, all Rocket threads get blocked and the application hangs until the process completes (which takes a long time on startup or whenever the icon cache misses expire).

I have tried manually setting the ICON_CACHE_TIMEOUT environment variable to something explicit (5 seconds) and this has no effect; requests still take 200 seconds to time out.

The database is imported from a Keepass XML export and has about 160 entries. My current docker run command is below:
docker run -d --name vaultwarden -v /storage/store/docker-configs/vw-data:/data/ -p 8124:80 -p 3012:3012 -e WEBSOCKET_ENABLED=true -e ICON_DOWNLOAD_TIMEOUT=5 vaultwarden/server:latest
And I'm running this on a Fedora server using nginx as a reverse proxy to front the vaultwarden instance with Letsencrypt certificates. This part otherwise appears to be working fine, and I'm able to connect to the instance via web interface or mobile app and websockets are even working correctly.

Any idea what could be causing this suspiciously specific and overly long 200-second timeout for icon downloads?

Log snippet (note the chunk of requests at 21:48:08 followed by no activity while all 10 threads block, then another chunk at 21:50:58

[2021-11-26 21:48:08.742][request][INFO] GET /icons/www.nedro.nl/icon.png,
[2021-11-26 21:48:08.742][response][INFO] GET /icons/<domain>/icon.png (icon) => 200 OK,
[2021-11-26 21:48:08.742][request][INFO] GET /icons/www.newpower99.com/icon.png,
[2021-11-26 21:48:08.743][request][INFO] GET /icons/ontariotelescope.com/icon.png,
[2021-11-26 21:48:08.743][response][INFO] GET /icons/<domain>/icon.png (icon) => 200 OK,
[2021-11-26 21:48:08.743][request][INFO] GET /icons/www.pishop.ca/icon.png,
[2021-11-26 21:48:08.743][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. bennvenn.myshopify.com,
[2021-11-26 21:48:08.744][response][INFO] GET /icons/<domain>/icon.png (icon) => 200 OK,
[2021-11-26 21:48:08.744][request][INFO] GET /icons/www.pixiv.net/icon.png,
[2021-11-26 21:48:08.745][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. www.astrobin.com,
[2021-11-26 21:48:08.745][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. www.aztecharmory.com,
[2021-11-26 21:48:08.745][response][INFO] GET /icons/<domain>/icon.png (icon) => 200 OK,
[2021-11-26 21:48:08.746][request][INFO] GET /icons/www.quadrigacx.com/icon.png,
[2021-11-26 21:48:08.747][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. www.astronomik.com,
[2021-11-26 21:48:08.747][response][INFO] GET /icons/<domain>/icon.png (icon) => 200 OK,
[2021-11-26 21:48:08.747][request][INFO] GET /icons/www.reddit.com/icon.png,
[2021-11-26 21:50:58.851][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. www.canadianpetconnection.ca,
[2021-11-26 21:50:58.851][response][INFO] GET /icons/<domain>/icon.png (icon) => 200 OK,
[2021-11-26 21:50:58.852][request][INFO] GET /icons/keepass.info/icon.png,
[2021-11-26 21:50:58.858][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. sso.epost.ca,
[2021-11-26 21:50:58.858][response][INFO] GET /icons/<domain>/icon.png (icon) => 200 OK,
[2021-11-26 21:50:58.859][request][INFO] GET /icons/www.repalcementlaptopkeys.com/icon.png,
[2021-11-26 21:50:58.864][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. www.canakit.com,
[2021-11-26 21:50:58.864][response][INFO] GET /icons/<domain>/icon.png (icon) => 200 OK,
[2021-11-26 21:50:58.864][request][INFO] GET /icons/www.robotshop.com/icon.png,
[2021-11-26 21:50:58.873][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. www.pishop.ca,
[2021-11-26 21:50:58.874][response][INFO] GET /icons/<domain>/icon.png (icon) => 200 OK,
[2021-11-26 21:50:58.874][request][INFO] GET /icons/www.rockymountainatvmc.com/icon.png,
[2021-11-26 21:50:58.874][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. www.quadrigacx.com,
[2021-11-26 21:50:58.874][response][INFO] GET /icons/<domain>/icon.png (icon) => 200 OK,
[2021-11-26 21:50:58.875][request][INFO] GET /icons/www.tealyra.ca/icon.png,
[2021-11-26 21:50:58.876][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. www.newpower99.com,
[2021-11-26 21:50:58.876][response][INFO] GET /icons/<domain>/icon.png (icon) => 200 OK,
[2021-11-26 21:50:58.876][request][INFO] GET /icons/www.schiit.com/icon.png,
[2021-11-26 21:50:58.880][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. defense-prod05.conferdeploy.net,
[2021-11-26 21:50:58.880][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. privatedivision.com,
@siberx
Copy link
Author

siberx commented Nov 26, 2021

Additional information:
Once the process eventually completes, it seems like I only end up with miss entries in the icon_cache folder anyway and the icons don't show up despite all the 200 OK responses in the logs for at least some of the queries.

@siberx
Copy link
Author

siberx commented Nov 26, 2021

And one more additional log chunk (cleaned up to remove websocket debug entries inline with these) after deliberately deleting just a couple icon_cache miss files and enabling debug and extended logging:

[2021-11-26 22:14:12.583][request][INFO] GET /icons/abetterrouteplanner.com/icon.png
[2021-11-26 22:14:12.583][request][INFO] GET /icons/aliexpress.com/icon.png
[2021-11-26 22:14:52.645][reqwest::connect][DEBUG] starting new connection: https://abetterrouteplanner.com/
[2021-11-26 22:14:52.645][reqwest::connect][DEBUG] starting new connection: https://aliexpress.com/
[2021-11-26 22:14:56.963][request][INFO] GET /alive
[2021-11-26 22:14:56.964][response][INFO] GET /alive (alive) => 200 OK
[2021-11-26 22:15:12.590][request][INFO] GET /images/fa-globe.png
[2021-11-26 22:15:12.591][response][INFO] GET /<p..> [10] (web_files) => 200 OK
[2021-11-26 22:15:17.662][reqwest::connect][DEBUG] starting new connection: http://abetterrouteplanner.com/
[2021-11-26 22:15:17.662][reqwest::connect][DEBUG] starting new connection: http://aliexpress.com/
[2021-11-26 22:15:22.662][vaultwarden::api::icons][DEBUG] [get_icon_url]: Trying with www. prefix 'www.abetterrouteplanner.com'
[2021-11-26 22:15:22.662][vaultwarden::api::icons][DEBUG] [get_icon_url]: Trying with www. prefix 'www.aliexpress.com'
[2021-11-26 22:15:42.674][reqwest::connect][DEBUG] starting new connection: https://www.abetterrouteplanner.com/
[2021-11-26 22:15:42.680][reqwest::connect][DEBUG] starting new connection: https://www.aliexpress.com/
[2021-11-26 22:15:57.088][request][INFO] GET /alive
[2021-11-26 22:15:57.089][response][INFO] GET /alive (alive) => 200 OK
[2021-11-26 22:16:07.694][reqwest::connect][DEBUG] starting new connection: http://www.abetterrouteplanner.com/
[2021-11-26 22:16:07.700][reqwest::connect][DEBUG] starting new connection: http://www.aliexpress.com/
[2021-11-26 22:16:32.714][reqwest::connect][DEBUG] starting new connection: https://abetterrouteplanner.com/
[2021-11-26 22:16:32.721][reqwest::connect][DEBUG] starting new connection: https://aliexpress.com/
[2021-11-26 22:16:37.714][vaultwarden::api::icons][DEBUG] error sending request for url (https://abetterrouteplanner.com/favicon.ico): operation timed out
[2021-11-26 22:16:37.721][vaultwarden::api::icons][DEBUG] error sending request for url (https://aliexpress.com/favicon.ico): operation timed out
[2021-11-26 22:16:57.213][request][INFO] GET /alive
[2021-11-26 22:16:57.213][response][INFO] GET /alive (alive) => 200 OK
[2021-11-26 22:16:57.732][reqwest::connect][DEBUG] starting new connection: http://abetterrouteplanner.com/
[2021-11-26 22:16:57.741][reqwest::connect][DEBUG] starting new connection: http://aliexpress.com/
[2021-11-26 22:17:02.732][vaultwarden::api::icons][DEBUG] error sending request for url (http://abetterrouteplanner.com/favicon.ico): operation timed out
[2021-11-26 22:17:02.732][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. abetterrouteplanner.com
[2021-11-26 22:17:02.733][response][INFO] GET /icons/<domain>/icon.png (icon) => 200 OK
[2021-11-26 22:17:02.741][vaultwarden::api::icons][DEBUG] error sending request for url (http://aliexpress.com/favicon.ico): operation timed out
[2021-11-26 22:17:02.741][vaultwarden::api::icons][ERROR] Error downloading icon: Empty response or unable find a valid icon. aliexpress.com
[2021-11-26 22:17:02.742][response][INFO] GET /icons/<domain>/icon.png (icon) => 200 OK

@siberx
Copy link
Author

siberx commented Nov 27, 2021

I have discovered that this issue manifests when DNS is not working/available from the docker instance running vaultwarden. In my case, I have pihole set up as a DNS resolver running as another docker container on the same host, and due to how the docker network bridging/masquerading works, DNS queries were successfully reaching pihole but were getting returned from an unexpected interface (the docker host bridge interface rather than the external local network interface the query was sent to) which caused the DNS querier on the vaultwarden instance to reject the responses silently.

Altering my pihole deployment as per the below link to ensure it only binds to port 53 on the external host interface resolved the nonfunctional DNS in my vaultwarden container, and by extension "worked around" the above bug because DNS queries now succeed and the icons come in correctly or time out quickly.
https://discourse.pi-hole.net/t/solve-dns-resolution-in-other-containers-when-using-docker-pihole/31413

I would still like to know why this pathologically slow timeout occurs in the nonfunctional DNS case; ideally vaultwarden's icon requests should time out in the expected time regardless of cause (10 seconds by default) in this scenario, preventing broken DNS from resulting in a hung web interface for many minutes as all threads are blocked with queries that will never succeed. Of course, in this scenario you can avert the issue by disabling the icon lookups entirely, but this requires additional steps after container deployment (resulting in poor initial user experience that might discourage a new user from continuing to use vaultwarden) and will also result in poor behaviour any time a transient failure in DNS resolution occurs with no obvious indication to the user as to why their instance is hanging.

@BlackDex
Copy link
Collaborator

First, glad that you have found the solution.
Further, this is kinda hard to tackle. There is no timeout for dns resolving to be configured since the host functions are used to resolve DNS and the host determines the timeout also.

I briefly thought about setting a flag that if one resolving took way longer to skip the rest afterwards, but that also could cause false positive if there is one broken authoritative resolver which then breaks all the rest of the following queries.

I actually think that this is ok, since you had DNS resolving issues which is why Vaultwarden took a long time resolving. I do not think that it is the task of Vaultwarden to mitigate and prepare for broken systems of configurations, that could be endless.

Since after you have resolved your dns resolving issue Vaultwarden seems to work as it should, i do not think we should change anything on the Vaultwarden side.

@siberx
Copy link
Author

siberx commented Nov 28, 2021

It seems like the way the timeout is handled currently is by directly passing the configured parameter into the reqwest client builder, and it appears that reqwest is only using that timeout for the actual server query and not for delays caused during any implicit lookups that are performed beforehand. I'm not a Rust expert but it might be possible to control the DNS timeout by using the trust-dns resolver instead of the default behaviour of getaddrinfo (this is an option in reqwest from what I can see) or alternatively by running the reqwest query in an async context wrapped by tokio::time's Timeout functionality or similar so that it can be aborted after a specified delay regardless of the step it's on. Maybe it's also possible to run these queries in their own thread or threads separate from the actual Rocket ones so they don't block other behaviour while they're happening, which would also permit killing the threads if they run too long (assuming operations like that aren't too expensive in Rust).

I don't have enough knowledge in this domain to say how much work would be involved with something like this, so I'll leave it up to the experts if this is something worth investigating. At least from the user's perspective, it's somewhat unexpected that setting a specific icon download timeout doesn't actually limit the query to the specified value but instead some potentially much larger value based on the particulars of their host environment. It would be a better experience if the icon request were wrapped in such a way that it timed out after the specified number of seconds no matter the cause, as it's not really important to the user whether the query fails due to DNS, missing route, server unresponsive, or any number of other causes.

For my own instance since I now have DNS working I'm no longer directly affected by this behaviour, so this is mostly just input/feedback on potential improvements at this point. Feel free to leave this open if there's interest in investigating further or closing if it will be left with the current behaviour.

@BlackDex
Copy link
Collaborator

I do agree on user experience. Though misconfiguration of the host and dns which is taking a long time could be a nice and good indicator to start. And, users also want to see those icons, if we cut off to quickly and not show icons at all the world be bad use experience also.

I did looked at trust-dns, though i also saw issues regarding the use of that in different platforms. Maybe they are solved right now that i haven't checked.

There is an async branch of Vaultwarden which is using tokio and a general timeout could be added there maybe.

Regarding threads, that is currently already there case, Rocket starts with an x amount of default threads and you can configure that. Though creating 160+ threads in your case could have been an issue also when suggesting to just create threads for all requests. Maybe the default is a bit low for your system, but on low powered devices it is probably better to be on the safe side. Just change ROCKET_WORKERS to an higher value for more threads to be able to start.

@siberx
Copy link
Author

siberx commented Nov 28, 2021

My point about threads was to have a small number (1-4 maybe) in a separate thread pool dedicated specifically to performing the icon request lookups (which are potentially way longer-running than generating a typical http response for a bitwarden client), leaving the "standard" pool of Rocket threads for serving web requests from the client as designed. The behaviour noted here is only really an issue because it permits long-running icon queries to block and cause normal web client and API requests from the bitwarden clients to hang; if you had a thread or two slowly churning away on (and failing) icon queries in the background, it would have effectively no impact on the user, especially since they're not going to receive their icons anyway in this configuration so it really doesn't matter how long it takes.

As long as the threads intended to provide the "web server" portions of vaultwarden as running within Rocket can also get hung up issuing their own long-running web client requests (for icons in this case), it won't really matter how many threads you throw into the main Rocket thread pool for large enough vaultwarden databases with many entries/URLs requiring icon lookups, it will just delay the problem a bit. Either having some wrapped timeout mechanism as you mention in the async branch or having a separate thread pool for icon requests would do the trick I think, though.

@BlackDex
Copy link
Collaborator

I do agree on the blocking part regarding fetching icons and the rest of the api/server.
Lets see what i can test locally both regarding the blocking and async branches.

@BlackDex
Copy link
Collaborator

A quick test.
main branch: It blocks fully if there are multiple icon requests running, then no other requests are handled at all.
async branch: It still allows requests to other pages (as it should) but reqwest timesout during DNS, probably because of the threaded gethostbyaddr which is used by default.

It looks like the trust-dns has been updated a lot and re-implemented as a feature into reqwest since the last time i really looked at this. Ill see what happens if i enable that feature in both main and async branches.

@BlackDex BlackDex added the enhancement New feature or request label Nov 28, 2021
@BlackDex
Copy link
Collaborator

Small update:
I tested the trust-dns feature for reqwest on both the async and main branch.
On the async branch is seems to solve the issues, on the main branch it does still block a bit, but less and instead of DNS request timeing-out they do resolve now compared to the non-trust-dns enabled version, which just stopped DNS resolving and no icon was fetched at all.

Now, i am willing to enable this feature which seems to help a lot. But i am a bit hesitant into adding a separate thread-pool for the icon endpoint right now. The main reason is, i do not know when @dani-garcia wants to release the async version as the main version. Because that would solve the rest.

I probably need to add trust-dns-resolver because we do some resolving our selfs, which then also improves the performance and less blocking in the end, but that needs a bit more testing.

@dani-garcia
Copy link
Owner

My current plan is to make a new release 1.24 tomorrow or during this week, which will be the last release of the old bitwardenrs images. Then potentially merging the async branch in 1.25 or 1.26.

Knowing that I don't think it makes sense to develop a specific solution for the sync version when it's not a widespread issue and it's gonna be obsolete soon, we can add the trust-dns feature if you see an improvement with it enabled though.

@BlackDex
Copy link
Collaborator

That's what i thought indeed. I'll see if i have some time tomorrow or the day after to add it and charge the custom dns lookups to also use the trust-dns-resolver.

BlackDex added a commit to BlackDex/vaultwarden that referenced this issue Dec 1, 2021
- Enabled trust-dns feature which seems to help a bit when DNS is
causing long timeouts. Though in the blocking version it is less visible
then on the async branch.
- Updated crates
- Removed some redundant code
- Updated javascript/css libraries

Resolves dani-garcia#2118
Resolves dani-garcia#2119
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants