hsts: add support for Strict-Transport-Security #5896
I just found the email talking about this feature on the mailing list and also had a quick look into the wiki.
Would be incredible if supported from the get-go, we probably could have function just like
This is great @bagder, thanks for taking the time to look into it.
What would be a scenario that
Should the write callback get signaled somehow when the last entry is passed to it?
I see, makes sense.
Yes, I think this would be great. Example: to close the stream that is being used to write the data to.
If possible, the callback could receive the index of the item currently being processed, and the total amount of items, or any other data structure that would convey the same information.
A first shot on read/write callbacks is now provided in this PR as well. The wiki description of the funtions is updated to match the code (and there are initial manpages in the branch). This still lacks test cases for the callbacks. I'll appreciate all and any feedback I can get on this.
DeepCode's analysis on #1c65de found:
I took a quick look at this proposal. It appears that it is based on the idea that the HSTS cache is loaded and saved sequentially from a flat file at library initiation and teardown. I'm not sure this scales - especially if HSTS finally gains widespread traction.
I'm not opposed to a simple flatfile default, but you should consider the alternatives. (A database doesn't have to be massive - SQLite, BerkeleyDB, or even a purpose-built backend might be used.)
In any case, I suggest thinking about scalablitly and the (implict) use models. I think that at a minimum, the API should support a database back end, and that the cache load should happen when a cache miss happens, not in library init. This also includes providing a hostname in an eviction callback, not just "write whatever is left in the cache" at library teardown.
Sorry if I'm missing something - I saw this pop up, but am too time-limited to dive deeply into the code.
Nonetheless, I hope these comments are useful.
 Yes, that's a very large number to make the point. Consider a library user that's a search engine. Or even 10M or 1M.
Thanks for good feedback and excellent questions. I think this also shows that it is a good idea to ship this feature marked as experimental to start with, to let users try things out as they are now and we can polish and improve things according to what users want and experience.
If you use that feature, yes. A library user can deal with it however it likes.
I'm positive most applications using libcurl will not populate the cache with millions of entries even if/when HSTS is used very widely, since very few applications work with that amount of random sites in a browser-like fashion.
If we want to build the system to actually have a huge hsts database, then yes we'd need something different. The current design is made with a small dataset in mind.
Perhaps. But I rather see an actual use for this option before we go ahead and add more bells and whistles.
There's nothing in neither the file format nor in the API for the callbacks that says anything about how the search is to be done. If someone experiences slow HSTS handling I don't think we will object to performance enhancements. I don't think we need to optimize for such a future hypothetical case now but take it as it comes.
Then you're probably asking for an API that lets the application maintain the database and does the entire lookup (because I can't see a future where libcurl bundles some sort of database). That's not provided here but I wouldn't be against it if someone actually provides a viable use case where this is necessary/beneficial.
We could of course perhaps consider such an "asking API" instead of the current load/save API to better allow applications to scale to large amounts of hsts entries.
What is "the entire cache" for you? I would imagine a typical command line use case have a very small cache of just a bunch of hosts so reading "the entire cache" is very fast and small operation. Separate caches for separate invokes. Having two different curl command line tools using the same file on disk will just make the second overwrite the first with the existing functionality.
Standard coding practice: first we deliver something basic that works. If there's a need, we improve it over time. If not, basic is fine.
That's a fair idea. I don't think the current API dictates when the cache gets read so we can work on tweaking that.
- enable in the build (configure) - header parsing - host name lookup - unit tests for the above - CI build - CURL_VERSION_HSTS bit - curl_version_info support - curl -V output - curl-config --features - CURLOPT_HSTS_CTRL - man page for CURLOPT_HSTS_CTRL - curl --hsts (sets CURLOPT_HSTS_CTRL and works with --libcurl) - man page for --hsts - save cache to disk - load cache from disk - CURLOPT_HSTS - man page for CURLOPT_HSTS - added docs/HSTS.md - fixed --version docs - adjusted curl_easy_duphandle