Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: provide meta-data about all easy options #5365

Closed
wants to merge 2 commits into from

Conversation

@bagder
Copy link
Member

@bagder bagder commented May 9, 2020

This is a generated table with information about all existing options
that curl_easy_setopt() might accept. The list contains the option name,
the option value and the option type.

The purpose is to provide detailed enough information to allow for
example libcurl bindings to get option information at run-time about
what options that exist and what of argument they expect.

@bagder
Copy link
Member Author

@bagder bagder commented May 9, 2020

@jeroen, here's a first shot at providing the meta-data that we discussed today.

This currently just makes an array with all known options. Does it contain the necessary information you'd like? What kind of API would your dream scenario provide from libcurl to aid your binding work?

@jeroen
Copy link
Contributor

@jeroen jeroen commented May 9, 2020

@bagder thanks for picking this up.

Maybe it is better to expose this as an iterator rather than an array. I think an array would technically break the ABI when it changes length from version to version?

I think ideally, this would give options from the linked version of libcurl, rather than the headers at compile time. For example when the user upgrades libcurl after compiling the bindings, or when the user has a newer version of libcurl that we target on the build server that compiles the bindings.

I think my dream scenario would involve the following APIs:

typedef enum {
  CURLOT_LONG,    /* long */
  CURLOT_OFF_T,   /* curl_off_t */
  CURLOT_OBJECT,  /* void * */
  CURLOT_STRING,  /* char * */
  CURLOT_SLIST,   /* struct curl_slist * */
  CURLOT_FUNCTION /* function pointer */
} curl_easytype;

typedef struct {
  const char *name;
  CURLoption id;
  curl_easytype type;
} curl_option_info;

/* Lookup options at runtime */
const curl_option_info curl_option_info_get_by_name (const char *name);
const curl_option_info curl_option_info_get_by_id (CURLoption id);

/* Iterates over available options */
curl_option_info *curl_option_info_next(const curl_option_info *x);

Maybe all of these things could be static so that we don't have to worry about free'ing them.

I did not entirely invent this of course. Ffmpeg's libav have similar api's for listing codes and filters (ffmpeg may be compiled with any possible combination of hundreds of codes and filters). E.g. the docs for avfilter show:

@bagder
Copy link
Member Author

@bagder bagder commented May 9, 2020

Thanks! I like those; simple and easy. I'll get back with an update soon.

@bagder bagder force-pushed the bagder/curl-options-types branch from 5e05f4f to d912612 May 9, 2020
@bagder
Copy link
Member Author

@bagder bagder commented May 9, 2020

This iteration provides this API:

typedef enum {
  CURLOT_LONG,    /* long */
  CURLOT_OFF_T,   /* curl_off_t */
  CURLOT_OBJECT,  /* void * */
  CURLOT_STRING,  /* char * */
  CURLOT_SLIST,   /* struct curl_slist * */
  CURLOT_FUNCTION /* function pointer */
} curl_easytype;

struct curl_easyoption {
  const char *name;
  CURLoption id;
  curl_easytype type;
};

const struct curl_easyoption *curl_easy_option_by_name(const char *name);
const struct curl_easyoption *curl_easy_option_by_id (CURLoption id);
const struct curl_easyoption *
curl_easy_option_next(const struct curl_easyoption *prev);

I've been pondering on marking options that pass pointers to callbacks with some special marker instead of just CURLOT_OBJECT, but I'm not sure it helps anyone.

Are there any options that are insufficiently detailed with these provided types?

This API can of course still introduce other option types in the future (like the blob one in #5357)

@jeroen
Copy link
Contributor

@jeroen jeroen commented May 9, 2020

Are there any options that are insufficiently detailed with these provided types?

It may be useful distinguish long types that are either a number (like CURLOPT_INFILESIZE), or ENUM value (like CURLOPT_USE_SSL), or boolean (CURLOPT_VERBOSE).

Exposing this information might allow for better type checking in the bindings.

@dfandrich
Copy link
Collaborator

@dfandrich dfandrich commented May 9, 2020

I'm a bit worried about the extensibility of this API. There's no safe way to extend curl_easyoption with more fields in the future without potentially breaking clients (especially a client compiled against a newer libcurl but linked at run-time against an older one). There are a few approaches we could take to make that possible, but we should make sure we think about it now.

Also, it seems like curl_option_info_next() is only going to be useful if the application can start from the first one, but there's no way to get that.

@jeroen
Copy link
Contributor

@jeroen jeroen commented May 9, 2020

There's no safe way to extend curl_easyoption with more fields in the future without potentially breaking clients (especially a client compiled against a newer libcurl but linked at run-time against an older one).

Well yes like everything the ABI will be backward compatible, not forward compatible. I don't think that is a concern, the same will hold for any new API or symbol. One should be able to upgrade libcurl and expect things will keep working. Obviously build servers need to be conservative when choosing the target versions of libcurl and OS.

Also, it seems like curl_option_info_next() is only going to be useful if the application can start from the first one, but there's no way to get that.

If you pass NULL as argument it will start a new iterator at the first one. And when it returns NULL it means you've reached the end. I should have mentioned this (but I think it's sort of obvious).

@dfandrich
Copy link
Collaborator

@dfandrich dfandrich commented May 9, 2020

It probably happens more often than you realize that applications are run with an older libcurl than on which they were compiled. It surely can happen and we should make sure that we don't crash such applications when it does. libcurl is for the most part, backwards and forwards compatible, and any new APIs should be, too.

@bagder
Copy link
Member Author

@bagder bagder commented May 10, 2020

Right, I didn't explain exactly how to use the functions yet. I figured we should agree somewhat first before I spend time on that

@dfandrich 's comment:

There's no safe way to extend curl_easyoption with more fields in the future

We can fix that by for example

  1. returning an opaque handle and provide access functions to get the contents of the fields. Alternatively, we can use the method we do elsewhere and
  2. have a 'generation' member in the struct so that the caller knows what fields to expect to be present. Or a third option:
  3. we add an opaque pointer to the struct now, that we can use the future for additional access functions if we come up with other data to return...

@jeroen's comment:

It may be useful distinguish long types that are either a number (like CURLOPT_INFILESIZE), or ENUM value (like CURLOPT_USE_SSL), or boolean (CURLOPT_VERBOSE).

Or even "bitmask" which is also used for some.

Not a bad idea. The only little thing that it made me think of, is that curl_easy_setopt() takes a 'long' even for the enums and booleans. It makes us able to extend options in the future without breaking anything. For example, in the future we could make CURLOPT_VERBOSE accept a 2 as well, not just being a boolean. We need to consider that, if we let curl_easytype get a boolean type: it might change to a long or an enum in the future!

@bagder
Copy link
Member Author

@bagder bagder commented May 10, 2020

In order to generate the include/curl/typecheck-gcc.h header, I need to mark up the curl.h header much more...

@jeroen
Copy link
Contributor

@jeroen jeroen commented May 10, 2020

For example, in the future we could make CURLOPT_VERBOSE accept a 2 as well, not just being a boolean. We need to consider that, if we let curl_easytype get a boolean type: it might change to a long or an enum in the future!

You are right. Perhaps we could treat boolean as a special case of enum with values 0 and 1.

But if the API always provides the current correct type to the bindings, there shouldn't be any issue I think? The bindings lookup the type for CURLOPT_BOOLEAN in the current version of libcurl at runtime so it doesn't matter if this has changed from previous versions?

What I am really after of course is some way to do input validation for the option values that the user is trying to set.

Another way to go about this is if you would include a field (e.g. long[]) to identify the valid values is for this option. For example for CURLOPT_VERBOSE it would be [0, 1] and or CURLOPT_USE_SSL it currently includes [CURL_HTTP_VERSION_1_0, CURL_HTTP_VERSION_1_1, CURL_HTTP_VERSION_2, CURL_HTTP_VERSION_2_0, CURL_HTTP_VERSION_2_PRIOR_KNOWLEDGE, CURL_HTTP_VERSION_2TLS, CURL_HTTP_VERSION_3, CURL_HTTP_VERSION_NONE].

@dfandrich
Copy link
Collaborator

@dfandrich dfandrich commented May 10, 2020

@bagder
Copy link
Member Author

@bagder bagder commented May 10, 2020

ping @weltling, as someone who's worked with the PHP/CURL binding. Is this kind of meta-data something you or your fellow PHP peeps think would be good in your work and is there anything we should improve?

@bagder
Copy link
Member Author

@bagder bagder commented May 10, 2020

I'm a little afraid of letting this meta information include details that either second-guesses what the code will do (and thus introducing duplicated logic that might not even be a perfect duplicate) or doing something that steps on the API/ABI promises or allowances we have.

For this reason, I don't think this meta-data should contain range information. It will make it error-prone and complicated since the range might not be a simple one that is the same for everyone.

If we say an option is a boolean in one version, someone might then do something that will depend on the option being a boolean and thus break if we change it to long in the next release - even if we document that we can do it and that there's this risk. Upsetting users or tricking them into making mistakes is never good. But sure, there's quite a large number of long-accepting options that really are booleans... can we do it in a way that doesn't mislead anyone?

I think a binding is better off letting libcurl do the range and value checks and return the proper return code when reaching outside the boundaries.

@jzakrzewski
Copy link
Contributor

@jzakrzewski jzakrzewski commented May 11, 2020

Edit: I guess I have just basically duplicated #5365 (comment) ;)

What I'm missing (well - it's more like a "I wish I had") is an indication that a particular option is meant to be a constant / set of flags. It doesn't have to go into the details but once I have a failing tests for my wrapper it would be beneficial to know if I can just put another "long" option there or do I have to actually create a type.

@bagder
Copy link
Member Author

@bagder bagder commented May 11, 2020

Based on @jzakrzewski's comment I took a pass over all existing long-accepting options and I found the ones below that don't take a regular "range of values" (where the smallest range is 0-1):

  • CURLOPT_SSLVERSION
  • CURLOPT_PROXY_SSLVERSION
  • CURLOPT_TIMECONDITION
  • CURLOPT_NETRC
  • CURLOPT_HTTP_VERSION
  • CURLOPT_RTSP_REQUEST
  • CURLOPT_POSTREDIR

(did I miss any?)

These are options for which a user really cannot easily guess what values to pass to them without reading the docs and using the supplied extra symbols for those values.

Would it be beneficial to mark such options specially as accepting long but in need some extra magic for the value?

@jeroen
Copy link
Contributor

@jeroen jeroen commented May 11, 2020

What about e.g. https://curl.haxx.se/libcurl/c/CURLOPT_USE_SSL.html ?

Would it be beneficial to mark such options specially as accepting long but in need some extra magic for the value?

Yes that would be helpful. Then we can do something in the bindings that distinguishes this to make sure the user understands that this is not a regular number.

@jzakrzewski
Copy link
Contributor

@jzakrzewski jzakrzewski commented May 11, 2020

If my list is correct you're missing:

  • CURLOPT_FTP_FILEMETHOD
  • CURLOPT_FTPSSLAUTH
  • CURLOPT_HTTPAUTH
  • CURLOPT_IPRESOLVE
  • CURLOPT_PROXYAUTH
  • CURLOPT_PROXYTYPE
  • CURLOPT_SSH_AUTH_TYPES
  • CURLOPT_SSL_OPTIONS
  • CURLOPT_USE_SSL
  • CURLOPT_GSSAPI_DELEGATION
  • CURLOPT_HEADEROPT
  • CURLOPT_KRBLEVEL
  • CURLOPT_PROXY_TLSAUTH_TYPE
  • CURLOPT_TLSAUTH_TYPE

This is at least what is implemented as some enums/flags in the piece of software I maintain. It may not be exactly what curl regards as such.

@bagder
Copy link
Member Author

@bagder bagder commented May 11, 2020

If my list is correct you're missing

Man, I wasn't very good at this! Your three last ones are however strings: CURLOPT_KRBLEVEL, CURLOPT_PROXY_TLSAUTH_TYPE and CURLOPT_TLSAUTH_TYPE so not exactly in the same category.

Hang on for an updated take that exposes this info.

@bagder bagder force-pushed the bagder/curl-options-types branch from 0fb8e55 to 2e4847f May 11, 2020
@bagder
Copy link
Member Author

@bagder bagder commented May 11, 2020

Now it provides these types:

typedef enum {
  CURLOT_LONG,    /* long (a range of values) */
  CURLOT_VALUES,  /* long (a defined set or bitmask) */
  CURLOT_OFF_T,   /* curl_off_t */
  CURLOT_OBJECT,  /* void * */
  CURLOT_STRING,  /* char * */
  CURLOT_SLIST,   /* struct curl_slist * */
  CURLOT_FUNCTION /* function pointer */
} curl_easytype;
@jzakrzewski
Copy link
Contributor

@jzakrzewski jzakrzewski commented May 11, 2020

Man, I wasn't very good at this! Your three last ones are however strings: CURLOPT_KRBLEVEL, CURLOPT_PROXY_TLSAUTH_TYPE and CURLOPT_TLSAUTH_TYPE so not exactly in the same category.

Yeah, I just maintain some comlicated piece of software that exposes many curl options to the users. Almost like a language binding. We've mapped some string options to enumerations, so when I pulled the definitions I got the extra stuff also ;)

Now it provides these types: ...

I'd be happy with that :)

@jeroen
Copy link
Contributor

@jeroen jeroen commented May 11, 2020

@jzakrzewski where do you maintain the list of string options to enumerations, I would be interested to steal those.

@jzakrzewski
Copy link
Contributor

@jzakrzewski jzakrzewski commented May 11, 2020

@jeroen Unfortunately in a bit of two-decades-old proprietary spaghetti c++. If that was just a mapping I'd be happy to share it...

@bagder
Copy link
Member Author

@bagder bagder commented May 11, 2020

Now there's also a CURLOT_CBPTR that identifies the options used to pass pointers to callbacks:

typedef enum {
  CURLOT_LONG,    /* long (a range of values) */
  CURLOT_VALUES,  /*      (a defined set or bitmask) */
  CURLOT_OFF_T,   /* curl_off_t (a range of values) */
  CURLOT_OBJECT,  /* pointer (void *) */
  CURLOT_STRING,  /*         (char * to zero terminated buffer) */
  CURLOT_SLIST,   /*         (struct curl_slist *) */
  CURLOT_CBPTR,   /*         (void * passed as-is to a callback) */
  CURLOT_FUNCTION /* function pointer */
} curl_easytype;
@bagder bagder force-pushed the bagder/curl-options-types branch from 674c62a to bbd111d May 11, 2020
@bagder
Copy link
Member Author

@bagder bagder commented May 11, 2020

We still don't have enough information here to generate typecheck-gcc.h entirely from this set of meta-data. If we ignore the callback prototypes for now, what the existing typecheck header has that isn't yet in this curl.h edit is more detailed OBJECTPOINT info about these:

  • CURLOPT_CURLU - CURLU *
  • CURLOPT_HTTPPOST - curl_httppost *
  • CURLOPT_MIMEPOST - curl_mime *
  • CURLOPT_SHARE - CURLSH *
  • CURLOPT_STDERR - FILE *
  • CURLOPT_STREAM_DEPENDS - CURL *
  • CURLOPT_STREAM_DEPENDS_E - CURL *

Worth providing?

@jeroen
Copy link
Contributor

@jeroen jeroen commented May 12, 2020

I'm not sure, the OBJECTPOINT types don't have an obvious representation in the bindings. I don't see a use for providing those right now.

@bagder
Copy link
Member Author

@bagder bagder commented May 14, 2020

BTW, there's a new type setopts coming in #5357: struct curl_blob *, which is a way to pass "binary blobs" to curl: data with a size that can also ask curl to copy the entire thing.

@bagder
Copy link
Member Author

@bagder bagder commented May 20, 2020

The gcc curlcheck_off_t_option() check returns 1 for both CURLOT_OFF_T and CURLOT_BLOB options. That is probably because the gcc header hasn't been synced yet.

Ack yes, that was a bug. Should be fixed in master now. Thanks!

The types for CURLOPT_COPYPOSTFIELDS, CURLOPT_ERRORBUFFER, CURLOPT_POSTFIELDS, are marked as CURLOT_OBJECT but I was expecting CURLOT_STRING here. But perhaps these are special char *buf because you could set CURLOPT_POSTFIELDSIZE? But I suppose they also could be treated as string fields.

CURLOPT_COPYPOSTFIELDS and CURLOPT_POSTFIELDS are not considered "strings" because they are not necessarily C strings. You can provide a C string to them and libcurl can then figure out the length of that with strlen() but you can also point to a binary memory buffer and set the size and then the data is not a string. So I consider them somewhat special.

CURLOPT_ERRORBUFFER sets an output buffer rather than an input one. I think that also makes it somewhat different than the other strings as they're all inputs to libcurl. Even if ERRORBUFFER will in fact create a C string within that buffer.

@bagder bagder force-pushed the bagder/curl-options-types branch from a12d16b to f0da98d May 20, 2020
@jeroen
Copy link
Contributor

@jeroen jeroen commented May 20, 2020

I also tested the curl_easy_option_by_name and curl_easy_option_by_id APIs and everything is working as expected.

@jeroen
Copy link
Contributor

@jeroen jeroen commented May 20, 2020

I see you have added the ability to the configure script to disable this API. Is there a macro in the public headers that I can use in the bindings to test if this API is available? So that we can do something like:

#include <curl/curl.h>

#ifdef CURL_HAVE_GETOPTIONS
/* code using new api */
#else 
/* Fallback implementation here */
#endif

Or would there be a better way to detect the API? I cannot use autoconf feature-testing in the R bindings.

@weltling
Copy link
Contributor

@weltling weltling commented May 22, 2020

@bagder thanks for the ping. This is definitely an interesting approach. Right now all the features are checked while the PHP binding is compiled. Having the feature checks at runtime is more flexible of course, but hopefully the old way stays for a while, too.

/cc @cmb69 @nikic

@jeroen
Copy link
Contributor

@jeroen jeroen commented May 22, 2020

@weltling I'm interested in how you are checking the features at compile time. Can you point me to the code that does this?

@cmb69
Copy link
Contributor

@cmb69 cmb69 commented May 22, 2020

@jeroen, I don't think there is such code yet, but it might be used to make PHP's curl_setopt() (and related functions) more dynamic. OTOH, most PHP users are certainly using older libcurl versions (even Ubuntu bionic is only on libcurl 7.58.0), and PHP master currently supports libcurl 7.29.0+, so it will take some time until PHP could rely on this dynamic information being available.

@bagder
Copy link
Member Author

@bagder bagder commented May 22, 2020

@jeroen wrote:

I see you have added the ability to the configure script to disable this API. Is there a macro in the public headers that I can use in the bindings to test if this API is available?

No. We have a bunch of existing ways to strip out functionality from curl that isn't detectable by users and this would just be another one in that colllection. The users who typically want these things disabled are those who want minimum footprint versions in for example embedded devices and presumably not using any bindings and in control of their own environment.

If you run a libcurl with this API disabled, it won't expose any options at all and then I figure it makes sense that there are no options provided or offered.

@cmb69 wrote:

it will take some time until PHP could rely on this dynamic information being available

Of course. But we can't let that stand in the way for us improving things (and it can possibly work as a motivator for users to upgrade to later versions). How do we want to do this for the next 20 years to make things smoother for everyone involved - at least once they upgrade into a version that supports it.

@bagder bagder force-pushed the bagder/curl-options-types branch from f0da98d to 3d54a17 May 23, 2020
@bagder bagder force-pushed the bagder/curl-options-types branch from 3d54a17 to 5f3ffd3 Jun 3, 2020
@bagder bagder marked this pull request as ready for review Jun 3, 2020
@bagder bagder changed the title [WIP] provide meta-data about all easy options API: provide meta-data about all easy options Jun 3, 2020
bagder added a commit that referenced this pull request Jun 18, 2020
 const struct curl_easyoption *curl_easy_option_by_name(const char *name);

 const struct curl_easyoption *curl_easy_option_by_id (CURLoption id);

 const struct curl_easyoption *
 curl_easy_option_next(const struct curl_easyoption *prev);

The purpose is to provide detailed enough information to allow for
example libcurl bindings to get option information at run-time about
what easy options that exist and what arguments they expect.

Closes #5365
@bagder bagder force-pushed the bagder/curl-options-types branch from 5f3ffd3 to 82fb7cc Jun 18, 2020
bagder added a commit that referenced this pull request Jul 12, 2020
 const struct curl_easyoption *curl_easy_option_by_name(const char *name);

 const struct curl_easyoption *curl_easy_option_by_id (CURLoption id);

 const struct curl_easyoption *
 curl_easy_option_next(const struct curl_easyoption *prev);

The purpose is to provide detailed enough information to allow for
example libcurl bindings to get option information at run-time about
what easy options that exist and what arguments they expect.

Closes #5365
@bagder bagder force-pushed the bagder/curl-options-types branch from 82fb7cc to c2a396e Jul 12, 2020
bagder added a commit that referenced this pull request Aug 11, 2020
 const struct curl_easyoption *curl_easy_option_by_name(const char *name);

 const struct curl_easyoption *curl_easy_option_by_id (CURLoption id);

 const struct curl_easyoption *
 curl_easy_option_next(const struct curl_easyoption *prev);

The purpose is to provide detailed enough information to allow for
example libcurl bindings to get option information at run-time about
what easy options that exist and what arguments they expect.

Closes #5365
@bagder bagder force-pushed the bagder/curl-options-types branch from c2a396e to 0d3a20a Aug 11, 2020
@bagder
Copy link
Member Author

@bagder bagder commented Aug 24, 2020

Any final words or suggestions, or are we ready to see this merged?

bagder added 2 commits Aug 26, 2020
 const struct curl_easyoption *curl_easy_option_by_name(const char *name);

 const struct curl_easyoption *curl_easy_option_by_id (CURLoption id);

 const struct curl_easyoption *
 curl_easy_option_next(const struct curl_easyoption *prev);

The purpose is to provide detailed enough information to allow for
example libcurl bindings to get option information at run-time about
what easy options that exist and what arguments they expect.

Assisted-by: Jeroen Ooms
Closes #5365
To allow disabling of the curl_easy_option APIs in a build.
@bagder bagder force-pushed the bagder/curl-options-types branch from 0d3a20a to 85d51af Aug 26, 2020
@bagder bagder closed this in 6ebe63f Aug 27, 2020
bagder added a commit that referenced this pull request Aug 27, 2020
To allow disabling of the curl_easy_option APIs in a build.

Closes #5365
@bagder bagder deleted the bagder/curl-options-types branch Aug 27, 2020
@mback2k
Copy link
Member

@mback2k mback2k commented Aug 28, 2020

Why were the AppVeyor failures on this PR ignored? This error is now showing up in all recent builds since this was merged:
easygetopt.obj : error LNK2019: unresolved external symbol _Curl_easyopts_check referenced in function _lookup
https://ci.appveyor.com/project/curlorg/curl/builds/34856624/job/t2cgoalqra89q05h

Update: Just saw that @bagder is already working on it via #5877. Please consider the AppVeyor CI stable at the moment and take a closer look at CI failures in the future. The only unstable CI at the moment is the Windows CI on Azure Pipelines.

@bagder
Copy link
Member Author

@bagder bagder commented Aug 28, 2020

Our CI set is virtually never stable these days (all-green builds are very rare), which makes it too easy to overlook occasional red builds. I have no excuse in this case, I simply must've overlooked that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants
You can’t perform that action at this time.