reuri - URI Manipulation for Tcl
package require reuri ?0.13?
reuri get uri ?part ?defaultVal??
reuri extract uri ?part ?defaultVal??
reuri exists uri part
reuri set variable part value
reuri valid uri
?? reuri context uri script
?? reuri resolve uri
?? reuri absolute uri
reuri encode query|queryval|path|path2|host|userinfo|fragment|awssig value
reuri decode value
reuri query op uri ?arg …?
reuri path op uri ?arg …?
reuri normalize uri
reuri::query get query ?param ?-default default? ?-index index??
reuri::query values query param
reuri::query add variable param value
reuri::query exists query param
reuri::query set variable ?param value …?
reuri::query unset variable ?param …?
reuri::query names query
?? reuri::query reorder variable params
reuri::query new params|?param value …?
reuri::query encode params|?param value …?
reuri::query decode query
reuri::path get path ?index?
reuri::path exists path index
reuri::path join ?segment …?
?? reuri::path set variable index value
?? reuri::path resolve path
Commands marked up as “?? reuri::foo” are not yet implemented, or only partially implemented.
This package allows efficient manipulation of URI strings from Tcl. A fast parser is used to extract the parts from the URI string representation and cache them in a custom Tcl_ObjType. Subsequent accesses read or update the existing internal parsed data, and updating the string representation recompiles the URI string representation from the parsed parts.
This package serves a similar purpose to the uri module of tcllib, but is much faster. Where splitting a typical http url using the tcllib module might take half a millisecond typically, parsing and extracting a part with this package is on the order of 100 times faster, which matters if hundreds of URLs have to be manipulated in the time budget of serving a hit on a website.
The reuri query and reuri::query commands operate on the query portion of the URI assuming it follows the HTTP scheme query format (parameters separated by &, names separated from values with =).
While this package is in the 0 major version series this API is not stable and will change in backwards-incompatible ways. Starting with a 1 series release the API will preserve backwards compatibility within a major version number sequence. USING THE 0 SERIES VERSIONS IN PRODUCTION CODE IS NOT RECOMMENDED.
-
reuri get uri ?part ?defaultVal??
Return the fully decoded URL part from uri or throw an exception if Guri can’t be parsed or doesn’t have the specified part and no defaultVal was specified. If part isn’t defined in uri (but is a valid part name) and a defaultVal is given, that will be returned instead. If part isn’t specified, return a dictionary containing all parts. See PARTS below for valid parts, and EXCEPTIONS for the exceptions that might be thrown. -
reuri extract uri ?part ?defaultVal??
Return the encoded URL part from uri or throw an exception if uri can’t be parsed or doesn’t have the specified part and no defaultVal was specified. If part isn’t defined in uri (but is a valid part name) and a defaultVal is given, that will be returned instead. If part isn’t specified, return a dictionary containing all parts. See PARTS below for valid parts, and EXCEPTIONS for the exceptions that might be thrown. -
reuri exists uri part
Return true if uri contains part, false otherwise. -
reuri set variable part value
Replace the part of the URI value contained in variable with the value (which must parse correctly as that part), returning the updated uri value stored in variable. If value is an empty string, unset the part. -
reuri valid uri
Return true if uri is a syntactically valid URI and can be parsed by this package. Note that this will return false for some values of uri that are accepted by the other commands in this package - it applies a strict interpretation of the RFC rules whereas the other commands accept unambiguous but non-compliant input. In particular Unicode characters in the range 0x80 to 10FFFF are accepted on input to the other commands but rejected here. -
reuri context uri script
Evaluate script in the current call frame but treat uri as the context for resolving any relative references that occur while processing script. The return state of script is transparently propagated to the return state of this command, including exceptions and return codes like break and continue. This command can be nested, with each successive call creating an inner scope with a reference URI that is resolved relative to its context. -
reuri resolve uri
Return a URI by resolving the possibly-relative uri in the context of any resolve::uri context calls on the callstack. -
reuri absolute uri
Return true if uri is absolute (ie. not a relative URI). -
reuri encode query|queryval|path|path2|host|userinfo|fragment|awssig value
Percent-encode the UTF-8 representation of value, suitable for inclusion as component of the part given by query, path or host, etc. queryval applies the slightly different rules for parameter values, query applies the rules for parameter names. path2 differs from path in that it permits “:” un-encoded, that is, segment-nz-nc vs segment in RFC 3986 Section 3.2.2. awssig uses the rules required for calculating AWS v4 signatures. -
reuri decode value
Percent decode value, the inverse of reuri encode. For compatibility with other implementations, “+
” is replaced with a space and invalid percent-encoded sequences are transcribed as-is. -
reuri query op uri ?arg …?
Equivalent to calling reuri::query op ?arg …? on the query portion of uri. -
reuri path op uri ?arg …?
Equivalent to calling reuri::path op ?arg …? on the path portion of uri. -
reuri normalize uri
Return a canonical representation of uri, as described in RFC3986. -
reuri::query get query ?param ?-default default? ?-index index??
Retrieve the value for the named param in the query part. If param occurs multiple times in the query part, returns the value for the last instance, unless -index is given, in which case it returns the value(s) named by index (see INDEX SYNTAX for details on index). If param doesn’t exist and default is supplied, return that in its place, otherwise throw the REURI PARAM_NOT_SET exception. If param is not supplied, return all of the query parameters as a list of alternating parameter names and their values, in the order they appear in query (similar to a dictionary but multiple instances of the same param name can occur). -
reuri::query values query param
Return a list of all of the values for param in the query, in the order they appear in the URI. -
reuri::query add variable param value
Append the query param with value to the query part contained in the variable variable. If param already exists in the query part, append the new instance, leaving existing instances in place. -
reuri::query exists query param
Return true if param exists in query. -
reuri::query set variable ?param value …?
Replace any instances of param in the query part stored in variable with a single instance containing value. -
reuri::query unset variable ?param …?
Remove all instances of each param in variable. -
reuri::query names query
Return a list of all of the param names that appear in query, in the order they appear. If multiple instances occur then the result contains multiple instances in the corresponding positions. -
reuri::query reorder variable names
Reorder the query part so that the params occur in the order given by names, with any that weren’t specified in names occuring after all those that were. If any duplicate param names exist on the URI, all their instances are placed in the position given in names, with the instances preserving their relative positions. -
reuri::query new params|?param value …?
Create a new query with the supplied params (as a list with pairs of param and value), or in the param and value arguments and return a properly formatted query string. -
reuri::query encode params|?param value …?
Percent-encode the supplied params (as a list with pairs of param and value), or in the param and value arguments and return a properly formatted query string. If the supplied parameter set is empty then the result is a blank string, otherwise it will have a “?” character prefixed. Parameters with an empty corresponding value are encoded without an “=”. -
reuri::query decode query
Decode query into a list of params and their values. A single leading “?” character will be stripped off if it exists and is unencoded. (Inverse of reuri::query encode) -
reuri::path get path ?index?
Return decoded elements of a path. See INDEX SYNTAX for details on index. If index names elements outside of the range (before the start or after the end of the path), then empty values are returned for those elements. If index is omitted, return a list of all the elements (equivalent to specifying “0..end” for the index). If index specifies a list or a range, the result is a list of the elements, otherwise just the element value. -
reuri::path exists path index
True if index refers to a path element in range. If index specifies a range or list of indices, return a list with a boolean corresponding to each named element. See INDEX SYNTAX for details on index. -
reuri::path set variable index value
-
reuri::path split path
Deprecated - use reuri::path get path instead. -
reuri::path join ?segment …?
Produce a properly encoded URI path part given the list of segments. -
reuri::path resolve path
Return path resolved in the context of all the URIs on the callstack from reuri context calls.
For commands that take an index parameter, a superset of the index style provided by lindex is supported.
An index may be a plain positive or negative integer in decimal, hex (0x*), octal (0o*, 0*) or binary (0b*) formats, in which case it counts from the first element at 0. If it is the string “end” it names the last element in the list. “end” followed by a positive or negative integer counts from the last element, “end-1” is the second last element, “end+2” is two beyond the end of the list.
An index may also be a range of two such values separated by “..”, in which case it names the range of elements starting from the index on the left to the index on the right, inclusive. So “0..end” is all of the elements, and “end..1” is all but the first, in reverse order.
An index may also be a list of comma separated indices or index ranges, which names each element from each of the indices or index ranges. So “1,2,4” refers to the second, third and fifth elements, and “end..0,1,-1” refers to all of the elements in reverse order, plus the second, and -1 th element (before the start of the list).
The named parts of URIs that are accessible in this package are:
- scheme
The part before the first:
likehttp
inhttp://github.com
- userinfo
The userinfo part:admin:secret
inhttp://admin:secret@example.com
- host
The host part of the authority section:google.com
inhttps://google.com
,127.0.0.1
inhttp://127.0.0.1:8080
,::1
inhttp://[::1]:8080
, and/tmp/myserv.80
inhttp://[/tmp/myserv.80]/foo
. This last example isn’t valid by RFC 3986 but is one of the common ways to refer to the socket in HTTP-over-unix sockets. - port
The numeric port number portion of the authority section,1234
inhttp://localhost:1234/foo
. - path
/foo/bar
inhttp://localhost/foo/bar?quux=123
. - query
thing=foo&other=bar
inhttp://localhost/path?thing=foo&other=bar#frag
- fragment
endbit
inhttp://localhost/foo?x=y&a=b#endbit
Exceptions with errorcodes matching the following patterns will be thrown by these commands if the supplied URIs aren’t valid, the specified part isn’t valid, or some other assumption was violated:
- REURI PARSE_ERROR str offset
Parsing the supplied value str failed at character offset offset. - REURI INVALID_PART part
The specified part isn’t a valid part (possibly for this type of URI). - REURI PART_NOT_SET part
The requested part is not defined in the supplied URI value, and no default was provided. - REURI PARAM_NOT_SET param
The requested param is not defined in the query part of the supplied URI value, and no default was provided. - REURI BAD_OFFSET param offset
The specified offset wasn’t valid for param in the supplied URI. - REURI CONFLICT
The requested update could not be performed because it would move the state represented by the uri out of the range that can be serialised (like setting a host when the uri has a relative path). - REURI UNBALANCED_PARAMS
The supplied set of parameters isn’t even. Each parameter name must have a matching value.
A C API is exposed via the stubs mechanism:
int Reuri_URIObjGetPart(interp, uriPtr, part, defaultPtr, valuePtrPtr)
int Reuri_URIObjSetPart(interp, uriPtr, part, valuePtr)
int Reuri_URIObjSetParam(interp, uriPtr, paramPtr, valuePtr, paramMode)
TODO: complete
- Tcl_Interp *interp
A Tcl interpreter, which will have its result updated with and exception thrown, if supplied. Can be NULL in which case the exception info is discarded. - Tcl_Obj *uriPtr
A pointer to the URI value. Must be unshared for those calls that change its value. - enum reuri_part part
One of REURI_SCHEME, REURI_USERINFO, REURI_HOST, REURI_PORT, REURI_PATH, REURI_QUERY or REURI_FRAGMENT. - Tcl_Obj *defaultPtr
A pointer to the default value to be returned if the requested element isn’t present on the URI. Can be NULL if no default value is desired. - Tcl_Obj **valuePtrPtr
Location where the requested element will be stored. - Tcl_Obj *valuePtr
The new value to store for the specified element. - Tcl_Obj *paramPtr
The name of the query parameter. - enum reuri_param_mode paramMode
One of REURI_PARAM_ADD or REURI_PARAM_REPLACE.
Extract the host and port from a URI, with a scheme-defined default port:
proc connect uri {
switch [reuri get $uri scheme] {
http {set default_port 80}
https {set default_port 443}
default {error "Unsupported scheme \"[reuri get $uri scheme]\""}
}
socket [reuri get $uri host] [reuri get $uri port $default_port]
}
Update query parameters from a dictionary:
proc mergeparams {uri patch} {
foreach {k v} $patch {
reuri query set uri $k $v
}
return $uri
}
Retrieve the host and store each resolved address as a query param in c using the stubs API:
int ResolveAddrCmd(ClientData cdata, Tcl_Interp *interp, int objc, Tcl_Obj *const objv[])
{
int code = TCL_OK, rc;
Tcl_Obj *host = NULL;
Tcl_Obj *uri = NULL;
Tcl_Obj *paramname = NULL;
Tcl_Obj *paramval = NULL;
struct addrinfo *res = NULL;
struct addrinfo *addr = NULL;
if (objc != 2) {
Tcl_WrongNumArgs(interp, 1, objv, "uri")
code = TCL_ERROR;
goto finally;
}
/* We're going to modify the uri object's value,
* so ensure it is unshared:
*/
uri = objv[1];
if (Tcl_IsShared(uri))
uri = Tcl_DuplicateObj(uri);
Tcl_IncrRefCount(uri);
/* Retrieve the host portion from the URI: */
code = Reuri_URIObjGetPart(interp, uri, REURI_HOST, &host);
if (TCL_OK != code) goto finally;
rc = getaddrinfo(Tcl_GetString(host), NULL, NULL, &res);
if (rc) {
Tcl_SetObjResult(interp, Tcl_ObjPrintf(interp,
"Resolve error: %s", gai_strerror(rc)));
code = TCL_ERROR;
goto finally;
}
Tcl_IncrRefCount(paramname = Tcl_NewStringObj("addr", 4));
for (addr=res; addr; addr=addr->ai_next) {
const char *addrstr[NI_MAXHOST];
/* Format the numeric address as a string, whatever its
* address family
*/
rc = getnameinfo(addr->ai_addr, addr->ai_addrlen,
host, sizeof(host), NULL, 0, NI_NUMERICHOST);
if (rc) {
Tcl_SetObjResult(interp,
Tcl_NewStringObj("Could not format address", -1));
goto finally;
}
if (paramval) Tcl_DecrRefCount(paramval);
Tcl_IncrRefCount(paramval = Tcl_NewStringObj(addrstr, -1));
/*
* Append another &addr= param to the query part to store
* this address. REURI_PARAM_ADD signals that we want
* to add another instance of &addr= to the URI, not replace
* any that are there already.
*/
code = Reuri_URIObjSetParam(interp, uri,
paramname, paramval, REURI_PARAM_ADD);
if (TCL_OK != code) goto finally;
}
/* Return the changed uri value: */
Tcl_SetObjResult(interp, uri);
finally:
if (res) {
freeaddrinfo(res);
res = NULL;
}
if (uri) {
Tcl_DecrRefCount(uri);
uri = NULL;
}
if (paramname) {
Tcl_DecrRefCount(paramname);
paramname = NULL;
}
if (paramval) {
Tcl_DecrRefCount(paramval);
paramval = NULL;
}
return code;
}
This package aims to conform to RFC 3986, with the optional addition of recognising HTTP-over-unix sockets style URLs like http://[/tmp/myserv.80]/foo?bar=baz
.
Please report any bugs to the github issue tracker: https://github.com/cyanogilvie/reuri/issues
The uri module of tcllib.
- Implement http://[/tmp/mysock.80]/foo style unix domain sockets support
- Implement reuri set
- Implement reuri valid
- Implement reuri context
- Implement reuri resolve
- Implement reuri absolute
- Implement reuri decode
- Implement reuri::query get
- Implement reuri::query values
- Implement reuri::query add
- Implement reuri::query exists
- Implement reuri::query set
- Implement reuri::query unset
- Implement reuri::query names
- Implement reuri::query reorder
- Implement reuri::path split
- Implement reuri::path get
- Implement reuri::path set
- Implement reuri::path join
- Implement reuri::path resolve
- Implement URLPattern style matching: https://developer.mozilla.org/en-US/docs/Web/API/URL\_Pattern\_API (in jitclib?)
This package is Copyright 2023 Cyan Ogilvie, and is made available under the same license terms as the Tcl Core