Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align uri_parser's output with CRIs #13827

Open
chrysn opened this issue Apr 7, 2020 · 6 comments
Open

Align uri_parser's output with CRIs #13827

chrysn opened this issue Apr 7, 2020 · 6 comments
Assignees
Labels
Area: CoAP Area: Constrained Application Protocol implementations Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Type: tracking The issue tracks and organizes the sub-tasks of a larger effort

Comments

@chrysn
Copy link
Member

chrysn commented Apr 7, 2020

Description

Work on a format to replace link-format is going on around the CoRAL format, and chances are CRIs are the first part of that that can become stable. CRIs are a CBOR based representation of the information in a URI (largely compatible, and where it's not it's in areas that implementations often don't get right anyway), and especially suitable when CoAP requests are later built from it.

The data structure that uri_parser produces is almost aligned with the CRI information model, and I think it'd be convenient to align them to the point where such a struct can be used as an internal representation of a CRI. Then, CRIs from CoRAL documents could be preprocessed into uri_parser_result, and requests built from them.

The current discrepancies are:

  • host is currently in text form; CRIs have either a text DNS name or a binary representation of the IP literal. Parsing this early would make sense as it makes the later use easier, but might be problematic with the uri_parser_result notion of using the original URI as immutable backing store (and even if mutation is allowed, which might make sense, the text form is shorter than the binary form for some addresses).
    • CRIs don't have a zone identifier; that's OK, and the internal representation would have one.
  • port is numeric in CRI; could be changed easily.
    • CRIs allow omitting the port; as long as we only convert internal representations to CRIs where we know the scheme, that can be handled at conversion time.
  • path and query are delimited by their delimiting characters '/' and '&'. CRIs can contain almost arbitrary texts (including '/' in path components and '&' in query components), which would be percent-escaped in URIs -- but we can choose not to support such URIs at all (the current uri_parser doesn't, as it'd need to percent-decode them for mapping into CoAP) for starters. When converting a CRI, that might need to be taken in mutably to allow replacing the CBOR characters with the agreed-on delimiters. (Long-term, being able to express all CRIs would be nice, especially because we do proxying with it, but let's take this step by step)

Useful links

Next steps

I'd keep this around as a tracking issue while uri_parser is being developed on during proxy development; when actual CRI support is added, it can be closed in that PR.

@chrysn chrysn added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Type: tracking The issue tracks and organizes the sub-tasks of a larger effort Area: CoAP Area: Constrained Application Protocol implementations labels Apr 7, 2020
@kb2ma
Copy link
Member

kb2ma commented Apr 8, 2020

Thanks for the pointer to CRIs. Has there been any discussion of using CRIs in CoAP itself? For example, there might be a Proxy-Cri option. Or is this work intended for the payload, like link format?

@chrysn
Copy link
Member Author

chrysn commented Apr 8, 2020 via email

@chrysn
Copy link
Member Author

chrysn commented Dec 18, 2021

Just a brief update (no work active here but CRIs are changing):

  • Hosts are now in list form as well (dot separated segments). Convenient for turning them into DNS requests (no ASCII string parsing any more at all), provided there's a good DNS interface that does not rely on dots being there (which the POSIX getaddrinfo does, but we don't have to replicate that).
    This doesn't introduce any new fields of work, because we already need some trickery needed to get space for the v6 address.
  • Paths and queries support a lot more now (using the new PET mechanism), but these are inexpressible in CoAP, so we can stick with that and not much changes.

@chrysn
Copy link
Member Author

chrysn commented Apr 15, 2022

Before I start API sketching here I'd like to collect what CRI handling for RIOT might be able to do, sketching use cases:

  • URI parsing: Given a buffer that contains a text URI, parse it into something that can be populated into a CoAP request. (The URI may also be a URI reference, in which case the base might be ... a different CoAP request?).

    Example: User input on a console needs to be placed into a CoAP request (coap get coap://host/path)

  • CRI parsing: Same but with already serialized CRIs.

    Example: Resource discovery returns a CRI reference (eg. equivalent to directory/ in a response to a multicast CoAP request received from [fe80::42]:61616)

  • CoAP request handling: A server (as aroud gcoap: Add file server #14397 (comment)) receives a request and needs access to the parts of the request's URI that have not been "used up" by the resource dispatch. (For other, primarily non-CRI, purposes, eg. returning Location-* or full URIs, the handler also needs to access the full requested CRI).

    Example: A server in gcoap: Add file server #14397 is attached to the gCoAP server at /fs/. When a request comes in carrynig the CoAP options for /fs/mtd0/firmware1.bin?sha1sum, the handler would like to have mtd0/firmware1.bin?sha1sum conveniently at hand (where different users may have different ideas of what "convenient" here means; those considering the string "mtd0/firmware1.bin?sha1sum" to be convenient might reconsider when they learn that actual percent signs inside the file name, which are legal in the path, would be percent encoded in such a string).

  • CRI producing: A server has some knowledge about where something is located, and needs to produce a CRI for it.

    Example: The .well-known/core resource needs to produce /fs/ or the CRI binary equivalent thereof from the path names registered in the server, depending on the client's Accept value.

The hard part about unifying these is that they're all present in memory already but in different forms (CoAP option serialization, parts of socket endpoints, CRIs, URIs), and that even if we limited ourselves to the subset of CRIs where all strings are in contiguous memory (which is, essentially, URIs with no percent encoding), the incompatible delimitations mean that a unified CRI that zero-copies will either carry around a rather large list of start and end positions of components (like, 10 x 2 x size_t for up to 5 path components and 5 query components), or needs to behave in a driver-like fashion. (The third alternative is to rewrite the data into a consistent structure, but (a) the original data may be const, and (b) there may not be enough contiguous memory to store the full CRI).

@chrysn
Copy link
Member Author

chrysn commented Sep 22, 2022

I'm currently leaning towards starting small and providing a CRIish interface to parsed URIs first; more can then still be done through a driver model.

When processing URIs, this would only accept "easy" URIs, and reliably refuse those it can't process correctly. This would put us in about the same league of features as the current URI parser (no percent encoding etc) but without the conversion errors. Unsupported URIs would not only contain those inexpressible in CoAP (like any that use escaped allowed delimiters) but also those that are expressible in CoAP but hard to translate. Practically speaking, that's probably the relevant subset.

@Teufelchen1
Copy link
Contributor

chrysn: Work on a format to replace link-format is going on around the CoRAL format, and chances are CRIs are the first part of that that can become stable

I just edited your links from version 3 to version 14 of CRIs. Can you give an opinion if it can be considered "stable"? What is the next step here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: CoAP Area: Constrained Application Protocol implementations Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Type: tracking The issue tracks and organizes the sub-tasks of a larger effort
Projects
None yet
Development

No branches or pull requests

4 participants