- Version: 1.1
- Author: Norman Walsh
- Date: 28 July 2011
(Are you anxious and impatient? Do you want to just see it work? Then you might prefer to skip down to the slides example).
The REST endpoint library is a set of XQuery modules designed to make the development and deployment of RESTful web services on MarkLogic server easier. It consists of two parts, an XML vocabulary for describing web service endpoints and a library module.
The XML vocabulary is used to write declarative descriptions of the endpoints. These description include the mapping of URI parts to parameters, additional parameters, and conditions that must be met in order for the incoming request to match.
The library module contains functions that simplify:
- A URL rewriter for mapping incoming requests to endpoints
- The parsing of parameters within the endpoint itself
The XML vocabulary has been designed so that same description can be used for both the rewriter and the endpoint. One motivation for this approach was to assure that the same code is used for both choosing a rewrite and validating an endpoint; this minimizes the possibility of semantic drift between the two halves of the task.
The REST endpoint markup is designed to facilitate writing services in MarkLogic server. To that end, it provides a single, declarative style for describing the URI, parameters, and other aspects of an "endpoint". The URI rewriter uses a list of these endpoint descriptions to dispatch to the right main module. The main module in turn uses the same description to validate its parameters.
In other words, by writing a single description you get a rewriter for free and validated, typed parameters in a single function call.
The simplest example just maps the URI to a module:
<request uri="/" endpoint="/default.xqy"/>
If the incoming request is for "/", it is rewritten to "/default.xqy".
The uri
is a regular expression, so you could also support requests
for index.html
like this:
<request uri="^/(index\.html)?$" endpoint="/default.xqy"/>
If all your rewrites were that simple, you'd hardly need a library to simplify them. A more common sort of example is one that involves translating parts of a URI into parameters. Suppose, for example, that we have an endpoint for displaying slides from a presentation. The endpoint needs the name of the slide deck to display and the number of the slide. We could just slap those on as parameters, but getting URIs right is an important part of building a REST interface.
Instead, we want to expose them through URIs of this form:
/slides/
deck/
number. So the third slide of the "uc11" deck would
be referenced with the URI /slides/uc11/3
. That endpoint can be described
as follows:
<request uri="^/slides/(.+?)/(\d+)$" endpoint="/slides.xqy">
<uri-param name="deck">$1.xml</uri-param>
<uri-param name="num">$2</uri-param>
</request>
As before, the rewriter starts by comparing the request URI with the
specified uri
regex. If it doesn't match, we don't have to go
further. But if request URI is /slides/uc11/3
, then it does match.
For each uri-param
, the rewriter will construct a parameter with
the specified name using fn:replace()
to compute the value. So the
deck
parameter will have the value uc11.xml
because
fn:replace('/slides/uc11/3', '^/slides/(.+?)/(\d+)$', '$1.xml')
is
uc11.xml
and the num
parameter will have the value 3
.
These parameters are passed to the specified endpoint. You can think of this as a rewrite of the request URI to
/slides.xqy?decl=uc11.xml&num=3
although that isn't exactly what happens for subtle technical reasons related to POST requests that aren't important right now.
So far so good. But inside the slides.xqy
module, we've got to decode
all those parameters. In the simplest cases, like this one, it's not too
hard to call xdmp:get-request-field()
for each parameter. But as endpoints
become more complicated with optional parameters and repeated parameters it
quickly becomes quite tedious.
Since we have a description of the endpoint, we can let a single library
function deal with all the complexity for us. Assuming we have the relevant
request
element in $request
, this simple call
$params := rest:parse-options($request)
will return a map of the parameters. That still may not look too compelling, but we can immediately begin to expose additional benefits "for free".
The first benefit you get is error detection. If someone manages to call this
endpoint with the wrong options, the rest:parse-options()
method will automatically raise an error. This means you (a) don't have to write
the error checking code and (2) don't have to worry about bugs that might be
introduced by failing to check for those errors.
Another problem you may already have encountered in dealing with endpoint parameters is their type. Consider this simple implementation of the slides endpoint:
let $deck := xdmp:get-request-field("deck")
let $slideno := xdmp:get-request-field("num")
return
doc($deck)/slides/slide[$slideno]
Even if you get the right document with $deck
, you're still going to
get a type error because $slideno
is a string. Rather than trying to
remember to do all of the type casting in your endpoint, and
addressing the errors potentially raised by those casts, we can
augment the description:
<request uri="^/slides/(.+?)/(\d+)$" endpoint="/slides.xqy"> <uri-param name="deck">$1.xml</uri-param> <uri-param name="num" as="decimal">$2</uri-param> </request>
Now consider the analagous simple implementation:
let $params := rest:parse-options($request)
let $deck := map:get($params, "deck")
let $slideno := map:get($params, "num")
return
doc($deck)/slides/slide[$slideno]
This code will work because $slideno
will be a decimal. As before, you
get error detection for free. If someone manages to call this endpoint with a
num
parameter that isn't a decimal, the rest:parse-options()
call will
raise the error for you.
You aren't limited to just parameters parsed from the URI. Suppose we augment
the slides.xqy
module to support a "theme
" parameter. We can simply add that
to the description:
<request uri="^/slides/(.+?)/(\d+)$" endpoint="/slides.xqy">
<uri-param name="deck">$1.xml</uri-param>
<uri-param name="num" as="decimal">$2</uri-param>
<param name="theme"/>
</request>
Now the rewriter and the endpoint will both allow a theme
parameter.
You can make the theme parameter required:
<param name="theme" required="true"/>
in which case a request URI that doesn't have a theme
will not match in the rewriter, and an
attempt to call the endpoint without a theme
will raise an error.
Alternatively, you can provide a default value:
<param name="theme" default="hires"/>
You can also mark
a parameter as repeatable. Without stretching our running "slides" example too far,
let's say you wanted to allow a css
parameter to specify additional stylesheets
for a particular slide. You might want to allow more than one, so you could add
a css
parameter like this:
<param name="css" repeatable="true"/>
In the rewriter, this would allow any number of css
parameters. In the endpoint,
there'd be a single css
key in the parameters map but its value would be a list.
To recap: if you don't make any declarations about a parameter beyond its name, then it's optional but may occur at most once; if you say it's required it must occur at least once; and if you say it's repeatable, it may occur more than once.
For parameters that must be numbers, booleans, or other atomic types, the
as
attribute is all you need. But suppose you want to limit the value to
a discrete set of possibilities?
A parameter like theme
, for example, might be a string but might accept only
a few possible values. You can do that with the values
attribute:
<param name="theme" values="hires|lowres|mobile|bw" default="hires"/>
With this declaration, the theme
must be exactly one of "hires", "lowres",
"mobile", or "bw".
Finally, to round out the options on parameter handling, you can also specify
aliases. This option is useful if you decide to change the names of parameters
on the server but don't want to break existing clients. Suppose, for example,
that you extend your slides framework so that it accepts stylesheets in languages
other than CSS. Then you might want to rename the css
parameter to style
.
If you specify the parameter this way:
<param name="style" alias="css" repeatable="true"/>
then the endpoint will see a parameter named style
even if the user specifies
css
in the request.
The real motivating factor for this feature however, is to support a, uh, feature
of jQuery. If, to continue the above example, you use jQuery's
AJAX features to send an array of style parameters to the endpoint, even if you specify
the key "style
" in your JSON object, jQuery will send the parameter with the
name "style[]
". It automatically adds square brackets to indicate that an array
is being passed. You can work around that by making style[]
an alias:
<param name="style" alias="css|style[]" repeatable="true"/>
Like a list of values, a list of aliases can be separated by vertical bars.
We have one more trick up our sleeves with respect to parameters. Sometimes it's
useful to be able to perform the sort of match and replace operations on parameter
values that we can perform on parts of the URI. Suppose, for example, that you've
got a parameter that will contain an internet media type, you can extract part of
that value using match
:
<param name="format" match="^application/(.*?)(\+xml)?$">$1</param>
This will translate the format=application/xslt+xml
to format=xslt
.
You can use this feature to rename parameters as well:
<param name="lang" from="format" match="^application/(.*?)(\+xml)?$">$1</param>
This will translate the format=application/xslt+xml
to lang=xslt
.
There's no facility for more complex processing, combining multiple parameters and such.
N.B. If you combine matching in parameters with validation, make sure that you validate against the transformed value. For example, this parameter will never match:
<param name="test" values="foo|bar" match="^(.+)$">baz-$1</param>
Instead, write it this way:
<param name="test" values="baz-foo|baz-bar" match="^(.+)$">baz-$1</param>
In other words, the value that is validated is the transformed value.
There's no rule in the URI rewriter that says only a single request can be specified
for a given endpoint. In the rules we've looked at so far for the "slides" example,
we're only matching a request for a single slide. Suppose we wanted another rule for
presenting the "title page" slide. One reasonable format for that URI would be to simply
leave off the trailing slide number. So /slides/uc11
would display the title page
slide. We can support that with two rules:
<options xmlns="http://marklogic.com/appservices/rest">
<request uri="^/slides/(.+?)/(\d+)$" endpoint="/slides.xqy">
<uri-param name="slides">$1.xml</uri-param>
<uri-param name="num" as="decimal">$2</uri-param>
</request>
<request uri="^/slides/(.+?)/?$" endpoint="/slides.xqy">
<uri-param name="slides">$1.xml</uri-param>
</request>
</options>
If the request doesn't have a number, it won't match the first request, but will
match the second. In this case, you simply have to make sure that the /slides.xqy
endpoint
uses a request
that will validate all of the possible rewrites.
To simplify earlier examples, I omitted the <options>
element. In practice the rewriter
will have many request
s to consider and they're grouped in an options
element.
A request
that doesn't specify any verbs only matches HTTP GET
requests. If you
want to match other verbs, simply list them:
<request uri="^/slides/(.+?)/?$" endpoint="/slides.xqy">
<uri-param name="slides">$1.xml</uri-param>
<http method="GET"/>
<http method="POST"/>
</request>
This request will match (and validate) if the request method is either an HTTP GET
or
an HTTP POST
.
Adding support for POST
is a good place to talk about adding conditions. Just because
you want to display slides for anyone doesn't mean you want anyone to be able to change
them.
You can add additional conditions, either in the body of the request
, in which case they
apply to all verbs, or within a particular verb:
<request uri="^/slides/(.+?)/?$" endpoint="/slides.xqy">
<uri-param name="slides">$1.xml</uri-param>
<http method="GET"/>
<http method="POST">
<auth>
<privilege>http://example.com/privs/editor</privilege>
<kind>execute</kind>
</auth>
</http>
</request>
With this request
, only users with the specifed execute privilege can POST
to
that URI. If a user without that privilege attempts to post, this request won't match
and control will fall through to the next request. In this way, you can provide fallbacks
if you wish.
In a rewriter, failing to match a condition causes the request not to match. In an endpoint, failing to match a condition raises an error.
As shown in the example above, the auth
condition tests for the specified privilege:
<auth> <privilege>privilege-uri</privilege> <kind>kind</kind> </auth>
The privilege can be any specified execute or URI privilege. If unspecified,
kind
defaults to execute
.
When a user agent requests a URI, it can also specify the kinds of responses that it is
able to accept. These are specified in terms of media types. You can specify the
media type(s) that are acceptable with the accept
header:
<accept>application/json</accept>
A request
that specifies the accept
shown above will only match user agent
requests that can accept JSON responses.
You can also match on the user agent string:
<user-agent>ELinks</user-agent>
A request
that specifies the user-agent
shown above will only match user agents
that identify as the ELinks browser.
The function
condition gives you the ability to test for arbitrary
conditions. By specifying the namespace, localname, and module of a
function, you can execute arbitrary code:
<function ns="http://example.com/module" apply="my-function" at="utils.xqy"/>
A request
that specifies the function
shown above will only match requests
for which the specified function returns true
. The function will be passed
the URI string and the function
condition element.
An and
condition must contain only conditions. It returns true if
and only if all of its child conditions return true
.
<and>
...conditions...
</and>
There is no guarantee that conditions will be evaluated in any particular order or that all conditions will be evaluated.
If more than one condition is present at the top level in a request
, they
are treated as they occurred in an and
.
An or
condition must contain only conditions. It returns true if and
only if at least one of its child conditions return true
.
<or>
...conditions...
</or>
There is no guarantee that conditions will be evaluated in any particular order or that all conditions will be evaluated.
By default, the rewriter and endpoint parser will reject any request that
includes additional, user-specified parameters. Sometimes you may want to
write an endpoint that allows arbitrary parameters specified by the user.
This behavior is controlled by the user-params
attribute. It can be specified
on the http
method, the request
, or even the options
element.
It can have the following values:
forbid
, the default, forbids additional parametersignore
, discards themallow
, allows themallow-dups
, allows them even if they duplicateuri-param
parameters
The REST endpoint library is accessed by importing it:
import module namespace rest="http://marklogic.com/appservices/rest"
at "/MarkLogic/appservices/utils/rest.xqy";
It contains the functions described in the sections below.
Because it's convenient to reuse the endpoint description in both the rewriter and the individual modules, it's often convenient to describe them in a separate module, like this:
xquery version "1.0-ml";
module namespace endpoints="http://example.com/ns/endpoints";
declare namespace rest="http://marklogic.com/appservices/rest";
declare default function namespace "http://www.w3.org/2005/xpath-functions";
declare option xdmp:mapping "false";
declare variable $endpoints:ENDPOINTS as element(rest:options)
:= <options xmlns="http://marklogic.com/appservices/rest">
... your endpoint descriptions go here ...
</options>;
declare function endpoints:options()
as element(rest:options)
{
$endpoints:ENDPOINTS
};
declare function endpoints:request(
$module as xs:string)
as element(rest:request)?
{
($endpoints:ENDPOINTS/rest:request[@endpoint = $module])[1]
};
The examples below follow this pattern, but you're not required to do so.
If you prefer, you can simply put the options
in your rewriter and individual
request
elements in your endpoints.
The rest:rewrite()
function is used in the URL rewriter to map the incoming
request to the endpoint that supports it. A typical rewriter.xqy
module looks
like this:
xquery version "1.0-ml";
import module namespace rest="http://marklogic.com/appservices/rest"
at "/MarkLogic/appservices/utils/rest.xqy";
import module namespace endpoints="http://example.com/ns/endpoints"
at "endpoints.xqy";
declare default function namespace "http://www.w3.org/2005/xpath-functions";
declare option xdmp:mapping "false";
let $rewrite := rest:rewrite(endpoints:options())
return
if (empty($rewrite))
then
$uri
else
$rewrite
The default, one-argument form of rest:rewrite()
takes all of the relevant
parameters (URI, HTTP request method, accept headers, and user parameters) from
the environment. There are other entry points that provide more control if that's
desirable.
The rest:process-request()
function is used in the endpoint main
module to parse the incoming request against the options. It returns a
map that contains all of the parameters as typed values. Processing
the request also checks all of the assocaited conditions and will
raise an error if any condition is not met.
A typical main module looks like this:
xquery version "1.0-ml";
import module namespace rest="http://marklogic.com/appservices/rest"
at "/MarkLogic/appservices/utils/rest.xqy";
import module namespace endpoints="http://example.com/ns/endpoints"
at "endpoints.xqy";
declare default function namespace "http://www.w3.org/2005/xpath-functions";
declare option xdmp:mapping "false";
try {
let $params := rest:process-request(endpoints:request('/thismodule.xqy'))
return
...your endpoint code goes here...
} catch ($e) {
rest:report-error($e)
}
If the request is processed successfully, you know that all of the conditions have
been met and the $params
map contains all of the parameters. If not, an error
will occur which you can catch and process. See Handling errors.
Proper functioning of the REST endpoint library depends on the correctness of the
descriptions used. The rest:check-options()
method takes an options
node and returns
a report of the problems found. If this function does not return an empty sequence,
you have made a mistake and the library will not perform reliably.
Proper functioning of the REST endpoint library depends on the correctness of the
descriptions used. The rest:check-request()
method takes a request
node and returns
a report of the problems found. If this function does not return an empty sequence,
you have made a mistake and the library will not perform reliably.
The rest:report-error()
function is a convenience function for transforming
error:error
nodes into HTML. See Handling errors.
The REST endpoint library exposes a few additional, low-level functions. These may be useful in more complex applications that need to perform sophisticated processing on requests or implement their own rewriting strategies.
Unlike, rest:rewrite()
which returns the rewrite URI, rest:matching-request()
returns
the request
element that matches.
Note that the rewrite URI is composed from both the request
element and the request
environment, including additional parameters, so it's not possible to construct the rewrite
URI solely from the request
.
The rest:test-request-method()
function tests a request
against
the xdmp:get-request-method()
. It returns an empty sequence if the
test passes and raises an error otherwise.
The rest:test-conditions()
function tests all of the
conditions of a request. It returns an empty sequence
if the test passes and raises an error otherwise.
The rest:get-acceptable-types()
function returns a list of media types.
These are the media types that are the intersection of what the endpoint description
claims it can produce and what the user agent claimed it could accept.
They're returned in preference order.
The rest:get-raw-query-params()
function returns a map of all the
query parameters. This does not include the parameters that would be
derived from matching the URI string. The parameters returned by this
function are all strings, they have not been type checked.
The REST endpoint library includes a rest:report-error()
function
that performs a simple translation of MarkLogic Server error markup to
HTML. You can invoke it in a module to report errors:
try {
let $params := rest:process-request($request)
return
...the non-error case...
} catch ($e) {
rest:report-error($e)
}
If the user agent making the request accepts text/html
, a simple HTML-formatted
response is returned. Otherwise, the raw error XML is returned.
You can also use this function in an error handler to process all of the errors for a particular application.
The URL rewriter translates the requested URI into a new URI for dispatching within the server. The user agent making the request is totally unaware of this translation. As REST APIs mature and expand, it's sometimes useful to respond to a request by telling the user agent to reissue the request at a new URI. This is called redirection.
For example, suppose we decide to change the URI pattern for slides from
/slides/
deck/
number to /presentations/
deck/
number. If someone
makes a request for a /slides/
URI, we want to use redirection to tell the
user agent they're using to reissue the request to the equivalent /presentations/
URI.
(Browser users can tell this has happened because the URI in their address bar
will change; this means if they bookmark the URI, they'll be bookmarking the new,
correct URI, not the old, incorrect one.)
You can support redirects by adding a redirect.xqy
module like this one to
your application:
xquery version "1.0-ml";
import module namespace rest="http://marklogic.com/appservices/rest"
at "/MarkLogic/appservies/utils/rest.xqy";
declare default function namespace "http://www.w3.org/2005/xpath-functions";
declare option xdmp:mapping "false";
declare variable $request as element(rest:request)
:= <request xmlns="http://marklogic.com/appservices/rest" user-params="allow">
<param name="__ml_redirect__" as="string" required="true"/>
</request>;
try {
let $params := rest:process-request($request)
let $rparams := for $key in map:keys($params)
where $key != "__ml_redirect__"
return
for $value in map:get($params, $key)
return
concat($key,"=",$value)
let $ruri := map:get($params, "__ml_redirect__")
return
if (empty($rparams))
then
$ruri
else
concat($ruri, "?", string-join($rparams, "&"))
} catch ($e) {
rest:report-error($e)
}
and then using request
elements like this to perform the redirect:
<request uri="^/slides/(.*)$" endpoint="/redirect.xqy" user-params="allow">
<uri-param name="__ml_redirect__">/presentations/$1</uri-param>
</request>
You can employ as many redirects as you want through the same redirect.xqy
module
by changing the value of the __ml_redirect__
parameter. (In the unlikely event that
your application needs a parameter with that name, simple change it to something else
in both the module and each request.)
One of the nice things about having a declarative syntax for endpoint descriptions is the ability to interrogate those definitions for other purposes. For example, one could imagine automating some aspects of unit testing based on the ability to find the description for an endpoint.
You can implement this using the REST endpoint library by supporting the
OPTIONS method. Here's a very simple options.xqy
module that will return
the request
associated with a particular URI.
xquery version "1.0-ml";
import module namespace rest="http://marklogic.com/appservices/rest"
at "rest.xqy";
import module namespace endpoints="http://example.com/ns/endpoints"
at "endpoints.xqy";
declare default function namespace "http://www.w3.org/2005/xpath-functions";
declare option xdmp:mapping "false";
declare variable $request as element(rest:request)
:= <request xmlns="http://marklogic.com/appservices/rest"
uri="^(.*)$" endpoint="/options.xqy" user-params="allow">
<uri-param name="__ml_options__">$1</uri-param>
<http method="OPTIONS"/>
</request>;
try {
let $params := rest:process-request($request)
let $ruri := map:get($params, "__ml_options__")
return
<options xmlns="http://marklogic.com/appservices/rest">
{ if ($ruri = "/")
then
endpoints:options()/rest:request
else
rest:matching-request(endpoints:options(), $ruri, "GET")
}
</options>
} catch ($e) {
rest:report-error($e)
}
Note that if the request URI is /
, this module will return the entire
options
element, exposing the complete set of endpoints. For consistency, even
a single request is therefore wrapped in an options
node.
Because a single description can match different HTTP methods, possibly with different
parameters, when the URI is not /
, the request is treated as a GET
for the purposes
of finding the request
.
You can use it by adding the following request to the end of your options:
<request uri="^(.+)$" endpoint="/options.xqy" user-params="allow">
<uri-param name="__ml_options__">$1</uri-param>
<http method="OPTIONS"/>
</request>
Obviously, if some earlier request directly supports OPTIONS
then it will have
priority for that resource.
Translating the options
node dynamically into something more widely used, such as
WADL, is an exercise left to the reader for the moment.