Query Carto datasets with a Socrata-style SODA2 API. Depends heavily on node-soda2-parser, which is a light wrapper around the node-sqlparser AST parser.
soda-carto is a reverse proxy meant to simulate a Socrata-style SODA2 API in front of a Carto API. To do that, it has to (1) translate the inbound SODA request into a Carto request, and (2) translate the Carto response into a response format you'd expect from a SODA API. 90% of the work necessary was on translating the inbound request.
Carto expects a pure PostgreSQL/PostGIS query, and SODA requests are almost
SQL to begin with: the SODA API consists of querystring parameters like
$select, $where, $group, $limit, etc. By combining those values, you can easily construct a SQL query. The SODA API also allows "simple filters," such as ?district=northinstead of$where=district = 'north'. It also supports a $query` parameter where you can write pure SQL, but that isn't supported by
soda-carto under the assumption that it's not commonly used.
In order to translate these combinations of querystrings into a Carto request (basically a PostgreSQL/PostGIS query), soda-carto first delegates to node-soda2-parser, which constructs a standard SQL query from the querystring parts, then parses it to an abstract syntax tree (AST) using node-sqlparser. This logic resides in its own package so that other layers can sit on top of it (such as soda-geoservices), but that may be more trouble than it's worth.
soda-carto applies a few changes to the parsed AST:
- Insert the table name as the
FROMclause - Add a default limit of
1000if none is specified - Select the geometry as GeoJSON if the output format is json, and as Well-known Text if the output format is csv
- Rename the geometry field, since Socrata datasets support arbitrary geometry
field names and in carto, they're always called
the_geom - Convert Socrata's convex_hull, within_box, within_circle, and within_polygon functions to standard PostGIS
soda-carto then uses node-soda2-parser's stringify function to convert the
touched-up AST back to SQL. This logic is all wrapped by handler.js, which
handles an inbound HTTP request, translates it to a Carto/SQL query (as
described above), then executes that query on a Carto API and passes on the
response. It also does a few other minor things:
- If the desired output format is json, it returns only the
rowsproperty of the Carto response, to match the response format you'd expect from a SODA API - It uses a lookup file (
datasets.yml) to match Socrata dataset IDs (ie.sd09-aox1) to the corresponding Carto table (this file must be populated manually)
The last layer is serverless, which uses a simple
YAML configuration file to deploy AWS Lambda
functions. serverless.yml defines the routes and delegates the handling to
handler.js.
To run this yourself, git clone the repository and npm install the
dependencies. You'll also need to install serverless using npm install --global serverless.
You can change the CARTO_DOMAIN inside serverless.yml, then
use serverless deploy to deploy the function to AWS Lambda. To view the logs,
check CloudWatch or use serverless logs -f soda.
There is also a local web server provided by server.js, but there isn't full
parity between that and what AWS Lambda would show; we need to rewire
server.js to simply delegate to handler.js.
The below table outlines the specific features identified in the SODA2
docs. Features designated auto are
already PostgreSQL-compliant, so their support can be taken for granted.
Features marked supported have custom "translation" code written to make them
PostgreSQL/Carto-compliant. Features marked "todo" should be possible, but
require translation code to be written. Those marked "not supported" are likely
due to a shortcoming of the node-sqlparser library. Each feature should have a
test written.
| category | feature | example | support |
|---|---|---|---|
| $select | all fields | $select=* |
auto |
| $select | comma-separated fields | $select=foo, bar |
auto |
| $select | aliases | $select=foo AS bar |
auto |
| $select | operators | $select=foo * 2 AS double_foo |
auto |
| $select | count | $select=count(*) AS count |
auto |
| $select | sum | $select=sum(foo) AS total |
auto |
| $select | average | $select=avg(foo) |
auto |
| $select | minimum | $select=min(foo) |
auto |
| $select | maximum | $select=max(foo) |
auto |
| $select | date truncation (y/ym/ymd) | $select=date_trunc_ym(datetime) AS month |
todo |
| $select | convex_hull | $select=convex_hull(location) |
supported |
| $select | case | $select=case(type = 'A', 'Full', type = 'B', 'Partial') |
supported |
| $select | extent | $select=extent(location) |
todo |
| $select | simplify | $select=simplify(location, 0.001) |
todo |
| $select | number of vertices | $select=num_points(location) |
todo |
| $select | distance in meters | $select=distance_in_meters(location, 'POINT(-122.334540 47.59815)') AS range |
todo |
| $select | concatenate strings | $select=foo |
|
| $where | equality expression | $where=foo = 'bar' |
auto |
| $where | AND operator | $where=foo = 'bar' AND baz = 2 |
auto |
| $where | parentheses | $where=foo = 'bar' AND (baz = 2 OR baz = 3) |
auto |
| $where | operators | $where=end - start < 3 |
auto |
| $where | simple filters | foo=bar&baz=1 |
supported |
| $where | simple filters with $where clause | foo=bar&$where=baz = 1` | supported |
| $where | quotes within strings | $where=foo = 'bob''s burgers' |
not supported |
| $where | quotes within simple filters | $foo=bob's burgers |
supported |
| $where | boolean fields | $where=foo = true |
auto |
| $where | boolean fields short-hand | $where=foo |
auto |
| $where | in function | $where=foo in ('bar', 'baz') |
auto |
| $where | not in function | $where=foo not in ('bar', 'baz') |
auto |
| $where | between | $where=foo between '100' and '200' |
auto |
| $where | not between | $where=foo not between '100' and '200' |
not supported |
| $where | intersects | $where=intersects(location, 'POINT (-12.3, 45.6)') |
supported |
| $where | starts with | $where=starts_with(title, 'chief') |
todo |
| $where | within box | $where=within_box(location, 47.5, -122.3, 47.5, -122.3) |
supported |
| $where | within circle | $where=within_circle(location, 47.59815, -122.33454, 500) |
supported |
| $where | within polygon | $where=within_polygon(location, 'MULTIPOLYGON (((-87.637714 41.887275, -87.613681 41.886892, -87.625526 41.871555, -87.637714 41.887275)))') |
supported |
| $where | record updated | $where=:updated_at > '2017-02-19' |
not supported |
| $order | order by | $order=foo |
auto |
| $order | order by with direction | $order=foo DESC |
auto |
| $group | group by | $group=foo |
auto |
| $group | group by truncated date | $group=date_trunc_ym(datetime) |
todo |
| $having | having | $select=count(*) AS count&$group=bar&$having=count > 20 |
todo |
| $limit | limit | $limit=10 |
auto |
| $limit | default limit of 1000 | $select=foo |
supported |
| $offset | offset | $offset=5 |
auto |
| $q | full-text search | $q=foo |
todo |
| $q | stemming | $q=users |
not supported |
| $q | multiple words are ANDed | $q=test user |
todo |
| $query | soql query | $query=SELECT * WHERE foo = 'bar' |
todo |
| $query | sub-query | $query=SELECT city_feature, COUNT(*) AS count GROUP BY city_feature | > SELECT COUNT(city_feature) AS num_types, SUM(count) AS total_features |
not supported |
| $jsonp | jsonp callback | $jsonp=callback |
todo |