PostgreSQL 14 preview - 一些SQL标准函数调用parser 增强 - OVERLAPS , OVERLAY , SUBSTRING , SIMILAR , NORMALIZE , AT TIME ZONE , POSITION ...
digoal
2020-11-07
PostgreSQL , SQL
Improve our ability to regurgitate SQL-syntax function calls.
The SQL spec calls out nonstandard syntax for certain function calls,
for example substring() with numeric position info is supposed to be
spelled "SUBSTRING(string FROM start FOR count)". We accept many
of these things, but up to now would not print them in the same format,
instead simplifying down to "substring"(string, start, count).
That's long annoyed me because it creates an interoperability
problem: we're gratuitously injecting Postgres-specific syntax into
what might otherwise be a perfectly spec-compliant view definition.
However, the real reason for addressing it right now is to support
a planned change in the semantics of EXTRACT() a/k/a date_part().
When we switch that to returning numeric, we'll have the parser
translate EXTRACT() to some new function name (might as well be
"extract" if you ask me) and then teach ruleutils.c to reverse-list
that per SQL spec. In this way existing calls to date_part() will
continue to have the old semantics.
To implement this, invent a new CoercionForm value COERCE_SQL_SYNTAX,
and make the parser insert that rather than COERCE_EXPLICIT_CALL when
the input has SQL-spec decoration. (But if the input has the form of
a plain function call, continue to mark it COERCE_EXPLICIT_CALL, even
if it's calling one of these functions.) Then ruleutils.c recognizes
COERCE_SQL_SYNTAX as a cue to emit SQL call syntax. It can know
which decoration to emit using hard-wired knowledge about the
functions that could be called this way. (While this solution isn't
extensible without manual additions, neither is the grammar, so this
doesn't seem unmaintainable.) Notice that this solution will
reverse-list a function call with SQL decoration only if it was
entered that way; so dump-and-reload will not by itself produce any
changes in the appearance of views.
This requires adding a CoercionForm field to struct FuncCall.
(I couldn't resist the temptation to rearrange that struct's
field order a tad while I was at it.) FuncCall doesn't appear
in stored rules, so that change isn't a reason for a catversion
bump, but I did one anyway because the new enum value for
CoercionForm fields could confuse old backend code.
Possible future work:
* Perhaps CoercionForm should now be renamed to DisplayForm,
or something like that, to reflect its more general meaning.
This'd require touching a couple hundred places, so it's not
clear it's worth the code churn.
* The SQLValueFunction node type, which was invented partly for
the same goal of improving SQL-compatibility of view output,
could perhaps be replaced with regular function calls marked
with COERCE_SQL_SYNTAX. It's unclear if this would be a net
code savings, however.
Discussion: https://postgr.es/m/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
扩展的SQL语法如下
+-- reverse-listing of various special function syntaxes required by SQL
+create view tt201v as
+select
+ extract(day from now()) as extr,
+ (now(), '1 day'::interval) overlaps
+ (current_timestamp(2), '1 day'::interval) as o,
+ 'foo' is normalized isn,
+ 'foo' is nfkc normalized isnn,
+ normalize('foo') as n,
+ normalize('foo', nfkd) as nfkd,
+ overlay('foo' placing 'bar' from 2) as ovl,
+ overlay('foo' placing 'bar' from 2 for 3) as ovl2,
+ position('foo' in 'foobar') as p,
+ substring('foo' from 2 for 3) as s,
+ substring('foo' similar 'f' escape '#') as ss,
+ substring('foo' from 'oo') as ssf, -- historically-permitted abuse
+ trim(' ' from ' foo ') as bt,
+ trim(leading ' ' from ' foo ') as lt,
+ trim(trailing ' foo ') as rt;
+select pg_get_viewdef('tt201v', true);
+ pg_get_viewdef
+-----------------------------------------------------------------------------------------------
+ SELECT date_part('day'::text, now()) AS extr, +
+ ((now(), '@ 1 day'::interval) OVERLAPS (CURRENT_TIMESTAMP(2), '@ 1 day'::interval)) AS o,+
+ (('foo'::text) IS NORMALIZED) AS isn, +
+ (('foo'::text) IS NFKC NORMALIZED) AS isnn, +
+ NORMALIZE('foo'::text) AS n, +
+ NORMALIZE('foo'::text, NFKD) AS nfkd, +
+ OVERLAY('foo'::text PLACING 'bar'::text FROM 2) AS ovl, +
+ OVERLAY('foo'::text PLACING 'bar'::text FROM 2 FOR 3) AS ovl2, +
+ POSITION(('foo'::text) IN ('foobar'::text)) AS p, +
+ SUBSTRING('foo'::text FROM 2 FOR 3) AS s, +
+ SUBSTRING('foo'::text SIMILAR 'f'::text ESCAPE '#'::text) AS ss, +
+ "substring"('foo'::text, 'oo'::text) AS ssf, +
+ TRIM(BOTH ' '::text FROM ' foo '::text) AS bt, +
+ TRIM(LEADING ' '::text FROM ' foo '::text) AS lt, +
+ TRIM(TRAILING FROM ' foo '::text) AS rt;
+(1 row)
+
执行一下以上view, 把alias column name去掉, 就可以看到对应的function call.
postgres=> select TRIM(TRAILING FROM ' foo '::text);
rtrim
-------
foo
(1 row)
postgres=> select POSITION(('foo'::text) IN ('foobar'::text)) ;
position
----------
1
(1 row)
at time zone用法
explain (costs off)
select * from tmptz where f1 at time zone 'utc' = '2017-01-18 00:00';
- QUERY PLAN
--------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
Seq Scan on tmptz
- Filter: (timezone('utc'::text, f1) = 'Wed Jan 18 00:00:00 2017'::timestamp without time zone)
+ Filter: ((f1 AT TIME ZONE 'utc'::text) = 'Wed Jan 18 00:00:00 2017'::timestamp without time zone)
(2 rows)
https://www.postgresql.org/docs/devel/functions-datetime.html
In addition to these functions, the SQL OVERLAPS operator is supported:
(start1, end1) OVERLAPS (start2, end2)
(start1, length1) OVERLAPS (start2, length2)
This expression yields true when two time periods (defined by their endpoints) overlap, false when they do not overlap. The endpoints can be specified as pairs of dates, times, or time stamps; or as a date, time, or time stamp followed by an interval. When a pair of values is provided, either the start or the end can be written first; OVERLAPS automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant. This means for instance that two time periods with only an endpoint in common do not overlap.
SELECT (DATE '2001-02-16', DATE '2001-12-21') OVERLAPS
(DATE '2001-10-30', DATE '2002-10-30');
Result: true
SELECT (DATE '2001-02-16', INTERVAL '100 days') OVERLAPS
(DATE '2001-10-30', DATE '2002-10-30');
Result: false
SELECT (DATE '2001-10-29', DATE '2001-10-30') OVERLAPS
(DATE '2001-10-30', DATE '2001-10-31');
Result: false
SELECT (DATE '2001-10-30', DATE '2001-10-30') OVERLAPS
(DATE '2001-10-30', DATE '2001-10-31');
Result: true
https://www.postgresql.org/docs/devel/functions-string.html
text IS [NOT] [form] NORMALIZED → boolean
Checks whether the string is in the specified Unicode normalization form. The optional form key word specifies the form: NFC (the default), NFD, NFKC, or NFKD. This expression can only be used when the server encoding is UTF8. Note that checking for normalization using this expression is often faster than normalizing possibly already normalized strings.
U&'\0061\0308bc' IS NFD NORMALIZED → t
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.