Skip to content

Latest commit

 

History

History
219 lines (164 loc) · 10.7 KB

20201107_04.md

File metadata and controls

219 lines (164 loc) · 10.7 KB

PostgreSQL 14 preview - 一些SQL标准函数调用parser 增强 - OVERLAPS , OVERLAY , SUBSTRING , SIMILAR , NORMALIZE , AT TIME ZONE , POSITION ...

作者

digoal

日期

2020-11-07

标签

PostgreSQL , SQL


背景

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=40c24bfef92530bd846e111c1742c2a54441c62c

Improve our ability to regurgitate SQL-syntax function calls.     
    
The SQL spec calls out nonstandard syntax for certain function calls,    
for example substring() with numeric position info is supposed to be    
spelled "SUBSTRING(string FROM start FOR count)".  We accept many    
of these things, but up to now would not print them in the same format,    
instead simplifying down to "substring"(string, start, count).    
That's long annoyed me because it creates an interoperability    
problem: we're gratuitously injecting Postgres-specific syntax into    
what might otherwise be a perfectly spec-compliant view definition.    
However, the real reason for addressing it right now is to support    
a planned change in the semantics of EXTRACT() a/k/a date_part().    
When we switch that to returning numeric, we'll have the parser    
translate EXTRACT() to some new function name (might as well be    
"extract" if you ask me) and then teach ruleutils.c to reverse-list    
that per SQL spec.  In this way existing calls to date_part() will    
continue to have the old semantics.    
    
To implement this, invent a new CoercionForm value COERCE_SQL_SYNTAX,    
and make the parser insert that rather than COERCE_EXPLICIT_CALL when    
the input has SQL-spec decoration.  (But if the input has the form of    
a plain function call, continue to mark it COERCE_EXPLICIT_CALL, even    
if it's calling one of these functions.)  Then ruleutils.c recognizes    
COERCE_SQL_SYNTAX as a cue to emit SQL call syntax.  It can know    
which decoration to emit using hard-wired knowledge about the    
functions that could be called this way.  (While this solution isn't    
extensible without manual additions, neither is the grammar, so this    
doesn't seem unmaintainable.)  Notice that this solution will    
reverse-list a function call with SQL decoration only if it was    
entered that way; so dump-and-reload will not by itself produce any    
changes in the appearance of views.    
    
This requires adding a CoercionForm field to struct FuncCall.    
(I couldn't resist the temptation to rearrange that struct's    
field order a tad while I was at it.)  FuncCall doesn't appear    
in stored rules, so that change isn't a reason for a catversion    
bump, but I did one anyway because the new enum value for    
CoercionForm fields could confuse old backend code.    
    
Possible future work:    
    
* Perhaps CoercionForm should now be renamed to DisplayForm,    
or something like that, to reflect its more general meaning.    
This'd require touching a couple hundred places, so it's not    
clear it's worth the code churn.    
    
* The SQLValueFunction node type, which was invented partly for    
the same goal of improving SQL-compatibility of view output,    
could perhaps be replaced with regular function calls marked    
with COERCE_SQL_SYNTAX.  It's unclear if this would be a net    
code savings, however.    
    
Discussion: https://postgr.es/m/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu    

扩展的SQL语法如下

+-- reverse-listing of various special function syntaxes required by SQL    
+create view tt201v as    
+select    
+  extract(day from now()) as extr,    
+  (now(), '1 day'::interval) overlaps    
+    (current_timestamp(2), '1 day'::interval) as o,    
+  'foo' is normalized isn,    
+  'foo' is nfkc normalized isnn,    
+  normalize('foo') as n,    
+  normalize('foo', nfkd) as nfkd,    
+  overlay('foo' placing 'bar' from 2) as ovl,    
+  overlay('foo' placing 'bar' from 2 for 3) as ovl2,    
+  position('foo' in 'foobar') as p,    
+  substring('foo' from 2 for 3) as s,    
+  substring('foo' similar 'f' escape '#') as ss,    
+  substring('foo' from 'oo') as ssf,  -- historically-permitted abuse    
+  trim(' ' from ' foo ') as bt,    
+  trim(leading ' ' from ' foo ') as lt,    
+  trim(trailing ' foo ') as rt;    
    
    
+select pg_get_viewdef('tt201v', true);    
+                                        pg_get_viewdef                                             
+-----------------------------------------------------------------------------------------------    
+  SELECT date_part('day'::text, now()) AS extr,                                               +    
+     ((now(), '@ 1 day'::interval) OVERLAPS (CURRENT_TIMESTAMP(2), '@ 1 day'::interval)) AS o,+    
+     (('foo'::text) IS NORMALIZED) AS isn,                                                    +    
+     (('foo'::text) IS NFKC NORMALIZED) AS isnn,                                              +    
+     NORMALIZE('foo'::text) AS n,                                                             +    
+     NORMALIZE('foo'::text, NFKD) AS nfkd,                                                    +    
+     OVERLAY('foo'::text PLACING 'bar'::text FROM 2) AS ovl,                                  +    
+     OVERLAY('foo'::text PLACING 'bar'::text FROM 2 FOR 3) AS ovl2,                           +    
+     POSITION(('foo'::text) IN ('foobar'::text)) AS p,                                        +    
+     SUBSTRING('foo'::text FROM 2 FOR 3) AS s,                                                +    
+     SUBSTRING('foo'::text SIMILAR 'f'::text ESCAPE '#'::text) AS ss,                         +    
+     "substring"('foo'::text, 'oo'::text) AS ssf,                                             +    
+     TRIM(BOTH ' '::text FROM ' foo '::text) AS bt,                                           +    
+     TRIM(LEADING ' '::text FROM ' foo '::text) AS lt,                                        +    
+     TRIM(TRAILING FROM ' foo '::text) AS rt;    
+(1 row)    
+    

执行一下以上view, 把alias column name去掉, 就可以看到对应的function call.

postgres=> select TRIM(TRAILING FROM ' foo '::text);
 rtrim 
-------
  foo
(1 row)


postgres=> select POSITION(('foo'::text) IN ('foobar'::text)) ;
 position 
----------
        1
(1 row)

at time zone用法

 explain (costs off)    
 select * from tmptz where f1 at time zone 'utc' = '2017-01-18 00:00';    
-                                           QUERY PLAN                                                
--------------------------------------------------------------------------------------------------    
+                                             QUERY PLAN                                                  
+-----------------------------------------------------------------------------------------------------    
  Seq Scan on tmptz    
-   Filter: (timezone('utc'::text, f1) = 'Wed Jan 18 00:00:00 2017'::timestamp without time zone)    
+   Filter: ((f1 AT TIME ZONE 'utc'::text) = 'Wed Jan 18 00:00:00 2017'::timestamp without time zone)    
 (2 rows)    

https://www.postgresql.org/docs/devel/functions-datetime.html

In addition to these functions, the SQL OVERLAPS operator is supported:

(start1, end1) OVERLAPS (start2, end2)    
(start1, length1) OVERLAPS (start2, length2)    

This expression yields true when two time periods (defined by their endpoints) overlap, false when they do not overlap. The endpoints can be specified as pairs of dates, times, or time stamps; or as a date, time, or time stamp followed by an interval. When a pair of values is provided, either the start or the end can be written first; OVERLAPS automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant. This means for instance that two time periods with only an endpoint in common do not overlap.

SELECT (DATE '2001-02-16', DATE '2001-12-21') OVERLAPS    
       (DATE '2001-10-30', DATE '2002-10-30');    
Result: true    
SELECT (DATE '2001-02-16', INTERVAL '100 days') OVERLAPS    
       (DATE '2001-10-30', DATE '2002-10-30');    
Result: false    
SELECT (DATE '2001-10-29', DATE '2001-10-30') OVERLAPS    
       (DATE '2001-10-30', DATE '2001-10-31');    
Result: false    
SELECT (DATE '2001-10-30', DATE '2001-10-30') OVERLAPS    
       (DATE '2001-10-30', DATE '2001-10-31');    
Result: true    

https://www.postgresql.org/docs/devel/functions-string.html

text IS [NOT] [form] NORMALIZED → boolean    

Checks whether the string is in the specified Unicode normalization form. The optional form key word specifies the form: NFC (the default), NFD, NFKC, or NFKD. This expression can only be used when the server encoding is UTF8. Note that checking for normalization using this expression is often faster than normalizing possibly already normalized strings.

U&'\0061\0308bc' IS NFD NORMALIZED → t    

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat