-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathregexp.html
197 lines (176 loc) · 14.1 KB
/
regexp.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>6.9. Regular Expression Functions — Presto 0.192 Documentation</title>
<link rel="stylesheet" href="../_static/presto.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '../',
VERSION: '0.192',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="top" title="Presto 0.192 Documentation" href="../index.html" />
<link rel="up" title="6. Functions and Operators" href="../functions.html" />
<link rel="next" title="6.10. Binary Functions and Operators" href="binary.html" />
<link rel="prev" title="6.8. String Functions and Operators" href="string.html" />
</head>
<body role="document">
<div class="header">
<h1 class="heading"><a href="../index.html">
<span>Presto 0.192 Documentation</span></a></h1>
<h2 class="heading"><span>6.9. Regular Expression Functions</span></h2>
</div>
<div class="topnav">
<p class="nav">
<span class="left">
« <a href="string.html">6.8. String Functions and Operators</a>
</span>
<span class="right">
<a href="binary.html">6.10. Binary Functions and Operators</a> »
</span>
</p>
</div>
<div class="content">
<div class="section" id="regular-expression-functions">
<h1>6.9. Regular Expression Functions</h1>
<p>All of the regular expression functions use the <a class="reference external" href="http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html">Java pattern</a> syntax,
with a few notable exceptions:</p>
<ul class="simple">
<li>When using multi-line mode (enabled via the <code class="docutils literal"><span class="pre">(?m)</span></code> flag),
only <code class="docutils literal"><span class="pre">\n</span></code> is recognized as a line terminator. Additionally,
the <code class="docutils literal"><span class="pre">(?d)</span></code> flag is not supported and must not be used.</li>
<li>Case-insensitive matching (enabled via the <code class="docutils literal"><span class="pre">(?i)</span></code> flag) is always
performed in a Unicode-aware manner. However, context-sensitive and
local-sensitive matching is not supported. Additionally, the
<code class="docutils literal"><span class="pre">(?u)</span></code> flag is not supported and must not be used.</li>
<li>Surrogate pairs are not supported. For example, <code class="docutils literal"><span class="pre">\uD800\uDC00</span></code> is
not treated as <code class="docutils literal"><span class="pre">U+10000</span></code> and must be specified as <code class="docutils literal"><span class="pre">\x{10000}</span></code>.</li>
<li>Boundaries (<code class="docutils literal"><span class="pre">\b</span></code>) are incorrectly handled for a non-spacing mark
without a base character.</li>
<li><code class="docutils literal"><span class="pre">\Q</span></code> and <code class="docutils literal"><span class="pre">\E</span></code> are not supported in character classes
(such as <code class="docutils literal"><span class="pre">[A-Z123]</span></code>) and are instead treated as literals.</li>
<li>Unicode character classes (<code class="docutils literal"><span class="pre">\p{prop}</span></code>) are supported with
the following differences:<ul>
<li>All underscores in names must be removed. For example, use
<code class="docutils literal"><span class="pre">OldItalic</span></code> instead of <code class="docutils literal"><span class="pre">Old_Italic</span></code>.</li>
<li>Scripts must be specified directly, without the
<code class="docutils literal"><span class="pre">Is</span></code>, <code class="docutils literal"><span class="pre">script=</span></code> or <code class="docutils literal"><span class="pre">sc=</span></code> prefixes.
Example: <code class="docutils literal"><span class="pre">\p{Hiragana}</span></code></li>
<li>Blocks must be specified with the <code class="docutils literal"><span class="pre">In</span></code> prefix.
The <code class="docutils literal"><span class="pre">block=</span></code> and <code class="docutils literal"><span class="pre">blk=</span></code> prefixes are not supported.
Example: <code class="docutils literal"><span class="pre">\p{Mongolian}</span></code></li>
<li>Categories must be specified directly, without the <code class="docutils literal"><span class="pre">Is</span></code>,
<code class="docutils literal"><span class="pre">general_category=</span></code> or <code class="docutils literal"><span class="pre">gc=</span></code> prefixes.
Example: <code class="docutils literal"><span class="pre">\p{L}</span></code></li>
<li>Binary properties must be specified directly, without the <code class="docutils literal"><span class="pre">Is</span></code>.
Example: <code class="docutils literal"><span class="pre">\p{NoncharacterCodePoint}</span></code></li>
</ul>
</li>
</ul>
<dl class="function">
<dt id="regexp_extract_all">
<code class="descname">regexp_extract_all</code><span class="sig-paren">(</span><em>string</em>, <em>pattern</em><span class="sig-paren">)</span> → array<varchar></dt>
<dd><p>Returns the substring(s) matched by the regular expression <code class="docutils literal"><span class="pre">pattern</span></code>
in <code class="docutils literal"><span class="pre">string</span></code>:</p>
<div class="highlight-sql"><div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="n">regexp_extract_all</span><span class="p">(</span><span class="s1">'1a 2b 14m'</span><span class="p">,</span> <span class="s1">'\d+'</span><span class="p">);</span> <span class="c1">-- [1, 2, 14]</span>
</pre></div>
</div>
</dd></dl>
<dl class="function">
<dt>
<code class="descname">regexp_extract_all</code><span class="sig-paren">(</span><em>string</em>, <em>pattern</em>, <em>group</em><span class="sig-paren">)</span> → array<varchar></dt>
<dd><p>Finds all occurrences of the regular expression <code class="docutils literal"><span class="pre">pattern</span></code> in <code class="docutils literal"><span class="pre">string</span></code>
and returns the <a class="reference external" href="http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#gnumber">capturing group number</a> <code class="docutils literal"><span class="pre">group</span></code>:</p>
<div class="highlight-sql"><div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="n">regexp_extract_all</span><span class="p">(</span><span class="s1">'1a 2b 14m'</span><span class="p">,</span> <span class="s1">'(\d+)([a-z]+)'</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span> <span class="c1">-- ['a', 'b', 'm']</span>
</pre></div>
</div>
</dd></dl>
<dl class="function">
<dt id="regexp_extract">
<code class="descname">regexp_extract</code><span class="sig-paren">(</span><em>string</em>, <em>pattern</em><span class="sig-paren">)</span> → varchar</dt>
<dd><p>Returns the first substring matched by the regular expression <code class="docutils literal"><span class="pre">pattern</span></code>
in <code class="docutils literal"><span class="pre">string</span></code>:</p>
<div class="highlight-sql"><div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="n">regexp_extract</span><span class="p">(</span><span class="s1">'1a 2b 14m'</span><span class="p">,</span> <span class="s1">'\d+'</span><span class="p">);</span> <span class="c1">-- 1</span>
</pre></div>
</div>
</dd></dl>
<dl class="function">
<dt>
<code class="descname">regexp_extract</code><span class="sig-paren">(</span><em>string</em>, <em>pattern</em>, <em>group</em><span class="sig-paren">)</span> → varchar</dt>
<dd><p>Finds the first occurrence of the regular expression <code class="docutils literal"><span class="pre">pattern</span></code> in
<code class="docutils literal"><span class="pre">string</span></code> and returns the <a class="reference external" href="http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#gnumber">capturing group number</a> <code class="docutils literal"><span class="pre">group</span></code>:</p>
<div class="highlight-sql"><div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="n">regexp_extract</span><span class="p">(</span><span class="s1">'1a 2b 14m'</span><span class="p">,</span> <span class="s1">'(\d+)([a-z]+)'</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span> <span class="c1">-- 'a'</span>
</pre></div>
</div>
</dd></dl>
<dl class="function">
<dt id="regexp_like">
<code class="descname">regexp_like</code><span class="sig-paren">(</span><em>string</em>, <em>pattern</em><span class="sig-paren">)</span> → boolean</dt>
<dd><p>Evaluates the regular expression <code class="docutils literal"><span class="pre">pattern</span></code> and determines if it is
contained within <code class="docutils literal"><span class="pre">string</span></code>.</p>
<p>This function is similar to the <code class="docutils literal"><span class="pre">LIKE</span></code> operator, expect that the
pattern only needs to be contained within <code class="docutils literal"><span class="pre">string</span></code>, rather than
needing to match all of <code class="docutils literal"><span class="pre">string</span></code>. In other words, this performs a
<em>contains</em> operation rather than a <em>match</em> operation. You can match
the entire string by anchoring the pattern using <code class="docutils literal"><span class="pre">^</span></code> and <code class="docutils literal"><span class="pre">$</span></code>:</p>
<div class="highlight-sql"><div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="n">regexp_like</span><span class="p">(</span><span class="s1">'1a 2b 14m'</span><span class="p">,</span> <span class="s1">'\d+b'</span><span class="p">);</span> <span class="c1">-- true</span>
</pre></div>
</div>
</dd></dl>
<dl class="function">
<dt id="regexp_replace">
<code class="descname">regexp_replace</code><span class="sig-paren">(</span><em>string</em>, <em>pattern</em><span class="sig-paren">)</span> → varchar</dt>
<dd><p>Removes every instance of the substring matched by the regular expression
<code class="docutils literal"><span class="pre">pattern</span></code> from <code class="docutils literal"><span class="pre">string</span></code>:</p>
<div class="highlight-sql"><div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="n">regexp_replace</span><span class="p">(</span><span class="s1">'1a 2b 14m'</span><span class="p">,</span> <span class="s1">'\d+[ab] '</span><span class="p">);</span> <span class="c1">-- '14m'</span>
</pre></div>
</div>
</dd></dl>
<dl class="function">
<dt>
<code class="descname">regexp_replace</code><span class="sig-paren">(</span><em>string</em>, <em>pattern</em>, <em>replacement</em><span class="sig-paren">)</span> → varchar</dt>
<dd><p>Replaces every instance of the substring matched by the regular expression
<code class="docutils literal"><span class="pre">pattern</span></code> in <code class="docutils literal"><span class="pre">string</span></code> with <code class="docutils literal"><span class="pre">replacement</span></code>. <a class="reference external" href="http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#cg">Capturing groups</a> can be
referenced in <code class="docutils literal"><span class="pre">replacement</span></code> using <code class="docutils literal"><span class="pre">$g</span></code> for a numbered group or
<code class="docutils literal"><span class="pre">${name}</span></code> for a named group. A dollar sign (<code class="docutils literal"><span class="pre">$</span></code>) may be included in the
replacement by escaping it with a backslash (<code class="docutils literal"><span class="pre">\$</span></code>):</p>
<div class="highlight-sql"><div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="n">regexp_replace</span><span class="p">(</span><span class="s1">'1a 2b 14m'</span><span class="p">,</span> <span class="s1">'(\d+)([ab]) '</span><span class="p">,</span> <span class="s1">'3c$2 '</span><span class="p">);</span> <span class="c1">-- '3ca 3cb 14m'</span>
</pre></div>
</div>
</dd></dl>
<dl class="function">
<dt id="regexp_split">
<code class="descname">regexp_split</code><span class="sig-paren">(</span><em>string</em>, <em>pattern</em><span class="sig-paren">)</span> → array<varchar></dt>
<dd><p>Splits <code class="docutils literal"><span class="pre">string</span></code> using the regular expression <code class="docutils literal"><span class="pre">pattern</span></code> and returns an
array. Trailing empty strings are preserved:</p>
<div class="highlight-sql"><div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="n">regexp_split</span><span class="p">(</span><span class="s1">'1a 2b 14m'</span><span class="p">,</span> <span class="s1">'\s*[a-z]+\s*'</span><span class="p">);</span> <span class="c1">-- [1, 2, 14, ]</span>
</pre></div>
</div>
</dd></dl>
</div>
</div>
<div class="bottomnav">
<p class="nav">
<span class="left">
« <a href="string.html">6.8. String Functions and Operators</a>
</span>
<span class="right">
<a href="binary.html">6.10. Binary Functions and Operators</a> »
</span>
</p>
</div>
<div class="footer" role="contentinfo">
</div>
</body>
</html>