Skip to content

Latest commit

 

History

History
417 lines (325 loc) · 14.8 KB

docs.md

File metadata and controls

417 lines (325 loc) · 14.8 KB

sqlite-regex Documentation

A full reference to every function and module that sqlite-regex offers.

As a reminder, sqlite-regex follows semver and is pre v1, so breaking changes are to be expected.

API Reference

regexp()

An implementation of the REGEXP() operator for SQLite, described here:

The REGEXP operator is a special syntax for the regexp() user function. No regexp() user function is defined by default and so use of the REGEXP operator will normally result in an error message. If an application-defined SQL function named "regexp" is added at run-time, then the "X REGEXP Y" operator will be implemented as a call to "regexp(Y,X)". >

SQL Language Expressions, on sqlite.org

This can be used with the text regexp pattern or regexp(pattern, text) syntax. See the regex crate documentation for allowed syntax/features in the regex pattern string.

select regexp('[abc]', 'a'); -- 1
select regexp('[abc]', 'x'); -- 0

select 'a' regexp '[abc]'; -- 1
select 'x' regexp '[abc]'; -- 0


--

regex(pattern)

Creates a regex "object" with the given pattern, using SQLite's pointer passing interface. Useful when caching regex patterns in heavy queries that use sqlite-regex table functions, like regex_split() or regex_find_all().

Note that the return value will appear to be NULL because of SQLite pointer passing interface. To debug, use regex_print() to print the pattern string of a regex object.

select regex('[abc]'); -- NULL, but is still a regex "object"
select regex("[abc"); -- Errors with 'Error parsing pattern as regex: ...'

select regex_print(regex('[abc]')); -- '[abc]'

regex_print(regex)

Prints the pattern of a regex object created with regex().

select regex_print(regex('[abc]')); -- '[abc]'
--

regex_valid(pattern)

Returns 1 if the given pattern is a valid regular expression, 0 otherwise.

select regex_valid('abc'); -- 1
select regex_valid('[abc]'); -- 1
select regex_valid('[abc'); -- 0
select regex_valid(''); -- 1
--

regex_find(pattern, text)

Find and return the text of the given pattern in the string, or NULL otherwise. Errors if pattern is not legal regex. Based on Regex.find().

select regex_find(
  '[0-9]{3}-[0-9]{3}-[0-9]{4}',
  'phone: 111-222-3333'
);
-- '111-222-3333'

select * from regex_find_all(pattern, text)

Find all instances of a pattern in the given text. Based on Regex.find_iter().

The returned columns:

  • rowid: The 0-based index of the match.
  • start: The 0-based index of the starting character of the match inside the text.
  • end: The 0-based index of the ending character of the match inside the text.
  • match: The full string match.

For faster results, wrap the pattern with the regex() function for caching.

select rowid, *
from regex_find_all(
  regex('\b\w{13}\b'),
  'Retroactively relinquishing remunerations is reprehensible.'
);
/*
┌───────┬───────┬─────┬───────────────┐
│ rowid │ start │ end │     match     │
├───────┼───────┼─────┼───────────────┤
│ 0     │ 0     │ 13  │ Retroactively │
│ 1     │ 14    │ 27  │ relinquishing │
│ 2     │ 28    │ 41  │ remunerations │
│ 3     │ 45    │ 58  │ reprehensible │
└───────┴───────┴─────┴───────────────┘

regex_capture(pattern, text, group)

Returns the text of the capture group with the specific group index or name, or NULL otherwise. Errors if pattern is not legal regex. Based on Regex.captures().

If group is a number, then the N-th capture group is returned, where 0 refers to the entire match, 1 refers to the first left-most capture group in the match, 2 the second, and so on. If the provided group number "overflows', then NULL is returned.

select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  0
);
-- "'Citizen Kane' (1941)"

select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  1
);
-- "Citizen Kane"


select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  2
);
-- "1941"

select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  3
);
-- NULL

If group is a string, then the value of the capture group with the same name is returned. If there is no matching capture group with the name, or the group was not captured, then NULL is returned.

select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  'title'
);
-- "Citizen Kane"


select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  'year'
);
-- "1941"

select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  'not_exist'
);
-- NULL

Note that there is a version of regex_capture() that only have two parameters: captures and group. This can only be used with the regex_captures table function, with the special captures column like so:

select
  regex_capture(captures, 'title')      as title,
  regex_capture(captures, 'year')       as year,
  regex_capture(captures, 'not_exist')  as not_exist
from regex_captures(
  regex("'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)"),
  "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931)."
);
/*
┌──────────────────┬──────┬───────────┐
│      title       │ year │ not_exist │
├──────────────────┼──────┼───────────┤
│ Citizen Kane     │ 1941 │           │
│ The Wizard of Oz │ 1939 │           │
│ M                │ 1931 │           │
└──────────────────┴──────┴───────────┘
*/

select * from regex_captures(pattern, text)

Returns all non-overlapping capture groups in the given text. Similar to regex_find_all, but allows for extracting capture information. Must use with the regex_capture function to extract capture group values. Based on Regex.captures_iter().

The returned columns:

  • rowid: The 0-based index of the match. 0 is the entire match, 1 the first matching capture group, 2 the second, etc.
  • captures: A special value that's meant to be passed into regex_capture(). Will appear NULL through direct access.

For faster results, wrap the pattern with the regex() function for caching.

select
  rowid,
  captures,
  regex_capture(captures, 0)        as "0",
  regex_capture(captures, 1)        as "1",
  regex_capture(captures, 2)        as "2",
  regex_capture(captures, 3)        as "3"
from regex_captures(
  regex("'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)"),
  "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931)."
);
/*
┌───────┬──────────┬───────────────────────────┬──────────────────┬──────┬───┐
│ rowid │ captures │             0             │        1         │  2   │ 3 │
├───────┼──────────┼───────────────────────────┼──────────────────┼──────┼───┤
│ 0     │          │ 'Citizen Kane' (1941)     │ Citizen Kane     │ 1941 │   │
│ 1     │          │ 'The Wizard of Oz' (1939) │ The Wizard of Oz │ 1939 │   │
│ 2     │          │ 'M' (1931)                │ M                │ 1931 │   │
└───────┴──────────┴───────────────────────────┴──────────────────┴──────┴───┘
*/
select
  rowid,
  captures,
  regex_capture(captures, 'title')  as title,
  regex_capture(captures, 'year')  as year,
  regex_capture(captures, 'blah')  as blah
from regex_captures(
  regex("'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)"),
  "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931)."
);
/*
┌───────┬──────────┬──────────────────┬──────┬──────┐
│ rowid │ captures │      title       │ year │ blah │
├───────┼──────────┼──────────────────┼──────┼──────┤
│ 0     │          │ Citizen Kane     │ 1941 │      │
│ 1     │          │ The Wizard of Oz │ 1939 │      │
│ 2     │          │ M                │ 1931 │      │
└───────┴──────────┴──────────────────┴──────┴──────┘
*/

regex_replace(pattern, text, replacement)

Replace the first instance of pattern inside text with the given replacement text. Supports the replacment string syntax. Based on Regex.replace()

select regex_replace(
  '[^01]+',
  '1078910',
  ''
);
-- '1010'

select regex_replace(
  '(?P<last>[^,\s]+),\s+(?P<first>\S+)',
  'Springsteen, Bruce',
  '$first $last'
);
-- 'Bruce Springsteen'

regex_replace_all(pattern, text, replacement)

Replace all instance of pattern inside text with the given replacement text. Supports the replacment string syntax. Based on Regex.replace_all()

select regex_replace_all(
  'dog',
  'cat dog mouse dog',
  'monkey'
)
-- 'cat monkey mouse monkey'

select * from regex_split(pattern, text)

Split the given text on each instance of the given pattern. Based on Regex.split().

The returned columns:

  • rowid: The 0-based index of the split item.
  • item: The individual split item, as text.

For faster results, wrap the pattern with the regex() function for caching.

select rowid, *
from regex_split(
  regex('[ \\t]+'),
  'a b \t  c\td    e'
);
/*
┌───────┬──────┐
│ rowid │ item │
├───────┼──────┤
│ 0     │ a    │
│ 1     │ b    │
│ 2     │ c    │
│ 3     │ d    │
│ 4     │ e    │
└───────┴──────┘
*/

regexset(pattern1, patern2, ...)

Creates a regexset "object" with the given pattern, using SQLite's pointer passing interface. Required when using regexset_is_match and regexset_matches. Based on RegexSet.

Note that the return value will appear to be NULL because of SQLite pointer passing interface. To debug, use regexset_print() to print the pattern string of a regex object.

select regexset(
  "bar",
  "foo",
  "barfoo"
);
-- NULL, but is still a regexset "object"

select regexset("[abc"); --errors

select regexset_print(regexset('abc', 'xyz')); -- '["abc","xyz"]'

regexset_print()

Prints the patterns of a regexset object created with regexset().

select regexset_print(regexset('abc', 'xyz')); -- '["abc","xyz"]'

regexset_is_match(regexset, text)

Returns 1 if any of the patterns in regexset matches text. Based on RegexSet.is_match().

select regexset_is_match(
  regexset(
    "bar",
    "foo",
    "barfoo"
  ),
  'foobar'
); -- 1

select regexset_is_match(
  regexset(
    "bar",
    "foo",
    "barfoo"
  ),
  'xxx'
); -- 0

select * from regexset_matches(regexset, text)

Returns all the matching patterns inside regexset found inside text. Note that this doesn't return rows for each of the matches themselves, only if there was at least 1 match for each patten. Based on RegexSet.matches().

select
  key,
  pattern
from regexset_matches(
  regexset(
    '\w+',
    '\d+',
    '\pL+',
    'foo',
    'bar',
    'barfoo',
    'foobar'
  ),
  'foobar'
);
/*
┌─────┬─────────┐
│ key │ pattern │
├─────┼─────────┤
│ 0   │ \w+     │
│ 2   │ \pL+    │
│ 3   │ foo     │
│ 4   │ bar     │
│ 6   │ foobar  │
└─────┴─────────┘
*/

regex_version()

Returns the semver version string of the current version of sqlite-regex.

select regex_version();
-- "v0.1.0"

regex_debug()

Returns a debug string of various info about sqlite-regex, including the version string, build date, and commit hash.

select regex_debug();
/*
Version: v0.0.0-alpha.4
Source: 85fd18bea80c42782f35975351ea3760d4396eb6
*/