Skip to content

Arseni1919/Learning_Regexes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Regular Expressions: Regexes in Python

For now, you’ll focus predominantly on one function, re.search().

re.search(<regex>, <string>)

re.search(<regex>, <string>) scans <string> looking for the first location where the pattern <regex> matches. If a match is found, then re.search() returns a match object. Otherwise, it returns None.

>>> s = 'foo123bar'

>>> # One last reminder to import!
>>> import re

>>> re.search('123', s)
<_sre.SRE_Match object; span=(3, 6), match='123'>

Python Regex Metacharacters

In a regex, a set of characters specified in square brackets ([]) makes up a character class. This metacharacter sequence matches any single character that is in the class, as demonstrated in the following example:

>>> s = 'foo123bar'
>>> re.search('[0-9][0-9][0-9]', s)
<_sre.SRE_Match object; span=(3, 6), match='123'>
>>> re.search('[0-9][0-9][0-9]', 'foo456bar')
<_sre.SRE_Match object; span=(3, 6), match='456'>

>>> re.search('[0-9][0-9][0-9]', '234baz')
<_sre.SRE_Match object; span=(0, 3), match='234'>

>>> re.search('[0-9][0-9][0-9]', 'qux678')
<_sre.SRE_Match object; span=(3, 6), match='678'>

>>> print(re.search('[0-9][0-9][0-9]', '12foo34'))
None

The dot (.) metacharacter matches any character except a newline, so it functions like a wildcard:

>>> s = 'foo123bar'
>>> re.search('1.3', s)
<_sre.SRE_Match object; span=(3, 6), match='123'>

>>> s = 'foo13bar'
>>> print(re.search('1.3', s))
None

In the first example, the regex 1.3 matches '123' because the '1' and '3' match literally, and the . matches the '2'. Here, you’re essentially asking, “Does s contain a '1', then any character (except a newline), then a '3'?” The answer is yes for 'foo123bar' but no for 'foo13bar'.

Metacharacters That Match a Single Character

[] - Specifies a specific set of characters to match.

>>> re.search('ba[artz]', 'foobarqux')
<_sre.SRE_Match object; span=(3, 6), match='bar'>
>>> re.search('ba[artz]', 'foobazqux')
<_sre.SRE_Match object; span=(3, 6), match='baz'>

A character class can also contain a range of characters separated by a hyphen (-), in which case it matches any single character within the range. For example, [a-z] matches any lowercase alphabetic character between 'a' and 'z', inclusive:

>>> re.search('[a-z]', 'FOObar')
<_sre.SRE_Match object; span=(3, 4), match='b'>

[0-9] matches any digit character:

>>> re.search('[0-9][0-9]', 'foo123bar')
<_sre.SRE_Match object; span=(3, 5), match='12'>

[0-9a-fA-F] matches any hexadecimal digit character:

>>> re.search('[0-9a-fA-f]', '--- a0 ---')
<_sre.SRE_Match object; span=(4, 5), match='a'>

You can complement a character class by specifying ^ as the first character, in which case it matches any character that isn’t in the set. In the following example, [^0-9] matches any character that isn’t a digit:

>>> re.search('[^0-9]', '12345foo')
<_sre.SRE_Match object; span=(5, 6), match='f'>

If a ^ character appears in a character class but isn’t the first character, then it has no special meaning and matches a literal '^' character:

>>> re.search('[#:^]', 'foo^bar:baz#qux')
<_sre.SRE_Match object; span=(3, 4), match='^'>

What if you want the character class to include a literal hyphen character? You can place it as the first or last character or escape it with a backslash ():

>>> re.search('[-abc]', '123-456')
<_sre.SRE_Match object; span=(3, 4), match='-'>
>>> re.search('[abc-]', '123-456')
<_sre.SRE_Match object; span=(3, 4), match='-'>
>>> re.search('[ab\-c]', '123-456')
<_sre.SRE_Match object; span=(3, 4), match='-'>

If you want to include a literal ']' in a character class, then you can place it as the first character or escape it with backslash:

>>> re.search('[]]', 'foo[1]')
<_sre.SRE_Match object; span=(5, 6), match=']'>
>>> re.search('[ab\]cd]', 'foo[1]')
<_sre.SRE_Match object; span=(5, 6), match=']'>

Other regex metacharacters lose their special meaning inside a character class:

>>> re.search('[)*+|]', '123*456')
<_sre.SRE_Match object; span=(3, 4), match='*'>
>>> re.search('[)*+|]', '123+456')
<_sre.SRE_Match object; span=(3, 4), match='+'>

dot (.) - Specifies a wildcard.

>>> re.search('foo.bar', 'fooxbar')
<_sre.SRE_Match object; span=(0, 7), match='fooxbar'>

>>> print(re.search('foo.bar', 'foobar'))
None
>>> print(re.search('foo.bar', 'foo\nbar'))
None

\w \W - Match based on whether a character is a word character.

\w matches any alphanumeric word character. Word characters are uppercase and lowercase letters, digits, and the underscore (_) character, so \w is essentially shorthand for [a-zA-Z0-9_]:

>>> re.search('\w', '#(.a$@&')
<_sre.SRE_Match object; span=(3, 4), match='a'>
>>> re.search('[a-zA-Z0-9_]', '#(.a$@&')
<_sre.SRE_Match object; span=(3, 4), match='a'>

\W is the opposite. It matches any non-word character and is equivalent to [^a-zA-Z0-9_]:

>>> re.search('\W', 'a_1*3Qb')
<_sre.SRE_Match object; span=(3, 4), match='*'>
>>> re.search('[^a-zA-Z0-9_]', 'a_1*3Qb')
<_sre.SRE_Match object; span=(3, 4), match='*'>

\d \D - Match based on whether a character is a decimal digit.

\d matches any decimal digit character. \D is the opposite. It matches any character that isn’t a decimal digit:

>>> re.search('\d', 'abc4def')
<_sre.SRE_Match object; span=(3, 4), match='4'>

>>> re.search('\D', '234Q678')
<_sre.SRE_Match object; span=(3, 4), match='Q'>

\d is essentially equivalent to [0-9], and \D is equivalent to [^0-9].

Credits

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages