# Regular Expressions: Regexes in Python

### Basics

In [3]:
# Without regex
# suppose we want to find some pattren in a string , then we can use

# Method-1
s = "foo123bar"
print('123' in s)

# Method-2
print(s.find('123'))

# Method-3
print(s.index('123'))

True
3
3


But there are some cases where this method won't work - 
- For example, rather than searching for a fixed substring like '123', suppose you wanted to determine whether a string contains any three consecutive decimal digit characters, as in the strings 'foo123bar', 'foo456bar', '234baz', and 'qux678'.

Here regular expression comes into picture

Regular expressions in python can be used by `re` module like - 

`import re`

`re.search(<regex>, <string>)`

re.search(<regex>, <string>) scans <string> looking for the first location where the pattern <regex> matches. If a match is found, then re.search() returns a match object. Otherwise, it returns None.

In [5]:
s = "foo123bar"

import re
re.search('123',s) # returns None if no match found

<re.Match object; span=(3, 6), match='123'>

### Python Regex Metacharacters
When `<regex>` contains **special** characters called **metacharacters**

In [7]:
# Let's find the strings that contains three consicutive decimal numbers
s1 = 'foo456bar'
s2 = '234baz'
s3 = 'qux678'
s4 = '12foo34'

pattern = '[0-9][0-9][0-9]'
print(re.search(pattern,s1))
print(re.search(pattern,s2))
print(re.search(pattern,s3))
print(re.search(pattern,s4))

<re.Match object; span=(3, 6), match='456'>
<re.Match object; span=(0, 3), match='234'>
<re.Match object; span=(3, 6), match='678'>
None


**Character(s)** -------|------- **Meaning**
```

    \.  -------- Matches any single character except newline

    ^   -------- ∙Anchors a match at the start of a string
                 ∙ Complements a character class

    $   -------- Anchors a match at the end of a string

    \*  -------- Matches zero or more repetitions

    \+  -------- Matches one or more repetitions

    ?   -------- ∙ Matches zero or one repetition
                 ∙ Specifies the non-greedy versions of *, +, and ?
                 ∙ Introduces a lookahead or lookbehind assertion
                 ∙ Creates a named group

    {}  -------- Matches an explicitly specified number of repetitions

    \   -------- ∙ Escapes a metacharacter of its special meaning
                ∙ Introduces a special character class
                ∙ Introduces a grouping backreference

    [] 	-------- Specifies a character class

    | 	-------- Designates alternation

    () 	-------- Creates a group
    \:            
    \#           
    \=           

    !   ------- Designate a specialized group

    < > ------- Creates a named group
```

### [ ] : Specifies a specific set of characters to match.
`b[ea]d` - will match with 'e' or 'a' between 'b' and 'd' so, **bed**, **bad** can be matched

In [8]:
s1 = "bed"
s2 = "bad"
print(re.search('b[ae]d',s1))
print(re.search('b[ae]d',s2))

<re.Match object; span=(0, 3), match='bed'>
<re.Match object; span=(0, 3), match='bad'>


In [14]:
# It also takes range like , a-z, 0-9 etc.
re.search('[a-z]','F11-key')  # Returns the first matching occurance

<re.Match object; span=(4, 5), match='k'>

In [15]:
# To match sequence of characters(pattern) 
re.search('[0-9][0-9]', 'foo123bar')

<re.Match object; span=(3, 5), match='12'>

In [18]:
# Applying Compliment
re.search('[^0-8]', '12349foo')  #It will match with anything other than numbers from 0-8

<re.Match object; span=(4, 5), match='9'>

In [21]:
# If a ^ character appears in a character class but isn’t the first character, then it has no special meaning and matches a literal '^' character:
re.search('[#:^]', 'foo^bar:baz#qux')

<re.Match object; span=(3, 4), match='^'>

In [23]:
re.search('[ab\-c]', '123-456')  # Can use \ for escape characters

<re.Match object; span=(3, 4), match='-'>

In [24]:
# Other regex metacharacters lose their special meaning inside a character class:
re.search('[)*+|]', '123*456')

<re.Match object; span=(3, 4), match='*'>

### dot (.) : Specifies a wildcard.
The . metacharacter matches any single character except a newline:

In [31]:
print(re.search('foo.bar', 'fooxbar'))  # Matches only scgle character
print(re.search('foo.bar', 'fooxwebar'))
print(re.search('foo.bar', 'foobar'))
print(re.search('foo.bar', 'foo\nbar'))

<re.Match object; span=(0, 7), match='fooxbar'>
None
None
None


### \w ,\W  : Match based on whether a character is a word character.
- `\w` matches any alphanumeric word character. Word characters are uppercase and lowercase letters, digits, and the underscore (_) character, so `\w` is essentially shorthand for `[a-zA-Z0-9_]`
- `\W` is the opposite. It matches any non-word character and is equivalent to `[^a-zA-Z0-9_]`

In [36]:
# \w
print(re.search('\w', '#(.a$@&'))
print(re.search('[a-zA-Z0-9_]', '#(.a$@&'))

<re.Match object; span=(3, 4), match='a'>
<re.Match object; span=(3, 4), match='a'>


In [38]:
# \W
print(re.search('\W', 'a_1*3Qb'))
print(re.search('[^a-zA-Z0-9_]', 'a_1*3Qb'))

<re.Match object; span=(3, 4), match='*'>
<re.Match object; span=(3, 4), match='*'>
