<h1>Learning regex</h1>

In [1]:
import re

In [9]:
s = 'foo123bar'

re.search('123', s) # -> returns a match object

<_sre.SRE_Match object; span=(3, 6), match='123'>

<h2>Python regex metacharacters</h2>

The <b>[ ]</b> character class: Matches any single character that is in the class

In [12]:
re.search('[0-9][0-9][0-9]', s)

<_sre.SRE_Match object; span=(3, 6), match='123'>

In [15]:
re.search('ba[artz]', 'foobarqux')

<_sre.SRE_Match object; span=(3, 6), match='bar'>

In [17]:
re.search('ba[artz]', 'foobazqux')

<_sre.SRE_Match object; span=(3, 6), match='baz'>

In [18]:
# matching hexadecimal characters
re.search('[0-9a-fA-F]', '---a0---')

<_sre.SRE_Match object; span=(3, 4), match='a'>

<br/>

The dot <b>(.)</b> metacharacter: Matches any character except a newline

In [14]:
re.search('[a-z].[0-9]', s)

<_sre.SRE_Match object; span=(1, 4), match='oo1'>

In [33]:
re.search('foo.bar', 'fooxbar')

<_sre.SRE_Match object; span=(0, 7), match='fooxbar'>

In [37]:
re.search('foo.bar', 'foo\nbar')

<br/>

The complement <b>(^)</b> character: Matches a character that is not in a set. If the ^ character appears in a character class but isn't the first character, then it has no special meaning and matches a literal '^' character.

In [21]:
re.search('[^0-9]', '12345foo')

<_sre.SRE_Match object; span=(5, 6), match='f'>

<br/>

<h2>Regex metacharacters in the right position</h2>

In [22]:
re.search('[-abc]', '123-456')

<_sre.SRE_Match object; span=(3, 4), match='-'>

In [24]:
re.search('[abc-]', '123-456')

<_sre.SRE_Match object; span=(3, 4), match='-'>

In [28]:
re.search('[ab\-c]', '123-456')

<_sre.SRE_Match object; span=(3, 4), match='-'>

In [29]:
re.search('[]]', 'foo[1]')

<_sre.SRE_Match object; span=(5, 6), match=']'>

In [31]:
re.search('[ab\]cd]', 'foo[1]')

<_sre.SRE_Match object; span=(5, 6), match=']'>

<h3>Other metacharacters lose their special meaning inside a character class.

In [32]:
re.search('[)*+|]', '123*456')

<_sre.SRE_Match object; span=(3, 4), match='*'>

<br/>

The <b>\w</b> metacharacter: Matches any <b>alphanumeric</b> word character.

The <b>\W</b> metacharacter: Matches any <b>non alphanumeric</b> word character.

In [40]:
# word characters [a-zA-Z0-9_]
re.search('\w', '#(.a$@&)') # word characters are uppercase and lowercase letters, digits and underscores

<_sre.SRE_Match object; span=(3, 4), match='a'>

In [39]:
re.search('\W', '#(.a$@&)')

<_sre.SRE_Match object; span=(0, 1), match='#'>

<br/>

The <b>\d</b> metacharacter: Matches any <b>decimal</b> digit.

The <b>\D</b> metacharacter: Matches any <b>non decimal</b> digit.

In [43]:
# [0-9]
re.search('\d', 'abc4def')

<_sre.SRE_Match object; span=(3, 4), match='4'>

In [44]:
# [^0-9]
re.search('\D', '234Q678')

<_sre.SRE_Match object; span=(3, 4), match='Q'>

<br/>

The <b>\s</b> metacharacter: Matches any whitespace.

The <b>\S</b> metacharacter: Matches any non-whitespace character.

<p style="background: #26abff; padding: 0.5rem 1rem; color: #fafafa; border-radius: 5px; font-weight: bold;">"\n" are considered whitespaces when using <b>\s</b></p>

In [45]:
re.search('\s', 'rashid mohammed')

<_sre.SRE_Match object; span=(6, 7), match=' '>

In [125]:
re.search('\S', 'rashid moh')

<_sre.SRE_Match object; span=(0, 1), match='r'>

In [47]:
re.search('[\w\d\s]', '---3---')

<_sre.SRE_Match object; span=(3, 4), match='3'>

<br/>

The <b>backslash( \ )</b> can be a bit tricky sometimes.

In [48]:
s = r'foo\bar'

In [None]:
re.search('\\', s)

In [51]:
# The right way to go about it
re.search('\\\\', s)

<_sre.SRE_Match object; span=(3, 4), match='\\'>

In [52]:
# a much cleaner way is to use a raw string
re.search(r'\\', s)

<_sre.SRE_Match object; span=(3, 4), match='\\'>

<br/>

<h2>Anchors: ^ or \A</h2>

There are zero-width matches. It dictates a particular location in the search string where a match must occur.

In [53]:
re.search('^foo', 'foobar') # - foo should be at the beginning of the string

<_sre.SRE_Match object; span=(0, 3), match='foo'>

In [197]:
print(re.search(r'^foo', 'barfoo'))

None


In [55]:
re.search('\Afoo', 'foobar')

<_sre.SRE_Match object; span=(0, 3), match='foo'>

In [56]:
re.search('\Afoo', 'barfoo')

<br>

<h2>Anchors: $ or \Z</h2>

When the regex parser encounters <b>\$ or \Z </b>, the parser's current position must be at the end of the end of the search string for it to match. Whatever precedes <b>$ or \Z </b> must constitute the end of the search string.

In [249]:
re.search('bar$', 'foobar')

<_sre.SRE_Match object; span=(3, 6), match='bar'>

In [59]:
print(re.search('bar$', 'barfoo'))

None


In [60]:
re.search('bar\Z', 'foobar')

<_sre.SRE_Match object; span=(3, 6), match='bar'>

In [71]:
print(re.search('bar\Z', 'barfoo'))

None


As a special case, $ (but not \Z) also matches just before a single newline at the end of the search string

In [62]:
re.search('bar$', 'foobar\n')

<_sre.SRE_Match object; span=(3, 6), match='bar'>

<br/>

<h2>Anchors: \b</h2>

\b asserts that the regex parser's current position must be at the beginning or end of a word. A word consists of a sequence of <b>alphanumeric</b> characters or underscores ([a-zA-Z0-9_]), as the same as \w character class.

In [79]:
re.search(r'\bfoo', 'foobar')

<_sre.SRE_Match object; span=(0, 3), match='foo'>

In [84]:
re.search(r'\bbar', 'foo bar')

<_sre.SRE_Match object; span=(4, 7), match='bar'>

In [107]:
re.search(r'\bhome', 'rashidwent*home')

<_sre.SRE_Match object; span=(11, 15), match='home'>

In [94]:
re.search(r'\bhome\b', 'rashid went*home')

<_sre.SRE_Match object; span=(12, 16), match='home'>


<br/>

<h2>Anchors: \B</h2>

Anchors match a location that isn't a word boundary.

In [92]:
print(re.search(r'\Bfoo\B', 'foo'))

None


In [93]:
print(re.search(r'\Bfoo\B', '.foo.'))

None


In [100]:
print(re.search(r'\Bfoo\B', 'barfoobaz'))

<_sre.SRE_Match object; span=(3, 6), match='foo'>


In [102]:
print(re.search(r'\Bfoo\B', 'afooa'))

<_sre.SRE_Match object; span=(1, 4), match='foo'>


<br/>

<h2>Quantifiers</h2>

A quantifier metacharacter immediately follows a proportion of <code>regex</code> and indicates how many times that portion must occur for the match to succeed

<b>( * )</b> matches zero or more characters

In [108]:
re.search('foo-*bar', 'foobar')

<_sre.SRE_Match object; span=(0, 6), match='foobar'>

In [109]:
re.search('foo-*bar', 'foo-bar')

<_sre.SRE_Match object; span=(0, 7), match='foo-bar'>

In [110]:
# you'll probably encounter the regex .* in a python program at some point.
re.search('foo.*bar', '# foo $qux@grault % bar #')

<_sre.SRE_Match object; span=(2, 23), match='foo $qux@grault % bar'>

In [115]:
re.match('foo[1-9]*bar', 'foobar')

<_sre.SRE_Match object; span=(0, 6), match='foobar'>

In [116]:
re.match('foo[1-9]*bar', 'foo42bar')

<_sre.SRE_Match object; span=(0, 8), match='foo42bar'>

In [117]:
print(re.search('foo[1-9]+bar', 'foobar'))

None


In [118]:
re.match('foo[1-9]?bar', 'foobar')

<_sre.SRE_Match object; span=(0, 6), match='foobar'>

In [119]:
print(re.match('foo[1-9]?bar', 'foo43bar'))

None


<br/>

<b>( + ) matches one or more characters</b>

In [111]:
print(re.search('foo-+bar', 'foobar'))

None


In [112]:
re.search('foo-+bar', 'foo-bar')

<_sre.SRE_Match object; span=(0, 7), match='foo-bar'>

<br/>

**( ? )** matches 0 or 1 repetitions of the preceeding regex.

In [113]:
re.search('foo-?bar', 'foobar')

<_sre.SRE_Match object; span=(0, 6), match='foobar'>

In [114]:
re.search('foo-?bar', 'foo--bar')

<br/>

<b>Non greedy versions of quantifiers</b>

**(*?)**, **+?**, **??**

In [126]:
s = '%<foo> <bar> <baz>%'
re.search('<.*>', s)

<_sre.SRE_Match object; span=(1, 18), match='<foo> <bar> <baz>'>

In [127]:
# to get the first >, use the non-greedy sequence *?
re.search('<.*?>', s)

<_sre.SRE_Match object; span=(1, 6), match='<foo>'>

In [128]:
re.search('<.+?>', s)

<_sre.SRE_Match object; span=(1, 6), match='<foo>'>

In [131]:
re.search('ba?', 'baaa')

<_sre.SRE_Match object; span=(0, 2), match='ba'>

In [135]:
re.search('ba??', 'baaaa') # the greedy version of ? matches 0 occurrences

<_sre.SRE_Match object; span=(0, 1), match='b'>

<br/>

**{m}**

Matches exactly m repetitions of the preceding regex. This is similar to * or + but it specifies exactly how many times the preceding regex must occur for a match to succeed.

In [138]:
print(re.search('x-{3}x', 'x--x')) # fewer than 3 dashes

None


In [137]:
print(re.search('x-{3}x', 'x---x'))

<_sre.SRE_Match object; span=(0, 5), match='x---x'>


<br/>

**{m , n}** 

Matches any number of repetitions of the preceding regex from m to n, inclusive.

In [146]:
for i in range(1, 6):
    s = f"x{'-' * i}x"
    print(f'{i} {s:10}', re.search('x-{2,4}x', s))

1 x-x        None
2 x--x       <_sre.SRE_Match object; span=(0, 4), match='x--x'>
3 x---x      <_sre.SRE_Match object; span=(0, 5), match='x---x'>
4 x----x     <_sre.SRE_Match object; span=(0, 6), match='x----x'>
5 x-----x    None


If you omit all of m, n and the comma, the the curly braces no longer function as metacharacters. **{}** matches just the literal string **'{ }'**.

Non greedy **{m, n}?** matches few characters as possible.

In [150]:
re.search('a{3,5}', 'aaaaaa')

<_sre.SRE_Match object; span=(0, 5), match='aaaaa'>

In [152]:
re.search('a{3,5}?', 'aaaaa')

<_sre.SRE_Match object; span=(0, 3), match='aaa'>

<br/>

<h2>Grouping Constructs and Backreferences</h2>

In [153]:
re.search('(bar)', 'foo bar baz')

<_sre.SRE_Match object; span=(4, 7), match='bar'>

In [154]:
re.search('bar', 'foobarbaz')

<_sre.SRE_Match object; span=(3, 6), match='bar'>

In [156]:
re.search('(bar)', 'barfoobaz')

<_sre.SRE_Match object; span=(0, 3), match='bar'>

<br/>

<h3>Treating a group as a unit</h3>

A quantifier metacharacter that follows a group operates on the entire subexpression specified in the group as a single unit.

In [157]:
re.search('(bar)+', 'foo bar boaz')

<_sre.SRE_Match object; span=(4, 7), match='bar'>

In [159]:
re.search('(bar)*', 'foo boaz')

<_sre.SRE_Match object; span=(0, 0), match=''>

In [160]:
re.search('(bar)+', 'foo barbarbar bar baz')

<_sre.SRE_Match object; span=(4, 13), match='barbarbar'>

In [161]:
# take a look at a more complicated example
re.search('(ba[rz]){2,4}(qux)?', 'bazbarbazqux')

<_sre.SRE_Match object; span=(0, 12), match='bazbarbazqux'>

In [162]:
re.search('(ba[rz]){2,4}(qux)?', 'barbar')

<_sre.SRE_Match object; span=(0, 6), match='barbar'>

In [163]:
re.search('(foo(bar)?)+(\d\d\d)?', 'foofoobar')

<_sre.SRE_Match object; span=(0, 9), match='foofoobar'>

In [164]:
re.search('(foo(bar)?)+(\d\d\d)?', 'foofoobar123')

<_sre.SRE_Match object; span=(0, 12), match='foofoobar123'>

<br/>

<h3>Capturing Groups</h3>

**m.groups ( )**

Returns a tuple containing all the captured groups from a regex match.

In [503]:
m = re.search('(\w+),(\w+),(\w+)', 'foo,quux,baz')
m

<_sre.SRE_Match object; span=(0, 12), match='foo,quux,baz'>

In [506]:
m.groups()

('foo', 'quux', 'baz')

<br/>

**m.group (n)**

With one argument, returns a single captured match.

In [167]:
m.group(1)

'foo'

In [168]:
m.group(2)

'quux'

In [169]:
m.group(3)

'baz'

With multiple arguments, returns a tuple containing the specified captured matches in the given order.

In [170]:
m.groups()

('foo', 'quux', 'baz')

In [507]:
m.group(2, 3)

('quux', 'baz')

<br/>

<h3>Backreferences</h3>

In [508]:
regex = r'(?P<group1>\w+),(?P<group2>\w+),(?P=group2)'
m = re.search(regex, 'foo,bar,bar')
m

<_sre.SRE_Match object; span=(0, 11), match='foo,bar,bar'>

<br/>

<h3>Non capturing groups: (?:<code>regex</code>)</h3>

In [236]:
m = re.search('(\w+),(?:\w+),(\w+)', 'foo,quux,baz')
m.groups()

('foo', 'baz')

<br/>

<h1>Conditional match</h1>

```python

(?(<n>)<yes-regex>|<no-regex>)

(?(<name>)<yes-regex>|<no-regex>)</p>
    
```

In [237]:
regex = r'^(###)?foo(?(1)bar|baz)'

In [238]:
re.search(regex, '###foobar')

<_sre.SRE_Match object; span=(0, 9), match='###foobar'>

In [239]:
re.search(regex, '###foobaz')

In [241]:
re.search(regex, 'foobaz')

<_sre.SRE_Match object; span=(0, 6), match='foobaz'>

<br/>

In [245]:
regex = r'^(?P<ch>\W)?foo(?(ch)(?P=ch)|)$'

In [246]:
re.search(regex, 'foo')

<_sre.SRE_Match object; span=(0, 3), match='foo'>

In [252]:
re.search(regex, '#foo#')

<_sre.SRE_Match object; span=(0, 5), match='#foo#'>

In [253]:
re.search(regex, "#foo@")

<br/>


<h2>Lookahead and Lookbehind Assertions</h2>

Lookahead and lookbehind assertions determine the success or failure of a regex match in Python based on waht is just behind (to the left) or ahead (to the right) of the parser's current position in the search string.

Even though they contain parenthesis and perform grouping, the don't capture what they match.

```python
(?=<lookahead_regex>)
```

In [254]:
re.search('foo(?=[a-z])', 'foobar')

<_sre.SRE_Match object; span=(0, 3), match='foo'>

In [255]:
re.search('foo(?=[a-z])', 'foo123')

What is unique about a lookahead is that the portion of the search string that matches <lookahead_regex> isn't consumed, and it isn't part of the return match object.

In [256]:
re.search('foo(?=[a-z])', 'foobar')

<_sre.SRE_Match object; span=(0, 3), match='foo'>

Compare that to a similar example that uses grouping parenthesis without a lookahead.

In [257]:
re.search('foo[a-z]', 'foobar')

<_sre.SRE_Match object; span=(0, 4), match='foob'>

This time, the regex consumes the 'b', and it becomes part of the eventual match.

In [305]:
m = re.search('foo(?=[a-z])([a-z])(?P<ch>.)', 'foobes')
m.groups()

('b', 'e')

In [283]:
m = re.search('foo(?=[a-z])(?P<ch>.)', 'foobar')
m.groups()

('b',)

<br/>

In [501]:
samples = [[1,1,1,1,1], [0,1,1,1,0], [0,1,1,1,1],[0,1,1,1,1]]

In [472]:
def consecutive_ones(array):
    dict_ = {}
    indexes = []
    consecutive = 0
    
    for i in range(len(array)):
        if array[i] == 1:
            indexes.append(i)
            consecutive += 1
            
        if array[i] == 0:
            dict_[consecutive] = indexes
            consecutive = 0
            indexes = []
            
    dict_[consecutive] = indexes
            
    return sorted(dict_.items(), key=lambda k:k[0], reverse=True)[0]
            
            
        

In [473]:
def find_area(two_d):
    size = 0
    consecutives = [consecutive_ones(array) for array in two_d]
    shortest_length, indices = sorted(consecutives, key=lambda k:k[0])[0]
    
    for k in consecutives:
        if size == shortest_length:
            return size
        
        if set(indices).issubset(set(k[1])):
            size += 1


In [474]:
find_area(samples)

3