# Non-Capturing Groups

> There are cases when we want to use groups, but we're not interested in extracting the information, i.e. capturing the matched text inside paranthesis only. An example is **alteration**.

Let's consider an example where we want to find the strings `i love cats` or `i love dogs` in the given text.

In [1]:
import re

In [2]:
txt = """
i love cats
i love dogs
"""

In [3]:
pattern = re.compile("i love (cats|dogs)")

#group 0 = "i love (cats|dogs)"
#group 1 = "(cats|dogs)"

In [4]:
pattern.findall(txt)

['cats', 'dogs']

In [5]:
for match in pattern.finditer(txt):
    print("Complete regex match (default):", match.group(0))
    print("Match captured by 1st group:", match.group(1))

Complete regex match (default): i love cats
Match captured by 1st group: cats
Complete regex match (default): i love dogs
Match captured by 1st group: dogs


As we can see, the group captured part contains only `cats` or `dogs` instead of complete sentences.

Hence, to make a group **non-capturing**, we have to use the syntax `(?:pattern)`.

In [6]:
pattern = re.compile("i love (?:cats|dogs)")

In [7]:
pattern.findall(txt)

['i love cats', 'i love dogs']

In [8]:
for match in pattern.finditer(txt):
    print(match)
    

<re.Match object; span=(1, 12), match='i love cats'>
<re.Match object; span=(13, 24), match='i love dogs'>


In [9]:
for match in pattern.finditer(txt):
    print(match.group(0))
    

i love cats
i love dogs


In [10]:
for match in pattern.finditer(txt):
    print(match.group(1))
    

IndexError: no such group

> After using the new syntax, we have the same functionality as before, but now we're saving resources and the regex is easier to maintain. Note that the group cannot be referenced.

In [4]:
# Just practice from made easy

In [1]:
#Here is one such example:
    

import re
string = '1234 56789'

re.findall('(\d)+', string)

['4', '9']

In [2]:
re.search('(\d)+', string).groups()  #using search

('4',)

In [None]:
#  non-capture groups syntax


?:      
    
The symbol above represents non-capture groups and looks slightly
similar to the syntax for naming groups

?P  #don't confuse the two please. 


#comparison


In [3]:
re.findall('(?:\d)+', string) #with non capture group

['1234', '56789']

In [5]:
#So the group is part of the pattern, but we don't output the groups'
#results
re.findall('\d+', string)  # when RE has no groups in findall, 
                            #we output entire match

['1234', '56789']

In [7]:
string  = '123123 = Alex, 123123123 = Danny, 123123123123 = Mike, 456456 = rick, 121212 = John, 132132 = Luis,' 
#We want to pull out all names whose ID has 123 within in
re.findall('(?:123)+ = (\w+),', string)   #three instances

['Alex', 'Danny', 'Mike']

In [9]:
#Another example
string = '1*1*1*1*22222  1*1*3333  2*1*2*1*222  1*2*2*2*333 3*3*3*444'
re.findall( r'(?:1\*){2,}\d+', string)

['1*1*1*1*22222', '1*1*3333']

In [10]:
#Now, non-captured groups doesn't just affect the findall method
#it also affects the search and match methods

# BE CAREFUL WITH SYNTAX

In [None]:
?:   correct!
:?   incorrect!

In [12]:
string = '1234 56789'

match =re.search('(?:\d)+', string)#correct syntax
print(match.groups())

()


In [13]:
string = '1234 56789'

match =re.search('(:?\d)+', string)# :? incorrect syntax!!!! 
print(match.groups())

('4',)


In [None]:
Summary: 

#when we capture groups we are either storing the value or 
#outputting them


# Backreferences - Using captured groups inside other operations

In [14]:
# backreferencing is making a refererence to the captured group
# within the same regular expression
# syntax and example


In [15]:
re.search(r'(\w+) \1','Merry Merry Christmas')  #Looking for repeated words

<re.Match object; span=(0, 11), match='Merry Merry'>

In [16]:
re.search(r'(\w+) \1','Merry Merry Christmas').groups()

('Merry',)

In [None]:
\1 is just referencing the first group 
within the regular expression 

r'(\w+) \1'

In [17]:
re.findall(r'(\w+) \1','Happy Happy Holidays. Merry Christmas Christmas')   #Want to look for repeated words

['Happy', 'Christmas']

In [18]:
re.findall(r'(\w+) \1','Merry Merry Christmas Christmas Merry Merry Christmas')

['Merry', 'Christmas', 'Merry']