# REGEX
## (Regular Expressions)
![](img/regex_cover.png)

Los datos no siempre están organizados, formateados ni estructurados de forma homogénea.

Una parte importante del trabajo de un _Data Scientist_ consiste en limpiar los datos **(Data Cleaning)**

Para ello, existen técnicas como **Regex**

Las expresiones regulares están conformadas por secuencias de caracteres que nos permiten encontrar patrones de búsqueda.

# [¡VAMOS A ELLO!](https://regex101.com/)

Puedes ayudarte con los siguientes enlaces [RegEx101](https://regex101.com/) y [RegEx Python guide](https://docs.python.org/3/howto/regex.html)

In [102]:
import re

text_to_search = '''
abcdefghijklmnopqurtuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
Ha HaHa ?Ha
MetaCharacters (Need to be escaped):
. ^ $ * + ? { } [ ] \ | ( )
hegoigaritaonandia*com
hegoigaritaonandia.com
hegoigaritaonandiacom
hegoigaritaonandia hola
321--555-4321
123.555.1234
123*555*1234
800-555-1234
900-555-1234
9005551234
900055501234
Mr. Scha2fer
Mr Smith
Ms Davis
Mrs. Robinson
Mr. T
Mr. ()

cat
mat
pat
bat 
at
'''


![image.png](attachment:image.png)

## Utilizamos las raw_strings para obtener la literalidad del texto:

### `print(r'\tTabulador')`

In [103]:
print('Tabulador sin raw string: \tTabulador')
print(r'Tabulador con raw string: \tTabulador')

Tabulador sin raw string: 	Tabulador
Tabulador con raw string: \tTabulador


### Buscamos el patrón `abc` en el texto

Para ello utilizamos:
- `re.compile()`: para introducir el patrón que queremos buscar
- La función `finditer()`: para buscar el patrón en nuestro texto
- Iteramos sobre la búsqueda

In [104]:
mi_string = 'Hola mundo'
mi_string[0]

'H'

In [105]:
pattern = re.compile(r'abc')

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")



<re.Match object; span=(1, 4), match='abc'>
Match found: abc at positions 1-4


### Hay que tener en cuenta que cuando específicamos el pattern, se busca la literalidad de ese patrón.
Por ejemplo, si queremos buscar las letras en distinto orden...

In [106]:
new_pattern = re.compile(r'cba')
new_matches = new_pattern.finditer(text_to_search)

for match in new_matches:
    print(match) # no se muestra nada por pantalla 
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


## Metacaracteres
Son aquellos caracteres que no son alfanuméricos:
- Signos de puntuación, exclamación y admiración

Si queremos obtenerlos, tenemos que "escaparlos"

In [107]:
# Como veis, aquí se muestran prácticamente todos los caracteres.
pattern = re.compile('.')
matches = pattern.finditer(text_to_search)

for match in matches:
    print(match) 

<re.Match object; span=(1, 2), match='a'>
<re.Match object; span=(2, 3), match='b'>
<re.Match object; span=(3, 4), match='c'>
<re.Match object; span=(4, 5), match='d'>
<re.Match object; span=(5, 6), match='e'>
<re.Match object; span=(6, 7), match='f'>
<re.Match object; span=(7, 8), match='g'>
<re.Match object; span=(8, 9), match='h'>
<re.Match object; span=(9, 10), match='i'>
<re.Match object; span=(10, 11), match='j'>
<re.Match object; span=(11, 12), match='k'>
<re.Match object; span=(12, 13), match='l'>
<re.Match object; span=(13, 14), match='m'>
<re.Match object; span=(14, 15), match='n'>
<re.Match object; span=(15, 16), match='o'>
<re.Match object; span=(16, 17), match='p'>
<re.Match object; span=(17, 18), match='q'>
<re.Match object; span=(18, 19), match='u'>
<re.Match object; span=(19, 20), match='r'>
<re.Match object; span=(20, 21), match='t'>
<re.Match object; span=(21, 22), match='u'>
<re.Match object; span=(22, 23), match='v'>
<re.Match object; span=(23, 24), match='w'>
<re.M

#### Para escaparlos, tienen que ir precedidos de la barra invertida(`\`)

In [108]:
pattern = re.compile('\.')
matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(115, 116), match='.'>
Match found: . at positions 115-116
<re.Match object; span=(184, 185), match='.'>
Match found: . at positions 184-185
<re.Match object; span=(252, 253), match='.'>
Match found: . at positions 252-253
<re.Match object; span=(256, 257), match='.'>
Match found: . at positions 256-257
<re.Match object; span=(327, 328), match='.'>
Match found: . at positions 327-328
<re.Match object; span=(359, 360), match='.'>
Match found: . at positions 359-360
<re.Match object; span=(372, 373), match='.'>
Match found: . at positions 372-373
<re.Match object; span=(378, 379), match='.'>
Match found: . at positions 378-379


Para buscar una página web:

In [109]:
pattern = re.compile(r'hegoigaritaonandia\.com')

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(166, 188), match='hegoigaritaonandia.com'>
Match found: hegoigaritaonandia.com at positions 166-188


Lo realmente interesante de regex no es encontrar simplemente una página web o una frase concreta, sino que nos ayuda a encontrar una serie de patrones en los textos.

En este documento podemos ver las principales expresiones regulares para encontrar texto: `snippets.txt`

In [110]:
pattern = re.compile(r'\S') # matches any non-whitespace character

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(1, 2), match='a'>
Match found: a at positions 1-2
<re.Match object; span=(2, 3), match='b'>
Match found: b at positions 2-3
<re.Match object; span=(3, 4), match='c'>
Match found: c at positions 3-4
<re.Match object; span=(4, 5), match='d'>
Match found: d at positions 4-5
<re.Match object; span=(5, 6), match='e'>
Match found: e at positions 5-6
<re.Match object; span=(6, 7), match='f'>
Match found: f at positions 6-7
<re.Match object; span=(7, 8), match='g'>
Match found: g at positions 7-8
<re.Match object; span=(8, 9), match='h'>
Match found: h at positions 8-9
<re.Match object; span=(9, 10), match='i'>
Match found: i at positions 9-10
<re.Match object; span=(10, 11), match='j'>
Match found: j at positions 10-11
<re.Match object; span=(11, 12), match='k'>
Match found: k at positions 11-12
<re.Match object; span=(12, 13), match='l'>
Match found: l at positions 12-13
<re.Match object; span=(13, 14), match='m'>
Match found: m at positions 13-14
<re.Match object; sp

## Anclas

Las anclas no buscan caracteres en concreto, pero delimitan nuestra búsqueda.

Word Boundaries `\b`: está compuesto por los espacios, tabuladores, nuevas líneas y caracteres no alfanuméricos.

In [111]:
pattern = re.compile(r'Ha\b')
#\b assert position at a word boundary

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(66, 68), match='Ha'>
Match found: Ha at positions 66-68
<re.Match object; span=(71, 73), match='Ha'>
Match found: Ha at positions 71-73
<re.Match object; span=(75, 77), match='Ha'>
Match found: Ha at positions 75-77


No word boundaries `\B`: lo contrario

Muestra el último Ha, porque delante no tiene los boundaries

In [112]:
pattern = re.compile(r'\BHa')

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(71, 73), match='Ha'>
Match found: Ha at positions 71-73


### `^` Busca solo el principio del string

In [113]:
sentence = 'Start a sentence and then bring it to an end'

In [114]:
pattern = re.compile(r'^Start')

matches = pattern.finditer(sentence)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(0, 5), match='Start'>
Match found: Start at positions 0-5


### `$` Solo busca el final del string

In [115]:
pattern = re.compile(r'end$')

matches = pattern.finditer(sentence)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(41, 44), match='end'>
Match found: end at positions 41-44


## TIME FOR ACTION

A continuación, vamos a tratar de obtener los números de teléfono.

Como podemos ver en el texto, el número de teléfono sigue la misma estructura: 
- 3 números
- signo de puntuación 
- 3 números
- signo de puntuación
- 4 números

In [132]:
#escribe tu código
pattern = re.compile("\d{3}\W\d{3}\W\d{4}")

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(249, 261), match='123.555.1234'>
Match found: 123.555.1234 at positions 249-261
<re.Match object; span=(262, 274), match='123*555*1234'>
Match found: 123*555*1234 at positions 262-274
<re.Match object; span=(275, 287), match='800-555-1234'>
Match found: 800-555-1234 at positions 275-287
<re.Match object; span=(288, 300), match='900-555-1234'>
Match found: 900-555-1234 at positions 288-300


### Abrimos `fake_info.txt` para empezar a trabajar

In [117]:
with open('data/fake_info.txt', 'r') as f:
    contents = f.read()

Pongamos que queremos obtener solamente los números de teléfono separados por un punto o un guion

In [133]:
#escribe tu código
pattern = re.compile(r'\d\d\d[-\.]\d\d\d[-\.]\d\d\d\d')

matches = pattern.finditer(contents)

numeros_telefono = []
for match in matches:
    # print(match.group())
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")
    numeros_telefono.append(match.group().replace("-",""))
numeros_telefono

Match found: 615-555-7164 at positions 12-24
Match found: 800-555-5669 at positions 102-114
Match found: 560-555-5153 at positions 191-203
Match found: 900-555-9340 at positions 281-293
Match found: 714-555-7405 at positions 378-390
Match found: 800-555-6771 at positions 467-479
Match found: 783-555-4799 at positions 557-569
Match found: 516-555-4615 at positions 647-659
Match found: 127-555-1867 at positions 740-752
Match found: 608-555-4938 at positions 831-843
Match found: 568-555-6051 at positions 917-929
Match found: 292-555-1875 at positions 1005-1017
Match found: 900-555-3205 at positions 1093-1105
Match found: 614-555-1166 at positions 1182-1194
Match found: 530-555-2676 at positions 1273-1285
Match found: 470-555-2750 at positions 1359-1371
Match found: 800-555-6089 at positions 1443-1455
Match found: 880-555-8319 at positions 1530-1542
Match found: 777-555-8378 at positions 1618-1630
Match found: 998-555-7385 at positions 1701-1713
Match found: 800-555-7100 at positions 1794-

['6155557164',
 '8005555669',
 '5605555153',
 '9005559340',
 '7145557405',
 '8005556771',
 '7835554799',
 '5165554615',
 '1275551867',
 '6085554938',
 '5685556051',
 '2925551875',
 '9005553205',
 '6145551166',
 '5305552676',
 '4705552750',
 '8005556089',
 '8805558319',
 '7775558378',
 '9985557385',
 '8005557100',
 '9035558277',
 '1965555674',
 '9005555118',
 '9055551630',
 '2035553475',
 '8845558444',
 '9045558559',
 '8895557393',
 '1955552405',
 '3215559053',
 '1335551711',
 '9005555428',
 '7605557147',
 '3915556621',
 '9325557724',
 '6095557908',
 '8005558810',
 '1495557657',
 '1305559709',
 '1435559295',
 '9035559878',
 '5745553194',
 '4965557533',
 '2105553757',
 '9005559598',
 '8665559844',
 '6695557159',
 '1525557417',
 '8935559832',
 '2175557123',
 '7865556544',
 '7805552574',
 '9265558735',
 '8955553539',
 '8745553949',
 '8005552420',
 '9365556340',
 '3725559809',
 '8905555618',
 '6705553005',
 '5095555997',
 '7215555632',
 '9005553567',
 '1475556830',
 '5825553426',
 '40055517

In [138]:
#escribe tu código
pattern = re.compile(r'\d+[*.-]*\d+[*.-]\d+')

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(235, 248), match='321--555-4321'>
Match found: 321--555-4321 at positions 235-248
<re.Match object; span=(249, 261), match='123.555.1234'>
Match found: 123.555.1234 at positions 249-261
<re.Match object; span=(262, 274), match='123*555*1234'>
Match found: 123*555*1234 at positions 262-274
<re.Match object; span=(275, 287), match='800-555-1234'>
Match found: 800-555-1234 at positions 275-287
<re.Match object; span=(288, 300), match='900-555-1234'>
Match found: 900-555-1234 at positions 288-300


## Character sets
Sirven para concretar nuestra búsqueda.

#### ¡CUIDADO! En ocasiones suele haber confusión con los character sets, porque no cogen más de un elemento.

In [141]:
# Para encontrar todos los números que empiecen por centenas:
# 800 - 900

pattern = re.compile(r'[89]00\D\d\d\d\D\d\d\d\d') # [89] es un character set que incluye al "8" y al "9"

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(275, 287), match='800-555-1234'>
Match found: 800-555-1234 at positions 275-287
<re.Match object; span=(288, 300), match='900-555-1234'>
Match found: 900-555-1234 at positions 288-300


## Los guiones no solamente sirven para encontrar ese caracter especial, sino que además nos permiten establecer rangos

Por ejemplo, para mostrar los números entre el 1 y el 5 de todo el texto

In [140]:
pattern = re.compile(r'[1-5]')

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(55, 56), match='1'>
Match found: 1 at positions 55-56
<re.Match object; span=(56, 57), match='2'>
Match found: 2 at positions 56-57
<re.Match object; span=(57, 58), match='3'>
Match found: 3 at positions 57-58
<re.Match object; span=(58, 59), match='4'>
Match found: 4 at positions 58-59
<re.Match object; span=(59, 60), match='5'>
Match found: 5 at positions 59-60
<re.Match object; span=(235, 236), match='3'>
Match found: 3 at positions 235-236
<re.Match object; span=(236, 237), match='2'>
Match found: 2 at positions 236-237
<re.Match object; span=(237, 238), match='1'>
Match found: 1 at positions 237-238
<re.Match object; span=(240, 241), match='5'>
Match found: 5 at positions 240-241
<re.Match object; span=(241, 242), match='5'>
Match found: 5 at positions 241-242
<re.Match object; span=(242, 243), match='5'>
Match found: 5 at positions 242-243
<re.Match object; span=(244, 245), match='4'>
Match found: 4 at positions 244-245
<re.Match object; span=(245, 246), m

### Para Mostrar letras mayúsculas y minúsculas, basta con poner los rangos juntos.


In [142]:
pattern = re.compile(r'[a-zA-Z]')

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(1, 2), match='a'>
Match found: a at positions 1-2
<re.Match object; span=(2, 3), match='b'>
Match found: b at positions 2-3
<re.Match object; span=(3, 4), match='c'>
Match found: c at positions 3-4
<re.Match object; span=(4, 5), match='d'>
Match found: d at positions 4-5
<re.Match object; span=(5, 6), match='e'>
Match found: e at positions 5-6
<re.Match object; span=(6, 7), match='f'>
Match found: f at positions 6-7
<re.Match object; span=(7, 8), match='g'>
Match found: g at positions 7-8
<re.Match object; span=(8, 9), match='h'>
Match found: h at positions 8-9
<re.Match object; span=(9, 10), match='i'>
Match found: i at positions 9-10
<re.Match object; span=(10, 11), match='j'>
Match found: j at positions 10-11
<re.Match object; span=(11, 12), match='k'>
Match found: k at positions 11-12
<re.Match object; span=(12, 13), match='l'>
Match found: l at positions 12-13
<re.Match object; span=(13, 14), match='m'>
Match found: m at positions 13-14
<re.Match object; sp

## Importante 
Al poner el símbolo `^` dentro de los corchetes `[]`, significa que **NO** queremos lo que está dentro de él.

En este caso, al ejecutar, se muestran solo los caracteres numéricos, los espacios en blanco, los saltos de línea y los caracteres numéricos.

**Se niega el set**

In [143]:
pattern = re.compile(r'[^a-zA-Z]')

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(0, 1), match='\n'>
Match found: 
 at positions 0-1
<re.Match object; span=(27, 28), match='\n'>
Match found: 
 at positions 27-28
<re.Match object; span=(54, 55), match='\n'>
Match found: 
 at positions 54-55
<re.Match object; span=(55, 56), match='1'>
Match found: 1 at positions 55-56
<re.Match object; span=(56, 57), match='2'>
Match found: 2 at positions 56-57
<re.Match object; span=(57, 58), match='3'>
Match found: 3 at positions 57-58
<re.Match object; span=(58, 59), match='4'>
Match found: 4 at positions 58-59
<re.Match object; span=(59, 60), match='5'>
Match found: 5 at positions 59-60
<re.Match object; span=(60, 61), match='6'>
Match found: 6 at positions 60-61
<re.Match object; span=(61, 62), match='7'>
Match found: 7 at positions 61-62
<re.Match object; span=(62, 63), match='8'>
Match found: 8 at positions 62-63
<re.Match object; span=(63, 64), match='9'>
Match found: 9 at positions 63-64
<re.Match object; span=(64, 65), match='0'>
Match found: 0 at pos

## Búsquedas de patrones en los textos 
Pongamos que queremos recoger palabras terminadas en at, excepto **bat**
Especificamos que no queremos los valores que empiecen por b

In [148]:
pattern = re.compile(r'[^b\s]at')

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(384, 387), match='cat'>
Match found: cat at positions 384-387
<re.Match object; span=(388, 391), match='mat'>
Match found: mat at positions 388-391
<re.Match object; span=(392, 395), match='pat'>
Match found: pat at positions 392-395


## Rangos `{}`
Como vemos en snippets.txt, las llaves nos permiten establecer rangos. 

Volviendo al ejemplo de los números de teléfono, otra forma de obtener los patrones

In [149]:
pattern = re.compile(r'\d{3}\D\d{3}\D\d{4}')

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(249, 261), match='123.555.1234'>
Match found: 123.555.1234 at positions 249-261
<re.Match object; span=(262, 274), match='123*555*1234'>
Match found: 123*555*1234 at positions 262-274
<re.Match object; span=(275, 287), match='800-555-1234'>
Match found: 800-555-1234 at positions 275-287
<re.Match object; span=(288, 300), match='900-555-1234'>
Match found: 900-555-1234 at positions 288-300


In [150]:
pattern = re.compile(r'\d{2,4}\D\d{2,4}\D\d{2,4}')

matches = pattern.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(240, 252), match='555-4321\n123'>
Match found: 555-4321
123 at positions 240-252
<re.Match object; span=(253, 265), match='555.1234\n123'>
Match found: 555.1234
123 at positions 253-265
<re.Match object; span=(266, 278), match='555*1234\n800'>
Match found: 555*1234
800 at positions 266-278
<re.Match object; span=(279, 291), match='555-1234\n900'>
Match found: 555-1234
900 at positions 279-291
<re.Match object; span=(292, 305), match='555-1234\n9005'>
Match found: 555-1234
9005 at positions 292-305


In [151]:
## Este ejemplo nos vale porque sabemos exactamente el patrón que se reproduce.

pattern1 = re.compile(r'Mr\.')

matches = pattern1.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


# Aquí no nos está dando lo que queremos. Solo nos da la secuencia Mr.

<re.Match object; span=(325, 328), match='Mr.'>
Match found: Mr. at positions 325-328
<re.Match object; span=(370, 373), match='Mr.'>
Match found: Mr. at positions 370-373
<re.Match object; span=(376, 379), match='Mr.'>
Match found: Mr. at positions 376-379


## Operador `?` 
Nos sirve para añadir 0 o 1 a nuestra selección. Así se va a contemplar lo que hay un espacio después

In [152]:
# Aquí sí aparecen todos los Mr. independientemente de que tengan punto o no
pattern2 = re.compile(r'Mr\.?')

matches = pattern2.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(325, 328), match='Mr.'>
Match found: Mr. at positions 325-328
<re.Match object; span=(338, 340), match='Mr'>
Match found: Mr at positions 338-340
<re.Match object; span=(356, 358), match='Mr'>
Match found: Mr at positions 356-358
<re.Match object; span=(370, 373), match='Mr.'>
Match found: Mr. at positions 370-373
<re.Match object; span=(376, 379), match='Mr.'>
Match found: Mr. at positions 376-379


In [154]:
# Aquí sí aparecen todos los Mr. independientemente de que tengan punto o no
pattern3 = re.compile(r'Mr\.?\s\w+') # El operador + muestra si hay 1 elemento o más 

matches = pattern3.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")

# Por eso no se imprime Mr. T

<re.Match object; span=(325, 337), match='Mr. Scha2fer'>
Match found: Mr. Scha2fer at positions 325-337
<re.Match object; span=(338, 346), match='Mr Smith'>
Match found: Mr Smith at positions 338-346
<re.Match object; span=(370, 375), match='Mr. T'>
Match found: Mr. T at positions 370-375


## Ahora sí que sí
para mostrarlo todo , utilizaremos el cuantificador `*`

In [155]:
pattern4 = re.compile(r'Mr\.?\s\w*')

matches = pattern4.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(325, 337), match='Mr. Scha2fer'>
Match found: Mr. Scha2fer at positions 325-337
<re.Match object; span=(338, 346), match='Mr Smith'>
Match found: Mr Smith at positions 338-346
<re.Match object; span=(370, 375), match='Mr. T'>
Match found: Mr. T at positions 370-375
<re.Match object; span=(376, 380), match='Mr. '>
Match found: Mr.  at positions 376-380


## Grouping `()`
Siguiendo con el ejemplo, para ver todos los Mr, Ms y Mrs, podemos utilizar el operador | (or)

In [156]:
pattern4 = re.compile(r'(Mr|Ms|Mrs)\.?\s\w*')

matches = pattern4.finditer(text_to_search)

for match in matches:
    print(match)
    print(f"Match found: {match.group()} at positions {match.start()}-{match.end()}")


<re.Match object; span=(325, 337), match='Mr. Scha2fer'>
Match found: Mr. Scha2fer at positions 325-337
<re.Match object; span=(338, 346), match='Mr Smith'>
Match found: Mr Smith at positions 338-346
<re.Match object; span=(347, 355), match='Ms Davis'>
Match found: Ms Davis at positions 347-355
<re.Match object; span=(356, 369), match='Mrs. Robinson'>
Match found: Mrs. Robinson at positions 356-369
<re.Match object; span=(370, 375), match='Mr. T'>
Match found: Mr. T at positions 370-375
<re.Match object; span=(376, 380), match='Mr. '>
Match found: Mr.  at positions 376-380
