# Regex Tutorial
* Learning some patterns and ways to extract information

In [None]:
import re

### Extracting phone numbers

In [None]:
text = '''My name is Wellington! My phone number is (17)99999-9999, 17999999999, call me if you need any questions on NLP'''

pattern = r'\(\d{2}\)\d{5}-\d{4}|\d{11}'

matches = re.findall(pattern, text)
matches

['(17)99999-9999', '17999999999']

### Extracting titles

In [None]:
text = '''Note 1 - First paragraph
A expectativa era de uma estreia duríssima para João Fonseca diante de Perricard, último campeão do torneio. E o primeiro set traduziu na prática o que se esperava neste duelo entre dois jovens, de 19 e 22 anos, respectivamente, na quadra do evento suíço.

Note 2 - Second paragraph
Dono de um saque potente e veloz, Perricard teve um aproveitamento de 78% de bolas em jogo no primeiro serviço. João Fonseca também encaixou o saque, o que tornou o embate equilibrado, sem quebras. No tie-break, o brasileiro anotou 6 a 3, permitiu o empate e, no quarto set point, fechou com um ace cravando 7/6 (6).

Note 3 - Third paragraph
No segundo set, João Fonseca deslanchou. Ele abriu 1 a 0 e, em seguida, quebrou o saque do adversário pela primeira vez no duelo. Diferentemente do set anterior, marcado pelo equilíbrio, nesta parcial, o carioca foi superior e faturou a vitória ao sacar para o jogo.'''

pattern = r'Note \d - ([^\n]+)'

re.findall(pattern, text)

['First paragraph', 'Second paragraph', 'Third paragraph']

### Extracting specif informations

In [None]:
text = '''The gross cost of operating lease vehicles in FY2021 Q1 was $4.85 billion.
In previous quarter i.e fy2020 Q4 it was $3 billion.
FY2020 Q5'''

pattern = r'FY(\d{4} Q[1-4])'

matches = re.findall(pattern, text, flags=re.IGNORECASE)
matches

['2021 Q1', '2020 Q4']

### Extracting numbers

In [None]:
text = '''The gross cost of operating lease vehicles in FY2021 Q1 was $4.85 billion.
In previous quarter i.e fy2020 Q4 it was $3 billion.
FY2020 Q5'''

pattern = r'\$([0-9\.]+)'

matches = re.findall(pattern, text, flags=re.IGNORECASE)
matches

['4.85', '3']

In [None]:
text = '''The gross cost of operating lease vehicles in FY2021 Q1 was $4.85 billion.
In previous quarter i.e fy2020 Q4 it was $3 billion.
FY2020 Q5'''

pattern = r'FY(\d{4} Q[1-4])[^\$]+\$([0-9\.]+)'

matches = re.search(pattern, text)
matches

<re.Match object; span=(46, 65), match='FY2021 Q1 was $4.85'>

In [None]:
matches.groups()

('2021 Q1', '4.85')