# Regex (regular expressions)

https://docs.python.org/3/library/re.html

Regular expressions provide a flexible way to search or match (often more complex)
string patterns in text. A single expression, commonly called a regex, is a string
formed according to the regular expression language. Python’s built-in re module is
responsible for applying regular expressions to strings

**Functions:**

- `findall`	Returns a list containing all matches
- `search`	Returns a Match object if there is a match anywhere in the string. If there is more than one match, only the first occurrence of the match will be returned.
- `split`	Returns a list where the string has been split at each match
- `sub`	Replaces one or many matches with a string

**The Match object** has properties and methods used to retrieve information about the search, and the result:

- `.span()` returns a tuple containing the start-, and end positions of the match.
- `.string` returns the string passed into the function
- `.group()` returns the part of the string where there was a match

Cheat sheet
https://cheatography.com/davechild/cheat-sheets/regular-expressions/
<br>
Tester: https://regex101.com

In [1]:
import re

## findall
re.findall()

In [2]:
alphanumeric = "4298fsfsDFGHv012rvv21v9"

#varible with mixture of letters and numbers

In [9]:
re.findall("[A-z]", alphanumeric)

#we use findall to pull out all letters (lower and upper case) in variable alphanumeric

['f', 's', 'f', 's', 'D', 'F', 'G', 'H', 'v', 'r', 'v', 'v', 'v']

In [4]:
text = "Sian sian@google.com"
pattern = r'[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}'

# pattern = looking for a-z letters and all numbers 0-9 also special character
# we must have an @sign and must have after that certain letter/Signs and we need a dot with either 2-3 (.com/.de)

#findall using a known pattern can be used to pull pertinent information out of a text value

In [7]:
regex = re.compile(pattern, flags=re.IGNORECASE) #ignore the case of A-Z
regex.findall(text)

#we defined a text
#we defined the pattern
#we define a new variable in that we use re.compile(pattern) on the pattern
#no we apply findall function on text with pattern in new variable
#dont need the re.findall because its already in the variable with the re.compile

['sian@google.com']

In [11]:
my_string = "Kosta likes climbing. Kosta is a great TA so he also loves data"

In [12]:
# return all occurrances of 'Kosta' using re.findall()

re.findall("Kosta ", my_string)

#be aware of the space after the word to find it in a text

['Kosta ', 'Kosta ']

## substitue/replace
re.sub()

In [14]:
# use re.sub() to replace "TA" by "Triceratops Alligator"

my_string = re.sub("TA", "Triceratops Alligator", my_string)
my_string

'Kosta likes climbing. Kosta is a great Triceratops Alligator so he also loves data'

## search
re.search()

In [15]:
x = re.search("ove", my_string)
print(x)

<re.Match object; span=(73, 76), match='ove'>


In [16]:
x = re.search(r"\bT\w+", my_string)
print(x.span())

#capital T after a non character followed by more than one any alphanumeric
#output gives us the position of request

(39, 50)


In [17]:
print(x.group())

#we look for x and print the group after thing we looked for (T) before

Triceratops


## split
re.split()

In [18]:
multiples= "ear       hand  foot knee"

In [19]:
#use split with \s+ to comile and then split the passed text around the spaces
re.split('\s+', multiples)

['ear', 'hand', 'foot', 'knee']