### 00 A RegEx, or Regular Expression
- A regular Expression is a sequence of characters that forms a search pattern.
- RegEx can be used to check if a string contains the specified search pattern.

In [1]:
import re

#Search the string to see if it starts with "The" and ends with "Spain":
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)

if x:
  print("YES! We have a match!")
else:
  print("No match")

YES! We have a match!


#### RegEx Functions
- The re module offers a set of functions that allows us to search a string for a match:

Function        Description
findall	        Returns a list containing all matches
search	        Returns a Match object if there is a match anywhere in the string
split	        Returns a list where the string has been split at each match
sub	            Replaces one or many matches with a string

#### Metacharacters
- are characters with special meaning:
Character	    Description	                                                                        Example	
[]	            A set of characters	                                                                "[a-m]"	
\	            Signals a special sequence (can also be used to escape special characters)	        "\d"	
.	            Any character (except newline character)	                                        "he..o"	
^	            Starts with	                                                                        "^hello"	
$	            Ends with	                                                                        "planet$"	
*	            Zero or more occurrences	                                                        "he.*o"	
+	            One or more occurrences	                                                            "he.+o"	
?	            Zero or one occurrences	                                                            "he.?o"	
{}	            Exactly the specified number of occurrences	                                        "he.{2}o"	
|	            Either or	                                                                        "falls|stays"	
()	            Capture and group	 

#### Special Sequences

A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:

Character	Description	                                                                                    Example
\A	        Returns a match if the specified characters are at the beginning of the string	                "\AThe"	
\b	        Returns a match where the specified characters are at the beginning or at the end of a word     r"\bain"
            (the "r" in the beginning is making sure that the string is being treated as a "raw string")	r"ain\b"

\B	        Returns a match where the specified characters are present, but NOT at the beginning            r"\Bain"
            (or at the end) of a word
            (the "r" in the beginning is making sure that the string is being treated as a "raw string")	r"ain\B"
                                                                                                           	
\d	        Returns a match where the string contains digits (numbers from 0-9)	                            "\d"	
\D	        Returns a match where the string DOES NOT contain digits	                                    "\D"	
\s	        Returns a match where the string contains a white space character	                            "\s"	
\S	        Returns a match where the string DOES NOT contain a white space character	                    "\S"	
\w	        Returns a match where the string contains any word characters 
            (characters from a to Z, digits from 0-9, and the underscore _ character)	                    "\w"	
\W	        Returns a match where the string DOES NOT contain any word characters	                        "\W"	
\Z	        Returns a match if the specified characters are at the end of the string	                    "Spain\Z"

In [2]:
import re

text = "The quick brown fox jumps over the lazy dog."
pattern = r"\b\w+\b"  # Matches whole words

matches = re.findall(pattern, text)
print(matches)  # Output: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']


#### 001 The findall() Function
- The findall() function returns a list containing all matches.
- The list contains the matches in the order they are found.
- If no matches are found, an empty list is returned:

In [7]:
#Print a list of all matches:
import re

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)                        # returns list with matches
print(len(x))                   # returns number of number of items in list

import re

txt = "The rain in Spain"
x = re.findall("Portugal", txt)
print(x)                        # returns empty list

['ai', 'ai']
2
[]


#### 002 The Search() Function
- The search() function searches the string for a match, and returns a Match object if there is a match.
- If there is more than one match, only the first occurrence of the match will be returned:
- If no matches are found, the value None is returned:

In [17]:
import re

txt = "The rain in Spain"
x = re.search("\s", txt)

print("1st white space character is located in position:", x.start())

1st white space character is located in position: 3


  x = re.search("\s", txt)


#### The split() Function
- The split() function returns a list where the string has been split at each match:

In [19]:
#Split at each white-space character
import re

txt = "The rain in Spain"
x = re.split('\s', txt)
print(x)

['The', 'rain', 'in', 'Spain']


  x = re.split('\s', txt)


In [9]:
import re

text = "The quick brown fox jumps over the lazy dog."
pattern = r"\b\w+\b"  # Matches whole words

match = re.search(pattern, text)
if match:
    print(match.group())  # Output: The

vowels 3


#### Sets
A set is a set of characters inside a pair of square brackets [] with a special meaning:

Set	        Description
[arn]	    Returns a match where one of the specified characters (a, r, or n) is present	
[a-n]	    Returns a match for any lower case character, alphabetically between a and n	
[^arn]	    Returns a match for any character EXCEPT a, r, and n	
[0123]	    Returns a match where any of the specified digits (0, 1, 2, or 3) are present	
[0-9]	    Returns a match for any digit between 0 and 9	
[0-5][0-9]	Returns a match for any two-digit numbers from 00 and 59	
[a-zA-Z]	Returns a match for any character alphabetically between a and z, lower case OR upper case	
[+]	        In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string

5 x 0 = 0
5 x 1 = 5
5 x 2 = 10
5 x 3 = 15
5 x 4 = 20
5 x 5 = 25
5 x 6 = 30
5 x 7 = 35
5 x 8 = 40
5 x 9 = 45
5 x 10 = 50


120
