<h1>START...Searching, Manipulating and Processing Text with REGEX

<h2>Overview and purpose</h2><br>
In this START workshop, you will the basics of working with regex. Regex is a key tool for searching, manipulating and processing text and can be used across various programming languages and software.

<h2>Important: creating a copy of this notebook</h2><br>
Please don't use my colab notebook to follow this session (or you will make changes to our shared document!). You can create a copy of this notebook by clicking File --> Save a Copy in Drive. This will create a copy of the notebook in your own Google Drive.

<h2>What is Regex?</h2><br>
A regex - or 'regular expression' - is a sequence of symbols and characters that specifies a pattern to be searched for within a piece of text. 

![regexsnippet2.PNG](attachment:regexsnippet2.PNG)<br>
(<i>Snapshot from regex101.com</i>)

Regex use cases and compatibility:
- Searching textual data (e.g. conducting research, data extraction, validating user input)
- Substituting/rearranging pieces of data (e.g. tidying data, editing copy)
- Splitting text (e.g. generating tokens, creating structured data)
- Text feature engineering (e.g. NLP processes)
- Compatible with many programming languages (e.g. Python, R, Java, JavaScript, C++), databases (e.g. MySQL, Oracle) and software (e.g. EditPad, MS Excel with VBA)
- Regular expressions are pretty much written the same whichever software/language you use to implement them!

<h2>Regex in Python and the .findall() function</h2>

We will be creating regular expressions with the <b>Python</b> programming language. We can use the Python function `.findall()` to specify our regular expression and return matches, just as we did in the above Regex101 example.

In [1]:
# Import regex library to Python
import re

In [None]:
# Use findall() to find anything but letters in our 'Call 999' phrase
# The 1st argument specifies the regex, the 2nd is the text to be searched
re.findall("[^a-zA-Z]", "Call 999...The cat is on the roof!")

<h2>Literals</h2>
When our regular expression matches the exact text we want to match, this is known as a <b>literal</b>.

In [None]:
# We can use the literal 'cat' to match with 'cat' from our phrase
re.findall("cat", "cat1, cat2")

In [None]:
# Literals can include any characters (letters/digits/symbols/punctuation)

In [None]:
re.findall("Call 999", "Call 999...The cat is on the roof!")

In [None]:
re.findall("Call 911", "Call 999...The cat is on the roof!")

<i>Activities: literals</i><br>

In [None]:
# Change the ? in the regex argument to find 'fox'.
# Remember to click 'Run' to check your answer!
re.findall("?", "The quick brown fox jumped over the lazy cat.")

In [None]:
# Change the ? in the regex argument to find '123'.
re.findall("?", "abc123def456")

<h2>Wildcards</h2><br>
The wildcard, implemented with the dot (.) symbol, can be used to match with <i>any</i> single character (e.g. a letter, digit, whitespace, symbol etc.).

In [None]:
re.findall(".", "Everything !")

The dot wildcard supercedes the period (full-stop) character. If you want to match a period, you need to <i>escape</i> the dot using a back-slash first.

In [None]:
# We haven't escaped the dot here, so we will match with everything
re.findall(".", "Full. Stops.")

In [None]:
# We've now escaped the dot, so we will match only full-stops (periods)
re.findall("\.", "Full. Stops.")

Let's write a regular expression that matches:<br>
<b>Dog.<br>
123.<br>
---.<br></b>
<br>
But <b>doesn't</b> match:<br>
<b>abc1</b>


In [None]:
# Replace the ? with a suitable regular expression
re.findall("?", "Dog. 123. ---. abc1" )

<h2>Alternation</h2><br>
Alternation, implemented with the pipe (|) symbol, allows us to 'alternate' between the characters we want to match (i.e. we can match either the characters before or after the pipe).

In [None]:
# Here we are matching for either 'cat' or 'roof'
re.findall("cat|roof", "Call 999...The cat is on the roof!")

In [None]:
# Here we are matching for either 'a', 'b' or 'c'. 
re.findall("a|b|c", "Call 999...The cat is on the roof!")

In [None]:
# What might happen in the next example?
re.findall("sea|seaweed", "I love the sea, but I hate seaweed")

<h2>Character classes</h2><br>
Character classes (or 'character sets'), implemented with square brackets ([]), can be used to match any of the characters you specify. For example, for the regex [aeo], I am matching for the letters 'a', 'e' and 'o'; similarly, [135] matches for the digits 1, 3 and 5.

In [None]:
# Here we are matching for the letters 'a', 'b'  and/or 'c'.
re.findall("[abc]", "Call 999...The cat is on the roof!")

In [None]:
# Note that literals do not work in class sets because they are letters are treated as separate characters.
re.findall("[cat]", "Call 999...The cat is on the roof!")

In [None]:
# Character sets are useful for matching text even where there are incorrect or alternative spellings.
re.findall("sep[ae]rate", "The spellings 'separate' and 'seperate' are often mixed up")

In [None]:
# Character classes used in this way only match single characters
re.findall("[abc][abc]", "ab, bc, ac, ad, cb")

<i>Activities: character sets</i><br>

In [None]:
# Replace the ? to add a regex which matches for the vowels in the sentence
re.findall("?", "Not every vowel is in this sentence!")

In [None]:
# Replace the ? to add a regex that matches for all words rhyming with 'at'
re.findall("?", "The cat was sat on the mat chasing a rat.")

<h3>Negated character classes</h3><br>
Negated character classes, implemented with the caret (^) symbol, specify the characters you DON'T want to match for. The caret needs to be placed at the start of the character set.

In [None]:
# Here we are matching for any single character BUT vowel letters
re.findall("[^aeiou]", "No vowels please!")

In [None]:
# Note that the ^ needs to be at the start of the character set or it is treated like any other character
re.findall("[aei^ou]", "Rain, snow, ^, hail, sun!")

We have the (non-sensical) phrase "log dog hog fog".

Let's write a regular expression that matches:<br>
log<br>
dog<br>
hog<br>
<br>
But <b>doesn't</b> match:<br>
fog

In [None]:
# Replace the ? with a suitable regular expression using a negated character class
re.findall("?", "log dog hog fog" )

<h2>Ranges</h2><br>
Ranges, implement using the hyphen (-), allow us to specify a range of characters rather than typing out all options separately. They need to be placed inside of a character class like so: [1-5]. 

In [None]:
# Here we match any letter that is 'a', 'b' or 'c' using the range a-c
re.findall("[a-c]", "Call 999...The cat is sat on the roof!")

In [None]:
# Ranges can be used with either letters or digits (because they have order)
re.findall("[12345]", "123456789")

In [None]:
# Here we match any single character that is a lower-case letter
print(re.findall("[a-z]", "a1b2c3"))

# The same for upper-case letters
print(re.findall("[A-Z]", "A1B2C3"))

In [None]:
# Or we can match any single character that is either a lower- or upper-case letter
re.findall("[A-Za-z]", "abcABC123")

In [None]:
# And we can match any single digit using the range 0-9
re.findall("[0-9]", "abcde12345")

In [None]:
# Let's write a regex that matches any single capital letter or number
re.findall("[A-Z9]", "Call 999...The cat is sat on the roof!")

In [None]:
# Let's write a regex that matches any single punctuation mark or whitespace (using ranges!)
re.findall("?", "Call 999...The cat is sat on the roof!")

<h2>Shorthand character classes</h2><br>
Shorthand classes represent a <i>predefined</i> set of characters (often common ranges or characters). They are implemented using a backslash (\) followed by a given letter. Typical shorthand classes include:<br>
- <b>\w</b> ('word character') - any single character that is a letter, digit or underscore; shorthand for the character class [a-zA-Z0-9_]<br>
- <b>\d</b> ('digit character') - any single digit; shorthand for class [0-9]<br>
- <b>\s</b> ('whitespace character') - any single whitespace character (e.g. a tab, return or line break); shorthand for class [\t\r\n\f\v]

In [None]:
# Here we are matching any single character that is punctuation (incl. whitespace)
# Compare this to solution for the question two cells above!
re.findall("[^\w]", "Call 999...The cat is sat on the roof!")

In [None]:
# In this example we also negate the whitespace character
re.findall("[^\w\s]", "Call 999...The cat is sat on the roof!")

<i>Activity: shorthand character classes</i><br>
Replace the ? in the cell below to write a regex that matches for the following three phrases:<br>
4 limbs<br>2 hands<br>1 heart<br><br>But <b>doesn't</b> match the following:<br>Four limbs<br>Two hands<br>One heart<br><br>Hint: you don't need to use the square brackets here!

In [None]:
re.findall("\d\s\w\w\w\w\w", "4 limbs, 2 hands, 1 heart, Four limbs, Two hands, One heart")

<h3>Negated shorthand character classes</h3><br>
These specify the shorthand character classes you DON'T want to match for. They are implemented just like shorthand classes but using a capital (rather than lower-case) letter. For example:<br>
- <b>\W</b> ('non-word character') - any single character that is <i>not</i> a word character<br>
- <b>\D</b> ('non-digit character') - any single character that is <i>not</i> a digit character<br>
- <b>\S</b> ('non-whitespace character') - any single character that is <i>not</i> a whitespace

In [None]:
# Here we match for any single character that is not whitespace
re.findall("\S", "Just the letters")

In [None]:
# What do you think the output will be?
re.findall("\d\D\s", "11 5G 789 9  ")

In [None]:
# What about this example?
re.findall(".\d\S", "123, abc, !-.")

<h2>Quantifiers</h2><br>Quantifers allow you to specify the quantity of characters you want to match, avoiding repetition.

<i>Fixed quantifiers</i><br>
Fixed quantifiers, implemented using the curly brackets {}, require you to specify the number of times you want to match (either as an exact number or a range).

In [None]:
# Here we match for exactly 3 a's
re.findall("a{3}", "a aa aaa")

In [None]:
# And we can use special characters alongside quantifiers
re.findall("\d{5}", "1 12 123 1234 12345 123456")

In [None]:
# Specify ranges by separating boundaries with a comma
re.findall("[A-Z]{2,3}", "A BC DEF")

<i>Activity</i>:<br>
Let's write a regex using a quantifier that matches the following:<br>
CHEEESE<BR>CHEEEESE<BR>CHEEEESY<BR><br>But <b>doesn't match</b>:<br>CHEESE<br>CHEESY

In [None]:
re.findall("?", "CHEEESE, CHEEEESE, CHEEEESY, CHEESE, CHEESY")

<i>The optional quantifier</i><br>
The 'optional' quantifier, implemented with a question mark (?), specifies that the character in the regex is <i>optional</i>, meaning that it can appear either 0 or 1 times.

In [None]:
# Here we are matching for the word 'colour' with or without a u
re.findall("colou?r", "UK spelling: colour; US spelling: color")

In [None]:
# To match the actual question mark, remember to 'escape' the metacharacter using a backslash
re.findall("[\w]{3,4}\?", "Who? What? When?")

<i>Activity</i>:<br>
Replace the ? in the cell below to write a regex that matches for the following:<br>
1 time more?<br>
2 times more?<br>
3 times more?<br><br>
But <b>doesn't</b> match with:<br> 4 times more.
<br><br>Hint: remember to 'escape' the question mark character!

In [None]:
re.findall("?", "1 time more? 2 times more? 3 times more? 4 times more.")

<i>Plural quantifiers</i>
- The asterisk (*) symbol, also an 'optional' quantifier, specifies that the character can appear either 0 <b> or more</b> times.<br>
- The plus (+) symbol, a 'non-optional' quantifier, specifies that the character must appear <b>once or more</b>.

In [None]:
# Matches numbers starting & ending with 1, regardless of whether there are 0's in the middle (or how many)
re.findall("10*1", "11 10 101 10001")

In [None]:
# Matches for numbers starting & ending with 1 as long as there is 1 or more 0's in the middle
re.findall("10+1", "11 10 101 10001")

<i>Activity</i>:<br>
Let's replace the ? in the cell below with a regex that can match the following strings:<br>aaaabcc<br>aabbbbc<br>aacc<br><br>But <b>doesn't</b> match:<br>a<br>ab

In [None]:
re.findall("?", "aaaabcc aabbbbc aacc a ab")

<h2>Groups</h2>

<h3>Capture groups</h3><br>
We can extract information for further processing by defining groups of characters and <i>capturing</i> them. Capture groups are implemented by using the parentheses (). Any substring we put inside of the parentheses will be captured as a group. 

In [None]:
# Here we match and store all single digits in a separate group
re.findall("(\d)", "123")

In [None]:
# I want to extract only the HUM codes, but only the digits (not 'HUM')
# Matching for HUM followed by the digits does not work:
re.findall("HUM\d{3}", "HUM001, LAW001, SCI001, HUM111")

In [None]:
# However, we can capture the digits in a group and thereby not store the HUM characters
re.findall("HUM(\d{3})", "HUM001, LAW001, SCI001, HUM111")

<i>Activity</i><br>
We have a set of 5 files but only want to keep those in the correct format. Using the capture group mechanism, replace the ? with a regex that will keep the following files:<br>
doc_1.pdf<br>doc_2.pdf<br>doc_3.pdf<br><br>As well as keeping these files, we want to store them <b>without their extensions (.pdf)</b>, so that they will be stored in the format 'doc_1', 'doc_2' and so on.<br><br>At the same time, we don't want to keep the following files:<br>new_doc_4.pdf<br>doc_5.csv

In [None]:
re.findall("?", "My files: doc_1.pdf, doc_2.pdf, doc_3.pdf, new_doc_4.pdf, doc_5.csv")

In [None]:
# We can use multiple capture groups in the same expression
re.findall("Initials: (\w+), Age: (\d{1,3})", "Initials: AA, Age: 10")

In [None]:
# If there are multiple matches in the test string, another item ('tuple') will be added to the outputted list
re.findall("Initials: (\w+), Age: (\d{1,3})", "Initials: AA, Age: 10; Initials: BB, Age: 9")

<i>Activities: capture groups</i>

In [None]:
# Replace the ? with a regex that extracts the domain name from the website
re.findall("?", "swimming.com, triathlon.com, cycling.com")

In [None]:
# Replace the ? with a regex that captures only the lower-class letters at the end of each word
re.findall("?", "LOTSF2find 1IT3Kthe FDEWAWE8hidden 321NCASmessage")

<h3>The .sub() function</h3>

The `.sub()` function replaces matched substrings with a new string. 

In [None]:
# Replaces '999' with '911'
re.sub("999", "911", "Call 999...The cat is sat on the roof!")

In [None]:
# By default, the .sub() function replaces the string for every match
re.sub("aparent", "apparent", "It was aparent that they weren't coming, but aparently it was because of the traffic.")

<h2>Backreferences</h2><br>
Backreferencing is a mechanism to access the text captured in your groups. You are 'referring' back to the substring you have captured. Backreferencing is implemented by using the backslash (\) followed by the number of the capture group you want to access, for example: \1.<br><br>
<b>Note</b>: because '\' is a special character, you will need to escape it (just as did with previous examples). You can either put <i>another</i> backslash beforehand, or you place an 'r' at the start of your substitution text. For this workbook, we will be using the 'r' method.

In [None]:
# Here we store and reuse 'keep!' using a backreference (\1) but alter the word 'change!'
re.sub("(keep!)change!", r"\1NEW!", "keep!change!")

In [None]:
re.sub("Initials: (\w+), Age: (\d{1,3})", r"INITS: \1, YO: \2", "Initials: AA, Age: 10")

In [None]:
re.sub("Initials: (\w+), Age: (\d{1,3})", r"INITS: \1, YO: \2", "Initials: AA, Age: 10; Initials: BB, Age: 9")

<i>Activities: backreferencing</i>

In [None]:
# Replace the ?'s to write a regular expression and substitution that will swap the two words around
re.sub("?", r"?", "goodbye, hello")

In [None]:
# Replace the ?'s with a regular expression & substitution that removes the square brackets around digits (but not letters)
re.sub("\[(\d)\]", r"\1", "[1] [two] [3] [four] [5]")

<h2>Project: tidy OCR-generated text & export to CSV</h2><br>
For this task, you will be using an extract from a historic newspaper (Aberdeen Evening Express in 1886) which records mortality rates across British cities (original image below).</h2>

![imageregexhistoric.PNG](attachment:imageregexhistoric.PNG)

The above image has been converted to text using OCR, but this text is messy and requires pre-processing before it can analysed. The text currently reads:
    
    `MORTALITY STATISTICS. The Beeistrar-General reports ths annual rate mortality last week in the towns of England .'and Wales averaged 18*5 per 1000. The rates tbe various towns ware Birkenhead, 16; Birmingham, 15; Blackburn, 16; Bolton, 18 ; Bradford, 16 ; Brighton, 14 Bristol, 20; Cardiff, 25 ; Derby, 12 ; Halifax, 20; Biddersfield, 21 ; Hull, 19 ; Leeds, 22 ; Leicester, 18 ; London, 17; Manchester,15 ; Norwich, 24; Nottingham, 21; Oldham, 18 ; Plymouth, 19 Portsmouth, 15 ; Preston, 23 ; Salford, 14 ; Sheffield, 16 ; Sunderland, 18; Wolverhampton, 30. The rate in Edinburgh was 14 ;in Glasgow, 23 ; and in Dublin. 22.`
    
 Your task is to tidy this data and prepare it for exporting to CSV. This involves several steps.

<b> Step 1: Remove unnecessary text at the start of the article</b><br>We only want to keep the names of the towns/cities and their associated mortality rates, so we do not need any of the text prior to 'Birkenhead'.<br>
    
    

 In the cell below, remove the ?'s and replace with a regex needed to remove the unnecessary opening text. You are allowed to use literals for this exercise, and remember that there are always multiple solutions with regexes!

In [2]:
# This is just our textual data storing the newspaper article
data = "MORTALITY STATISTICS. The Beeistrar-General reports ths annual rate mortality last week in the towns of England .'and Wales averaged 18*5 per 1000. The rates tbe various towns ware Birkenhead, 16; Birmingham, 15; Blackburn, 16; Bolton, 18 ; Bradford, 16 ; Brighton, 14 Bristol, 20; Cardiff, 25 ; Derby, 12 ; Halifax, 20; Biddersfield, 21 ; Hull, 19 ; Leeds, 22 ; Leicester, 18 ; London, 17; Manchester,15 ; Norwich, 24; Nottingham, 21; Oldham, 18 ; Plymouth, 19 Portsmouth, 15 ; Preston, 23 ; Salford, 14 ; Sheffield, 16 ; Sunderland, 18; Wolverhampton, 30. The rate in Edinburgh was 14 ;in Glasgow, 23 ; and in Dublin. 22. "

# We are going to find and keep ALL text from 'Birkenhead'
# Replace the ?'s with a regex that will match for everything from 'Birkenhead'
data = ' '.join(re.findall("Birkenhead.*", data))
print(data)

Birkenhead, 16; Birmingham, 15; Blackburn, 16; Bolton, 18 ; Bradford, 16 ; Brighton, 14 Bristol, 20; Cardiff, 25 ; Derby, 12 ; Halifax, 20; Biddersfield, 21 ; Hull, 19 ; Leeds, 22 ; Leicester, 18 ; London, 17; Manchester,15 ; Norwich, 24; Nottingham, 21; Oldham, 18 ; Plymouth, 19 Portsmouth, 15 ; Preston, 23 ; Salford, 14 ; Sheffield, 16 ; Sunderland, 18; Wolverhampton, 30. The rate in Edinburgh was 14 ;in Glasgow, 23 ; and in Dublin. 22. 


<b>Step 2: Remove unnecessary text from the end of the article<br></b>For the sake of this exercise, we are only interested in English locations, so we want to remove everything after 'Wolverhampton, 30.' 

In the cell below, replace the ?'s with a regex which can remove the unnecessary closing text.

In [3]:
# Replace the ?'s in the sub() function to remove all of the text after 'Wolverhampton, 30'.
data = re.sub("\..*", ".", data)
print(data)

Birkenhead, 16; Birmingham, 15; Blackburn, 16; Bolton, 18 ; Bradford, 16 ; Brighton, 14 Bristol, 20; Cardiff, 25 ; Derby, 12 ; Halifax, 20; Biddersfield, 21 ; Hull, 19 ; Leeds, 22 ; Leicester, 18 ; London, 17; Manchester,15 ; Norwich, 24; Nottingham, 21; Oldham, 18 ; Plymouth, 19 Portsmouth, 15 ; Preston, 23 ; Salford, 14 ; Sheffield, 16 ; Sunderland, 18; Wolverhampton, 30.


<b>Step 3: Remove whitespace between the commas and mortality rates</b><br>In MOST cases there is an unneeded space between the comma and mortality rate. These spaces should be removed to structure the data for CSV export.<br> 
    
    
In the cell below, remove the ?'s to replace the comma's and spaces with just commas using regex. Be careful: sometimes there is NO space between the comma and the mortality rate (for example Manchester,15), so you'll need a regex that can match for either a comma and space OR just a comma!

In [6]:
# Replace the ?'s in the sub() function to remove the whitespace between commas and mortality rates
# Note: sometimes in the current extract there is NO space between the comma & mortality rate (e.g. Manchester,15)
data = re.sub(", ", ",", data)
print(data)

Birkenhead,16; Birmingham,15; Blackburn,16; Bolton,18 ; Bradford,16 ; Brighton,14 Bristol,20; Cardiff,25 ; Derby,12 ; Halifax,20; Biddersfield,21 ; Hull,19 ; Leeds,22 ; Leicester,18 ; London,17; Manchester,15 ; Norwich,24; Nottingham,21; Oldham,18 ; Plymouth,19 Portsmouth,15 ; Preston,23 ; Salford,14 ; Sheffield,16 ; Sunderland,18; Wolverhampton,30.


<b>Step 4: Add missing semi-colons & remove unnecessary whitespace</b><br>
The next step will involve 2 changes:
- We want there to be no spaces before or after the semi-colons, so the text should read 'Cardiff,25;Derby,12' and so on. 
- We also need to add in missing semi-colons. There are two places where there is no semi-colon separating the locational data: between Brighton and Bristol and between Plymouth and Portsmouth. These separators between the locations will be essential when we convert the text to CSV (to indicate new rows). 

Tip: the second requirement is a little more tricky! As an example, "Plymouth,19 Portsmouth,15" should become "Plymouth,19;Portsmouth,15". There are a number of ways to do this, but it might be a good place to practice your capture groups!

In [12]:
# Replace the ?'s to add missing semi-colons
# In the same expression, remove the whitespace before and after any semi-colons
data = re.sub("(\d+)[ ;]+", r"\1;", data)
print(data)

Birkenhead,16;Birmingham,15;Blackburn,16;Bolton,18;Bradford,16;Brighton,14;Bristol,20;Cardiff,25;Derby,12;Halifax,20;Biddersfield,21;Hull,19;Leeds,22;Leicester,18;London,17;Manchester,15;Norwich,24;Nottingham,21;Oldham,18;Plymouth,19;Portsmouth,15;Preston,23;Salford,14;Sheffield,16;Sunderland,18;Wolverhampton,30.


<b>Step 5: Exchange semi-colons for newlines</b><br>We will also to convert our semi-colons to newlines to indicate the start of a new row (in a spreadsheet)

In [13]:
# Swap semi-colons for new line (\n character)
data = re.sub(";", "\n", data)
print(data)

Birkenhead,16
Birmingham,15
Blackburn,16
Bolton,18
Bradford,16
Brighton,14
Bristol,20
Cardiff,25
Derby,12
Halifax,20
Biddersfield,21
Hull,19
Leeds,22
Leicester,18
London,17
Manchester,15
Norwich,24
Nottingham,21
Oldham,18
Plymouth,19
Portsmouth,15
Preston,23
Salford,14
Sheffield,16
Sunderland,18
Wolverhampton,30.


<b>Step 6: Export to CSV!</b><br>Our data is now tidy and consistent, and is therefore ready to export as CSV (so that it can be analysed with Excel, Google Sheets etc.). You do not need to do anything here, but if you're working in your own Jupyter notebook, you can run this cell to create a .csv file (it will be stored in the same folder as the notebook and entitled 'mortalitydata.csv').

In [14]:
with open('mortalitydata.csv', 'w') as out:   # here we are writing and saving a 'mortalitydata' .csv file
    out.write(data)

<h2>Workshop summary<br></h2>



![regexmeme3.jpg](attachment:regexmeme3.jpg)


Thank you for joining this workshop! We have covered the following:

- What is Regex? Where can we use it?
- Literals
- Alternation
- Character classes (incl. negated character classes)
- Ranges
- Shorthand classes (incl. negated shorthand character classes)
- Quantifiers
- Capture groups
- Backreferences

Further suggested topics:
- Non-capture groups
- Anchors
- Lookaheads and lookbehinds

<h2>Rights and authorship</h2><br>
This notebook was produced by Dr. Grace Di Méo for the workshop 'Start...Searching, Manipulating and Processing Text with REGEX', held in March 2023 and organised by Southampton Digital Humanities. 

This notebook is released under a CC-BY license.