<div style="text-align: center;" >
<h1 style="margin-top: 0.2em; margin-bottom: 0.1em;">Regular Expressions</h1>
<h4 style="margin-top: 0.7em; margin-bottom: 0.3em; font-style:italic">Why, how, and what the $@!* are they?</h4>
</div>
<br>

***If you're looking at this notebook online (GitHub) the images won't load correctly. To view them please download/pull the the notebook along with the 'sources' folder and view the notebook on your local machine***

In [2]:
import re
import numpy as np

## Extracting Text

Sometimes when working with webscraping or text processing (to name but a few topics where regexes come in handy) we would like to extract some specific text from a larger corpus of text data. Imagine for example a data set of posts where you would like to extract all email addresses in an automated fashion without having to go through each post yourself.<br>
Since we cannot just type: 'get me all email addresses', and python does the rest, we need to come up with a more sophisticated solution. That's were regexes come into play. Using regexes we can instruct our machine to look for certain patterns inside our data (e.g. the data set of posts) and extract all passages that match these patterns. Looking at our data set for example we could try to find a pattern that includes all email addresses but nothing else and pass that as an instruction to our machine.<br>
Since such patterns can be quite complex and coming up with a solution yourselves can be taxing let's first go over some basics and easy examples to get you started.

In [14]:
string = 'Today is a wonderful day. Temperature lies at around 23°C or 75°F. For more info refer to www.weather.icss or contact us at weather@icss.icss.'

### Literal Characters

Literal characters are, well, just that... literal characters. There's not really much more to say about them. Of course, there are some applications where they come in handy but most of the time a regular expression gets the job done as well and might also be more useful. Still... they have their uses and one of the most relevant benefits of literal characters is definitely that they are EASY!

In [15]:
re.search('Today', string)

<re.Match object; span=(0, 5), match='Today'>

In [16]:
re.search('today', string)

Suppose we want to extract the temperature from our string. We want to do so in one go, meaning that we want to retrieve the temperature in °C as well as °F at the same time. Using literal characters we will face an impass since the two temperature measures differ from each other.

In [17]:
re.search('°', string)

<re.Match object; span=(55, 56), match='°'>

First of all we face the problem that [`re.search`](https://docs.python.org/3/library/re.html) is greedy. That means, that it will look for the first match, return it and stop. It will not notice other matches and ignore them but just stop after it has found the first match.<br >
So we need to use another method to achive our goal. Here let's try the [`re.findall`](https://docs.python.org/3/library/re.html) method.<br >
In case you would like to look up all methods available refer to the `re` library's [documentation](https://docs.python.org/3/library/re.html).

In [None]:
re.findall('°', string)

Ok, so now we get two '°'-character matches. The [`re.findall`](https://docs.python.org/3/library/re.html) method, unlike [`re.search`](https://docs.python.org/3/library/re.html), is not greedy and will return every match in our string. Other than [`re.search`](https://docs.python.org/3/library/re.html), though, it does not return the position inside the string of our match but just a list containing the different matches in string representation.<br >
Let's see if we can retreive both temperature measures this way.

In [18]:
re.findall('23°', string)

['23°']

In [None]:
re.findall('75°', string)

Still no luck. The problem is that we are using literal characters. We are literally searching for '23°' and '75°' in our string. Since these two strings are not identical, we only get one temperature measure per query as result. We need some way to leave literal characters behind and go one step more abstract. We want to tell the machine: 'Find every sub string in our string that refers to a temperature measurement'. For this we need regexes.

### Regular Expressions

Well, I hope you see why we might need something other than literal characters in order to extract the patterns we are looking for sometimes.<br>
Enter regular expressions!<br>

Regular expressions are different compared to literal characters in the way that they do not only specify exactly one specific case but rather a pattern, that matches any case that is related to this pattern. Depending on how we define our regular expression the cases that are matched by our pattern can be more/less abstract.<br>

Let's take a look at a very simply example:

In [None]:
re.findall('[0-9]°', string)

We are slowly getting there. Now we find two instead of one match in our string. The problem is, that we are telling the machine: 'find all mentions of a digit (characters from 0-9) followed by a "°"'. This returns only the last digit before the '°' and the '°' itself. Suppose we are dealing with a winter day and temperatures lie between -9° and +9° this approach would be sufficient. For all other days where the temperature reaches double digits it is not enough. So we need to further specify the type of string(s) we would like to match.<br>
Let's think about how temperatures are commonly represented as text. Most of the time we will find two digits followed by '°' and then either 'C' or 'F' depending on what kind of measurement is used. For the time being let's ignore negative and three digits temperatures.<br>
So we need to include instuctions that make sure that two digits followed by '°' are matched. This can be done using curly braces `{}`. We position the curly braces right after the block where we instruct `re` to match digits (characters from 0-9) `[0-9]`. Inside the curly braces we specify how many characters matching the pattern specified in the squared brackets we wish to match. In our case `{2}`.

In [None]:
re.findall('[0-9]{2}°', string)

Perfect! We now match both temperature measures and as a result of our [`re.findall`](https://docs.python.org/3/library/re.html) query we get a list of both measurements. Sadly we cannot tell which of the two measurements is is °C and which is °F. So we need specify that our temperature (two digits followed by '°') is followed by a letter. As with the specification of digits above, we can use square brackets to specifiy that we would like to match letters. This is done by using the following syntax `[a-z]`. That will match a single occurance of a letter from the alphabet.

In [None]:
re.findall('[0-9]{2}°[a-z]', string)

Hmmm, what could have gone wrong. We implemented the specification that '°' should be followd by a single letter correctly. The problem is that `re.findall` along with most other `re` methods is case sensitive. Instead of telling `re` to match 'C' and 'F' (both upper case letters) we told `re` to match any letter from a-z but in lower case. To remedy this we simply convert the content of the second square brackets to upper case.

In [None]:
re.findall('[0-9]{2}°[A-Z]', string)

Et voila... We have extracted both temperatures from our string and retained enough information to tell what kind of scale each of the two measurements is using.<br>
Since we of course are quite perfectionist, we do not only want our code to do what it is supposed to do but also want it to look as good as possible. In our case this means that we want our code to be as easily understandable as possible. The two specifications inside our square brackets for example (`[0-9]` and `[A-Z]`) can be coded in a more intuitive manner. This brings us to the topic of so called 'Wildcards'.

#### Wildcards

Wildcards are essentially characters that can be used in order to substitute some other regex patters. We can use `[\d]` for example to replace our `[0-9]` pattern, where `\d` stands for any digit character. There are many other wildcards and much more detailed explanations of what exactly they are so do not take this as a complete definition. In the following the most important wildcards are listed but in case you want to dive in deeper check [this](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet) out.

![Character Wildcards](sources/regex_character_wildcards.png)
![Occurance and Positional Wildcards](sources/regex_additional_wildcards.png)

If we now incorporate the information about wildcards form the above table into our initial `re` query, we can restructure it to look as follows.

In [None]:
re.findall(r'[\d]{2}°[\w]', string)

What we tell `re` to match here is pretty much the same as above. 'Match all sub strings where two digits are followed by "°" and a single alphanumeric (upper/lower case letters as well as digits) character'.<br>
Notice that here we do not use `\W` (an upper case letter) but stick with a lower case `w`. This is due to the fact that wildcards work a little different than literal patterns. `\W` for example does not match upper case letters but is the exact negation of `\w`. This means that it will match anything not matched by `\w`, which means that it will match any character but upper/lower case letters and digits.

Let's have a look at a more complex string in order to elaborate a bit further on wildcards. For this we define a new string.

In [4]:
difficult_string = '''The Flaming Mountains are said to be one of, if not, the hottest places on earth, with temperatures of up to +80°C in summer. 
The Eastern Antarctic Plateau on the other hand is one of the coldest places on earth, reaching up to -94°C at times. Still, this seems like peanuts when compared to absolute zero (-273.15°C), 
not to be confused with 0°C.
'''

We now face a more difficult task. Not only do we have measurements above and below zero but there is also one measurement containing decimal places. To correctly extract those temperatures we need to 'upgrade' our regex.

In [None]:
re.findall(r'[\d]{2}°[\w]', difficult_string)

We see that our previous regex is pretty useless. We loose information about whether a measurement is above or below 0°C, we also get a wrong match for absolute zero (-273.15°C) and '0°C' is not matched at all. Let's try to improve our use of wildcards. First we try to match temperatures no matter how many digits.

In [None]:
re.findall(r'[\d]+°[\w]', difficult_string)

Instead of specifying the number of digits we want to match using curly braces (`{2}`) we now use the `+` wildcard. This signifies that we want to match each occurance of one or more digit(s) followed by '°' and a letter. As you can see, this matches '0°C' as well as the other temperatures. Still, absolute zero is not matched correctly. The problem here is that absolute zero includes a decimal point ('.'). This is not matched by our regex so far.<br>
To match something like that we will need to introduce a conditional statement. We want to tell `re`: 'Match anything where a number (single/double/triple digit) is followed either by a decimal point, another number and then '°'... or  directly followed by '°'...

In [None]:
# short hands on



As you can see, the regex is getting quite complicated by now. Imagine what you have to do to extract phone numbers with different country codes or email addresses from different providers...<br>
Let's brake it down. The regex can be broken into two parts. First `[\d]+°[\w]` and second `[\d]+.[\d]+°[\w]` which are divided by `|`. The pipe `|` signals an or-statement. So we can translate the expression into: 'match either one ore more digits followed by "°"... or match one or more digits followed by a dot (decimal point), one or more digits, and "°"...'.<br>
It is confusing in the beginning, yes, but sadly there is no 'easy' way. You will have to struggle through some regexes to get the hang of them. In case you would like to do some quick testing try [regex101.com](https://regex101.com/).<br>
Now we still do not have the the sign for the respective temperatures. Let's see how you would deal with the problem. Try to modify the regex in such a way, that the sign (+/-) in front of the temperatures is also matched:

In [None]:
# Modify the regex from the code cell above


In [None]:
# Solution in the 'solved' notebook

Alright! I'm sure you did an awesome job and everything works perfectly!<br>
In case you are looking at the version of the notebook without solutions just refer to the one containing the model solutions if you need a refresher or tip.

Now that we have dealt with the `\w` and `\d` wildcards lets move on to another one: the `.`. The `.` is probably the most relaxed wildcard you will find. Relaxed in the sense of: it will match every character there is. So lets say you are looking for the sign of the temperatures in our example above. You might have modified the regex in such a way that it matches either a `+` or a `-`. This is of course perfectly valid. It is probably the most decisive solution since it will not match anything other than a `+` or a `-`.<br>
In some cases though it is not as easy as in our example above. Sometimes you do not know what character you want to match or you are just a bit lazy and don't care to specify each exact scenario. In that case the `.` wildcard really shines. Let's just look at an example:<br>

In [3]:
re.findall(r'.[\d]+°.|.[\d]+\.[\d]+°.', difficult_string)

NameError: name 'difficult_string' is not defined

There are multiple things happening here. First of all we include a `.` to match any sign that might precede our temperatures. Then we specify like above that we want to match 1 ore more digits followed by `°` and then a `.` again to match any kind of scale there might be. Who knows maybe there are some fancy scales in use somewhere like `°%` or whatever. In the second part we do the same thing again. First comes the sign, then one or more digits followed by a decimal point (note that something has changed here though... more on it in a sec), then we look for more digits followed by `°` and finally any kind of scale symbol.<br>
Now... the part looking for the decimal point has changed. Do you spot it?<br>
Instead of just using `.` which would match anything we use `\.`. This is a special mechanic called `escaping`. Let me show you an example before we go into detail what exactly escaping is and what it is used for.

In [None]:
toy_string = 'The scientific subject 13G8° was reported to withstand temperatures of up to 99.54°C.'

Using the `.` wildcard, we can create a pretty simple regex to match the temperature.

In [None]:
re.findall(r'[\d]+.[\d]+°.', toy_string)

The regex matches the temperature but it also matches the test subject `13G8°`. That is of course not what we want. We need a way to match only the literal character `.` but how do we do that if `.` is a wildcard and matches every character?<br>
Well, we can use escaping. The escape character `\` (backslash) basically does nothing else than to disable the wildcard function of a given wildcard. Instead of matching all characters `\.` will only match the literal character `.`. The same goes for `\+`, `\$`, `\^`, and all others. Putting the backslash in front of a given wildcard character will enable you to match the literal character. Note that `\d` or `\w` are different. Here the `d` and `w` are not wildcards themselves but are turned into such by the preceding `\`.<br>
So lets put that to a test:

In [None]:
re.findall(r'[\d]+\.[\d]+°.', toy_string)

Voilá... See that we only changed one character in the whole regex. Ok, we didn't change it but inserted it. The `\` in front of the `.` that is meant to match the decimal point. The result is a perfect match of our temperature without the nuisance of also matching the test subject 13G8°.<br>
Of course you will have noticed that the examples here are kind of toy examples. They are pretty simple and you might not come into situations where there is a convenient fancy named, totally real, test subject 13G8° coincidentally appearing in your text just to mess up your regex(es). Still, all the stuff being explained here will have its uses at some point, given you will ever need to work with regexes and that you will definitively have to do in case you plan to finish ICSS.

I have already hinted at the `^` and `$` wildcards above but we are yet to see what they can do. First a quick explanation. The `^` wildcard signifies that you are looking for the start of a string while the `$` is the exact opposite and looks for the end of a string.<br>
This is super helpful if you want to pluck some sub string out of a greater whole. Lets look at a string, in this case a 'normal' sentence and lets extract the first word and the whether the sentence is a question, an exclamation, or just a sentence.

In [4]:
sentence_string = 'Hey, how are you doing?'

re.findall(r'^[\w]+', sentence_string)

['Hey']

In [5]:
re.findall(r'.$', sentence_string)

['?']

We see that the sentence is a question ending on `?` and that the first word is `Hey`. That by itself is not very impressing but try to leave the `^` away for example and see how your output changes...

In [6]:
re.findall(r'[\w]+', sentence_string)

['Hey', 'how', 'are', 'you', 'doing']

Still not super impressing but we are building up to something a little more advanced. If you have been able to follow until here with ease it's going to get a bit more challenging and in case you were struggling until now try and keep focused. The next sub chapter is definitively important since it makes a lot of stuff possible. And don't be discouraged... everything is documented here and you can revisit it whenever you feel the need. 

### Some more Advanced Stuff

Now that you have learned the basics of regexes like wildcards and escaping, it is time to look into some more advanced stuff.

#### Lookarounds

Sometimes you want to match a sub string that is followed or preceded by something. In that case this and the next sub chapter are exactly what you need. First we will have a look at the so called lookahead.<br>

##### Lookahead

Generally speaking the syntax for this is **X(?=Y)**. You are looking for **X** which is ***followed*** by **Y**. So `.(?=$)`, using the example form above, would be the lookahead equivalent of finding the last character of your string.

In [7]:
re.findall(r'.(?=$)', sentence_string)

['?']

Now, for a more social sciency example:<br>
Have a look at the following email address string. Try to extract the 'name' of the sender. So, try to match the part of the whole email address that precedes the `@` character. For that task you could ignore the lookahead and use some other way of getting the result but try to achieve it with a lookahead.

In [11]:
email_string = 'nospam@forsure.to'

In [13]:
# Get the 'nospam' sub string preceding the '@' character using a lookahead
re.findall(r'[^@]+(?=@)',email_string)

['nospam']

In [None]:
# Solution in the 'solved' notebook

If you need a little help just have a look at the notebook containing the solutions. In case you have some questions about it let me know ;)

##### Lookbehind

Same as with the lookahead it is also possible to look for some sub string that immediately ***follows*** a character or pattern. This is called a lookbehind. The syntax for this follows the pattern of **(?<=Y)X**. Again, **X** denotes the sub string you are actually looking for and **Y** signifies the sub string that **X** ***follows***. So, like with the example of matching the last character of our sentence we can use the lookbehind to find the first word of our sentence with `(?<=^)[\w]+`. Translated into human this means: 'Give me one or more word characters directly followed by each other (so no whitespace) but only for those word characters that directly follow the start (`^`) of the string.

In [9]:
re.findall(r'(?<=^)[\w]+', sentence_string)

['Hey']

Again, I would like to ask you to try it out yourself. Try and match the domain name (the part after the `@` character but without the domain suffix `.to`). So, basically try to match `forsure` without anything else.

In [None]:
# Match 'forsure' using a lookbehind

In [None]:
# Solution in the 'solved' notebook

The cool part about lookarounds is that they also work if you have a list of email addresses.

In [None]:
emails_string = 'bucksinyoursleep@money.cc, girlfriendshatethistrick@omg.xx, yougothacked@WARNING.oo, bucksinyoursleep@money.cc'

re.findall(r'(?<=@)[\w]+', emails_string)

You see that the output gives you all domain names for the fake email addresses. You can also use the lookarounds to extract all the user names of the different senders. As you can see, one of them appears twice. Using [`numpy.unique()`](https://numpy.org/doc/stable/reference/generated/numpy.unique.html) we can remove all duplicate entries and get a list of all unique user names from our list of fake email addresses. That is just a little example of what you can to with it and I have had to do stuff like that multiple times. Extracting twitter handles of mentioned users, counting their appearances or just listing all unique users...

In [None]:
user_names = re.findall(r'[\w]+(?=@)', emails_string)
print(user_names)
print(np.unique(user_names))

You might have noticed that up until now we have used only single characters or the start of a string as Y in our lookarounds. That is, of course, fine and might get you through some of the troubles ahead but you can also use the lookarounds to match something preceded or followed by a whole expression.<br>
You might have a data set of some email addresses where you are especially interested in those addresses of a specific domain. Maybe you have some data set containing short instant messages sent by some users you can identify by their email address. Look at the following example:

In [None]:
instant_messages_string = 'joy@mistery.com: I love your style today!; stern@business.com: Numbers are plummiting!; mad@business.com: We need to act right now!; ...'

Here you have one message sent by the `mistery` domain and two messages that were sent by users of the `business` domain. Maybe we are researching communications between employees of the same company. They might be corresponding with each other using their work email addresses, which are hosted by the same domain (like for example with students of the University of Constance... we all have the iconic ...@uni-konstanz.de email). You can use lookaheads to extract the messages sent by those users:

In [None]:
re.findall(r'(?<=@business.com:)[a-zA-Z\s!?\.]*', instant_messages_string)

Lets have a closer look the code above. First we define the lookbehind. We want `[a-zA-Z\s!?\.]*` that follows `@business.com:`. Now what does the stuff after the lookbehind mean? Try to think about it for a moment and then we will take a look at the solution.

Well, `[a-zA-Z\s!?\.]*` actually means: match all characters in lower case and upper case ranging from `a-zA-Z`, additionally we would like to match `\s` which denotes white space characters, and finally we also want to match punctuation at the end of the sentence `!?\.`. See how we escaped the `.` at the end of the expression to match the literal `.` character. Last but not least we use the asterisk `*` to ensure that we match every number of the specified characters including 0.<br>
Can you think of some situations where this regex might fail or something that does not look very nice at the moment?

In [None]:
# Give me some input


Lets say we have some texts with punctuation, specifically commas inside our instant messages. The regex above will not match those. How might we solve that problem?<br>
You might have noticed that there are some unnecessary whitespace (whitespaces that do not separate words for example) at the start of our output strings (`'_Numbers are plummiting!'`). That does not look nice and it might cause some problems further down the line but there is a simple way of removing unnecessary whitspaces. Just use the [`str.strip()`](https://www.w3schools.com/python/ref_string_strip.asp) method on your string.

In [None]:
re.findall(r'(?<=@business.com:)[a-zA-Z\s!?\.]*', instant_messages_string)[0].strip()

In [None]:
# A bit simplified
messages = re.findall(r'(?<=@business.com:)[a-zA-Z\s!?\.]*', instant_messages_string) # assign the messages to an object

first_message = messages[0] # assign the first message to an object using list indices (0 for first element in list)

first_message.strip() # apply the str.strip() method to the first message of our instant_messages_string

In [None]:
# If you want to apply it to each message in the string
for i in messages: # loop over messages (for each i (=item) in missages)
    print(i.strip()) # print i(=item).strip()

#### Negative Lookarounds

Now, there is also something called negative lookarounds. They work a bit different than the normal lookarounds.

##### Negative Lookahead

We start with the so called negative lookahead. Basically, it lets you match something as long as it is ***not followed*** by some specific expression. The syntax goes like this: **X(?!Y)**. It means you want to match the sub string **X** as long as it is ***not followed*** by sub string **Y**. Using the email addresses from above again take a look at the following example:

In [None]:
emails_string = 'bucksinyoursleep@money.cc, girlfriendshatethistrick@omg.xx, yougothacked@WARNING.oo, bucksinyoursleep@money.cc'

re.findall(r'[\w]+@(?!money.cc)', emails_string)

We are effectively trying to match each user name (including `@`) that is not followed by `money.cc`. So we are extracting all user names of emails not belonging to the `money` domain. Think about the effect of moving the position of the `@` character in our regex into the **Y** part of our negative lookahead (`[\w]+(?!@money.cc)`).<br>
What might happen?

In [None]:
# What happens if we move the '@' symbol in our regex
#re.findall(r'[\w]+(?!@money.cc)', emails_string)

So you see that in this specific case it is a bit easier to leave the `@` character to appear in the output. Sometimes you might have to make a choice like that. What is easier? Coming up with a regex that does everything exactly the way you want it to or might it be faster to write a passable regex and remove unwanted sub strings (like the white spaces above or the `@` character in this example) afterwards.<br>
In this case for example we can get rid of the `@` character very easily using the [`str.replace()`](https://www.w3schools.com/python/ref_string_replace.asp) method:

In [None]:
user_names_unclean = re.findall(r'[\w]+@(?!money.cc)', emails_string) # assign the 'unclean' usernames to an object

for i in user_names_unclean: # loop over the user_names_unlcean object
    print(i.replace('@', '')) # use the str.replace() method to replace the '@' character with nothing ''

##### Negative Lookbehind

The negative lookbehind is the same thing to the lookbehind as the negative lookahead to the standard lookahead. We are basically trying to match everything that is ***not preceded*** by some specific sub string. The syntax looks as follows: **(?<!Y)X**. Here we want to match **X** as long as it is ***not preceded*** by **Y**. Lets say for example that we are interested in the suffixes (`.com, .de, ...`) of our email domains but we are a bit picky and do not want the suffix of a specific domain.

In [None]:
re.findall(r'(?<!money)\.[\w]{2,3}', emails_string)

You can see that `.cc` is not part of our result. That would be the suffix of our fictive `money` domain. Before you go on try to explain to yourself what exactly the regex in the code cell above does.

In [None]:
# What does the regex do?


We are matching every occurrence of a `.` immediately followed by either 2 or 3 lower case alphanumeric (`[\w]`) characters but we are only doing so as long as the suffix is not preceded by `money`. We are using 2 or 3 because there are some suffixes like `.de` that are only two characters long and others like `.com` span three characters.<br>
Think about what would happen if we moved the `\.` into the **Y** part of our negative lookbehind (`(?<!money\.)[\w]{2,3}`).

In [None]:
# What happens if we move '\.' into the negative lookbehind


Again, it is a bit easier to phrase the regex like above even though we receive output where each suffix is preceded by a `.`. Actually, in this case that might even be the desired outcome. I mean with the suffix it kind of makes sense to include the `.`.<br>
In case you don't want that you can just remove it in an additional step using [`str.replace()`](https://www.w3schools.com/python/ref_string_replace.asp). Or you could rework you regex of course... be my guest!

In [None]:
suffixes_unclean = re.findall(r'(?<!money)\.[\w]{2,3}', emails_string) # assign the 'unclean' suffixes to an object

for i in suffixes_unclean: # loop over the suffixes_unclean object
    print(i.replace('.', '')) # use the str.replace() method to replace the '.' character with nothing ''

### Work-Out Area

Now that you have made it through all the regex stuff above, lets put your skills to a test using a real world example you might even run into during your Social and Economic Data Science studies.<br>

Remember that you can use [regex101.com](https://regex101.com/) in case you wanna try out some regex stuff and get some feedback on why exactly the defined regex works like it does.<br>

Ok, take a look at the following data. You are looking at a sample of tweets posted during the CDC Whistelblower campaign in the USA.

In [None]:
with open('sources/tweets.txt', 'r', encoding="utf-8") as f:
    data = f.read()
    
f.close()

In [None]:
data

#### Tasks

- Create a list of user handles mentioned in the data
- Go on an create a list that contains the different urls appearing in the data
- Last of all lets have a look at the hashtags. Create a list containing those

As you might have noticed there appear a lot of `|` characters in the data. I added those to make it easier for you to tell the individual tweets apart from each other. They are not necessarily of any use in order to solve the tasks.

In [None]:
# List of user handles


In [None]:
# List of urls


In [None]:
# List of hashtags
