## Intro to Regex
There's way more you can do with it than just this, but this should serve as an intro
Regex is all about matching patterns quickly and easily. We can use it in many of the string methods listed above.


In [2]:
string2 = "Here's some numbers: 483 4318 1593 053 4942 120 503 294. That sure is a lot of numbers"


"Here's some numbers: 483 4318 1593 053 4942 120 503 294. That sure is a lot of numbers"

In [3]:
# We can specify a regex by using the 'r' character before a string
# Brackets in a regex search for any of the characters in the brackets
regex = r"[eyz]"

println("type = $(typeof(regex))")
println("first instance of e, y, or z in string is $(findfirst(regex, string2))")


type = Regex
first instance of e, y, or z in string is 2:2


In [4]:
# Specify a range of characters by using [a-zA-Z]. You can adjust the range as you wish, like [a-e]
println(findfirst(r"[a-zA-Z]", string2))
    
# this prints the first occurence of any lowercase character followed by 'm'
println(findfirst(r"[a-z][m]", string2))


1:1
9:10


In [5]:
# It works with numbers too! This gets '4' followed by any two other numbers. Notice how it skips over the "8" in "483"
string2 = "Here's some numbers: 483 4318 1593 0853 49442 120 503 41294 544444. That sure is a lot of numbers"

result = findfirst(r"8[0-9][0-9]", string2)
println(findfirst(r"8[0-9][0-9]", string2))
println(string2[result])

37:39
853


In [10]:
# The "." character acts as a stand-in for any character, which means if you want to search for periods, you have to escape it
println(findall(r"..e.s", string2))
    
# How to escape the character 
range = findfirst(r"\.", string2)
println(findfirst(r"\.", string2))
println("match found: " * string2[range])


UnitRange{Int64}[2:6, 15:19, 93:97]
67:67
match found: .


In [11]:
println(string2)

# curly braces {} can be used to specify the number of matches you want

# example, this grabs all numbers of length 5 with '4' as a first digit. 
results = findall(r"4[0-9]{4}", string2)
for range in results
    println(string2[range])
end

Here's some numbers: 483 4318 1593 0853 49442 120 503 41294 544444. That sure is a lot of numbers
49442
41294
44444


In [8]:
# Uh oh! that picked up the set of '4's that was part of another number
# We can use the "not" (^) operator to exclude results that have numbers before and after

results = findall(r"[^0-9]4[0-9]{4}[^0-9]", string2)
for range in results
    println(string2[range])
end

 49442 
 41294 


In [12]:
We should explain special tokens such as \s (https://www.pcre.org/current/doc/html/pcre2syntax.html#SEC4)
We should explain the Kleene */+, and the end of string marker https://www.pcre.org/current/doc/html/pcre2syntax.html#SEC9


LoadError: syntax: extra token "should" after end of expression

In [6]:
smolString = "\n a"

# \s is a special character that denotes the whitespace character [as well as sarcasm]
# why use it instead of an actual space?
# Here's a list of the things it actually looks for:

#=
A space character
A tab character
A carriage return character
A new line character
A vertical tab character
A form feed character
=#


println(findfirst(r"\s", smolString))


1:1


In [27]:
string2 = "Here's some numbers: 5 4445 483 4318 1593 053 4942 120 503 294. That sure is a lot of numbers"

# "*" can be used to find 0 or more occurences of something
# "+" can be used to find 1 or more 

range1 = findfirst(r"[4-6]*5", string2)
println("$range1, \"$(string2[range1])\"")
    
range2 = findfirst(r"[4-6]+5", string2)
println("$range2, \"$(string2[range2])\"")



22:22, "5"
24:27, "4445"


In [43]:
another_string = "ok ok ok"

#the $ character is used for end of string
range = findfirst(r"ok", another_string)
println(range)

range = findfirst(r"ok$", another_string)
println(range)


1:2
7:8


## That's all! 
Want to learn more? Here's a link to more regex for those interested in becoming regex wizards. This tutorial should be plenty for now though. 

Lastly, regex isn't julia-specific. Other languages like C#, Java, and Python support their use as well. 

https://www.pcre.org/current/doc/html/pcre2syntax.html#SEC9