# Strings in Julia


- Each byte of a string is of type character (not string) and cannot be directly compared to a string.

In [13]:
phrase = "This is a string object"

"This is a string object"

In [44]:
typeof(phrase)

String

In [47]:
typeof('a')

Char

In [18]:
length(phrase)

23

You can get a particular character of a string using an index

In [20]:
phrase[1]

'T'

A substring can be selected using a UnitRange or an array of integers

In [33]:
typeof(1:5)

UnitRange{Int64}

In [34]:
phrase[1:5]

"This "

In [35]:
phrase[[1,2,3,4,5]]

"This "

### Splitting a String

split(a,b)

- a: String to be split
- b: Char/String tp be used as separator (it can be an empty String or emtpy space) 

In [48]:
split(phrase,"s")

4-element Array{SubString{String},1}:
 "Thi"         
 " i"          
 " a "         
 "tring object"

In [49]:
split("lxmls_monitor_team","_")

3-element Array{SubString{String},1}:
 "lxmls"  
 "monitor"
 "team"   

In [52]:
split("The house is big"," ")

4-element Array{SubString{String},1}:
 "The"  
 "house"
 "is"   
 "big"  

### Joining two Strings

join(a,b)


In [57]:
join(["Make a", "single sentence"], " ")

"Make a single sentence"

In [58]:
join(["Make a", "single sentence"], "==")

"Make a==single sentence"

## Regular expressions

#### Regex functions

r"regexp" where regexp is some regular expression

#### ismatch and match functions

- **ismatch(regex, string)** 
    - checks if the string verifies the regular expression.
    

- **match(regex, String, ind) **
    - checks if the string  verifies the regular expression, returns a **RegexMatch** object containing the substring verifying the regular expression starting at the given starting point **ind**.
    
    - Once match has been performed the substring verifying the match can be accessed using **.match**.
    


- **matchall(regex, String) **
    - Returns an array containing all places where the string  verifies the regular expression.
    
    

- **eachmatch(regex, String) **
    - Returns an 
    
    

Let us consider a regular expression that 

In [46]:
reg_exp = r".*dog"
reg_exp2 = r"\w+ dog"

r"\w+ dog"

In [47]:
typeof(reg_exp)

Regex

In [48]:
phrase = "The dog went to the park, the other dog went stayed home."

"The dog went to the park, the other dog went stayed home."

Let us begin with the basic **match(regex, String) **

In [49]:
ismatch(reg_exp, phrase)

true

If we want to know the first part of the phrase that verifies that regular expression we can use match:

In [61]:
match(reg_exp2, phrase, 1)

RegexMatch("The dog")

In [62]:
match(reg_exp, phrase, 1)

RegexMatch("The dog went to the park, the other dog")

In [63]:
aux = match(reg_exp, phrase, 1)

RegexMatch("The dog went to the park, the other dog")

In [64]:
typeof(aux)

RegexMatch

In [65]:
aux.match

"The dog went to the park, the other dog"

In [66]:
aux.regex

r".*dog"

Let us try  **matchall(regex, String) **

In [67]:
matchall(reg_exp, phrase)

1-element Array{SubString{String},1}:
 "The dog went to the park, the other dog"

In [76]:
matchall(reg_exp2, phrase)

2-element Array{SubString{String},1}:
 "The dog"  
 "other dog"

The regular expression **```reg_exp2 ```** will look for all places where the word ```dog``` is found and then select its previous word (no matter how many letters it has) as weel ass the word dog.

Sometimes we will want to loop over the different matches for a given String and a regular expression, then the **eachmatch(regex, String) ** function can be handy.
  

In [79]:
typeof(eachmatch(reg_exp2, phrase))

Base.RegexMatchIterator

In [81]:
eachmatch(reg_exp2, phrase)

Base.RegexMatchIterator(r"\w+ dog","The dog went to the park, the other dog went stayed home.",false)

In [95]:
for m in eachmatch(reg_exp2, phrase)
    print("match: ", m.match, " , begin pos: ",m.offset, "\n")
end

match: The dog , begin pos: 1
match: other dog , begin pos: 31
