# Chapter-7 Strings
This notebook contains the sample source code explained in the book *Hands-On Julia Programming, Sambit Kumar Dash, 2021, bpb Publications. All Rights Reserved*.

In [1]:
using Pkg
pkg"activate ."
pkg"instantiate"

[32m[1m  Activating[22m[39m environment at `C:\Bahiyaa\Julia-Assignment\Chapter 07\Project.toml`


## 7.1 Introduction

Strings can be considered as a collection of characters. For a detailed understanding please refer to the book chapter. 

## 7.2 String

Simple example of strings presented with various initialization literal definitions. 

In [2]:
string = "This is a string"

"This is a string"

In [3]:
string = """ 
        This is a preformatted 
        "string" """

" \nThis is a preformatted \n\"string\" "

In [4]:
b = "Bahi"
c = "zoe"
d = "200"

string = "$b owes $c $d dollars"

"Bahi owes zoe 200 dollars"

In [5]:
string = "This is a \"quoted\\  ' string"

"This is a \"quoted\\  ' string"

## 7.3 String Methods

Strings are immutable. They cannot be manupulated. String methods combine or work on various strings and return either an attribute of a string or provide a derivative of an original string. 

### Comparisons

In [6]:
e1 = "bcd"
e2 = "efg"
e1 < e2

true

In [7]:
e2 > e1

true

In [8]:
e1 = "bcd"
e2 = "efg"
e1 == e2

false

In [9]:
e1 === e2

false

### Iteration

Strings can be iterated as character collections. But, valid indices are only at the character boundaries. 

In [10]:
e = "Bahiyaa"
for b in e
    println(b)
end

B
a
h
i
y
a
a


In [11]:
e[1], e[2], e[3], e[4], e[5] 

('B', 'a', 'h', 'i', 'y')

In [12]:
e[begin], e[begin+2], e[end-1], e[end]

('B', 'h', 'a', 'a')

In [13]:
e = "\u2200 a \u2203 b"

"∀ a ∃ b"

In [14]:
length(e)

7

In [15]:
sizeof(e)

11

In [16]:
e[1]

'∀': Unicode U+2200 (category Sm: Symbol, math)

In [17]:
e[4]

' ': ASCII/Unicode U+0020 (category Zs: Separator, space)

In [18]:
for b in e
    println(b)
end

∀
 
a
 
∃
 
b


In [19]:
j, l = firstindex(b), lastindex(b)
while j <= l
    println(b[j])
    i = nextind(b, j)
    break
end

B


### Split and Concatenate

Both sets of operations return a newly defined string. The old string is not modified. 

In [85]:
string = "This is a String"
string[1:7]

"This is"

In [86]:
string[1:7]*string[end-6:end]

"This is String"

In [87]:
repeat("B:-", 6)

"B:-B:-B:-B:-B:-B:-"

In [88]:
"B:="^5

"B:=B:=B:=B:=B:="

In [89]:
join(["1", "2", "3", "4", "5"])

"12345"

In [90]:
join(["Bahiyaa", "zoe", "ram", "harry"], ", ", " and ")

"Bahiyaa, zoe, ram and harry"

In [91]:
string = "This is a\nString\n"
chomp(string)

"This is a\nString"

In [92]:
chop("january")

"januar"

In [93]:
chop("january", head=2, tail=3)

"nu"

In [94]:
e = "\u2200 x \u2203 y"
ee = split(e)

4-element Vector{SubString{String}}:
 "∀"
 "x"
 "∃"
 "y"

In [30]:
e = "\u2200,x,\u2203,y"
ee = split(e, ',', limit=2)

2-element Vector{SubString{String}}:
 "∀"
 "x,∃,y"

In [95]:
e = "\u2200,x,\u2203,y"
ee = rsplit(e, ',', limit=3)

3-element Vector{SubString{String}}:
 "∀,x"
 "∃"
 "y"

In [96]:
lpad("string", 20, "q")

"qqqqqqqqqqqqqqstring"

In [97]:
rpad("string", 10, "e")

"stringeeee"

In [98]:
strip("     string 789  ")

"string 789"

In [99]:
strip(" {b}     string 789  ", ['{', 'b', '}', ' '])

"string 789"

In [100]:
strip("     string 789  bbb") do y
    return y == ' ' || y == 'b'
end

"string 789"

### Case Conversion

In [101]:
uppercase("bahiyaa")

"BAHIYAA"

In [102]:
lowercase("bahiyaa")

"bahiyaa"

In [103]:
titlecase("hands on programming in julia")

"Hands On Programming In Julia"

In [104]:
uppercasefirst("bahiyaa")

"Bahiyaa"

In [105]:
lowercasefirst("Bahiyaa")

"bahiyaa"

### Match and Replace

In [106]:
string = "Introduction to Julia"
startswith(string, "Intro")

true

In [107]:
endswith(string, "Julia")

true

In [108]:
contains(string, "to")

true

In [109]:
occursin("to", string)

true

In [110]:
u = findfirst("o", "Introduction to Julia")
while u !== nothing 
    println(u)
    u = findnext("o", "Introduction to Julia", u.stop+1)
end

5:5
11:11
15:15


In [111]:
findlast("o", "Introduction to Julia")

15:15

In [112]:
replace("Introduction to Julia", "o"=>"b")

"Intrbductibn tb Julia"

#### Regular Expressions

Regular expressions are part of text pattern matching languages. Readers are suggested to refer to a text on the specific topic for a detailed understanding of them. 

In [123]:
rx = Regex("b.b")

r"b.b"

In [133]:
n = match(rx, "abracadabra")

RegexMatch("aca", key="c")

In [134]:
n.match

"aca"

In [145]:
n = match(rx, "abracadabra", 4)

RegexMatch("aca", key="c")

In [149]:
rx = Regex("a(.)a")
n = match(rx, "abracadabra")
n.captures

1-element Vector{Union{Nothing, SubString{String}}}:
 "c"

In [150]:
rx = Regex("a(?<key>.)a")
n = match(rx, "abracadabra")
n.captures

1-element Vector{Union{Nothing, SubString{String}}}:
 "c"

In [151]:
n["key"]

"c"

In [152]:
rx = r"a.a"
n = eachmatch(rx, "abracadabra", overlap=true)

Base.RegexMatchIterator(r"a.a", "abracadabra", true)

In [153]:
collect(n)

2-element Vector{RegexMatch}:
 RegexMatch("aca")
 RegexMatch("ada")

In [154]:
n = eachmatch(rx, "abracadabra", overlap=false)

Base.RegexMatchIterator(r"a.a", "abracadabra", false)

In [155]:
collect(n)

1-element Vector{RegexMatch}:
 RegexMatch("aca")

## 7.4 Encodings

`String` objects are internally stored in the UTF-8 encoding. However, they can be translated to or from other Unicode transformations like UTF-16 or UTF-32. 

In [156]:
v = "\u2200 a \u2203 b"

"∀ a ∃ b"

In [157]:
transcode(UInt16, v)

7-element Vector{UInt16}:
 0x2200
 0x0020
 0x0061
 0x0020
 0x2203
 0x0020
 0x0062

In [158]:
transcode(UInt8, v)

11-element Base.CodeUnits{UInt8, String}:
 0xe2
 0x88
 0x80
 0x20
 0x61
 0x20
 0xe2
 0x88
 0x83
 0x20
 0x62

In [159]:
transcode(UInt32, v)

7-element Vector{UInt32}:
 0x00002200
 0x00000020
 0x00000061
 0x00000020
 0x00002203
 0x00000020
 0x00000062

In [160]:
transcode(String, transcode(UInt16, v))

"∀ a ∃ b"

### Some Useful Functions

In [161]:
isascii("∀ x ∃ y"), isascii("abcd ef")

(false, true)

In [162]:
iscntrl('a'), iscntrl('\x1')

(false, true)

In [164]:
isdigit('a'), isdigit('7')

(false, true)

In [165]:
isxdigit('a'), isxdigit('x')

(true, false)

In [166]:
isletter('1'), isletter('a')

(false, true)

In [167]:
isnumeric('1'), isnumeric('௰') #No 10 in Tamil (Indian) Language

(true, true)

In [168]:
isuppercase('A'), islowercase('a')

(true, true)

In [169]:
isspace('\n'), isspace('\r'), isspace(' '), isspace('\x20')

(true, true, true, true)

## 7.5 Character Arrays

If you need to manipulate character by character, then it may be best to transform a `String` into an `Vector{Char}`. 

In [170]:
collect("∀ x ∃ y")

7-element Vector{Char}:
 '∀': Unicode U+2200 (category Sm: Symbol, math)
 ' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
 'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
 ' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
 '∃': Unicode U+2203 (category Sm: Symbol, math)
 ' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
 'y': ASCII/Unicode U+0079 (category Ll: Letter, lowercase)

## 7.6 Custom Strings

If Unicode based `String` type does not meet all your needs, you may have to implement your own string type deriving it from `AbstractString`. If the character code you are planning to use does not map to a UTF-8 `Char` you can create your own character type derived from `AbstractChar`. `LegacyStrings.jl` package in Julia has some sample implementations of such string types for reference. 

In [171]:
eltype("abcd")

Char

The subsequent command may take many minutes to complete if your environment has never been updated. 

In [172]:
]add LegacyStrings

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Bahiyaa\Julia-Assignment\Chapter 07\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Bahiyaa\Julia-Assignment\Chapter 07\Manifest.toml`


In [173]:
using LegacyStrings

In [174]:
v = ASCIIString("abcd")

"abcd"

In [175]:
ncodeunits(v)

4

In [176]:
codeunit(v)

UInt8

In [177]:
v16 = UTF16String(transcode(UInt16, "abcd\0"))

"abcd"

In [178]:
codeunit(v16)

UInt16

In [179]:
typeof(v16)

UTF16String

In [180]:
ncodeunits(v16)

4

Both `UTF16String` and `ASCIIString` will behave like collections of `Char` while internally they will store the data in 16-bit and 8-bit formats respectively. Hence,  it's not necessary every string class derived from `AbstractString` needs to implement an `AbstractChar`.

In [182]:
eltype(v), eltype(v16)

(Char, Char)