# I/O

## Topics
- Strings and characters
- text manipulation
- working with text files

## Strings and characters
A string is a sequence of one or more characters, usually enclosed in double quotes:
```julia
"this is a string"
```

In some cases it might also be useful to use an equivalent triple double quotes
```julia
"""this is also a string, but it supports double quotes "just" like this.""""
```

Characters are enclosed in single quotes, i.e., `'a'`. Be aware that this is not the same as string of length 1!

## Indexing strings
If you want to extract characters from strings, you index into it

In [14]:
s = "abcdefghijklmnopqrstuvwxyz"
s[1]

'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

You can also extract a substring using range indexing

In [15]:
s[1:3]

"abc"

Note how `s[1]` and `s[1:1]` do not give the same result: first gives a character, the other a (sub)string!

## String interpolation 
Often you want to use the results of Julia expressions inside strings. This can be done with the string interpolation machinery. Syntax is to use `$(x)` for the expression `x`. 

Note that for a simple variable it is enough to just use `$var` without the parentheses.

Inverse conversion, i.e., from other types into a string type, is easily done using the `string()` function.

In [2]:
x = 42
"The value of x is $x"

"The value of x is 42"

In [3]:
"The value of 2 + 2 is $(2+2)"

"The value of 2 + 2 is 4"

In [6]:
string(x)

"42"

## Splitting and joining strings
You can stick together (a process often called *concatenation*) using multiply (`*`) operator

In [5]:
"first part " * "second part!"

"first part second part!"

Note: you might be more familiar with using `+` sign that is used for concatenation in many other languages. 

Julia philosophy to select multiplication was, however, done because it makes *mathematically* more sense to express the "sticking together" as multiplication.

### Special addition for math nerds:
While `*` may seem like a surprising choice to users of languages that provide `+` for string concatenation, this use of `*` has precedent in mathematics, particularly in abstract algebra.

In mathematics, `+` usually denotes a commutative operation, where the order of the operands does not matter. An example of this is matrix addition, where `A + B == B + A` for any matrices `A` and `B` that have the same shape. In contrast, `*` typically denotes a noncommutative operation, where the order of the operands does matter. An example of this is matrix multiplication, where in general `A * B != B * A`. As with matrix multiplication, string concatenation is noncommutative: `greet * whom != whom * greet`. As such, `*` is a more natural choice for an infix string concatenation operator, consistent with common mathematical use.

## Specialized strings
Sometimes strings come with a specialization, marked by a one or more characters immediately followed by the opening double quotes:
- `r" "` indicates a regular expression
- `v" "` indicates a version string
- `raw" "` indicates a raw string literal (so no errors with `$`)

Furthermore, some packages extend this feature even further. For example, the [LaTeXStrings](https://github.com/stevengj/LaTeXStrings.jl) supported by many plotting libraries uses 
```julia
label = L"This is a LaTeX string $\sum_i x^2$"
```
that supports full arsenal of mathematical symbols using the LaTeX syntax.

## Advanced: `printf()`
If you are deeply attached to C-style `printf()` functionality, you'll be able to use a Julia macro for it (which are called by prefacing them with the `@` sign) by first issuing `using Printf`.

In [2]:
using Printf
@printf("pi = %0.20f", float(pi))

pi = 3.14159265358979311600

Or you can create another string using the `sprintf()` macro:

In [3]:
s = @sprintf("pi = %0.20f", float(pi))

"pi = 3.14159265358979311600"

## List of string manipulation functions
There are lots of functions for testing and changing strings:

- `length(str)` length of string
- `sizeof(str)` length/size
- `startswith(strA, strB)` does strA start with strB?
- `endswith(strA, strB)` does strA end with strB?
- `isascii(str)` is str ASCII?
- `all(isdigit, str)` is str 0-9?
- `all(ispunct, str)` does str consist of punctuation?
- `all(isspace, str)` is str whitespace characters?
- `all(isxdigit, str)` is str hexadecimal digits?
- `uppercase(str)` return a copy of str converted to uppercase
- `lowercase(str)` return a copy of str converted to lowercase
- `titlecase(str)` return copy of str with the first character of each word converted to uppercase
- `chop(str)` return a copy with the last character removed
- `chomp(str)` return a copy with the last character removed only if it's a newline


## Reading from files
The standard approach for getting information from a text file is using the `open()`, `read()`, and `close()` functions.

### Open
Let's see how they work in action. First we need to open the file. In Julia, a file can be opened in the following modes
- `"r"`: reading
- `"w"`: writing (destroys everything in file)
- `"a"`: appending (appends to the end of the file)

Let's open a file in a folder `data/` called `poem.txt`. The syntax is 
```julia
open(filename, mode)
```

In [4]:
f = open("../data/poem.txt", "r")

IOStream(<file ../data/poem.txt>)

### Read
Then, we can read the content with a specific function made for this, `readline()`. Let's also remember to close the stream once we are done.

In [5]:
lines = readlines(f)
close(f)

Now the content is inside a vector of strings called `lines`.

Alternatively, we could have used `eachline()` function that turns a source into an iterator. This allows to process a file a line at a time:

In [None]:
open("../data/poem.txt") do f
    for line in eachline(f)
        println(line)
    end
end

Note: I am sorry all literature enthusiastics, this was the only short poem having a word "julia" in it that I could find.

### Writing
Writing a file, is a similar process. Instead of printing into the screen, we can provide an open file handle to our printing function:

In [7]:
open("../data/example.txt", "w") do f
    println(f, "My very own line of text.")
end

## Writing and reading array to and from a file
In real life, this is typically not how stuff is done. There is a much easier way via `using DelimitedFiles`!

In many cases, we are reading data that is in some fixed array format. Most common is the CSV (Comma Separated Value) format. Julia provides us with a convenient set of functions to handle these kind of data formats with read and write delimited data, or `readdlm()` and `writedlm()`.

Syntax is 
```julia
readdlm(filename, delimiter)
```
and
```julia
writedlm(filename, array, delimiter)
```
If the `delimiter` (i.e., `','` for CSV) is omitted, a TAB is used by default.

In [8]:
arr = rand(3,5)

3×5 Array{Float64,2}:
 0.568929  0.575358  0.789489  0.699436  0.64208 
 0.990878  0.253297  0.653797  0.277166  0.680533
 0.288649  0.250189  0.9892    0.213586  0.740666

In [10]:
using DelimitedFiles
writedlm("../data/my_arr.txt", arr, ',')

In [11]:
arr2 = readdlm("../data/my_arr.txt", ',')

3×5 Array{Float64,2}:
 0.568929  0.575358  0.789489  0.699436  0.64208 
 0.990878  0.253297  0.653797  0.277166  0.680533
 0.288649  0.250189  0.9892    0.213586  0.740666

## Advanced: Regular expressions
Julia has Perl-compatible regular expressions (regexes), as provided by the [PCRE](http://www.pcre.org/) library. Regular expressions are used to find regular patterns in strings. 

In [12]:
r = r"^\s*(?:#|$)"

r"^\s*(?:#|$)"

In [13]:
typeof(r)

Regex

To check if a regex matches a string, use `match()`. Note that it returns nothing if nothing is found.

In [16]:
match(r"^\s*(?:#|$)", "not a comment")

In [17]:
match(r"^\s*(?:#|$)", "# a comment")

RegexMatch("#")

For more, see the docs about [Regular Expressions](https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions-1)