# Working with Files


Often, bioinformatic pipelines imply to manipulate text files. Here, we are
going to parse a very simple FASTA file just as an example.

There is a FASTA file in the data folder of this repo:

In [None]:
using  JuliaForBioinformatics
repo_path = pathof(JuliaForBioinformatics)

You can use `joinpath` and `abspath` to construct a path that works in all
the operative systems:

In [None]:
data_path = abspath(repo_path, "..", "..", "data")

In [None]:
fasta_file = joinpath(data_path, "O43521.fasta")

You can use `open` with the
[`do` syntax](https://docs.julialang.org/en/v1/manual/functions/#Do-Block-Syntax-for-Function-Arguments-1)
to read or write a file in Julia:

In [None]:
open(fasta_file, "r") do file
    for line in eachline(file)
        println(line)
    end
end

#### Exercise 1

Write a function to read the FASTA file into a dictionary from the
sequence/isoform UniProt name, i.e. the one between `|`, to the sequence.

**Hint!** You can use the following functions:

In [None]:
split("1 2 3", ' ')

In [None]:
startswith("Hello world!", 'H')

In [None]:
strip("  Hello world!  ")

and [string concatenation](https://docs.julialang.org/en/v1/manual/strings/#man-concatenation-1):

In [None]:
"Hello " * "world!"

In [None]:
# function read_fasta(...)
#     ...
# end

## Regex

You can also use [regular expressions](https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions-1).
They are very useful to parse text files.

In [None]:
line = ">sp|O43521|B2L11_HUMAN Bcl-2-like protein 11 OS=Homo sapiens OX=9606 GN=BCL2L11 PE=1 SV=1"

In [None]:
regex = r"^>\w+\|(\w+)\|"

In [None]:
m = match(regex, line)

In [None]:
if m !== nothing
    println(m[1])
end

## String interpolation

You can [interpolate values](https://docs.julialang.org/en/v1/manual/strings/#string-interpolation-1),
single variables or the result of more complex expressions, into
strings using `$`:

In [None]:
a = 1
b = 2
"$a + $b is $(a + b)"

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*