<p style='text-align: center'><a href=https://www.biozentrum.uni-wuerzburg.de/cctb/research/supramolecular-and-cellular-simulations/>Supramolecular and Cellular Simulations</a> (Prof. Fischer)<br>Center for Computational and Theoretical Biology - CCTB<br>Faculty of Biology, University of Würzburg</p>

<p style='text-align: center'><br><br>We are looking forward to your comments and suggestions. Please send them to: <br><br></p>
    
 <p style='text-align: center'>   <a href=andreas.kuhn@uni.wuerzburg.de>andreas.kuhn@uni.wuerzburg.de</a> or <a href=sabine.fischer@uni.wuerzburg.de>sabine.fischer@uni.wuerzburg.de</a></p>

<h1><p style='text-align: center'> Introduction to Julia </p></h1>


## File management

This notebook gives you a short introduction how to  manage paths, create/delete folders and to import/export data with Julia.   

### 1. Paths or where are we? 

A path is the address of a file or folder on a harddrive of your device. Paths can either be given in absolute terms or relative towards other paths. The function `pwd()` returns the absolute path of this notebook/the current Julia session. 

In [None]:
path_notebook = pwd()

Relative paths are always given in relation to the absolute path of the current notebook/Julia session (the return value of `pwd()`). It is possible to change this path, but we strongly advice you not to and always keep `pwd()` at its default value. 

If you want to get the contents of a folder, you can use the  `readdir(some_path)` function with a specific path as argument or without an argument `readdir()`. In the second case the absolute path of the current Julia session is used. 

  The  `readdir()` function returns the relative paths of all the folders and files of the given path as a Vector of `Strings`. 

In [None]:
readdir(path_notebook)  # absolute path

As `path_notebook` was set to the path of this notebook, `readdir()` without an argument does exactly the same.  

In [None]:
readdir()              # relative path    

As you can see, an absolute path is quite long, annoying to deal with and depends on your machine and operation system.  Therefore, if possible you should always work with relative paths, as they are smaller and easily portable.

If you want to create a folder, you can use the `mkpath()` function which creates folder(s) at the given path. 

In [None]:
mkpath("test_folder1")
mkpath("test_folder2/sub_folder/data")

In [None]:
readdir()

The return value of `readdir()` has changed because you have created the folders `test_folder1` and `test_folder2` inside your notebook folder. To see what's inside `test_folder` you can use its relative path as argument for `readdir()`.

In [None]:
readdir("test_folder1")

`test_folder1` is emtpy. 

In [None]:
readdir("test_folder2/")

In [None]:
readdir("test_folder2/sub_folder/")

`test_folder2` contains the folder `subfolder` which itself contains the folder `data`.

#### Note: The slash or backslash question
If you are using this notebook on Windows, `pwd()` has probably returned paths similar to this one: `"C:\\Users\\hanswurst\\sfw_stuff\\Julia_course\\Part_7_File_Management"` with `\\` instead of `/` . Therefore, you could use double backslashes instead of slashes in your code as well. On Windows this would work perfectly fine, but as soon as you use the same notebook on Linux or Mac, everything would break. Therefore, we strongly advice you to always use slashes `/`  as they work everywhere. 



If you want to delete a file or an empty folder, you can use the `rm()` function. 

In [None]:
rm("test_folder1")

If you want to delete a folder that is not empty you get an error message. 

In [None]:
rm("test_folder2")

To remove a folder together with all files/subfolders inside, you can add the keyword argument `recursive = true`.

Note: You should only use this if you exactly know what you are doing and where you are. Because, if you are in the wrong folder it can happen very easily that you delete half your harddrive in an instant.  



In [None]:
rm("test_folder2",recursive = true)

![title](oh_shit.webp)

In [None]:
readdir()

Note: There are a lot more [functions regarding the filesystem in Julia](https://docs.julialang.org/en/v1/base/file/), which provide a functionality almost as powerful as the Linux command line.

### 2. Import/Export data

When exporting/importing data, there are essentially two questions to ask: 

#### Should the data be human readable  ? 

#### Will the data be used outside of Julia with other languages/programs ? 


###  2.1 Serialization 

If the answer to both questions is `no`, Julia has a very useful package for this purpose in its standard libary called `Serialization`.

In [None]:
using Serialization

This package provides two functions `serialize()` and `deserialize()`. 

`serialize()` expects two arguments. The first argument is the name of the file (its future path) that you want to create, and the second argument is the Julia object you want to save. You can save every possible Julia object this way. It does not matter if it is a `String`, an `array`, a `dictionary` or arbitarily more complicated nested objects like `array of arrays, figures ...`. `serialize()` creates a Julia binary file at the given path. It is convention to use the ending `.jls` for such files.   


In [None]:
matrix1 = rand(100,100)

In [None]:
serialize("matrix1.jls",matrix1)  
readdir()

To import the created `jls` file, you can use `deserialize()`. This function takes only one argument, the path of the file you want to import and returns the restored Julia object. 

In [None]:
matrix1_restored = deserialize("matrix1.jls")

`.jls` files can only be read by Julia itself and cannot be opened by programs like text editor or Excel. But this comes with the advantage of superior speed.  `Serialization` is around 10-1000 times faster than any other method to import/export data.  Therefore, if you work on your data in Julia exclusively and want to share it only with people that have/know Julia, `Serialization` is the way to go.

##### Comment: 
 Serialization is mainly used in the internals of Julia to exchange objects between different processes/ Julia sessions, ... . Therefore, it is optimized for maximum performance but not necessarily for backwards compatibility. The developers have promised that it will not break between 1.xx versions of Julia, but if at some point a new x.0 version of Julia comes out, there might be issues importing old stored data into the new Julia version.
But as the past has shown, even packages, that are focused on long term storage, are not safe from [compatibilty issues](https://discourse.julialang.org/t/jld-jl-vs-jld2-jl/15287). Therefore, we believe that `Serialization` is currently the best and for sure fastest option to save huge junks of data in Julia.

###  2.2 Human readable data

Human-readable implies that the exported data can be opened using various programs like text editors, Excel, and more, and can be easily understood by humans. The most common method to achieve this involves converting data into `Strings` and then storing them in a .txt file. Different methods and standards exist for this purpose (`JSON`, `JSON3`, `BSON`, ...), each with its own applications and limitations. While it's generally advisable to avoid such formats due to slower speeds and increased error susceptibility compared to serialization, there are situations where you may need to import data from external sources in textform. If your data conforms to one of the mentioned formats, you can call yourself lucky, install the respective package and import the file with the corresponding import function. However, this isn't always the case.


Hence, this course will introduce you to a more general approach in Julia for importing and exporting data in `.txt` files. While the primary focus is on data import, for the sake of completeness, you will also learn how to export data.

#### 2.2.1 Export Data

To transfrom any variable into a `String`, you can use a function that you already know:  `print()`

But this time, you have to combine it with another function `open()`, which opens/creates files. 

The `print()` function has an optional argument at the beginning, which specifies the target. The default value is the standard output/command line, which you have been using so far without noticing.  But in this case, you want to change it to said file. 

The `open()` function takes two arguments the first is the path of the file and the second is either `w` for write, `r` for read or `a` for append. 

In [None]:
print(open("matrix3.txt","w"),matrix1)

You can also use another syntax construct that is often used when providing functions as arguments to functions. The `do` construct: 

``` julia
function1(argument1,...) do name
    
    function2(name,argument2,...)
    
end
    
```

which is essentially the same as 

``` julia
function2(function1(argument1,...), argument2,...)
    
  
```

In [None]:
open("matrix4.txt","w") do file
    print(file,matrix1)
end

You can `print` different datatypes with this method into a file: 

In [None]:
rockets_dic = Dict("SpaceX" =>"Starship", "NASA" =>"SLS","ULA" => "AtlasV")
number_vec = collect(1:50)
hansi = "hansi geht in die hütten" 

open("hansi.txt","w") do file
    print(file,hansi)
end

open("rockets.txt","w") do file
    print(file,rockets_dic)
end

open("numbers.txt","w") do file
    print(file,number_vec)
end

Now five new `.txt` files should have been created inside the folder of this notebook. You can use `Notepad`, `JupyterLab` or many other programms to inspect their contents. 

If you want to import the printed data back into Julia, you can use the `read()` function together with `open()`.

In [None]:
open("hansi.txt","r") do file
    read(file,String)
end 

But the more convenient way to do that is to use the function `readline()`, which  takes the path to a file as argument. 

In [None]:
hansi_restored = readline("hansi.txt")

`readline()` always returns a `String`. If the original Julia object was a `String` , everything works fine, but you will run into problems if the restored data was not a string originally.

In [None]:
number_vec_restored_string = readline("numbers.txt")

This oject which was an `Int64[]` array before,  is now a `String`. How can you transform it back to an array of type`Int64[]` ? One possible solution to this problem is the `parse()` function. 


In [None]:
parse(Int64,number_vec_restored_string[2])

In [None]:
parse(Int64,number_vec_restored_string[29:30])

In [None]:
parse(Int64,number_vec_restored_string)

The problem is, that `parse()` only works for the parts of the string that soley consist of numbers. Therefore, parsing single numeric types like `Int64` or `Float64` can be done easily, but not composite data types like `array` or e.g. `dicts`, which also include other characters like `,` , `[` , ... in their respetive `String` representations. 

Now the "fun" part starts and you have to transform the string to something that is "parseable".

Here is one possible way to make the string parseable: 

1. Cut off the brackets at the start and end 

2. Transform the `String` into a array of `Strings` with the `split()` function. Whereas the `,` is the symbol that indicates a "spliting point" for the `split()` function

3. Apply the broadcasted `parse.()` version of the `parse() `function with the `.` operator to every entry of the array of `Strings`    

In [None]:
numb_vec_restored1 = parse.(Int64, split(number_vec_restored_string[2:end-1],","))
println(numb_vec_restored1)

Overall this procedure is lengthy, annoying and needs to be changed for every datatype. For example, it wouldn't be possible to parse a 2D array in this way.  

This arises the obvious question: Is there a better way to do this ? 

`Julia` offers a powerfull alternative parsing technique, that is part of one of the language core features called [metaprogramming](https://docs.julialang.org/en/v1/manual/metaprogramming/). It essentially means that "Julia represents its own code as a data structure of the language itself". Or, in more understable terms, Julia itself can read and understand Julia source code. Surprisingly, this is a quite rare feature among programming languages, as most of the time the "read and understand source code" part is handled by a separate unity called the compiler/interpreter and not by the language itself. This opens the door to many very advanced programming patterns like live creation & execution of Julia source code during runtime,macros, ...

You will not use such advanced features here, except for one small part: Julia can evaluate every `String` not only as a `String` but also as source code.  

The `Meta.parse()` function transform a `String` into a `Julia` expression which can be evaluated with `eval()`. 

In [None]:
test_expression = "z = 100"
eval(Meta.parse(test_expression))
z

In [None]:
z == 100

This is an obviously very complicated way to create a variable `z` with content `100`. But, this feature also offers a nice alternative to parse a `String`, if the `String` represents correct Julia source code. In this case the `String` `number_vec_restored_string` is correct source code for the definition of an array. Therefore it can be evaluated: 

In [None]:
number_vec_restored2 = eval(Meta.parse(number_vec_restored_string))
number_vec_restored2

With this syntax you can import everything you have printed to a file with `print`. 

In [None]:
rocket_string = readline("rockets.txt")

In [None]:
eval(Meta.parse(rocket_string))

In [None]:
matrix_string = readline("matrix4.txt")

In [None]:
eval(Meta.parse(matrix_string))

##### Final Note: Even though you now know various ways to write/read data from .txt files in Julia, you should never do any of that unless you are forced to by powers out of your control.  

###  2.3 Human readable data and usable -> DataFrames.jl and CSV.jl

But what is the right way to save data in a human readable, exchangeable and usable way without needing to do all the tedious stuff in the previous chapter ?  
The packages `DataFrames` and `CSV` provide solutions. 

In [None]:
using DataFrames, CSV

For scientific applications, numerical data is usally shared in the `.csv` (comma separated values) format. Therefore, if you want to share your data, it is best to provide it as `.csv` files. The `CSV` package provides a simple interface to import and export csv formatted data into Julia. This comes with the limitation/benefit that you are forced to format your data as a `DataFrame` object from the `DataFrames` package. 

A `DataFrame` is essentially a table, where each colum is a `Vector` of any type. The `DataFrames` package additionally provides a lot of data analysis features, which you will get to know in the next lesson. 

If you want to save a vector as a `.csv` file you need to convert it into a `DataFrame` first. 

In [None]:
number_vec = collect(10:10:120)
Number_df = DataFrame(number = number_vec)

A column can be accessed by the name of the DataFrame followed by a `.` and the name of the column.

In [None]:
Number_df.number

Then we can use the `CSV.write()` function to save our DataFrame at the given path. 

In [None]:
CSV.write("numbers.csv",Number_df)

The function `CSV.File(path)` returns a table object that can be transformed back into a DataFrame with the `DataFrame()` function.

In [None]:
numbers_table = CSV.File("numbers.csv")

In [None]:
numbers_df = DataFrame(numbers_table)

##### Comment: This might be the first time, that you encounter an operator ( in this case `.`) that has different effects depending on the context. It was first introduced as a way to apply a function elementwise on a collection. Second, it is a way to access subobjects (`Vectors`) of a more complex object (`DataFrame`) and here, we used it to call specific functions from a pacakge (`CSV.write`). Don't worry, this might be confusing at first but is pretty common in programming. You get used to it very fast!

You can use `DataFrame` and `CSV.File` nested. The `CSV.File` function also has some useful additional keyword arguments like `header` which specifies which row should be read as the head of the table.

In [None]:
DataFrame(CSV.File("DrugScreen1.csv", header = 1)) 

Note: If not all colums are displayed, you can change the number of vertical displayed characters in a jupyter notebook by the command: `ENV["COLUMNS"] = 250`. The default value is 100 which can be increased to a number that fits well to your screen. 

In [None]:
list_3d = DataFrame(CSV.File("list3D.csv"))

Normally, the data type of a column will be determined automatically by the `DataFrame` package. In the case of `list_3d`, this did not work properly as the data type is no primitive Julia data type. Therefore, it was interpreted as a `String`. We can use the same Metaprogramming functions `eval(Meta.parse())` to cast the strings on valid Julia data types.   

In [None]:
# We have to use the broadcasted versions of the functions with . as we want to parse every single entry and not the whole column. 
list_3d."3" = eval.(Meta.parse.(list_3d."3"))
list_3d."5" = eval.(Meta.parse.(list_3d."5"))
list_3d

##  Short  Summary
- Julia can navigate through paths on your hard drive and create/load/delete folders and files
- Use `Serialization` to import and export data when only using Julia
- Use `DataFrames` and the `CSV` package to save and share your data in `.csv` files
- If you are forced to import data from `.txt` files:
    1. Check if the data is provided in one of the existing parsing format (`JSON`,`JSON3` `BSON`,...) 
    2. If not use the metaprogramming features of Julia to parse the data
    3. Be a good person and transform the data into a Dataframe/`.csv` when exporting

## Exercises

All exercises in this course are divided into three different difficulty categories: <span style="color:green">easy</span>, <span style="color:orange">medium</span> and <span style="color:red">hard</span>. <span style="color:green">Easy</span> exercises should be solvable solely with the contents of the respective notebook. <span style="color:orange">Medium</span> often require the transfer of known concepts to new problems. Therefore, it might be necessary to look up some old notebooks or to use your creativity and curiosity to combine seemingly unrelated stuff. <span style="color:red">Hard</span> exercises take this concept one step further and might require you to use additional resources like the official documentation, google, StackOverflow,... . 

### <p style='color: green'>easy</p>


1. Get the number of files in the current working directory.


2. Count the number of .txt files in the current working directory.


3. Create a subfolder named "data" inside the folder of the notebook.


4. Create and save a vector of Float64 as a `.jls` file inside the folder "data". 


5. Load the `.jls` file and delete it afterwards. 

### <p style='color: orange'>medium - Make sure that the previously created folder data is empty </p> 


6. Create  20 vectors that each contain 30 ascending numbers like ` vec1 = [1,...,30] vec2 = [31,...,60],...`. Save each vector as seperate `.jls` file into the folder "data". 

Hint: You do not need to give each vector a name. 

7. Write and execute a function called import_Vectors that imports all vectors from the folder "data" and creates one big vector called HUGE that contains all the numbers in ascending order.   

8. Apply the sinus function to HUGE and plot it as a scatter plot with CairoMakie. Save the plot in the data folder. 

9. Make sure that the import_Vector functions still works, even thought there is now a plot file as well in the data folder. 

10. Import the dataset drugScreen3 from the file DrugScreen3.csv. Check that the import worked correctly. If not, adjust the import parameters and modify the columns. 


### <p style='color: red'>hard</p>

11. Create a new folder called weights. Copy the file  Weights.csv to that folder, only using Julia. Import the data set "Weights" (Weights.csv). Check that the import worked correctly. If not, adjust the import parameters. Plot a histogram of the weights and save it as a PDF in a new subfolder in weights called plots.

12. Import the file AliceInWonderland.txt and count the number of words in the text. 
