<div style="color: #8b1538; font-size: 38px">Julia 1.0 Programming</div>

<div class="alert alert-block alert-success">
    Run cells with python in Julia kernel
</div>

In [1]:
macro python_str(s) open(`python`,"w",stdout) do io; print(io, s); end; end

@python_str (macro with 1 method)

# Variables, naming conventions and comments

> Comme en python, pas besoin de spécifier le type 

In [2]:
x, x2, x3 = 7, Int8(7), 0.5
s = "Julia"

println("Type de x: $(typeof(x))")
println("Type de x2: $(typeof(x2))")
println("Type de x3: $(typeof(x3))")
println("Type de s: $(typeof(s))")

Type de x: Int64
Type de x2: Int8
Type de x3: Float64
Type de s: String


<br>

### Suppresse output - ajouter ;

In [3]:
s;

<br>

### Stylish print

In [4]:
printstyled("I love Julia!", color=:red, bold=true)

[31m[1mI love Julia![22m[39m

<br>

### Overflow behavior

In [5]:
x = 10^19  # Dépasse typemax(Int64) - result in wraparound behavior
x2 = big(10)^19

println("x: $(x) & type: $(typeof(x))\nx2: $(x2) & type: $(typeof(x2))")

x: -8446744073709551616 & type: Int64
x2: 10000000000000000000 & type: BigInt


<br>

### Elementary mathemactical functions and operations

In [6]:
x += 2; x2 -= 3; x3 /= 2;
print("$(x), $(x2), $(x3)")

-8446744073709551614, 9999999999999999997, 0.25

In [7]:
x++  # doesn't exist in julia

LoadError: syntax: incomplete: premature end of input

<br>

### String

In [8]:
s = "Julia"

s_start = s[1:3]  # "quivalent de s[begin:3]
s_end = s[3:end]

println("Start $(s_start) & End $(s_end)")

Start Jul & End lia


In [9]:
python"""
s = "Julia!"
s_start, s_end = s[:3], s[3:]

print("Start {} & End {}".format(s_start, s_end))
"""

Start Jul & End ia!


<div class="alert alert-block alert-warning">
    A String "A" is different of Char 'A'
</div>

In [10]:
s, c = "A", 'A'

print("s = $s is a $(typeof(s)), c = $c is a $(typeof(c)) & ")
printstyled("s == c $(s == c)", color=:red, bold=:true)

s = A is a String, c = A is a Char & [31m[1ms == c false[22m[39m

- \$s inside a string is replaced byt the value of s - soit A
- \$(s == c) inside a string is replaced by its computed value - soit false

<br>

> Concatenation des string

In [11]:
s = "hello" * " world"
s2 = string("hello", " world")
s3 = string(s, " with Julia")

print("s: $s, s2: $s2 & s3: $s3")

s: hello world, s2: hello world & s3: hello world with Julia

In [12]:
python"""
s = "abc" + "def"
print("s: {}".format(s))
"""

s: abcdef


<br>

> Replace

In [13]:
s = "Julia"
s = replace(s, "u" => "o")
print("s (replace): $s")

s (replace): Jolia

In [14]:
python"""
s = "Julia"
s = s.replace("u", "o")
print("s (replace): {}".format(s))
"""

s (replace): Jolia


<br>

> Split

In [15]:
s = "hello world"
s = split(s, " ")  # return an array of String
print("s (split): $s")

s (split): SubString{String}["hello", "world"]

In [16]:
python"""
s = "hello world"
s = s.split(" ")
print("s (split): {}".format(s))
"""

s (split): ['hello', 'world']


<br>

> Substring

In [17]:
s = "hello world"
debut = SubString(s, 1, 5)
fin = SubString(s, 7)  # ou SubString(s, 7, length(s))

print("Soit s = $s, les 5 1er éléments sont $debut & les 5 derniers $fin")

Soit s = hello world, les 5 1er éléments sont hello & les 5 derniers world

<br>

> Useful functions

- Indices d'un mot donné **j** dans s - renvoie seulement l'indice de la 1ère occurrence de ce mot

In [18]:
index = findfirst("hello", s)
index2 = findfirst("Julia", s)  # Renvoie nothing car n'existe pas

print("Indice de hello $index & de Julia $index2 dans $s")

Indice de hello 1:5 & de Julia nothing dans hello world

- Détermine si un mot donné **j** apparaît au moins une fois dans s

In [19]:
occursin("hello", s)

true

In [20]:
occursin("Julia", s)

false

- Reverse & uppercase

In [21]:
println(reverse(s))
println(uppercase(s))

dlrow olleh
HELLO WORLD


> Endswith

In [22]:
s = "yersinia.fasta"

if endswith(s, ".fasta")
    print("$s est un fichier fasta")
end

yersinia.fasta est un fichier fasta

In [23]:
python"""
s = "yersinia.fasta"
if s.endswith(".fasta"):
    print("{} est un fichier fasta".format(s))
"""

yersinia.fasta est un fichier fasta


<br>

### Formatting numbers and strings

In [24]:
using Printf

@printf("%d\n", 7)
@printf("%f\n", pi)
@printf("%0.2f", pi)  # round

7
3.141593
3.14

In [25]:
python"""
import math
print("{}".format(math.pi))
print("{:.2f}".format(math.pi))  # round
"""

3.141592653589793
3.14


<br>

### Ranges and array

In [26]:
for i in 1:5  # de 1 à 5 (inclus) par pas de 1 - défaut
    print("$i ")
end
println()

for i in 1:2:5  # de 1 à 5 (inclus) par pas de 2
    print("$i ")
end

1 2 3 4 5 
1 3 5 

In [27]:
python"""
for i in range(1, 6):  # de 1 à 6 (exclus) par pas de 1 - défaut
    print("{} ".format(i), end="")
print()

for i in range(1, 6, 2):  # de 1 à 6 (exclus) par pas de 2
    print("{} ".format(i), end="")
"""

1 2 3 4 5 
1 3 5 

<br>

> Set up a macro range to have a python like syntax

In [28]:
macro range(debut, fin, args...)
    if isempty(args)
        return debut:fin
    end
    return debut:args[1]:fin
end

@range (macro with 1 method)

In [29]:
for i in @range(1, 5)  # de 1 à 5 (inclus) par pas de 1 - défaut
    print("$i ")
end
println()

for i in @range(1, 5, 2)  # de 1 à 5 (inclus) par pas de 2
    print("$i ")
end

1 2 3 4 5 
1 3 5 

<br>

> Vector - 1 dimensional array

In [30]:
arr = [100, 25, 37]  # array de int
show(arr)

[100, 25, 37]

In [31]:
arr = Any[100, 0.5, "Julia"]  # array de type any
show(arr)

Any[100, 0.5, "Julia"]

<div class="alert alert-block alert-warning">
    The index starts from 1 in julia
</div>

In [32]:
arr = Array{Int64}(undef, 2)  # array of random int
show(arr)

[140267067354832, 140266642576464]

<br>

> Push

In [33]:
arr = Int64[]
push!(arr, 66)
show(arr)

[66]

In [34]:
python"""
l = []
l.append(66)
print(l)
"""

[66]


<br>

> Pop - remove le dernier élément

In [35]:
pop!(arr)
show(arr)

Int64[]

In [36]:
python"""
l = [66]
print(l[:-1])
"""

[]


<br>

> Initialize an array from a range

In [37]:
arr = collect(1:7)
show(arr)

[1, 2, 3, 4, 5, 6, 7]

In [38]:
python"""
l = list(range(1,8))
print(l)
"""

[1, 2, 3, 4, 5, 6, 7]


<br>

In [39]:
macro collect(debut, fin, args...)
    if isempty(args)
        return collect(debut:fin-1)
    end
    return collect(debut:args[1]:fin-1)
end

@collect (macro with 1 method)

In [40]:
arr, arr2 = @collect(1,8), @collect(1, 8, 2)

([1, 2, 3, 4, 5, 6, 7], [1, 3, 5, 7])

<br>

> Access element by index

In [41]:
debut, fin = arr[1], arr[end]  # arr[begin] équivalent à arr[1]
print("1er élément vaut $debut & dernier vaut $fin")

1er élément vaut 1 & dernier vaut 7

In [42]:
python"""
l = list(range(1,8))
debut, fin = l[0], l[-1]
print("1er élément vaut {} & dernier vaut {}".format(debut, fin))
"""

1er élément vaut 1 & dernier vaut 7


<br>

> Fonctions importantes

In [43]:
arr_type, arr_dim, arr_len = eltype(arr), ndims(arr), length(arr)

print("Type $arr_type \nTaille $arr_len \nDimension $arr_dim")

Type Int64 
Taille 7 
Dimension 1

<br>

> Join méthode

In [44]:
arr_s = join(arr, " ")
arr_s

"1 2 3 4 5 6 7"

Python ne permet pas de join une liste de integer, il faut au préalable convertir chaque élément en string 

In [45]:
python"""
l = list(range(1,8))
l = " ".join(str(ele) for ele in l)
print("{}".format(l))
"""

1 2 3 4 5 6 7


<br>

> Some common functions fo arrays

- Concatenate 

In [46]:
a, b = [1, 7], [100, 200, 300]
append!(a, b)
show(a)

[1, 7, 100, 200, 300]

- Remove an element of index i

In [47]:
supp = splice!(a, 3)
supp, a

(100, [1, 7, 200, 300])

- Déterminer si un array contient un élément donné

In [48]:
in(200, a), in(100, a)

(true, false)

- Sort

In [49]:
a = [16, 7, 2, 23, 2, 1]

sort(a); println("Sorted: $a")  # équivalent à sorted() en python
sort!(a); println("Sort in place: $a")  # équivalent à a.sort() en python

Sorted: [16, 7, 2, 23, 2, 1]
Sort in place: [1, 2, 2, 7, 16, 23]


- Deep copy

Comme en python, si on fait a = b alors a & b point le même objet en mémoire

In [50]:
a = [1,2,4,6]
b = deepcopy(a)
b[end] = 0
println("a = $a & b = $b")

a = [1, 2, 4, 6] & b = [1, 2, 4, 0]


<br>

###  Dates and times

In [51]:
start_time = time()
println("Execution time: $(time()-start_time)")

Execution time: 0.00049591064453125


Pour déterminer le temps d'éxécution d'un programme, la macro **elapsed** peut être utilisée de la manière suivante: **@elapsed func()**

In [52]:
import Dates
Dates.Time(Dates.now())

12:53:54.452

<br>

# Functions 

> Multiple return values

Renvoie un tuple 

In [53]:
function carre_cube(x)
    return x^2, x^3
end

carre, cube = carre_cube(2)

(4, 8)

<br>

> Fonction qui prend un nombre arbitraire d'arguments

- x argument positionnel
- args tuple d'arguments

In [54]:
function power(x, args...)  # x est un argument positionel
    if isempty(args)
        return x^2  # par défaut détermnine le carré de x
    end
    return x^args[1]
end

x = 2
carre, cube = power(x), power(x, 3)

print("Soit x = $x, x^2 = $carre & x^3 = $cube")

Soit x = 2, x^2 = 4 & x^3 = 8

<br>

> Restrict the kind of parameters

- x argument positionnel
- pow argument optionnel qui par défaut vaut 2

In [55]:
function power_integer(x::Int, pow::Int=2)  # x & pow doivent être des Int & par défaut pow = 2
    return x^pow
end

x = 2
carre, cube = power_integer(x), power_integer(x, 3)

println("Soit x = $x, x^2 = $(carre) & x^3 = $cube")

Soit x = 2, x^2 = 4 & x^3 = 8


In [56]:
power_integer(x, 4.5)

MethodError: MethodError: no method matching power_integer(::Int64, ::Float64)
Closest candidates are:
  power_integer(::Int64, !Matched::Int64) at In[55]:2
  power_integer(::Int64) at In[55]:2

<br>

> One-line function - compact syntax

In [57]:
power_integer(x::Int, pow::Int=2) = x^pow

carre, cube = power_integer(x), power_integer(x, 3)
println("Soit x = $x, x^2 = $(carre) & x^3 = $cube")

Soit x = 2, x^2 = 4 & x^3 = 8


<br>

> Map, filter and list comprehensions

- map(func, coll)

Soit **func** une fonction (souvent anonyme) appliquée successivement à chaque élément de la collection **coll**

In [58]:
l = [1, 2, 3]
l2 = map(x -> x*10, l)

print("Soit l = $l & dix = $l2")

Soit l = [1, 2, 3] & dix = [10, 20, 30]

In [59]:
l = @collect(-5, 15, 3)
cubes = map(x -> power(x, 3), l)

print("Soit l = $l & cubes = $cubes")

Soit l = [-5, -2, 1, 4, 7, 10, 13] & cubes = [-125, -8, 1, 64, 343, 1000, 2197]

- filter(func, coll)

Soit **func** une fonction booléenne (souvent anonyme) qui est contrôlée sur chaque élément de la collection **coll**

In [60]:
l = @collect(-1, 7)
even = filter(n -> iseven(n), l)

print("Les éléments paires de $l sont $even")

Les éléments paires de [-1, 0, 1, 2, 3, 4, 5, 6] sont [0, 2, 4, 6]

- List comprehension - comme en python

In [61]:
carre = [power(x, 2) for x in @collect(-5, 15, 3)]
cubes = [power(x, 3) for x in @collect(-5, 15, 3)]

print("Liste de carrés: $carre & liste de cubes: $cubes")

Liste de carrés: [25, 4, 1, 16, 49, 100, 169] & liste de cubes: [-125, -8, 1, 64, 343, 1000, 2197]

<br>

# Control flow

> Try catch exceptions

In [62]:
l = []
try
    pop!(l)
catch ex
    println(typeof(ex)) 
    showerror(stdout, ex)
end

ArgumentError
ArgumentError: array must be non-empty

In [63]:
python"""
l = []
try:
    l[-1]
except:
    print("Liste vide")
"""

Liste vide


<br>

In [64]:
arr = collects(1, 4)

for index in 1:length(arr)
    println("$index: $(arr[index])")
end

UndefVarError: UndefVarError: collects not defined

<br>

# Collection types

> Matrices

- Matrice de taille (2,2)

In [65]:
m = [1 2; 3 4]

2×2 Array{Int64,2}:
 1  2
 3  4

<br>

- Matrice de taille (2,5) - constituée de nombre aléatoire entre 0 et 1

In [66]:
 m = rand(2, 5)

2×5 Array{Float64,2}:
 0.814988  0.964051  0.0977152  0.124034  0.658863
 0.427918  0.336311  0.0934309  0.589955  0.925075

<br>

- Nb dimensions, rows & col

In [67]:
dim = ndims(m)
nrows, ncols = size(m)  # nrows = size(m, 1) & ncols = size(m, 2) 

print("m est une matrice à $dim dimensions avec $nrows lignes & $ncols colonnes")

m est une matrice à 2 dimensions avec 2 lignes & 5 colonnes

<br>

- Slices

Sélectionner la 2ème colonne

In [68]:
col2 = m[:, 2];  # m[1:end, 2]

Sélectionner la 2ème ligne

In [69]:
row2 = m[2, :];

Affecter 0 à tous les éléments de la 2ème ligne

In [70]:
m[2, :] .= 0  # m[2, :] = [0 0 0 0 0]
m

2×5 Array{Float64,2}:
 0.814988  0.964051  0.0977152  0.124034  0.658863
 0.0       0.0       0.0        0.0       0.0

Matrice transposée **m'** de **m**

In [71]:
m = [1 2; 3 4]
m'

2×2 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
 1  3
 2  4

<br>

- Concatenation

In [72]:
a, b = [1 2; 3 4], [5 6; 7 8]

([1 2; 3 4], [5 6; 7 8])

Horizontal concatenation - concatenation along the 2nd dimension (columns)

In [73]:
c = hcat(a, b)  # c = [a b]

2×4 Array{Int64,2}:
 1  2  5  6
 3  4  7  8

Vertical concatenation - concatenation along the first dimension (rows)

In [74]:
c = vcat(a, b)  # c = [a; b]

4×2 Array{Int64,2}:
 1  2
 3  4
 5  6
 7  8

<br>

> Tuples

In [75]:
x, x2 = (1, 2, 3), (1, "a", 0.5)
print("x = $x - $(typeof(x)) & x2 = $x2 - $(typeof(x2))")

x = (1, 2, 3) - Tuple{Int64,Int64,Int64} & x2 = (1, "a", 0.5) - Tuple{Int64,String,Float64}

In [76]:
seq = Tuple("ATGCCGCGAT")

('A', 'T', 'G', 'C', 'C', 'G', 'C', 'G', 'A', 'T')

In [77]:
for nucleotide in seq
    print("$nucleotide ")
end

A T G C C G C G A T 

In [78]:
t = (nourriture="tarte", item=["flour", "salt", "pepper"], salade="comcombre")

println("t = $t")
print("item = $(t.item)")

t = (nourriture = "tarte", item = ["flour", "salt", "pepper"], salade = "comcombre")
item = ["flour", "salt", "pepper"]

<br>

> Dictionaries

In [79]:
dico = Dict("France" => "Paris", "Russia" => "Moscow", "South Africa" => ["Cape Town", "Pretoria"])

Dict{String,Any} with 3 entries:
  "South Africa" => ["Cape Town", "Pretoria"]
  "France"       => "Paris"
  "Russia"       => "Moscow"

Dict{String,Any} car les clés sont toutes de type String par contres les valeurs sont de type String ou array

In [80]:
dico2 = Dict(
    "France" => "Paris", "Russia" => "Moscow", 
    "South Africa" => Dict("Administrative" => "Pretoria", "Législative" => "Cape Town")
)

Dict{String,Any} with 3 entries:
  "South Africa" => Dict("Administrative"=>"Pretoria","Législative"=>"Cape Town…
  "France"       => "Paris"
  "Russia"       => "Moscow"

In [81]:
pays = ["France", "Russia", "South Africa"]
capital = ["Paris", "Moscow", Dict("Administrative" => "Pretoria", "Législative" => "Cape Town")]

dico3 = Dict(zip(pays, capital))

Dict{String,Any} with 3 entries:
  "South Africa" => Dict("Administrative"=>"Pretoria","Législative"=>"Cape Town…
  "France"       => "Paris"
  "Russia"       => "Moscow"

<br>

In [82]:
dico2["Russia"]

"Moscow"

In [83]:
haskey(dico2, "Russia"), haskey(dico2, "US")

(true, false)

In [84]:
get(dico2, "Russia", "null"), get(dico2, "US", nothing)

("Moscow", nothing)

Russia est présent donc renvoie Moscow mais US est absent donc renvoie nothing

In [85]:
dico2["France"] = "Moscow"
dico2

Dict{String,Any} with 3 entries:
  "South Africa" => Dict("Administrative"=>"Pretoria","Législative"=>"Cape Town…
  "France"       => "Moscow"
  "Russia"       => "Moscow"

<br>

In [86]:
for pays in keys(dico2)
    print("$pays  ")
end

South Africa  France  Russia  

In [87]:
for capital in values(dico2)
    print("$capital  ")
end

Dict("Administrative" => "Pretoria","Législative" => "Cape Town")  Moscow  Moscow  

In [88]:
for (pays, capital) in dico2
    println("$pays: $capital")
end

South Africa: Dict("Administrative" => "Pretoria","Législative" => "Cape Town")
France: Moscow
Russia: Moscow


<br>

In [89]:
python"""
dico = {'France': "Paris", 'Russia': "Moscow", 'South Africa': ["Cape Town", "Pretoria"]}

pays = ["France", "Russia", "South Africa"]
capital = ["Paris", "Moscow", {"Administrative": "Pretoria", "Législative": "Cape Town"}]
dico2 = dict(zip(pays, capital))

print("{}\n{}".format(dico, dico2), end="\n\n")

print("KEYS")
for pays in dico2.keys():
    print("{}  ".format(pays), end="")

print("\n\nVALUES")
for capital in dico2.values():
    print("{}  ".format(capital), end="")

print("\n\nITEMS")
for pays, capital in dico2.items():
    print("{}: {}".format(pays, capital))
"""

{'France': 'Paris', 'Russia': 'Moscow', 'South Africa': ['Cape Town', 'Pretoria']}
{'France': 'Paris', 'Russia': 'Moscow', 'South Africa': {'Administrative': 'Pretoria', 'Législative': 'Cape Town'}}

KEYS
France  Russia  South Africa  

VALUES
Paris  Moscow  {'Administrative': 'Pretoria', 'Législative': 'Cape Town'}  

ITEMS
France: Paris
Russia: Moscow
South Africa: {'Administrative': 'Pretoria', 'Législative': 'Cape Town'}


<br>

> Sets - comme en python

In [90]:
s, s2 = Set([1, 1, 25]), Set([7, 25])
push!(s, 1); push!(s, 13)
s, s2

(Set([25, 13, 1]), Set([7, 25]))

In [91]:
println("Union: $(union(s, s2))")
println("Intersect: $(intersect(s, s2))")
println("Diff (s s2): $(setdiff(s, s2)) & Diff (s2 s): $(setdiff(s2, s))")

Union: Set([7, 25, 13, 1])
Intersect: Set([25])
Diff (s s2): Set([13, 1]) & Diff (s2 s): Set([7])


<br>

# More on Types, Methods and Modules

In [92]:
a, b = 1, 1.5
c = a + b

print("a: $a - $(typeof(a)) & b: $b - $(typeof(b))\nc: $c - $(typeof(c))")

a: 1 - Int64 & b: 1.5 - Float64
c: 2.5 - Float64

Implicitement l'opérateur **+** appelle d'abord la méthode **promote** qui convertit a & b dans un type commun (ici Float64) puis réalise l'addition

In [93]:
promote(a, b)

(1.0, 1.5)

<br>

- Define his own types

In [94]:
struct UnmutablePoint
    x::Float64
    y::Float64
    z::Float64
end
p = UnmutablePoint(1, 4, 2)  # p = Point(1, 4, 2)

println("Soit $p de type $(typeof(p))")
p.x = 255

Soit UnmutablePoint(1.0, 4.0, 2.0) de type UnmutablePoint


ErrorException: setfield! immutable struct of type UnmutablePoint cannot be changed

In [95]:
mutable struct Point
    x::Float64
    y::Float64
    z::Float64
end
p = Point(1, 4, 2)  # p = Point(1, 4, 2)
p.x = 255

println("Soit $p de type $(typeof(p))")

Soit Point(255.0, 4.0, 2.0) de type Point


mutable indique qu'on peut modifier les valeurs de Point

<br>

Comparaison des valeurs & objets

- Valeurs

In [96]:
5 == 5, 5 == 5.0, isequal(5, 5), isequal(5, 5.0)

(true, true, true, true)

- Objets

In [97]:
5 === 5, 5 === 5.0

(true, false)

<br>

- Parametric types and methods

Abstract type: x & y peuvent être de n'importe quel type - Int, Float, String, etc.

In [98]:
mutable struct Point2{T}
    x::T
    y::T
end

p, p2, p3 = Point2(2, 5), Point2(2.5, 5.5), Point2("Hello", "World")

(Point2{Int64}(2, 5), Point2{Float64}(2.5, 5.5), Point2{String}("Hello", "World"))

On peut cependant limiter le type de T à un sous-type donné, ici **Real**

In [99]:
mutable struct Point3{T <: Real}
    x::T
    y::T
end

p, p2 = Point3(2, 5), Point3(2.5, 5.5)
print("$p & $p2")
Point3("Hello", "World")

Point3{Int64}(2, 5) & Point3{Float64}(2.5, 5.5)

MethodError: MethodError: no method matching Point3(::String, ::String)

<br>

# Metaprogramming in Julia

In [100]:
dump(:(2 + 3 * 4))

Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol +
    2: Int64 2
    3: Expr
      head: Symbol call
      args: Array{Any}((3,))
        1: Symbol *
        2: Int64 3
        3: Int64 4


<br>

- Macros

    - **Functions**: prend les valeurs d'entrée et renvoie les valeurs calculées au moment de l'exécution
    - **Macro**: prend les expressions d'entrée et renvoie les expressions modifiées au moment de l'analyse

In [101]:
function fname()  # code taking input values
    # code returning computed values
end

macro mname()  # code taking input expressions
    # code returning modifed expression
end

@mname (macro with 1 method)

In [102]:
macro assert(ex)
    :($ex ? nothing : error("Assertion failed: ", $(string(ex)) ))  # ternary operator
end

@assert(1 == 1.0), @assert(1 == 42)

ErrorException: Assertion failed: 1 == 42

1. 1 == 1.0, condition **true** donc renvoie nothing

2. 1 == 42, condition **false** donc renvoie le message d'erreur

In [103]:
macroexpand(Main, :(@assert 1 == 42))

:(if 1 == 42
      Main.nothing
  else
      Main.error("Assertion failed: ", "1 == 42")
  end)

<br>

# I/O, Networking, and Parallel Computing

- Basic input and output

In [104]:
#n = readline()
#print("$n")

In [105]:
write(stdout, "Julia")

Julia

5

5 correspond au nombre d'octets dans le flus de sortie

<br>

- **Lecture de fichier** - fichier fasta

In [106]:
tmp = "aaa "
strip(tmp)

"aaa"

In [107]:
function read_fasta(fichier)
    seq = Array{String, 1}()
    
    open(fichier, "r") do filin
        tmp = ""
        for line in readlines(filin)
            if !startswith(line, ">")
                tmp *= strip(line)
            else
                if length(tmp) > 0
                    push!(seq, tmp)
                end
                tmp = ""
            end
        end
    end  # end implicitly execute close(f)
    
    return seq
end

sequences = read_fasta("./data/fasta/test_sequences.fasta");
println(sequences[1])
print(length(sequences))

TGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGTGTTGTGGTTAATAACCGCAGCAATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGGAAAGCGCA
11

<br>

In [108]:
python"""
def read_fasta(fichier):
    seq = []
    
    with open(fichier, "r") as filin:
        tmp = ""
        for line in filin:
            if not line.startswith(">"):
                tmp += line.strip()
            else:
                if len(tmp) > 0:
                    seq.append(tmp)
                tmp = ""
    return seq

sequences = read_fasta("./data/fasta/test_sequences.fasta")
print(sequences[0])
print(len(sequences))
"""

TGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGTGTTGTGGTTAATAACCGCAGCAATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGGAAAGCGCA
11


<br>

<br>

Lecture de fichier avec un generator

In [109]:
function load_fasta(fichier)
   
    open(fichier, "r") do filin
        seq = ""
        
        Channel() do channel  # equivalent to generator in python - yield
            for line in readlines(filin)
                if !startswith(line , ">")
                    seq *= strip(line)
                else
                    if length(seq) > 0
                        put!(channel, seq)
                    end
                    seq = ""
                end
            end
        end
        
    end  # end implicitly execute close(f)
end

sequences = load_fasta("./data/fasta/test_sequences.fasta")

Channel{Any}(sz_max:0,sz_curr:1)

Pour parcourir les séquences 

In [110]:
#for seq in sequences
#    println(seq)
#end

<br>

In [111]:
python"""
def load_fasta(fichier):
    
    with open(fichier, "r") as filin:
        seq = ""
        for line in filin:
            if not line.startswith(">"):
                seq += line.strip()
            else:
                if len(seq) > 0:
                    try:
                        yield seq
                    except StopIteration:
                        return
                seq = ""

sequences = load_fasta("./data/fasta/test_sequences.fasta")
"""

<br>

<br>

- **Ecriture dans un fichier** - fichier fasta

In [112]:
function fasta_format(text, width=80)  # Split text to respect fasta format
    seq = Array{String, 1}()

    for i in 1:width:length(text)
        if i+width > length(text)
            push!(seq, text[i:end])
        else
            push!(seq, text[i:i+width-1])
        end
    end
    
    return join(seq, "\n")
end


function save_fasta(sequences)
    
    open("./data/fasta/test_write_julia.fasta", "w") do filout
        tmp = 0
        for seq in sequences
            write(filout, ">Seq_$tmp\n")
            write(filout, fasta_format(seq))
            write(filout, "\n")
            tmp += 1
        end
    end  # end implicitly execute close(f)
    
end

sequences = load_fasta("./data/fasta/test_sequences.fasta")
save_fasta(sequences)

<br>

In [113]:
python"""
def load_fasta(fichier):
    
    with open(fichier, "r") as filin:
        seq = ""
        for line in filin:
            if not line.startswith(">"):
                seq += line.strip()
            else:
                if len(seq) > 0:
                    try:
                        yield seq
                    except StopIteration:
                        return
                seq = ""


def fasta_format(text, width=80):
    seq = [text[i:i+width] for i in range(0, len(text), width)]
    return "\n".join(seq)


def save_fasta(sequences):
    with open("./data/fasta/test_write_python.fasta", "w") as filout:
        for i, seq in enumerate(sequences):
            filout.write(">Seq_{}\n".format(i))
            filout.write("{}\n".format(fasta_format(seq)))


sequences = load_fasta("./data/fasta/test_sequences.fasta")
save_fasta(sequences)
"""

<br>

<br>

- **CSV files** - DelimitedFiles package

In [114]:
using DelimitedFiles

In [142]:
csv = DelimitedFiles.readdlm("./data/csv/transferrine.csv", ',', header=true)

(Any["1A8E" "Homo sapiens" "1998-03-24" 329; "1A8F" "Homo sapiens" "1998-03-25" 329; "1AIV" "Gallus gallus" "1997-04-28" 686], AbstractString["PDB_ID" "Source" "Deposit_Date" "Length"])

    - Le header

In [143]:
header = csv[2]  # le header

1×4 Array{AbstractString,2}:
 "PDB_ID"  "Source"  "Deposit_Date"  "Length"

    - Les données

In [144]:
data = csv[1]  # les données

3×4 Array{Any,2}:
 "1A8E"  "Homo sapiens"   "1998-03-24"  329
 "1A8F"  "Homo sapiens"   "1998-03-25"  329
 "1AIV"  "Gallus gallus"  "1997-04-28"  686

In [118]:
data[3,:]  # la 3ème ligne

4-element Array{Any,1}:
    "1AIV"
    "Gallus gallus"
    "1997-04-28"
 686

In [119]:
data[:,1]  # la 1ère colonne - soit PDB ID

3-element Array{Any,1}:
 "1A8E"
 "1A8F"
 "1AIV"

In [120]:
mat = [data[:,1] data[:,3:end]]  # Créer une matrice composer de la 1ère colonne et des 2 dernières

3×3 Array{Any,2}:
 "1A8E"  "1998-03-24"  329
 "1A8F"  "1998-03-25"  329
 "1AIV"  "1997-04-28"  686

<br>

In [121]:
struct Transferrine
    pdb_id::String
    source::String
    deposit_date::String
    length::Int
end

t1 = Transferrine(data[1,:]...)  # l'opérateur ... 
t2 = Transferrine(data[2,:]...)

t1, t1.pdb_id, t2

(Transferrine("1A8E", "Homo sapiens", "1998-03-24", 329), "1A8E", Transferrine("1A8F", "Homo sapiens", "1998-03-25", 329))

<br>

In [145]:
function save_csv(fichier, header, data)
    
    open(fichier, "w") do filout
        header = join(tuple(header...), ",")
        write(filout, header, "\n")
        for i in 1:size(data)[1]
            row = join(tuple(data[i,:]...), ",")
            write(filout, row, "\n")
        end
    end
    
end

save_csv("./data/csv/csv_write_julia.csv", header, data)

<br>

In [149]:
python"""
import csv

def load_csv(fichier):
    data = {'PDB_ID': [], 'Source': [], 'Deposit_Date': [], 'Length': []}

    with open(fichier) as filin:
        f_reader = csv.DictReader(filin)
        for row in f_reader:
            for key in data:
                data[key].append(row[key])
    
    return data

data = load_csv("./data/csv/transferrine.csv")
print(data)


def save_csv(fichier, data):
    
    with open(fichier, "w") as filout:
        fields = list(data.keys())
        f_writer = csv.DictWriter(filout, fieldnames=fields)
        f_writer.writeheader()
    
        for i in range(len(data['PDB_ID'])):
            tmp = {
                'PDB_ID': data['PDB_ID'][i],
                'Source': data['Source'][i],
                'Deposit_Date': data['Deposit_Date'][i],
                'Length': data['Length'][i]
            }
            f_writer.writerow(tmp)

save_csv("./data/csv/csv_write_python.csv", data)
"""

{'PDB_ID': ['1A8E', '1A8F', '1AIV'], 'Source': ['Homo sapiens', 'Homo sapiens', 'Gallus gallus'], 'Deposit_Date': ['1998-03-24', '1998-03-25', '1997-04-28'], 'Length': ['329', '329', '686']}


<br>

<br>

- **Using DataFrames**

In [128]:
using DataFrames

In [129]:
df = DataFrame(Col1 = 1:4, Col2 = [MathConstants.e, pi, sqrt(2), 42], 
    Col3 = [true, false, true, false])

Unnamed: 0_level_0,Col1,Col2,Col3
Unnamed: 0_level_1,Int64,Float64,Bool
1,1,2.71828,1
2,2,3.14159,0
3,3,1.41421,1
4,4,42.0,0


    - Column selection
    
Use index for column selection is deprecated but it's still working

In [143]:
show(df[2])

[2.718281828459045, 3.141592653589793, 1.4142135623730951, 42.0]

│   caller = top-level scope at In[143]:1
└ @ Core In[143]:1


In [144]:
show(df.Col2)  # df[!,:Col2]

[2.718281828459045, 3.141592653589793, 1.4142135623730951, 42.0]

    - Row selection

In [148]:
df[1,:]

Unnamed: 0_level_0,Col1,Col2,Col3
Unnamed: 0_level_1,Int64,Float64,Bool
1,1,2.71828,1


In [151]:
df[1:2,:Col2]  # sélectionner la 2ème colonne des deux 1ères lignes

2-element Array{Float64,1}:
 2.718281828459045
 3.141592653589793

In [153]:
df[1:2,[:Col2, :Col3]]  # sélectionner la 2ème & 3ème colonnes des deux 1ères lignes

Unnamed: 0_level_0,Col2,Col3
Unnamed: 0_level_1,Float64,Bool
1,2.71828,1
2,3.14159,0


    - Head and tail - deprecated instead use first and last

In [157]:
first(df, 2)

Unnamed: 0_level_0,Col1,Col2,Col3
Unnamed: 0_level_1,Int64,Float64,Bool
1,1,2.71828,1
2,2,3.14159,0


In [159]:
last(df, 2)

Unnamed: 0_level_0,Col1,Col2,Col3
Unnamed: 0_level_1,Int64,Float64,Bool
1,3,1.41421,1
2,4,42.0,0


    - Useful methods

In [169]:
names(df), eltype.(eachcol(df))

(["Col1", "Col2", "Col3"], DataType[Int64, Float64, Bool])

In [170]:
describe(df)

Unnamed: 0_level_0,variable,mean,min,median,max,nunique,nmissing,eltype
Unnamed: 0_level_1,Symbol,Float64,Real,Float64,Real,Nothing,Nothing,DataType
1,Col1,2.5,1.0,2.5,4.0,,,Int64
2,Col2,12.3185,1.41421,2.92994,42.0,,,Float64
3,Col3,0.5,0.0,0.5,1.0,,,Bool


    - CSV save & read - package DataFrame & CSV

In [150]:
using CSV
using DataFrames

In [151]:
csv = CSV.read("./data/csv/transferrine.csv", delim=",", header=true)

Unnamed: 0_level_0,PDB_ID,Source,Deposit_Date,Length
Unnamed: 0_level_1,String,String,Date…,Int64
1,1A8E,Homo sapiens,1998-03-24,329
2,1A8F,Homo sapiens,1998-03-25,329
3,1AIV,Gallus gallus,1997-04-28,686


In [161]:
show(csv.PDB_ID), show(csv[!,:PDB_ID])
csv[1, [:PDB_ID, :Length]]

["1A8E", "1A8F", "1AIV"]["1A8E", "1A8F", "1AIV"]

Unnamed: 0_level_0,PDB_ID,Length
Unnamed: 0_level_1,String,Int64
1,1A8E,329


In [177]:
combine(csv -> size(csv, 1), groupby(csv, :Source, sort=false, skipmissing=true))

Unnamed: 0_level_0,Source,x1
Unnamed: 0_level_1,String,Int64
1,Homo sapiens,2
2,Gallus gallus,1


<br>

In [184]:
CSV.write("./data/csv/csv_write_julia.csv", csv, delim=",", header=true)

"./data/csv/csv_write_julia.csv"

<br>

# Performance tips

    - Libérer la mémoire

In [188]:
csv = nothing 

    - Array operation

In [191]:
x = [1, 2, 3, 4]
x .^ 2  # beaucoup plus rapide
[ele^2 for ele in x];