<h1>High performance computing with Julia for data analysis</h1>


<p>
This notebook explains the basic concept of Julia, a high performance coding language for data analysis.
Concepts like data types, operators, functions, vector, arrays, data frames are explained.<br>
This project relies heavily on the doc:<br>
<a href="https://docs.julialang.org/en/v1/" target=_blank>docs.julialang.org </a>
</p>

<p>
"Scientific computing has traditionally required the highest performance, yet domain experts have largely moved to slower dynamic languages for daily work. We believe there are many good reasons to prefer dynamic languages for these applications, and we do not expect their use to diminish. Fortunately, modern language design and compiler techniques make it possible to mostly eliminate the performance trade-off and provide a single environment productive enough for prototyping and efficient enough for deploying performance-intensive applications. The Julia programming language fills this role: it is a flexible dynamic language, appropriate for scientific and numerical computing, with performance comparable to traditional statically-typed languages."
</p>

<p>
Use cases and pro and cons are discussed here:<br>
<a href="https://www.datacamp.com/blog/what-is-julia-used-for">
datacamp.com/blog/what-is-julia-used-for
</a>

# Packages

<p>
Import imports the package, but then the package name is required like in:<br>
DataFrames.DataFrame.<br>
By using using only the function name DataFrame is required.
</p>

In [1239]:
import Statistics
import DataFrames
import LinearAlgebra
using CSV

Test: <b>Does package import work?</b>

In [1240]:
Statistics.mean([345, 567, 454])

455.3333333333333

### Current working directory

In [1241]:
# current_working_directory = pwd()
# Change current working directory like in Bash
# cd(path)

In [1242]:
## Julia basics

### Comments

#### One line comments

In [1243]:
# This is a one line comment

In [1244]:
#=

This is multiline comment.
Multline comments are very useful.

=#

# Coding

### Print

In [1245]:
println("No one can say where the bones of Machiavelli rest, but modern Florence has decreed him a stately cenotaph in Santa Croce, by the side of her most famous sons.")

No one can say where the bones of Machiavelli rest, but modern Florence has decreed him a stately cenotaph in Santa Croce, by the side of her most famous sons.


### Store info in a variable

<p>
<a href = "https://docs.julialang.org/en/v1/manual/variables/">
Stylistic Conventions
</a>

"While Julia imposes few restrictions on valid names, it has become useful to adopt the following conventions:
<ul>
<li>Names of variables are in lower case.</li>
<li>Word separation can be indicated by underscores ('_'), but use of underscores is discouraged unless the name would be hard to read otherwise.</li>
<li>Names of Types and Modules begin with a capital letter and word separation is shown with upper camel case instead of underscores.</li>
<li>Names of functions and macros are in lower case, without underscores."</li>
</ul>
</p>

In [1246]:
y = 100
println(y)

100


In [1247]:
# multiply

y_3 = y*3
println(y_3)

300


#### Adding two objects

In [1248]:
dm = 11

dt = 11 + 4
println(dt)

dm += 4
println(dm)

15
15


#### Some operators

<p>
Nothing new.
</p>


In [1249]:
4 > 5

false

In [1250]:
43 == 43

true

In [1251]:
45 >= 23

true

<p>Not</p>

In [1252]:
7 != 3

true

#### Not with inversion

In [1253]:
5 == 5

true

In [1254]:
~(5 == 5)

false

In [1255]:
#### And operator

In [1256]:
(1 == 1)

true

In [1257]:
(3 > 4)

false

In [1258]:
(1 == 1) & (3 > 4)

false

#### Or operator

In [1259]:
(1 == 1) | (3 > 4)

true

#### In operator

In [1260]:
ad = (50, 100, 1965)

50 in ad

true

#### XOR gate

<p>
XOR gate (sometimes EOR, or EXOR and pronounced as Exclusive OR) is a digital logic gate that gives a true <br>
(1 or HIGH) output when the number of true inputs is odd.
</p>

In [1261]:
(5 != 5) 

false

In [1262]:
(5 > 5)

false

In [1263]:
(6 == 6)

true

In [1264]:
(8 > 5)

true

#### Number of true inputs is odd

<p>One true input.</p>

In [1265]:
(5 != 5)  ⊻ (5 > 5) ⊻ (6 == 6)

true

<p>Three true inputs.</p>

In [1266]:
(5 != 5)  ⊻ (5 > 5) ⊻ (6 == 6) ⊻ (8 > 5) ⊻ (4 < 9)

true

In [1267]:
#### Number of true inputs is even (2)

In [1268]:
(5 != 5)  ⊻ (5 > 5) ⊻ (6 == 6) ⊻ (8 > 5)

false

#### The pipe operator

In [1269]:

ad |> (t -> sum(t))

2115

### Calculating

In [1270]:
#=  
Velocity
a=v-u / t
=#

v=500
u=150
t=20

a=v-u / t
println("Velocity: ", a)

Velocity: 492.5


In [1271]:
#=
Density
p=m/v
=#

m=133
v=987
p = m / v
println("Density: ", p)

Density: 0.1347517730496454


In [1272]:
#= 
Newton’s Second Law
f=m*a
=#

f=m*a
println("Force: ", f)

Force: 65502.5


In [1273]:
#=
Kinetic enery
e=1/2 * m*v^2
=#


e=1/2 * m*v^2
println("Kinetic energy: ", e)

Kinetic energy: 6.47822385e7


### Data types

<a href="https://docs.julialang.org/en/v1/manual/types/">doc</a>

<p>
"Type systems have traditionally fallen into two quite different camps: static type systems, where every program expression must have a type computable before the execution of the program, and dynamic type systems, where nothing is known about types until run time, when the actual values manipulated by the program are available. Object orientation allows some flexibility in statically typed languages by letting code be written without the precise types of values being known at compile time. The ability to write code that can operate on different types is called polymorphism. All code in classic dynamically typed languages is polymorphic: only by explicitly checking types, or when objects fail to support operations at run-time, are the types of any values ever restricted.
</p>
    
<p>
Julia's type system is dynamic, but gains some of the advantages of static type systems by making it possible to indicate that certain values are of specific types. This can be of great assistance in generating efficient code, but even more significantly, it allows method dispatch on the types of function arguments to be deeply integrated with the language. "
</p>


In [1274]:
int_i = 110
print(typeof(int_i))

Int64

In [1275]:
float_f = 9.87
print(typeof(float_f))

Float64

In [1276]:
bool_b = false
println(typeof(bool_b))

Bool


In [1277]:
string_s = "Type systems have traditionally fallen into two quite different camps: static type systems."
println(typeof(string_s))

String


### Mixing up data types

In [1278]:
# Ohms law

i = 3
r = 4

volt_1 = i * r

println(volt_1)
println(typeof(volt_1))



12
Int64


In [1279]:
#int + float = float
i_2= 3.2

volt_2 = i_2 * r

print(volt_2)
println(typeof(volt_2))

12.8Float64


In [1280]:
dynamic = "Julia's type system is dynamic," 
advantages = "but gains some of the advantages of static type systems."

dynamic_advantages = dynamic * advantages

"Julia's type system is dynamic,but gains some of the advantages of static type systems."

#### Using the $ sign in print to declare a variable

In [1281]:
println("Explain the type system of Julia? $dynamic $advantages")

Explain the type system of Julia? Julia's type system is dynamic, but gains some of the advantages of static type systems.


In [1282]:
# frequency = velocity * wavelength

velocity = "2.9"
wavelength = 0.6

#frequency = velocity * wavelength
# Throws an error: MethodError: no method matching *(::String, ::Float64)

velocity_str = "2.9"
wavelength_str = "0.6 nm"

frequency_2 = velocity_str * wavelength_str

print(frequency_2)
print(typeof(frequency_2))

2.90.6 nmString

### Converting data types

#### Float to int

In [1283]:
t = 3.44
println(typeof(t))

# d = Int64(t)
# println(typeof(d))
# Throws an error: InexactError: Int64(3.44)

u = 3.0
println(u)
u_2 = Int64(u)
println("Data type:", typeof(u_2), ":", u_2)

Float64
3.0
Data type:Int64:3


In [1284]:
# Using the convert func
ttt= convert(Int64, u)

3

Int to float

In [1285]:
z = 12
println(typeof(z), ": ",z)

k = Float64(z)
println(typeof(k),": ", k)

Int64: 12
Float64: 12.0


Int to Str

In [1286]:
n = 23
mm = string(n)
println(typeof(mm), ": ", mm)

String: 23


Str to int

In [1287]:
jj = parse(Int64, mm)
print(typeof(jj), ": ", jj)

Int64: 23

## String manipulation in Julia

<p>
Strings work with double quotes and the length is not limited.<br>
Single quotes throw errors.
</p>

In [1288]:
machi_lesson_1 = "All states, all powers, that have held and hold rule over men have been and are either republics or principalities."

println(machi_lesson_1)
print(typeof(machi_lesson_1))

All states, all powers, that have held and hold rule over men have been and are either republics or principalities.
String

<p>
Line breaks are displayed by Jupyter automatically.
</p>

In [1289]:
dom_3 ="
We have in Italy, for example, the Duke of Ferrara, who could not have withstood the 
attacks of the Venetians in ’84, 
nor those of Pope Julius in ’10, unless he had been long established in his dominions."

println(dom_3)


We have in Italy, for example, the Duke of Ferrara, who could not have withstood the 
attacks of the Venetians in ’84, 
nor those of Pope Julius in ’10, unless he had been long established in his dominions.


<p>
Triple quotes display line breaks:<br>
"These types of strings have special behaviors in Julia which are helpful to create long blocks of text.<br>
Triple-quoted strings are useful to use in codes that are indented because they recognize new lines."
</p>

<a href="https://www.geeksforgeeks.org/quoted-interpolated-and-escaped-strings-in-julia/">geeks</a>

In [1290]:
machi_lesson_2 = """
                Principalities are either hereditary, in which the family has been long established; 
                or they are new. 
                The new are either entirely new, as was Milan to Francesco Sforza, or they are, as it were, 
                members annexed to the hereditary state of the prince who has acquired them, as was the kingdom of Naples 
                to that of the King of Spain. 

                """

"Principalities are either hereditary, in which the family has been long established; \nor they are new. \nThe new are either entirely new, as was Milan to Francesco Sforza, or they are, as it were, \nmembers annexed to the hereditary state of the prince who has acquired them, as was the kingdom of Naples \nto that of the King of Spain. \n\n"

#### Split string into elements

In [1291]:
louis = "King Louis was brought into Italy by the ambition of the Venetians."
split(louis)

12-element Vector{SubString{String}}:
 "King"
 "Louis"
 "was"
 "brought"
 "into"
 "Italy"
 "by"
 "the"
 "ambition"
 "of"
 "the"
 "Venetians."

### String concatenation
with *

In [1292]:
dom_1 = "Such dominions thus acquired are either accustomed to live under a prince, or to live in freedom"

semi ="; "

dom_2 = "and are acquired either by the arms of the prince himself, or of others, or else by fortune or by ability."

concat_1 = dom_1*semi*dom_2

println(concat_1)
print(typeof(concat_1))

Such dominions thus acquired are either accustomed to live under a prince, or to live in freedom; and are acquired either by the arms of the prince himself, or of others, or else by fortune or by ability.
String

In [1293]:
"The Duke of Ferrara " * "who could not have withstood the" * "attacks of the Venetians in ’84"

"The Duke of Ferrara who could not have withstood theattacks of the Venetians in ’84"

#### Repeat

In [1294]:
"The Duke of Ferrara " ^ 3

"The Duke of Ferrara The Duke of Ferrara The Duke of Ferrara "

### String interpolation
with a $ sign adds a variable to the string.

In [1295]:
str_var_italy="Italy"
str_var_duke="Duke of Ferrara"
str_var_pope="Pope Julius"


dom_3 ="
We have in $str_var_italy, for example, the $str_var_duke, who could not have withstood the 
attacks of the Venetians in ’84, 
nor those of $str_var_pope in ’10, unless he had been long established in his dominions."

println(dom_3)
    


We have in Italy, for example, the Duke of Ferrara, who could not have withstood the 
attacks of the Venetians in ’84, 
nor those of Pope Julius in ’10, unless he had been long established in his dominions.


In [1296]:
old_feudal_family = "Collona"

"Principalities are either hereditary, like the $old_feudal_family."

"Principalities are either hereditary, like the Collona."

<p>
Adding math ops to the string.
</p>


In [1297]:

peasants_tax=15200
merchants_tax=44001
sum_taxes = peasants_tax + merchants_tax
println(sum_taxes)
println(typeof(sum_taxes))

str_var_duke_2="""The Duke of Ferrara collected $peasants_tax gold coins from the peasants and $merchants_tax gold coins from the merchants,
                which makes in sum $(peasants_tax + merchants_tax) gold coins."""



59201
Int64


"The Duke of Ferrara collected 15200 gold coins from the peasants and 44001 gold coins from the merchants,\nwhich makes in sum 59201 gold coins."

### Indexing strings
<p>
or grabbing characters from a string using the index.<br>
Other than in Python the index starts with 1 like in R.<br>
The end keyword returns from the end like the - in Python.
</p>


In [1298]:
states_1="I say at once there are fewer difficulties in holding hereditary states."
println(states_1)

first_char=states_1[1]
print(first_char)

last_char=states_1[end]


I say at once there are fewer difficulties in holding hereditary states.
I

'.': ASCII/Unicode U+002E (category Po: Punctuation, other)

#### Slicing strings with colons
allows to grab multiple chars.

In [1299]:
first_8_chars=states_1[1:8]
println(first_8_chars)

second_word=states_1[3:5]
println(second_word)

I say at
say


<p>
Select last chars from the end.
</p>

In [1300]:
last_two_words_1=states_1[end-17:end]
println(last_two_words_1)

hereditary states.


#### length() of a string

In [1301]:
len_states_1=length(states_1)

72

In [1302]:
states_1[len_states_1-17:len_states_1]

"hereditary states."

### String functions

In [1303]:
uppercase("king louis")

"KING LOUIS"

In [1304]:
lowercase("KING LOUIS")

"king louis"

In [1305]:
titlecase("king louis")

"King Louis"

In [1306]:
# replace("King Louis", "Louis" -> "Francis")

In [1307]:
rom = "King Louis yielded the Romagna to Alexander and the kingdom to Spain to avoid war."

findfirst("Spain", rom)

64:68

In [1308]:
occursin("to", rom)

true

### Slicing order data
for an armory merchant of the princes of Italy.

In [1309]:
order_1="customer: Duke of Ferrara | customer_id: 1215 | order: broad sword | quantity: 5 | oder_id: 34002"

"customer: Duke of Ferrara | customer_id: 1215 | order: broad sword | quantity: 5 | oder_id: 34002"

In [1310]:
customer_name=order_1[10:26]
println(customer_name)

order_id=order_1[end-5:end]
println(order_id)


 Duke of Ferrara 
 34002


### Vectors

In [1311]:
xt = [1,2,3]

3-element Vector{Int64}:
 1
 2
 3

In [1312]:
Vector{Float64}([1,2,3])

3-element Vector{Float64}:
 1.0
 2.0
 3.0

In [1313]:
100:110

100:110

In [1314]:
c=1:2:110
println(c)

1:2:109


<p>
Repeat each element 2 times inner and outer 3 times.
</p>

In [1315]:
repeat(xt, inner=2, outer=3)


18-element Vector{Int64}:
 1
 1
 2
 2
 3
 3
 1
 1
 2
 2
 3
 3
 1
 1
 2
 2
 3
 3

<p>
Some vector functions.
</p>

In [1316]:
q = [456, 12, 678, 756]
sort(q)

4-element Vector{Int64}:
  12
 456
 678
 756

In [1317]:
qq = [334, 45, 667, 9234]
reverse(qq)

4-element Vector{Int64}:
 9234
  667
   45
  334

In [1318]:
reverse!(qq)

4-element Vector{Int64}:
 9234
  667
   45
  334

In [1319]:
qqe = [12, 12, 12, 13, 13, 222, 222]
unique(qqe)


3-element Vector{Int64}:
  12
  13
 222

## Arrays

is the general term for organizing data as tables.

<ul>
<li>Vector: 1 dim</li>
<li>Matrice: 2 dims</li>
<li>Array: more than 2 dims</li>
</ul>

<p>
An overview of arrays is given in the doc:<br>
<a href="https://docs.julialang.org/en/v1/manual/arrays/">julialang.org</a>
</p>


<p>
Arrays are embedded into the core Julia and "derives its performance from the compiler".<br>
Other than in Py-Numpy.<br>

<p>
Constructing a one-dim array or vector.
</p>

In [1320]:
tax_collection=[1200, 14001, 13010, 8900]

4-element Vector{Int64}:
  1200
 14001
 13010
  8900

In [1321]:
println(eltype(tax_collection))
println(typeof(tax_collection))
println(size(tax_collection))

Int64
Vector{Int64}
(4,)


<p>
Loosing the comma creates a matrix.
</p>

In [1322]:
tax_collection_2=[1200 14001 13010 8900]

1×4 Matrix{Int64}:
 1200  14001  13010  8900

In [1323]:
println(eltype(tax_collection_2))
println(typeof(tax_collection_2))
println(size(tax_collection_2))

Int64
Matrix{Int64}
(1, 4)


<p>
Type of give the type of the data structure and the type of the elements at once.
</p>

In [1324]:
tax_debtors=["Ferrara", "Argento", "Emilia–Romagna","Comacchio"]
println(typeof(tax_debtors))

Vector{String}


<p>
Bad practice:<br>
mix up element types, while possible.<br>
It reduces execution speed and Julia is all about speed.
</p>

In [1325]:
mixed_up=["Ferrara", 1000, true]


3-element Vector{Any}:
     "Ferrara"
 1000
 true

In [1326]:
print(typeof(mixed_up))

Vector{Any}

<p>
Using spaces & semicolons for construction of vector arrays.
</p>

In [1327]:
t = [[1, 2, 3, 4]; [5, 6, 7, 8]]

8-element Vector{Int64}:
 1
 2
 3
 4
 5
 6
 7
 8

<p>
Creating 2 multidimensional arrays or matrices.<br>
The dimensions of the array are shaped by the syntax.<br>
An array can be created as 4x2 or 2x4 matrix.<br>
This import for matrix multiplication.
</p>

In [1328]:
tti = [[1, 2, 3, 4] [5, 6, 7, 8]]


4×2 Matrix{Int64}:
 1  5
 2  6
 3  7
 4  8

In [1329]:
print(typeof(tti))
println(size(tti))
println(tti)


Matrix{Int64}(4, 2)
[1 5; 2 6; 3 7; 4 8]


In [1330]:
qq = [[3, 3, 4];; [8 ,3, 9]]

3×2 Matrix{Int64}:
 3  8
 3  3
 4  9

In [1331]:
sdf = [1 2 3 4 ; 5 6 7 8]

2×4 Matrix{Int64}:
 1  2  3  4
 5  6  7  8

In [1332]:
print(typeof(sdf))
println(size(sdf))
println(sdf)

Matrix{Int64}(2, 4)
[1 2 3 4; 5 6 7 8]


In [1333]:
trt = [[1, 2, 3, 4] ;; [5, 6, 7, 8]]

4×2 Matrix{Int64}:
 1  5
 2  6
 3  7
 4  8

<p>
Concatenating elements.
</p>

In [1334]:
g =  [1:4, 5:9]
# No concat

2-element Vector{UnitRange{Int64}}:
 1:4
 5:9

In [1335]:
gt =  [1:4 ;5:9]

9-element Vector{Int64}:
 1
 2
 3
 4
 5
 6
 7
 8
 9

In [1336]:
gtz = [10:11, 100:101]

println(gtz)
println(typeof(gtz))

UnitRange{Int64}[10:11, 100:101]
Vector{UnitRange{Int64}}


In [1337]:
gto = [[10:11] [12:13] [14:15]]

1×3 Matrix{UnitRange{Int64}}:
 10:11  12:13  14:15

### Creating arrays of specific types

In [1338]:
d = Int64[5, 95, 17, 343, 2393]



5-element Vector{Int64}:
    5
   95
   17
  343
 2393

<p>
The float array converts the ints.
</p>

<p>
This is not the case for the int array.<br>
i = Int64[5, 9.4, 7.1, 10, 343, 2393]<br>
InexactError: Int64(9.5)
</p>

In [1339]:
i = Float64[5, 9.5, 7.1,10, 343, 2393]

6-element Vector{Float64}:
    5.0
    9.5
    7.1
   10.0
  343.0
 2393.0

In [1340]:
i22 = Float64[5 9.5 7.11 0 343 2393]

1×6 Matrix{Float64}:
 5.0  9.5  7.11  0.0  343.0  2393.0

### Empty arrays

In [1341]:
ar_zeros = zeros(Float64, 5)

5-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0

In [1342]:
ar_zeros[1] = 6666
println(ar_zeros)

[6666.0, 0.0, 0.0, 0.0, 0.0]


In [1343]:
ar_zeros[2:4] = [100.0, 101.0, 103.0]
println(ar_zeros)

[6666.0, 100.0, 101.0, 103.0, 0.0]


### Manipulating arrays

<p>
with funcs. Such modifying funcs using an ! at the end.
The func push! extends the array.
</p>
 


In [1344]:
p = [234, 4555, 668]

push!(p, 900000000)

println(p)
println(typeof(p))

[234, 4555, 668, 900000000]
Vector{Int64}


In [1345]:
push!(p, 777, 999)

6-element Vector{Int64}:
       234
      4555
       668
 900000000
       777
       999

In [1346]:
append!(p,  [1666, 1777, 1888])

9-element Vector{Int64}:
       234
      4555
       668
 900000000
       777
       999
      1666
      1777
      1888

In [1347]:
op = [222, 333, 444, 555, 666]
println(op)

remove_last_e = op[1:end-1]
println(remove_last_e)
println(op)

# Pop removes from the original list
get_last_e_with_pop = pop!(op)
println(get_last_e_with_pop)
println(op)

[222, 333, 444, 555, 666]
[222, 333, 444, 555]
[222, 333, 444, 555, 666]
666
[222, 333, 444, 555]


In [1348]:
pwd

pwd (generic function with 1 method)

#### Simple array functions

In [1349]:
turk = ["The", "entire",  "monarchy", "of",  "the",  "Turk",  "is" , "governed", "by",  "one",  "lord."]

11-element Vector{String}:
 "The"
 "entire"
 "monarchy"
 "of"
 "the"
 "Turk"
 "is"
 "governed"
 "by"
 "one"
 "lord."

In [1350]:
turk_sorted = sort(turk)

println(turk)
println(turk_sorted)

["The", "entire", "monarchy", "of", "the", "Turk", "is", "governed", "by", "one", "lord."]
["The", "Turk", "by", "entire", "governed", "is", "lord.", "monarchy", "of", "one", "the"]


In [1351]:
n_23 = [889, 23, 23, 23, 12122, 4556, 78, 78]

n_23_sorted = sort(n_23)

println(n_23)
println(n_23_sorted)

[889, 23, 23, 23, 12122, 4556, 78, 78]
[23, 23, 23, 78, 78, 889, 4556, 12122]


In [1352]:
println(n_23)
println(reverse(n_23))

[889, 23, 23, 23, 12122, 4556, 78, 78]
[78, 78, 4556, 12122, 23, 23, 23, 889]


In [1353]:
unique(n_23)

5-element Vector{Int64}:
   889
    23
 12122
  4556
    78

### Scalar vector operations

<p>
The . before the sign + signals a vector operations.<br>
The scalar is added to each element in the vector.<br>
</p>


In [1354]:
nn = [1, 2, 3, 4, 5,6, 7, 8, 9, 10]

add_scalar = 5
# adding needs the .+ syntax
nn_sum = nn .+ add_scalar

println(nn)
println(nn_sum)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[6, 7, 8, 9, 10, 11, 12, 13, 14, 15]


In [1355]:
println(typeof(nn))
println(nn)

Vector{Int64}
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [1356]:
difference = nn .- 6

println(nn)
println(difference)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]


In [1357]:
scalar = 3
# the .* is optional
product = nn * scalar

println(nn)
println(product)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[3, 6, 9, 12, 15, 18, 21, 24, 27, 30]


In [1358]:
quotient = nn ./ 2

println(nn)
println(quotient)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]


### Matrices operations

<p>
The . before the sign +/*/- signals matrice or array operations.<br>
The operations are based on linear algebra.<br>
Linear algebra is fundamental for machine learning.<br>
With matrices operations, coefficients or weights stored in one table are multiplied with variables in another table.<br>
The outputs are raw predictions.
The difference between these predictions and the actual values are the errors.<br>    
The errors are in turn optimized, for example with Gradient Descent calculus algorithm.<br>
The goal is to minimize the error by finding the combination of weights with the lowest error.
</p>

<a href = "https://www.mathsisfun.com/algebra/matrix-introduction.html" targt=_blank>mathsisfun</a>







In [1359]:
ff = [2 3]
println(typeof(ff))
println(size(ff))

tt = [4, 5]
println(typeof(tt))
println(size(tt))

tt .* ff

Matrix{Int64}
(1, 2)
Vector{Int64}
(2,)


2×2 Matrix{Int64}:
  8  12
 10  15

In [1360]:
tt .+ ff

2×2 Matrix{Int64}:
 6  7
 7  8

#### Matrix multiplication

In [1361]:
rr = [6 9; 5 7]

println(rr)
println(ff)

[6 9; 5 7]
[2 3]


In [1362]:
rr .* ff

2×2 Matrix{Int64}:
 12  27
 10  21

In [1363]:
rr .+ ff

2×2 Matrix{Int64}:
 8  12
 7  10

In [1364]:
ee = [4 6 2 ;  6 7 8]

println(ee)
println(ff)

[4 6 2; 6 7 8]
[2 3]


In [1365]:
display(*(ff , ee))

1×3 Matrix{Int64}:
 26  33  28

In [1366]:
ww = [[3, 3, 4] ;; [8, 3, 9]]

3×2 Matrix{Int64}:
 3  8
 3  3
 4  9

In [1367]:
*(ww, ee)

3×3 Matrix{Int64}:
 60  74  70
 30  39  30
 70  87  80

In [1368]:
ww * ee

3×3 Matrix{Int64}:
 60  74  70
 30  39  30
 70  87  80

#### Broadcasting

In [1369]:
A = [1 2];  
B = [5 6; 7 8; 9 10; 11 12]; 
broadcast(+, A, B)


4×2 Matrix{Int64}:
  6   8
  8  10
 10  12
 12  14

#### Dot product operations

<a href="https://www.mathsisfun.com/algebra/vectors-dot-product.html">
www.mathsisfun.com
</a>

In [1370]:
println(cos(abs(0)))
println(cos(abs(1)))
println(cos(abs(30)))
println(cos(abs(90)))


1.0
0.5403023058681398
0.15425144988758405
-0.4480736161291702


#### cosd: It returns the calculated cosine of the specified value in degrees.

In [1371]:
cosd(abs(59.5))

0.5075383629607042

In [1372]:
fg=abs(10)
gh=abs(13)

dot_prod_1 = fg*gh*cosd(59.5)
println(round(dot_prod_1))

66.0


In [1373]:
#a · b = -6 × 5 + 8 × 12
dot_prod_2 = -6*5 + 8*12
println(dot_prod_2)

66


#### With Julia

In [1374]:
LinearAlgebra.dot([8,-6], [12,5])

66

In [1375]:
LinearAlgebra.dot(ww, ee)

188

## Conditional Control flows

<p>
are fundamental to coding.
</p>

<p>
"Computer programs also make decisions, using Boolean expressions (true/false) inside conditionals (if/else). <br>
Thanks to conditionals, programs can respond differently based on different inputs and parameters." (Khan Academy)
</p>

<p>
Operators = and >
</p>

#### Other operators


<a href="https://www.geeksforgeeks.org/operators-in-julia/">
coperators-in-julia/
</a>

In [1376]:
100 == 2000

false

In [1377]:
100 != 2000

true

In [1378]:
100 <= 100

true

In [1379]:
a=234

println(typeof(a) == Int64)
println(typeof(a) == Float64)

true
false


In [1380]:
100 & 50

32

In [1381]:
is_duke = true

if is_duke
    # no colon here!
    println("How can I serve you, my duke!")
end

How can I serve you, my duke!


In [1382]:
is_duke = false

if is_duke
    # no colon here!
    println("How can I serve you, my duke!")
else    
    println("What do you want, beggar?")
end

What do you want, beggar?


In [1383]:
treasure = 70

if treasure > 80
    println("Buy the diadem for Catherina Sforza!")
    
else
    println("Buy flowers for Catherina Sforza!")
        
end

Buy flowers for Catherina Sforza!


In [1384]:
treasure = 67

if treasure > 80
    println("Buy the diadem for Catherina Sforza!")
    
elseif treasure > 60 && treasure < 80
    println("Buy a silver necklace for Catherina Sforza!")
    
else
    println("Buy flowers for Catherina Sforza!")
    
end

Buy a silver necklace for Catherina Sforza!


## Functions

<p>customized, self defined like:<br>
lb = kg * 2.2046</p>

In [1385]:
# func declaration
function kilo_to_pound(kg)
    # function body
    lb = kg  * 2.2046
    # return statement
    return round(lb, digits=2)
end

kilo_to_pound (generic function with 1 method)

In [1386]:
pound_2=kilo_to_pound(4)
println(pound_2)

8.82


In [1387]:
pound_2=kilo_to_pound(412.34)
println(pound_2)

909.04


In [1388]:

function buy_presents_2(treasure)
    

    if treasure > 80
        println("Buy the diadem for Catherina Sforza!")

    elseif treasure > 60 && treasure < 80
        println("Buy a silver necklace for Catherina Sforza!")

    else
        println("Buy flowers for Catherina Sforza!")

    end
end


buy_presents_2 (generic function with 1 method)

In [1389]:
buy_presents_2(76)

Buy a silver necklace for Catherina Sforza!


In [1390]:
buy_presents_2(12)

Buy flowers for Catherina Sforza!


In [1391]:
buy_presents_2(200)

Buy the diadem for Catherina Sforza!


### Multiple arguments

<p>
Formula for Velocity:<br>
In simple way, it is<br>
V=d/t<br>
d = X2 - X1
</p>

In [1392]:
function velocity_33(X1, X2, t)
    V = X1 - X2 / t
    return round(V, digits=2)
end

velocity_33 (generic function with 1 method)

In [1393]:
velocity_33_1 = velocity_33(100, 80, 10)
println(velocity_33_1)

92.0


In [1394]:
velocity_33_2 = velocity_33(777.34, 12.5, 13)
println(velocity_33_2)

776.38


### Funcs with broadcasting

In [1395]:
function fahrenheit2celsius(temp)
    return (temp - 32) * 5/9
    end
temps_f = [345, 3543, 21]
# Function not written to work with arrays
temps_c = fahrenheit2celsius.(temps_f)


3-element Vector{Float64}:
  173.88888888888889
 1950.5555555555557
   -6.111111111111111

### Broadcasting with multiple arguments

In [1396]:
x1_1 = [777.34, 35, 1000]
x2_1 = [12.5, 25, 500]
t_1 = [13, 10, 100]

velocity_33.(x1_1, x2_1, t_1)

3-element Vector{Float64}:
 776.38
  32.5
 995.0

## Data Frames

In [1397]:
rand(Int, 2)

2-element Vector{Int64}:
 1013866888796304765
 3863243386836771372

In [1398]:
rand(Int64)

4774458712049649936

In [1399]:
rand(1:10, 10)

10-element Vector{Int64}:
 10
  5
  7
  9
  1
  7
  5
  4
  7
  3

In [1400]:
medici_treasure = DataFrames.DataFrame(
    
    Gold_Ducats=rand(200:600, 5),
    Silver_Coins=rand(200:1000, 5),
    Jewels=rand(1:20, 5),
    Villages=["Bergamo", "Arcumeggia", "Monza", "Angera", "Pavia"],
    Subject_to_Taxation = [true, true, true, false, true]
)

println(medici_treasure)

[1m5×5 DataFrame[0m
[1m Row [0m│[1m Gold_Ducats [0m[1m Silver_Coins [0m[1m Jewels [0m[1m Villages   [0m[1m Subject_to_Taxation [0m
     │[90m Int64       [0m[90m Int64        [0m[90m Int64  [0m[90m String     [0m[90m Bool                [0m
─────┼────────────────────────────────────────────────────────────────────
   1 │         522           913       4  Bergamo                    true
   2 │         260           265      17  Arcumeggia                 true
   3 │         598           469      11  Monza                      true
   4 │         327           678       4  Angera                    false
   5 │         573           300      17  Pavia                      true


### Names & size

In [1401]:
# colnames
println(names(medici_treasure))

["Gold_Ducats", "Silver_Coins", "Jewels", "Villages", "Subject_to_Taxation"]


In [1402]:
println(size(medici_treasure))

(5, 5)


### Import a CSV file

In [1403]:
breast_cancer=CSV.File("files/breast-cancer-wisconsin.csv")
println(breast_cancer)

CSV.File("files/breast-cancer-wisconsin.csv"):
Size: 698 x 11
Tables.Schema:
 Symbol("1000025")  Int64
 Symbol("5")        Int64
 Symbol("1")        Int64
 Symbol("1_1")      Int64
 Symbol("1_2")      Int64
 Symbol("2")        Int64
 Symbol("1_3")      String3
 Symbol("3")        Int64
 Symbol("1_4")      Int64
 Symbol("1_5")      Int64
 Symbol("2_1")      Int64


### Convert csv files into data frames

In [1404]:
using DataFrames
breast_cancer_df = DataFrame(breast_cancer)

println(first(breast_cancer_df,5))
# comparable to head in R or Python

[1m5×11 DataFrame[0m
[1m Row [0m│[1m 1000025 [0m[1m 5     [0m[1m 1     [0m[1m 1_1   [0m[1m 1_2   [0m[1m 2     [0m[1m 1_3     [0m[1m 3     [0m[1m 1_4   [0m[1m 1_5   [0m[1m 2_1   [0m
     │[90m Int64   [0m[90m Int64 [0m[90m Int64 [0m[90m Int64 [0m[90m Int64 [0m[90m Int64 [0m[90m String3 [0m[90m Int64 [0m[90m Int64 [0m[90m Int64 [0m[90m Int64 [0m
─────┼─────────────────────────────────────────────────────────────────────────────────
   1 │ 1002945      5      4      4      5      7  10           3      2      1      2
   2 │ 1015425      3      1      1      1      2  2            3      1      1      2
   3 │ 1016277      6      8      8      1      3  4            3      7      1      2
   4 │ 1017023      4      1      1      3      2  1            3      1      1      2
   5 │ 1017122      8     10     10      8      7  10           9      7      1      4


<p>Bank file</p>

In [1405]:
bank=CSV.File("files/bank.csv")
bank_df=DataFrame(bank)
println(first(bank_df, 5))

[1m5×17 DataFrame[0m
[1m Row [0m│[1m age   [0m[1m job         [0m[1m marital  [0m[1m education [0m[1m default [0m[1m balance [0m[1m housing [0m[1m loan    [0m[1m contact  [0m[1m day   [0m[1m month   [0m[1m duration [0m[1m campaign [0m[1m pdays [0m[1m previous [0m[1m poutcome [0m[1m y       [0m
     │[90m Int64 [0m[90m String15    [0m[90m String15 [0m[90m String15  [0m[90m String3 [0m[90m Int64   [0m[90m String3 [0m[90m String3 [0m[90m String15 [0m[90m Int64 [0m[90m String3 [0m[90m Int64    [0m[90m Int64    [0m[90m Int64 [0m[90m Int64    [0m[90m String7  [0m[90m String3 [0m
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │    30  unemployed   married   primary    no          1787  no       no       cellular     19  oct            79         1     -1         0  unknown   no
   2 │    33  services

### Subsetting data frames

In [1406]:
bank_df[1,:]

Row,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
Unnamed: 0_level_1,Int64,String15,String15,String15,String3,Int64,String3,String3,String15,Int64,String3,Int64,Int64,Int64,Int64,String7,String3
1,30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown,no


In [1407]:
first(bank_df[:, 1:3], 2)

Row,age,job,marital
Unnamed: 0_level_1,Int64,String15,String15
1,30,unemployed,married
2,33,services,married


In [1408]:
bank_df[2, 1:3]

Row,age,job,marital
Unnamed: 0_level_1,Int64,String15,String15
2,33,services,married


In [1409]:
first(bank_df[:, "job"], 2)

2-element PooledArrays.PooledVector{String15, UInt32, Vector{UInt32}}:
 "unemployed"
 "services"

In [1410]:
first(bank_df[:, ["job", "marital"]], 2)

Row,job,marital
Unnamed: 0_level_1,String15,String15
1,unemployed,married
2,services,married


In [1411]:
bank_df[end-3:end,["education", "job", "marital"]]

Row,education,job,marital
Unnamed: 0_level_1,String15,String15,String15
1,tertiary,self-employed,married
2,secondary,technician,married
3,secondary,blue-collar,married
4,tertiary,entrepreneur,single


In [1412]:
first(bank_df.education, 3)

3-element PooledArrays.PooledVector{String15, UInt32, Vector{UInt32}}:
 "primary"
 "secondary"
 "tertiary"

In [1413]:
bank_df.education[6]

"tertiary"

In [1414]:
bank_df.education[5:10]

6-element PooledArrays.PooledVector{String15, UInt32, Vector{UInt32}}:
 "secondary"
 "tertiary"
 "tertiary"
 "secondary"
 "tertiary"
 "primary"

In [1415]:
bank_df[[12, 44, 85], 1:5]

Row,age,job,marital,education,default
Unnamed: 0_level_1,Int64,String15,String15,String15,String3
1,43,admin.,married,secondary,no
2,32,technician,married,tertiary,no
3,37,management,married,tertiary,no


### Sort and unique

In [1416]:
bank_df_job = sort(bank_df, "job")

bank_df_job[1:3, 1:5]

Row,age,job,marital,education,default
Unnamed: 0_level_1,Int64,String15,String15,String15,String3
1,43,admin.,married,secondary,no
2,37,admin.,single,tertiary,no
3,53,admin.,married,secondary,no


In [1417]:
bank_df_job_2 = sort(bank_df, "job", rev=true)

bank_df_job_2[1:3, 1:5]

Row,age,job,marital,education,default
Unnamed: 0_level_1,Int64,String15,String15,String15,String3
1,41,unknown,single,tertiary,no
2,37,unknown,married,unknown,no
3,52,unknown,married,secondary,no


In [1418]:
unique(bank_df.job)

12-element Vector{String15}:
 "unemployed"
 "services"
 "management"
 "blue-collar"
 "self-employed"
 "technician"
 "entrepreneur"
 "admin."
 "student"
 "housemaid"
 "retired"
 "unknown"

In [1419]:
unique(bank_df.education)

4-element Vector{String15}:
 "primary"
 "secondary"
 "tertiary"
 "unknown"

### Some descriptive Statistics

<p>
describe works best on strings.
</p>



In [1420]:
using Statistics

In [1421]:
describe(bank_df[:, 1:9])

Row,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Int64,DataType
1,age,41.1701,19,39.0,87,0,Int64
2,job,,admin.,,unknown,0,String15
3,marital,,divorced,,single,0,String15
4,education,,primary,,unknown,0,String15
5,default,,no,,yes,0,String3
6,balance,1422.66,-3313,444.0,71188,0,Int64
7,housing,,no,,yes,0,String3
8,loan,,no,,yes,0,String3
9,contact,,cellular,,unknown,0,String15


In [1422]:
describe(breast_cancer_df)

Row,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Int64,DataType
1,1000025,1071810.0,61634,1171710.0,13454352,0,Int64
2,5,4.41691,1,4.0,10,0,Int64
3,1,3.13754,1,1.0,10,0,Int64
4,1_1,3.2106,1,1.0,10,0,Int64
5,1_2,2.80946,1,1.0,10,0,Int64
6,2,3.21777,1,2.0,10,0,Int64
7,1_3,,1,,?,0,String3
8,3,3.4384,1,3.0,10,0,Int64
9,1_4,2.86963,1,1.0,10,0,Int64
10,1_5,1.59026,1,1.0,10,0,Int64


In [1423]:
mean(breast_cancer_df[:, "5"])

4.416905444126074

In [1424]:
std(breast_cancer_df[:, "5"])

2.8176733983653137

In [1425]:
minimum(breast_cancer_df[:, "5"])

1

In [1426]:
maximum(breast_cancer_df[:, "5"])

10

In [1427]:
sum(breast_cancer_df[:, "5"])

3083

### Column assignment

<p>
The example here uses standardization of values.<br>
The standardized values are assigned as new cols.
</p>

In [1428]:
breast_cancer_df[:, "5 divided by 2"]=breast_cancer_df[:, "5"]./2
println(names(breast_cancer_df))

["1000025", "5", "1", "1_1", "1_2", "2", "1_3", "3", "1_4", "1_5", "2_1", "5 divided by 2"]


In [1429]:
mean_balance=mean(bank_df.balance)
std_balance=std(bank_df.balance)

standardized_balance = (bank_df.balance .- mean_balance) ./ std_balance

bank_df[:, "standardized_balance"] = standardized_balance

println(names(bank_df))

println(bank_df[1:5, ["job", "education", "balance", "standardized_balance"]])

["age", "job", "marital", "education", "default", "balance", "housing", "loan", "contact", "day", "month", "duration", "campaign", "pdays", "previous", "poutcome", "y", "standardized_balance"]
[1m5×4 DataFrame[0m
[1m Row [0m│[1m job         [0m[1m education [0m[1m balance [0m[1m standardized_balance [0m
     │[90m String15    [0m[90m String15  [0m[90m Int64   [0m[90m Float64              [0m
─────┼───────────────────────────────────────────────────────
   1 │ unemployed   primary       1787             0.121058
   2 │ services     secondary     4789             1.11852
   3 │ management   tertiary      1350            -0.0241417
   4 │ management   tertiary      1476             0.0177238
   5 │ blue-collar  secondary        0            -0.472701


### Renaming columns

In [1430]:
renamed_bank_df = rename(bank_df, ["education" => "education_level"])
renamed_bank_df = rename!(bank_df, ["marital" => "married_or_not_?"])
# without ! works too

first(renamed_bank_df, 2)

Row,age,job,married_or_not_?,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y,standardized_balance
Unnamed: 0_level_1,Int64,String15,String15,String15,String3,Int64,String3,String3,String15,Int64,String3,Int64,Int64,Int64,Int64,String7,String3,Float64
1,30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown,no,0.121058
2,33,services,married,secondary,no,4789,yes,yes,cellular,11,may,220,1,339,4,failure,no,1.11852


In [1431]:
describe(bank_df)[18,:]

Row,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Int64,DataType
18,standardized_balance,-4.7149500000000005e-18,-1.5735,-0.325175,23.1806,0,Float64


<p>Normalization between 0 and 1</p>

In [1432]:
# xnew = (xi – xmin) / (xmax – xmin)
# xnew falls between 0 an 1
min_balance = minimum(bank_df.balance)
max_balance = maximum(bank_df.balance)

normalized_balance = (bank_df.balance .- min_balance) ./ (max_balance - min_balance)

bank_df[:, "normalized_balance"] = normalized_balance 

println(names(bank_df))


["age", "job", "married_or_not_?", "education", "default", "balance", "housing", "loan", "contact", "day", "month", "duration", "campaign", "pdays", "previous", "poutcome", "y", "standardized_balance", "normalized_balance"]


In [1433]:
describe(bank_df)[19,:]

Row,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Int64,DataType
19,normalized_balance,0.063565,0.0,0.0504289,1.0,0,Float64


### Filtering

<p>
Syntax:<br>
df = filter(row -> row.column==1000, df)
</p>

In [1434]:
println(minimum(bank_df.age))
println(maximum(bank_df.age))

19
87


In [1435]:
bank_df_age_filter_over_50 = filter(row -> row.age > 50, bank_df)

bank_df_age_filter_over_50[1:4,1:4]

Row,age,job,married_or_not_?,education
Unnamed: 0_level_1,Int64,String15,String15,String15
1,59,blue-collar,married,secondary
2,56,technician,married,secondary
3,55,blue-collar,married,primary
4,67,retired,married,unknown


In [1436]:
bank_df_age_filter_equal_44 = filter(row -> row.age == 44, bank_df)
bank_df_age_filter_equal_44[1:4,1:4]

Row,age,job,married_or_not_?,education
Unnamed: 0_level_1,Int64,String15,String15,String15
1,44,services,single,secondary
2,44,entrepreneur,married,secondary
3,44,admin.,married,secondary
4,44,technician,single,secondary


In [1437]:
bank_df_age_filter_not_50 = filter(row -> row.age != 50, bank_df)
bank_df_age_filter_not_50[1:4,1:4]

Row,age,job,married_or_not_?,education
Unnamed: 0_level_1,Int64,String15,String15,String15
1,30,unemployed,married,primary
2,33,services,married,secondary
3,35,management,single,tertiary
4,30,management,married,tertiary


In [1438]:
filter(row -> row.age == 50, bank_df_age_filter_not_50)
#  No 50 in the df

Row,age,job,married_or_not_?,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y,standardized_balance,normalized_balance
Unnamed: 0_level_1,Int64,String15,String15,String15,String3,Int64,String3,String3,String15,Int64,String3,Int64,Int64,Int64,Int64,String7,String3,Float64,Float64


In [1439]:
filter(row -> row.education == "primary", bank_df)[1:5, 1:7]

Row,age,job,married_or_not_?,education,default,balance,housing
Unnamed: 0_level_1,Int64,String15,String15,String15,String3,Int64,String3
1,30,unemployed,married,primary,no,1787,no
2,43,services,married,primary,no,-88,yes
3,25,blue-collar,single,primary,no,-221,yes
4,55,blue-collar,married,primary,no,627,yes
5,78,retired,divorced,primary,no,229,no


In [1440]:
edu=filter(row -> row.education == "primary", bank_df)

unique(edu.education)

1-element Vector{String15}:
 "primary"