# Creating Our First Julia App

We will learn to perform exploratory data analysis with Julia. In the process, we'll take a
look at RDatasets, a package that provides access to over 700 learning datasets. We'll load
one of them, the Iris flowers dataset, and we'll manipulate it using standard data analysis
functions. Then we'll look more closely at the data by employing common visualization
techniques. And finally, we'll see how to persist and (re)load our data.

But, in order to do that, first we need to revisit and take a look at some of the language's
most important building blocks.

We will cover the following topics in this chapter:

- Declaring variables (and constants)
- Working with Strings of characters and regular expressions
- Numbers and numeric types
- Our first Julia data structures—Tuple, Range, and Array
- Exploratory data analysis using the Iris flower dataset—RDatasets and core
- Statistics
- Quick data visualization with Gadfly
- Saving and loading tabular data with CSV and Feather
- Interacting with MongoDB databases

# Technical requirements

In order to install a specific version of a package you need to run:

In [2]:
# pkg> add PackageName@vX.Y.Z

# Defining variables

We have seen in the previous chapter how to use the REPL in order to execute
computations and have the result displayed back to us. Julia even lends a helping hand by
setting up the ans variable, which automatically holds the last computed value.

But, if we want to write anything but the most trivial programs, we need to learn how to
define variables ourselves. In Julia, a variable is simply a name associated to a value. There
are very few restrictions for naming variables, and the names themselves have no semantic
meaning (the language will not treat variables differently based on their names, unlike say
Ruby, where a name that is all caps is treated as a constant).

The variables, names are case-sensitive, meaning that ANSWER and answer (and Answer
and aNsWeR) are completely different things

Emojis also work, if your terminal supports them:

The only explicitly disallowed names for variables are the names of built-in Julia statements
(do, end, try, catch, if, and else, plus a few more):

In [4]:
do = 3

LoadError: syntax: invalid "do" syntax

Attempting to access a variable that hasn't been defined will result in an error:

It's true that the language does not impose many restrictions, but a set of code style
conventions is always useful—and even more so for an open source language. The Julia
community has distilled a set of best practices for writing code. In regard to naming
variables, the names should be lowercase and in just one word; word separation can be
done with underscores (_), but only if the name would be difficult to read without them.
For example, myvar versus total_length_horizontal.

# Constants

Constants are variables that, once declared, can't be changed. They are declared by
prefixing them with the const keyword:

In [6]:
const a = 3.2

3.2

Very importantly in Julia, constants are not concerned with their value, but rather with
their type. It is a bit too early to discuss types in Julia, so for now it suffices to say that a type
represents what kind of a value we're dealing with. For instance, "abc" (within double
quotes) is of type String, 'a' (within single quotes) is of type Char , and 1000 is of type
Int (because it's an integer). Thus, in Julia, unlike most other languages, we can change the
value assigned to a constant as long as the type remains the same. For instance, we can at
first decide that eggs and milk are acceptable meal choices and go vegetarian:

In [7]:
a = 3.9



3.9

In [8]:
a = 1

LoadError: invalid redefinition of constant a

# Why are constants important?

It's mostly about performance. Constants can be especially useful as global values. Because
global variables are long-lived and can be modified at any time and from any location in
your code, the compiler is having a hard time optimizing them. If we tell the compiler that
the value is constant and thus that the type of the value won't change, the performance
problem can be optimized away.

Global values in Julia, like in other languages,
must be avoided whenever possible. Besides performance issues, they can
create subtle bugs that are hard to catch and understand. Also, keep in
mind that, since Julia allows changing the value of a constant, accidental
modification becomes possible.

# Comments

Common programming wisdom says the following: </br>
"Code is read much more often than it is written, so plan accordingly."

Code comments are a powerful tool that make the programs easier to understand later on.
In Julia, comments are marked with the # sign. Single-line comments are denoted by a # and everything that follows this, until the end of the line, is ignored by the compiler.
Multiline comments are enclosed between #= ... =#. Everything within the opening and
the closing comment tags is also ignored by the compiler. Here is an example:

# Strings

A string represents a sequence of characters. We can create a string by enclosing the
corresponding sequence of characters between double quotes

In [9]:
str = "WTF"

"WTF"

If the string also includes quotes, we can escape these by prefixing them with a backslash \:

In [10]:
str = "W\"T"

"W\"T"

# Triple-quoted strings

However, escaping can get messy, so there's a much better way of dealing with this—by
using triple quotes """...""".

Within triple quotes, it is no longer necessary to escape the single quotes. However, make
sure that the single quotes and the triple quotes are separated—or else the compiler will get
confused:

In [11]:
"""My name is j"n """

"My name is j\"n "

The triple quotes come with some extra special powers when used with multiline text. First,
if the opening """ is followed by a newline, this newline is stripped from the string. Also,
whitespace is preserved but the string is dedented to the level of the least-indented line:

In [16]:
"""
         Hello
    Look
 Here      """

"        Hello\n   Look\nHere      "

In [17]:
print(ans)

        Hello
   Look
Here      

:| It didn't work :(

# Concatenating strings

Two or more strings can be joined together (concatenated) to form a single string by using
the star * operator:

In [4]:
"WTF" * " Jarvis" * "\n What do you want from me :\facepalm"

"WTF Jarvis\n What do you want from me :\facepalm"

Alternatively, we can invoke the string function, passing in all the words we want to
concatenate:

In [6]:
string("What ", "Do ", "you want?")

"What Do you want?"

In [7]:
first_name = "javid"
last_name = "norouzi"

"norouzi"

In [8]:
print(first_name * " " * last_name)

javid norouzi

However, again, we need to be careful when dealing with types (types are central to Julia,
so this will be a recurring topic). Concatenation only works for strings:

In [9]:
id = 82973874


82973874

In [10]:
print(first_name * " " * id)

LoadError: MethodError: no method matching *(::String, ::Int64)
[0mClosest candidates are:
[0m  *(::Any, ::Any, [91m::Any[39m, [91m::Any...[39m) at /opt/julia-1.7.1/share/julia/base/operators.jl:655
[0m  *([91m::T[39m, ::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} at /opt/julia-1.7.1/share/julia/base/int.jl:88
[0m  *(::Union{AbstractChar, AbstractString}, [91m::Union{AbstractChar, AbstractString}...[39m) at /opt/julia-1.7.1/share/julia/base/strings/basic.jl:260
[0m  ...

Performing the concatenation by invoking the string function does work even if not all
the arguments are strings:

In [12]:
string(first_name, " ", last_name, " ", id)

"javid norouzi 82973874"

Thus, string has the added advantage that it automatically converts its parameters to
strings. The following example works too:

In [13]:
string(2, " And ", 3)

"2 And 3"

# Interpolating strings

When creating longer, more complex strings, concatenation can be noisy and error-prone.
For such cases, we're better off using the $ symbol to perform variable interpolation into
strings:

In [14]:
"$first_name $last_name $id"

"javid norouzi 82973874"

More complex expressions can be interpolated by wrapping them into $(...):

In [16]:
"$(uppercase(first_name)) , $(lowercase(last_name))"

"JAVID , norouzi"

Just like the string function, interpolation takes care of converting the values to strings:

# Manipulating strings

Strings can be treated as a list of characters, so we can index into them—that is, access the
character at a certain position in the word:

In [17]:
str = "Strings can be treated as a list of characters, so we can index into them—that is, access the
character at a certain position in the word:"

"Strings can be treated as a list of characters, so we can index into them—that is, access the\ncharacter at a certain position in the word:"

In [18]:
str[10]

'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

Julia has support for arrays with arbitrary indices, allowing, for example,
to start numbering at 0. However, arbitrary indexing is a more advanced
feature that we won't cover here. If you are curious, you can check the
official documentation at https:/​/​docs.​julialang.​org/​en/​v1/​devdocs/
offset-​arrays/​.

In [19]:
str[10:20]

"an be treat"

It is important to notice that indexing via a singular value returns a Char , while indexing
via a range returns a String (remember, for Julia these are two completely different
things):

In [20]:
str[10]

'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

In [21]:
typeof(ans)

Char

In [22]:
str[10:10]

"a"

In [23]:
typeof(ans)

String

# Unicode and UTF-8

In Julia, string literals are encoded using UTF-8. UTF-8 is a variable-width encoding,
meaning that not all characters are represented using the same number of bytes. For
example, ASCII characters are encoded using a single byte—but other characters can use up
to four bytes. This means that not every byte index into a UTF-8 string is necessarily a valid
index for a corresponding character. If you index into a string at such an invalid byte index,
an error will be thrown. Here is what I mean:

str = "Søren Kierkegaard was a Danish Philosopher"