# Chapter 1 
# The Groundwork Julia's Environment

# Introduction

Julia is a fairly young programming language. In 2009, three developers (Stefan Karpinski, Jeff Bezanson, and Viral Shah) at MIT in the Applied Computing group under the supervision of Prof. Alan Edelman started working on a project that lead to Julia. In February 2012, Julia was presented publicly and became open source. The source code is available on GitHub (https://github.com/JuliaLang/julia).

# This chapter covers the following topics:


- How is Julia different?
- Setting up Julia's environment
- Using Julia's shell and REPL
- Using Jupyter notebooks
- Package management
- Parallel computation 
- Multiple dispatch
- Language interoperability

# 1.  How is Julia different?

The advancement in compiler techniques and language design, it is possible to eliminate the trade-off between performance and dynamic prototyping. So, the scientific computing required was a good dynamic language like Python together with performance like C. And then came Julia, a general purpose programming language designed according to the requirements of scientific and technical computing, providing performance comparable to C/C++, and with an environment productive enough for prototyping like the high-level dynamic language of Python. 

#### The key features offered by Julia are:

- A general purpose high-level dynamic programming language designed to be effective for numerical and scientific computing
- A Low-Level Virtual Machine (LLVM) based Just-in-Time (JIT) compiler that enables Julia to approach the performance of statically-compiled languages like C/C++

### Features and advantages of Julia can be summarized as follows:


- It's designed for distributed and parallel computation.
- Julia provides an extensive library of mathematical functions with great numerical accuracy.
- Julia gives the functionality of multiple dispatch. Multiple dispatch refers to using many combinations of argument types to define function behaviors.
- The Pycall package enables Julia to call Python functions in its code and Matlab packages using Matlab.jl. Functions and libraries written in C can also be called directly without any need for APIs or wrappers.
- Julia provides powerful shell-like capabilities for managing other processes in the system.
- Unlike other languages, user-defined types in Julia are compact and quite fast as
   built-ins.
- Data analysis makes great use of vectorized code to gain performance benefits. Julia eliminates the need to vectorize code to gain performance. De-vectorized code written in Julia can be as fast as vectorized code.
- It uses lightweight “green” threading also known as tasks or coroutines,
  cooperative multitasking, or one-shot continuations.
- Julia has a powerful type system. The conversions provided are elegant and extensible.
- It has efficient support for Unicode.
- It has facilities for metaprogramming and Lisp-like macros.
- It has a built-in package manager. (Pkg)
- Julia provides efficient, specialized and automatic generation of code for different argument types.
- It's free and open source with an MIT license.

# 2.  Setting up the environment

Julia is available free. It can be downloaded from its website at the following address: 
http://julialang.org/downloads/. 
The website also has exhaustive documentation, examples, and links to tutorials and community.

### Installing Julia (Mac)

Users with Mac OS X need to click on the downloaded .dmg file to run the disk image. After that, drag the app icon into the Applications folder. It may prompt you to ask if you want to continue as the source has been downloaded from the Internet and so is not considered secure. Click on continue if it is downloaded for the Julia language official website.

Julia can also be installed using homebrew on the Mac as follows:

     brew update
     
     brew tap staticfloat/julia
     
     brew install julia


The installation is complete. To check if the installation is successful in the Terminal, type the following:


    julia --version

This gives you the installed Julia version.

# 3. Using REPL

Read-Eval-Print-Loop is an interactive shell or the language shell that provides the functionality to test out pieces of code. Julia provides an interactive shell with a Just-in- Time compiler at the backend. We can give inputs in a line, it is compiled and evaluated, and the result is given in the next line.


<img src="Untitled.png" alt="(Screenshot of wikipedia table of WA EVD cases)">

The benefit of using the REPL is that we can test out our code for possible errors. Also, it is a good environment for beginners. We can type in the expressions and press Enter to evaluate.

Typing ? in the language shell will change the prompt to: 

    ? help

In [51]:
? help

search: sc[0m[1mh[22m[0m[1me[22mdu[0m[1ml[22me C[0m[1mh[22mann[0m[1me[22m[0m[1ml[22m [0m[1mh[22masfi[0m[1me[22m[0m[1ml[22md @t[0m[1mh[22mr[0m[1me[22madca[0m[1ml[22ml AbstractC[0m[1mh[22mann[0m[1me[22m[0m[1ml[22m searc[0m[1mh[22msort[0m[1me[22md[0m[1ml[22mast



**Welcome to Julia 1.4.2.** The full manual is available at

```
https://docs.julialang.org/
```

as well as many great tutorials and learning resources:

```
https://julialang.org/learning/
```

For help on a specific function or macro, type `?` followed by its name, e.g. `?cos`, or `?@time`, and press enter. Type `;` to enter shell mode, `]` to enter package mode.


To clear the screen, press Ctrl + L. To come out of the REPL press Ctrl + D or type the following:

     julia> exit().

# 4. Using Jupyter Notebook

Data science and scientific computing are privileged to have an amazing interactive tool called Jupyter Notebook. With Jupyter Notebook you can to write and run code in an interactive web environment, which also has the capability to have visualizations, images, and videos. It makes testing of equations and prototyping a lot easier. It has the support of over 40 programming languages and is completely open source.

Jupyter notebooks are extensively used for coding machine-learning algorithms, statistical modeling and numerical simulation, and data munging.

Jupyter Notebook is implemented in Python but you can run the code in any of the 40 languages provided you have their kernel.

It is highly recommended to install Anaconda if you are new to Python and data science. Commonly used packages for data science, numerical, and scientific computing including Jupyter notebook come bundled with Anaconda making it the preferred way to set up the environment. Instructions can be found at 

https://www.continuum.io/downloads.


Jupyter is present in the Anaconda package, but you can check if the Jupyter package is up to date by typing in the following:

    conda install jupyter


To check if Jupyter is installed properly, type the following in the Terminal:

     jupyter -version

It should give the version of the Jupyter if it is installed.

Now, to use Julia with Jupyter we need the IJulia package. This can be installed using Julia's package manager.

After installing IJulia, we can create a new notebook by selecting Julia under the Notebooks section in Jupyter.


<img src="Untitled Copy.png" alt="(Screenshot of wikipedia table of WA EVD cases)">

To get the latest version of all your packages, in Julia's shell type the following:

     using Pkg
     Pkg.update()

In [52]:
using Pkg

In [53]:
#Pkg.update()

Popular editors such as Atom and Sublime have a plugin for Julia. Atom has language—julia and Sublime has Sublime—IJulia, both of which can be downloaded from their package managers.

# 5. Package management

Julia provides a built-in package manager. Using Pkg we can install libraries written in Julia. For external libraries, we can also compile them from their source or use the standard package manager of the operating system

A list of registered packages is maintained at    http://pkg.julialang.org.

# Pkg.status() – package status

The Pkg.status() is a function that prints out a list of currently installed packages with a summary. This is handy when you need to know if the package you want to use is installed or not.

In [54]:
using Pkg
Pkg.status()

[32m[1mStatus[22m[39m `~/.julia/environments/v1.4/Project.toml`
 [90m [c52e3926][39m[37m Atom v0.12.17[39m
 [90m [336ed68f][39m[37m CSV v0.6.2[39m
 [90m [a93c6f00][39m[37m DataFrames v0.21.4[39m
 [90m [31c24e10][39m[37m Distributions v0.23.5[39m
 [90m [38e38edf][39m[37m GLM v1.3.9[39m
 [90m [09f84164][39m[37m HypothesisTests v0.10.0[39m
 [90m [7073ff75][39m[37m IJulia v1.21.2[39m
 [90m [6deec6e2][39m[37m IndexedTables v0.13.0[39m
 [90m [e5e0dc1b][39m[37m Juno v0.8.2[39m
 [90m [91a5bcdd][39m[37m Plots v0.29.9[39m
 [90m [438e738f][39m[37m PyCall v1.91.4[39m
 [90m [d330b81b][39m[37m PyPlot v2.9.0[39m
 [90m [ce6b1742][39m[37m RDatasets v0.6.9[39m
 [90m [60ddc479][39m[37m StatPlots v0.9.2[39m
 [90m [2913bbd2][39m[37m StatsBase v0.33.0[39m
 [90m [f3b207a7][39m[37m StatsPlots v0.14.6[39m


It is required by the command that the Pkg.status() returns a valid list of the packages installed. The list of packages given by the Pkg.status() are of registered versions which are managed by Pkg.


In [55]:
Pkg.installed()

└ @ Pkg /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.4/Pkg/src/Pkg.jl:531


Dict{String,VersionNumber} with 16 entries:
  "CSV"             => v"0.6.2"
  "Distributions"   => v"0.23.5"
  "Atom"            => v"0.12.17"
  "HypothesisTests" => v"0.10.0"
  "StatsPlots"      => v"0.14.6"
  "StatPlots"       => v"0.9.2"
  "Juno"            => v"0.8.2"
  "PyCall"          => v"1.91.4"
  "StatsBase"       => v"0.33.0"
  "IJulia"          => v"1.21.2"
  "Plots"           => v"0.29.9"
  "PyPlot"          => v"2.9.0"
  "RDatasets"       => v"0.6.9"
  "IndexedTables"   => v"0.13.0"
  "DataFrames"      => v"0.21.4"
  "GLM"             => v"1.3.9"

Pkg.installed() can also be used to return a list of all the installed packages with their versions.

# Pkg.add() – adding packages

Julia's package manager is declarative and intelligent. You only have to tell it what you want and it will figure out what version to install and will resolve dependencies if there are any. Therefore, we only need to add the list of requirements that we want and it resolves which packages and their versions to install.

We can also use Pkg.add(package_name) to add packages and Pkg.rm(package_name) to remove packages. 

In [56]:
#  Pkg.add(package_name)     # to add packages.

In [57]:
#  Pkg.rm(package_name)      # to remove packages.

# Working with unregistered packages

Frequently, we would like to be able to use packages created by our team members or someone who has published on Git but they are not in the registered packages of Pkg. Julia allows us to do that by using a clone. Julia packages are hosted on Git repositories and can be cloned using mechanisms supported by Git.

In [58]:
# Pkg.clone("git://example.com/path/unofficialPackage/Package.jl.git")

# Pkg.update() – package update

It's good to have updated packages. Julia, which is under active development, has its packages frequently updated and new functionalities are added.


To update all of the packages, type the following:

In [59]:
Pkg.update()

[32m[1m   Updating[22m[39m registry at `~/.julia/registries/General`


[?25l    

[32m[1m   Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`




[32m[1m   Updating[22m[39m `~/.julia/environments/v1.4/Project.toml`
[90m [no changes][39m
[32m[1m   Updating[22m[39m `~/.julia/environments/v1.4/Manifest.toml`
[90m [no changes][39m


# 6. Parallel computation using Julia

Advancement in modern computing has led to multi-core CPUs in systems and sometimes these systems are combined together in a cluster capable of performing a task which a single system might not be able to perform alone, or if it did it would take an undesirable amount of time. 

Julia's environment of parallel processing is based on message passing. Multiple processes are allowed for programs in separate memory domains.


Julia's parallel programming paradigm is built on the following:
  - Remote references 
  - Remote calls


A request to run a function on another process is called a remote call. The reference to an object by another object on a particular process is called a remote reference. A remote reference is a construct used in most distributed object systems. Therefore, a call which is made with some specific arguments to the objects generally on a different process by the objects of the different process is called the remote call and this will return a reference to the remote object which is called the remote reference.

Julia uses a single process default. To start Julia with multiple processors use the following:


     julia -p n

where n is the number of worker processes. Alternatively, it is possible to create extra processors from a running system by using addproc(n). It is advisable to put n equal to the number of the CPU cores in the system.

# 7. Julia's key feature – multiple dispatch


A function is an object, mapping a tuple of arguments using some expression to a return value. When this function object is unable to return a value, it throws an exception. For different types of arguments the same conceptual function can have different implementations. 

For example, we can have a function to add two floating point numbers and another function to add two integers. But conceptually, we are only adding two numbers.

Julia provides a functionality by which different implementations of the same concept can be implemented easily. The functions don't need to be defined all at once. They are defined in small abstracts. These small abstracts are different argument type combinations and have different behaviors associated with them. The definition of one of these behaviors is called a method.

# Methods in multiple dispatch

Is a function in Julia using multiple dispatch. Multiple dispatch is used by all of Julia's standard functions and operators. For various possible combinations of argument types and count, all of them have many methods defining their behavior. A method is restricted to take certain types of arguments using the :: type-assertion operator:


In [60]:
f(x::Float64, y::Float64) = x + y


f (generic function with 2 methods)

In [61]:
f(10.0, 14.0)

24.0

# Ambiguities – method definitions

Sometimes function behaviors are defined in such a way that there isn't a unique method to apply for a certain set of arguments. Julia throws a warning in such cases about this ambiguity, but proceeds by arbitrarily picking a method. To avoid this ambiguity we should define a method to handle such cases.

In [62]:
f(x::Number, y::Number) = 2x + 2y

f (generic function with 2 methods)

In [63]:
f(24.0, 4.0)

28.0

In [64]:
f(10,11)

42

# 8.  Facilitating language interoperability

Julia can be used to write most kinds of code, there are mature libraries for numerical and scientific computing which we would like to exploit. These libraries can be in C, Fortran or Python. Julia allows the ease of using the existing code written in Python, C, or Fortran. This is done by making Julia perform simple and efficient-to-call C, Fortran, or Python functions.

Importing Python code can be beneficial and sometimes needed, especially for data science, because it already has an exhaustive library of implementations of machine learning and statistical functions. For example, it contains scikit-learn and pandas. To use Python in Julia, we require PyCall.jl. To add PyCall.jl do the following:

In [65]:
Pkg.add("PyCall")

[32m[1m  Resolving[22m[39m package versions...
[32m[1m   Updating[22m[39m `~/.julia/environments/v1.4/Project.toml`
[90m [no changes][39m
[32m[1m   Updating[22m[39m `~/.julia/environments/v1.4/Manifest.toml`
[90m [no changes][39m


PyCall contains a macro @pyimport that facilitates importing Python packages and provides Julia wrappers for all of the functions and constants therein, including automatic conversion of types between Julia and Python.


# Calling Python code in Julia

The @pyimport macro automatically makes the appropriate type conversions to Julia types in most of the scenarios based on a runtime inspection of the Python objects. It achieves better control over these type conversions by using lower-level functions. Using PyCall in scenarios where the return type is known can help in improving the performance, both by eliminating the overhead of runtime type inference, and also by providing more type information to the Julia compiler:

-  pycall(function::PyObject, returntype::Type, args...)

-  pyimport(s): 

# References:

  - Anshul Joshi - Julia for Data Science
  - http://julialang.org/ 
  - https://github.com/JuliaLang 
  - https://github.com/JuliaStats