# <center> Introduction to Julia Language</center>

### <center> Paul Stey</center>
### <center> Brown University — Center for Computation \& Visualization — `ccv.brown.edu`</center>

<center><img src=ccv-logo-square.png height="300" width="300"></center>

Repo: `https://github.com/brown-ccv/ccv-bootcamp-julia`

JupyterHub: `https://ccv.jupyter.brown.edu`

## Outline 

1. Why Julia? 
2. Variables and Basic Types
    * Type hierarchy, Concrete vs Abstract types
3. Functions
    * Overloading 
    * Multiple dispatch
4. Control Flow
    * Iteration and conditional branching
5. Composite Types
6. Parallel Programming in Julia
7. DataFrame Type


## Two-Language Problem
  - Dynamic languages (e.g., Python, R, Matlab) tend to be slow
  - Lower-level, compiled languages (e.g., C, C++, Fortran) are very fast, but are more time-consuming to write and debug 

 ## The Standard Approach 
   - Write core algorithms in lower-level language
   - Then wrap that in higher-level language using some interface language/package (e.g., Cython, Rcpp)

## What's the Problem?
  1. Higher barrier to entry for package authors
  2. Package authors must know (at least) two languages
  3. Creates a sharp divide between package authors and package users

# <center> Meet Julia! </center>




## Julia is:
1. Dynamic technical computing lanugage
2. Multi-paradigm (functional, imperative, "OO"-ish)
3. Just-in-time compiled using LLVM
4. Under active development (current stable version: 1.9.0)
5. Fast!
6. Fun!

## History of Julia

* Started in 2009 as PhD thesis of Jeff Bezanson at MIT
* Jeff has been collaborating with Stefan Karpinski and Viral Shah
* First official release in 2012
* Influenced by: C, Python, R, Ruby, Lisp
* Specifically designed to be as fast as C and as expressive as Python or Ruby

## Julia Basics
- Similar to Python and R in that:
  * Very flexible
  * Has elements of imperative, functional, and object-oriented programming
  * Functions are first-class citizens 
    + (e.g., pass as arguments, return from other functions)
 


## Julia Basics (cont.)
- Differs from Python and R in that:
  * Compiled, not interpretted
  * Not object-oriented in the classical sense
  * No classes with private data and private methods
  * No concept of inheritence
  * Designed with parallelism in mind
  * Multiple dispatch
  * Excellent for generic programming
  * Metaprogramming

## Base Language Data Types

1. Primitives
  * Any numeric type you can imagine: `Int64`, `Float64`, `BigInt`, `Complex`, `Irrational`, `Rational`
  * Many abstract types: `Any`, `Real`, `Number`, `Integer`
  * `String` and `Chars`
2. Container Types
  * `Array` (vectors, matrices, _N_-dimensional arrays)
  * `Set`
  * `Dict`
  * `Tuple` and `NamedTuple`
3. Composite Types
  * `struct`

In [None]:
# Simple function that computes Fibonacci numbers

function fib(n)
    nums = ones(Int, n)
    for i = 3:n
        nums[i] = nums[i - 1] + nums[i - 2]
    end 
    return nums[n]
end     

In [None]:
fib(5)


# Ways to Use Julia

1. REPL
2. `.jl` "script" file
3. Jupyter Notebook
4. VS Code with Plug-in


# Numeric Variables

In [None]:
a = 42           # use `=` for variable assignment

In [None]:
typeof(a)        # typeof() function gives us the type of a variable

In [None]:
b = 3.14        

typeof(b)

In [None]:
c = UInt(37)

typeof(c)

# String and Character Variables

In [None]:
s = "hello, world"

typeof(s)

In [None]:
c = 'f'

typeof(c)

In [None]:
length(s)            # get string length

## More String Operations

In [None]:
w = "potato soup"

w[3]                      # index in to a string

In [None]:
w[1:6]                    # slice of a string

In [None]:
w[8:end]                  # use `end` for final element

In [None]:
dinner = "cheesy " * w    # string concatenation with `*`

### Even More String Operations

In [None]:
string("foo", "bar")      # string concatenation with string()

In [None]:
print("I love $a")   # string interpolation with `$`

# Hiearchical Types and Shared Behavior

![](images/type_system.png)

## Testing for Subtype-ness

In [None]:
UInt128 <: Integer

In [None]:
Float16 <: AbstractFloat

In [None]:
Real <: Rational

In [None]:
Rational <: Real

# Collection Types

1. `Array` (and `Vector` and `Matrix`)
2. `Tuple` (and `NamedTuple`)
3. `Set`
4. `Dict`

## The `Array` Type

In [None]:
a = [4, 3, 1]             # create 1-dimensional Array (i.e., Vector)

In [None]:
typeof(a)

In [None]:
B = [4 5 6; 1 2 3]       # create 2-dimensional Array (i.e., Matrix)

In [None]:
typeof(B)

## Useful `Array` Constructors

1. `zeros()`
2. `ones()`
3. `falses()` and `trues()`
3. `Array()`

In [None]:
a = zeros(5)             # allocate array of zeros

In [None]:
a = zeros(Int, (3, 2))  # allocate matrix of zeros

In [None]:
b = ones((2, 3, 2))     # allocate 3-dimensional array of ones

### More Useful `Array` Constructors

In [None]:
c = falses(3)              # allocated vector of falses

In [None]:
d = trues((2, 4))          # allocate matrix of trues

In [None]:
v = rand(3)                # allocated vector of uniform U(0, 1) random numbers

In [None]:
u = randn(2)              # allocated vector of std normal N(0, 1) random numbers

# Indexing and Slicing `Arrays`

  * Use familiar `[]` notation to index and slice
  * Use 1-based indexing (like R, Matlab, Lua)
  * Can use integers and booleans

In [None]:
a = ["cat", "dog", "fish", "shoe"]      # allocated vector of strings

In [None]:
a[1]                                   # get first element

In [None]:
a[end]                                 # get last element

In [None]:
a[2:3]                                 # get second through third element

### Slicing and Indexing with Booleans

In [None]:
v = randn(5)                          # vector of draws from N(0, 1)

In [None]:
v .> 0                                # which elements of v > 0      

In [None]:
v[v .> 0]                             # get elements of v > 0

## Indexing and Slicing 2-D `Array`

  * Uses common `[row, column]` notation

In [None]:
m = ["apple" "kiwi" "pear"; 
     "soup" "rice" "steak"]

In [None]:
m[1, 3]             # get element in first row, third column

In [None]:
m[:, 2]             # get all of second column

In [None]:
m[1, :]             # get first row


### Accessing Last Element of Array

* Can use `end` keyword to get last element in an array

In [None]:
a = ["cat", "dog", "fish", "shoe"]  

a[end]                      

In [None]:
a[end - 1]                 # get second-to-last element

## Adding Elements to an Array

* Arrays in Julia are mutable types
* We can use the `push!()` function to add an element to an array

In [None]:
v = [23, 345, 4245]

In [None]:
push!(v, 999)

In [None]:
v

# The `Tuple` Type
  * Fixed-length
  * Immutable
  * Can hold hetergenous data

In [None]:
t = (3.14, "soup")             # create a tuple with 2 elements

In [None]:
t[2]

In [None]:
t[1] = 17                      # ERROR

## The `NamedTuple` Type


In [None]:
z = (firstname = "paul", lastname = "stey", id = "37")

In [None]:
typeof(z)

In [None]:
z.firstname                 # get element by its name

In [None]:
z.id

In [None]:
z[1]                       # get element by index

# The `Set` Type

In [None]:
s = Set([4, 5, 6])

In [None]:
t = Set([5, 7, 11, 4, 6])

In [None]:
union(s, t)

## More Operations on `Set` Types

In [None]:
intersect(s, t)                   # get set intersection

In [None]:
symdiff(s, t)                     # get symmetric differnce of sets

## Even More Operations on `Set` Types

In [None]:
s ∩ t                   # set intersection

In [None]:
s ∪ t                   # union of sets

In [None]:
s ⊆ t                  # check if s is subset of t

In [None]:
issubset(s, t)         # same as above

# The `Dict` Type

  * Has "keys" and "values"
  * "associative" container type
  * Similar to `dict` in Python, `HashMap` in Rust, and `std::unordered_map` in C++

In [None]:
d = Dict("lee" => 21, "jones" => 32)

In [None]:
d["lee"]

In [None]:
d["smith"] = 82

In [None]:
d["kim"] = 12   

# Defining Fuctions

  * Conventional block format and mathematical notation

In [None]:
function add_one(n) 
    res = n + 1
    return res 
end

In [None]:
add_one(41)

In [None]:
f(n) = 2n + 1                

In [None]:
f(41)

## Functions with Multiple Arguments

In [None]:
function add_vals(a, b) 
    c = a + b
    return c
end 

In [None]:
add_vals(137, 42)

In [None]:
g(x, y) = 2x + 3y

In [None]:
g(137, 42)

## Default Arguments

In [None]:
function say_hi(name = "world")
    println("Hello, $name")
end 

In [None]:
say_hi("Paul")

In [None]:
say_hi()

## Multiple Dispatch
  * Conceptually similar to function overloading
  * Can have several methods for a function
    - Method tha is dispatched at run time is dependent on all function arguments and types

In [None]:
methods(say_hi)

### Adding Method to `say_hi()`

In [None]:
function say_hi(name1, name2)
    println("Hello, $name1 and $name2")
end

In [None]:
say_hi("Mary", "Paul")

In [None]:
methods(say_hi)

# Specifying Return Type
  * We can (optionally) specify the type returned by our functions
  * Will perform cast, if one is possible, error otherwise

In [None]:
function add_two(n)::Int8            # this function will return an Int8 type
    res = n + 2
    return res
end 

In [None]:
print(add_two(12))

In [None]:
typeof(add_two(12))

In [None]:
typeof(add_two(15.0))

# Argument Types

  * Beauty of multiple dispatch
    - Can use type of arguments to determine the method that gets used

In [None]:
function do_math(a::Int, b::Int)                   # when args are ints, we add them
    a + b
end 

In [None]:
function do_math(a::Float64, b::Float64)           # when args are floats, we exponentiate
    a^b
end 

In [None]:
function do_math(a::String, b::String)             # when args are strings, celebrate!! 
    println("$a and $b love math!! Yaaayyyyy!!!")
end 

## Argument Types (cont.)

In [None]:
do_math(4, 2)

In [None]:
do_math(4.1, 2.1)

In [None]:
do_math("carlos", "paul")

In [None]:
methods(do_math)

# Branching with `if` and `else`

In [None]:
a = 17

if a < 0
    println("a is negative!")
end

In [None]:
if a < 0
    println("a is negatvie!")
else 
    println("a is non-negative!")
end

### Branching with `if`, `else`, and `elseif`

In [None]:
b = 2.2

if b < 0
    println("b is negative")
elseif b < 1000
    println("b is a small positive number")
else
    println("b is a large number")
end

# The `for` Loop

In [None]:
for i in 1:5
    println(i)
end

In [None]:
for i = 1:5               # alternate syntax, identical behavior
    println(i)
end

## More `for` Looping

In [None]:
items = ["dog", "cat", "shoe", "potato"]        # can iterate over arbitrary types

for thing in items
    println(thing)
end

In [None]:
for thing in items
    if thing == "dog" || thing == "cat"
        println("we love $(thing)s")
    end
end

## Exiting `for` Loop Early

In [None]:
some_things = [1.8, 4, pi, "potato", 23, 10_000_000]

In [None]:
for x in some_things
    if !isa(x, Number)
        println("Oh, no!!! $x is not numeric!")
        break
    else
        println(x)
    end 
end

## Skipping an Iteration

In [None]:
nums = 1:10

for a in nums
    if iseven(a)
        continue
    end
    println(a) 
end

# `while` Loop
  * Same `break` and `continue` keywords for exiting and skipping

In [None]:
a = 13

while a > 0
    println(a)
    a -= 1
end

### Skipping Iterations in `while` Loop

In [None]:
a = 12

while a > 0
    a -= 1
    if iseven(a)
        continue
    end
    println(a)
end

<center><h1>Challenge Problem 1</h1></center>


The Collatz Conjecture is a mathematical conjecture named after Lothar Collatz, who first proposed it in 1937. It's a simple sequence that starts with any positive integer. Here's how it works:

1. Start with any positive integer $n$.
2. If $n$ is even, divide it by $2$ to get $n/2$.
3. If $n$ is odd, multiply it by $3$ and add $1$ to get $3n + 1$.
4. Repeat the process indefinitely (or until you arrive at 1) for the resulting values.

The Collatz Conjecture states that no matter what number you start with, eventually you will reach 1. Despite its simplicity, the conjecture is unsolved and is a famous problem in the field of number theory. 

Write a function that generates Collatz sequences. Our function should accept a single argument called `start`, which represents the starting value. Our function should return a vector representing the Collatz sequence generated by that starting value.

**Hint**: You will likely want to use a `while` loop, and do some branching with `if` and `else`. Also recall you can use `push!()` to append to an array in Julia.

In [None]:
function collatz_sequence(start) 
    # put more code here...


end 

In [None]:
# When our function is correct, these three tests should pass (i.e., be silent)

@assert [1] == collatz_sequence(1)
@assert [5, 16, 8, 4, 2, 1] == collatz_sequence(5)
@assert [10, 5, 16, 8, 4, 2, 1] == collatz_sequence(10)

# User-Defined Types

1. User can define composite types (i.e., `struct`s)
2. A `struct` can be mutable or immutable
3. Immutable `struct`s are allocated on stack, mutable are on the heap

In [None]:
struct Foo
    bar
    baz::Int
    qux::Float64
end

In [None]:
foo = Foo("Hello, world.", 23, 1.5)

In [None]:
typeof(foo)

In [None]:
fieldnames(Foo)

## Accessing Fields of a `struct`

In [None]:
foo.bar

In [None]:
foo.baz

In [None]:
foo.qux

## Immutable by Default

In [None]:
foo.bar = "Will this work?"  # Nope!

In [None]:
Foo((), 23.5, 1)            # ERROR second element must be and Int

## Creating Mutable `struct`

In [None]:
mutable struct Bar
    baz
    qux::Float64
end


In [None]:
bar = Bar("Hello", 1.5);
println(bar)

bar.qux = 2.0
println(bar)

bar.baz = 1//2
println(bar)

## Constructors for a `struct`

In [None]:
struct Square 
    height::Int
    width::Int
    area::Int
    
    function Square(h, w)
        a = h * w
        
        return new(h, w, a)
    end 
end 



In [None]:
sq = Square(5, 3)

## Operator Overloading

+ Can overload base language's operators (e.g., `+`, `*`, `/`)

In [None]:
struct Circle{T}
    x::T
    y::T
    radius::Int
    
    function Circle(x_coord::T, y_coord::T, rad::Int) where T <: Real
        return new{T}(x_coord, y_coord, rad)
    end
end 

### Operator Overloading (cont.)

In [None]:
import Base.+ 

function +(c1::Circle, c2::Circle)

    x_new = (c1.x + c2.x) / 2
    y_new = (c1.y + c2.y) / 2
    
    rad = c1.radius + c2.radius
    
    res = Circle(x_new, y_new, rad)
    
    return res 
end


In [None]:
# Test our new method for +() function

c1 = Circle(2, 3, 4)
c2 = Circle(1, 2, 5)

In [None]:
typeof(c1)

In [None]:
c1 + c2

# Parallel Programming with Julia

* Parallel programming is easy and powerful in Julia!!
* Can do shared-memory parallelism using threads
* Some parallelism is automatic in Julia!


In [None]:
n = 10_000

A = randn((n, n))
B = randn((n, n))

C = A * B              # matrix multiplication is parallelized in Julia by default

## Using `Threads` Module

* The `Threads` module is built in to the standard Julia library
* Note that the `JULIA_NUM_THREADS` environment variable should be set to the number of cores on your machine
  + For example, I have an M2 Max chip with 12 cores, so in my `~/.zshrc` file, I have the following:
  ```export JULIA_NUM_THREADS=12```

In [None]:
using Base.Threads       # load Threads module

In [None]:
nthreads()               # display number of available threads

## Using `@threads` Macro

In [None]:
function fill_with_thread_id() 
    
    n = 100_000_000
    vec = zeros(Int, n)
    
    @threads for i in 1:n 
        # @threads macro tells julia we want our loop iterations to
        # be handled by different threads to distribute the workload 
        vec[i] = Threads.threadid()
    end
    
    return vec
end 

In [None]:
thread_ids = fill_with_thread_id()

<center><h1>Challenge Problem 2</h1></center>


Write a function that, when given a matrix, `X`, of integers, will return a new matrix, `Y`, whose elements `Y[i, j]` will be boolean values indicating whether or not the element `X[i, j]` is prime. Note that you can use the `isprime()` function in the Primes.jl package. Since the `X` matrix is large, and the task is "embarrassingly parallel", let's try to use `@threads` macro for multi-threading to make this fast!

**HINT:** Julia uses column-major ordering of elements; bear this in mind to shave off some run time!

In [None]:
using Pkg

Pkg.add("Primes")

In [None]:
using Primes
using Random

Random.seed!(137)                     # set seed for reproducibility

A = rand(1:100_000, 20_000, 20_000)   # matrix of random integers from 1 to 100,000

function prime_status(mat)
    #
    # your code goes here 
    #  

end

In [None]:
@time prime_status(A)

# DataFrames in Julia

* The DataFrames.jl package implements this important data structure

In [None]:
Pkg.add("DataFrames")

In [None]:
using DataFrames

In [None]:
df = DataFrame(v = rand(1:1000, 10), 
               x = randn(10), 
               y = rand(['a', 't', 'g', 'c'], 10), 
               z = rand(["foo", "bar", "baz"], 10))

## Accessing Elements of DataFrame

In [None]:
df[1, 2]                # get element from first row, second column

In [None]:
df[2:4, :z]             # get elements from rows 2 to 4 in column `z`

In [None]:
df[:, :x]              # get all of column `x`

### Accessing Columns


In [None]:
df[:, [:v, :y]]         # get columns `v` and `y`

In [None]:
df[:, Not(:z)]         # get all columns except `z`

## Reading from CSV File


In [None]:
Pkg.add("CSV")

In [None]:
using CSV

In [None]:
arrests_df = CSV.read("data/pvd_arrests_2021-10-03.csv", DataFrame)

## Summarizing DataFrame

In [None]:
describe(arrests_df)

## Split-Apply-Combine Pattern

In [None]:
Pkg.add("StatsBase")

In [None]:
using StatsBase

arrests_gdf = groupby(arrests_df, :from_state)  # create grouped-dataframe

combine(arrests_gdf, :age => median)              # apply function group-wise

<center><h1>Challenge Problem 3</h1></center>


Write a function, `count_chars()`, that takes an arrry of characters and returns a dictionary with counts for each how many times each character appeared in the array.

**Hint:** There are several ways to accomplish this. You could use some `if` and `else` combination inside a `for` loop to check whether a key is in the dictionary, but you might instead consider using the `get()` function, which allows us to specify a default value when a key is not in the dictionary. Either approach is perfectly fine.


In [None]:
using Random


Random.seed!(137)

v = rand(['a', 't', 'g', 'c'], 500_000_000)

function count_chars(chars_vec)
    #
    # your code goes here
    #
        
    
end 

In [None]:
@time count_chars(v)

# <center>Thank you!</center>