In [1]:
using Plots
gr()

Plots.GRBackend()

## A. Data Table choices

Machine learning is all about finding patterns in data, so it is very reasonble to start with data.

In [2]:
# Some Data (try your own)
x = [5,6.5,7,8]
y = [10.1, 19.9, 30.1, 40.3]
plot(x,y,
    label="Y", line=(7,:green), marker=(10,0.8,:red), xlims=(0,10), ylims=(0,50),
    xlabel="X",ylabel="Y")
    

### A.1. Just a matrix please. (No labels, no extras, simple.)

In [3]:
data1 = [x y]

4×2 Array{Float64,2}:
 5.0  10.1
 6.5  19.9
 7.0  30.1
 8.0  40.3

### A.2. Data Frames: Inspired by the R universe.

In [4]:
using DataFrames
data2 = DataFrame(X=x,Y=y) # Upper Case X and Y are labels (not data)

Unnamed: 0,X,Y
1,5.0,10.1
2,6.5,19.9
3,7.0,30.1
4,8.0,40.3


In [5]:
data2[1]

4-element DataArrays.DataArray{Float64,1}:
 5.0
 6.5
 7.0
 8.0

In [6]:
#Pkg.add("CSV")
using CSV
CSV.write("data.csv", data2)



CSV.Sink(    CSV.Options:
        delim: ','
        quotechar: '"'
        escapechar: '\\'
        null: ""
        dateformat: dateformat"yyyy-mm-dd", IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1), "data.csv", 8, true, String["X", "Y"], false)

In [7]:
;cat data.csv

  likely near /Users/alanedelman/.julia/v0.6/IJulia/src/kernel.jl:31
  likely near /Users/alanedelman/.julia/v0.6/IJulia/src/kernel.jl:31


"X","Y"
5.0,10.1
6.5,19.9
7.0,30.1
8.0,40.3


### A.3. Indexed Tables (Treat data like array indices, knows type information)

In [8]:
# Pkg.add("IndexedTables")
using  IndexedTables.Table
using IndexedTables
data3 = Table(Columns(X=x),Columns(Y=y))

X   │ Y
────┼─────
5.0 │ 10.1
6.5 │ 19.9
7.0 │ 30.1
8.0 │ 40.3

In [9]:
data3[6.5]

(Y = 19.9)

In [10]:
typeof.([data1,data2,data3])

3-element Array{DataType,1}:
 Array{Float64,2}                                                                                                                                                                                                                                 
 DataFrames.DataFrame                                                                                                                                                                                                                             
 IndexedTables.IndexedTable{NamedTuples._NT_Y{Float64},Tuple{Float64},IndexedTables.Columns{NamedTuples._NT_X{Float64},NamedTuples._NT_X{Array{Float64,1}}},IndexedTables.Columns{NamedTuples._NT_Y{Float64},NamedTuples._NT_Y{Array{Float64,1}}}}

### A.4. JuliaDB (Lots of bells and whistles, many files, parallelism, ...)

In [11]:
#Pkg.add("JuliaDB")
using JuliaDB:DTable
using JuliaDB

In [12]:
data4 = distribute(data3, 1) 

DTable with 4 rows in 1 chunks:

X   │ Y
────┼─────
5.0 │ 10.1
6.5 │ 19.9
7.0 │ 30.1
8.0 │ 40.3

In [13]:
data5 = loadfiles(["data.csv"], usecache=false)

Metadata for 0 / 1 files can be loaded from cache.
Reading 1 csv files totalling 44 bytes...


DTable with 4 rows in 1 chunks:

  │ X    Y
──┼──────────
1 │ 5.0  10.1
2 │ 6.5  19.9
3 │ 7.0  30.1
4 │ 8.0  40.3

In [14]:
typeof(data4)

JuliaDB.DTable{NamedTuples._NT_X{Float64},NamedTuples._NT_Y{Float64}}

In [15]:
data4[1:2]

an empty DTable


In [16]:
select(data4,1=>i->i≥7) 

DTable with 1 chunks:

X   │ Y
────┼─────
7.0 │ 30.1
8.0 │ 40.3
...

In [17]:
filter(t->(t[1]>30),data4) 

DTable with 1 chunks:

X   │ Y
────┼─────
7.0 │ 30.1
8.0 │ 40.3
...

### A.5 IterableTables

In [18]:

#using IterableTables, DataTables, TypedTables # haven't investigated  much but looks very nice

## B. Simple Line Fitting

[So why is it called "Regression" anyway?](http://blog.minitab.com/blog/statistics-and-quality-data-analysis/so-why-is-it-called-regression-anyway) Dalton's original meaning not quite what it means today.

B.1 Linear Regression function

In [19]:
b, w =  linreg(x,y)

(-42.45733333333333, 10.197333333333333)

In [20]:
plot()
plot(x,y,
    label="Y", line=(4,:blue), marker=(3,0.8,:blue), xlims=(0,10), ylims=(0,50),
    xlabel="X",ylabel="Y")
plot!(x->w*x+b,xlims=(minimum(x)-.5,maximum(x)+.5), line=(4,:red), label="best fit line")
plot!(x->w*x+b, x ,marker=(3,0.8,:red), label="" )
for i = 1:length(x)
    plot!([x[i],x[i]],[y[i],w*x[i]+b],line=(4,:green))
end
plot!(legend=:topleft)

Mathematically equivalent Approaches <br>
B.2 Linear Algebra Least Squares

In [21]:
A = [ones(x) x]

4×2 Array{Float64,2}:
 1.0  5.0
 1.0  6.5
 1.0  7.0
 1.0  8.0

In [22]:
A'A

2×2 Array{Float64,2}:
  4.0   26.5 
 26.5  180.25

In [23]:
A\y 

2-element Array{Float64,1}:
 -42.4573
  10.1973

In [24]:
(A'A)\A'y  # normal equations usually not recommended

2-element Array{Float64,1}:
 -42.4573
  10.1973

In [25]:
q,r = qr(A)
r\(q'y)

2-element Array{Float64,1}:
 -42.4573
  10.1973

In [26]:
[length(x) sum(x); sum(x) x⋅x] \ [ sum(y) ; x⋅y ] # (A'A)\A'y

2-element Array{Float64,1}:
 -42.4573
  10.1973

B.3 Basic Formula

In [27]:
w = cov(x,y)/var(x) # same as (x.-mean(x))⋅(y.-mean(y))/sum(abs2,x.-mean(x))
b = mean(y)-w*mean(x)
b,w

(-42.45733333333333, 10.197333333333333)

In [28]:
@which linreg(x,y) # essentially uses the above formula

B.4 optimization  (think machine learning) via the package optim.jl

In [29]:
using Optim   # Julia all the way down
loss(bw) = sum(abs2,bw[2]*x.+bw[1]-y) # uglyish
optimize(loss,[0.0,0.0]).minimizer

2-element Array{Float64,1}:
 -42.4573
  10.1973

B.5 optimization with the package JuMP <br>
Note not every julia function can be in @objective or @NLobjective
but that would be the goal. See  [linear and quadratic objective Jump Notes](http://www.juliaopt.org/JuMP.jl/0.18/refexpr.html)  and [Nonlinear Jump Notes](http://www.juliaopt.org/JuMP.jl/0.18/nlp.html#syntax-notes).

In [30]:
Pkg.add("Ipopt")

[1m[36mINFO: [39m[22m[36mPackage Ipopt is already installed
[39m[1m[36mINFO: [39m[22m[36mMETADATA is out-of-date — you may not have the latest version of Ipopt
[39m[1m[36mINFO: [39m[22m[36mUse `Pkg.update()` to get the latest versions of your packages
[39m

In [54]:
using JuMP, Ipopt
n = length(x)
m = Model(solver=IpoptSolver(print_level=0))
@variable(m,w)
@variable(m,b)
@objective(m, Min, sum((w*x[i]+b-y[i])^2 for i in 1:n))
#@objective(m, Min,   sum(abs2,  w*x+b-y))
solve(m)
println( " b = ", getvalue(b), "w = ", getvalue(w))

 b = -42.457333333333715w = 10.19733333333339


B.6 Generalized Linear Models <br>
the very fancy statistical thing

In [32]:
#Pkg.add("GLM")
using GLM # Generalized Linear Models

In [33]:
lm(@formula(Y~X), data2)

DataFrames.DataFrameRegressionModel{GLM.LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: Y ~ 1 + X

Coefficients:
             Estimate Std.Error  t value Pr(>|t|)
(Intercept)  -42.4573    9.9622 -4.26184   0.0509
X             10.1973   1.48405   6.8713   0.0205


The lines above are obviously b and w
We assume at the start X is known without error, b,w,σ are unknown and
the real Y is distributed like  b+w*X+$\sigma *$randn(),
and the Y we have are samples from this distribution.

Under these assumptions, if we fit many times, the b and w would be normal, with these predicted standard deviations.

The third column is just the ratio of column 1 to column 2 , thus normalizing the situation to a standard normal.

When the probability column is less than .05, we can reject the hypothesis that the intercept/slope is 0 at the 5 percent signficance level. What does this mean? It means we feel pretty good about our intercept and slope. If the probability is higher than .05 we can not reject the null hypothesis, meaning that we feel 0 for the intercept/slope could have been possible. In particular a 0 slope says that the dependent variable is not really statistically dependent after all.

### C. Stochastic Gradient Descent

In [34]:
loss(w,b,i) =(w*x[i]+b-y[i])^2  # loss due to point i
Dloss(w,b,i) = 2*(w*x[i]+b-y[i])*[x[i];1]

Dloss (generic function with 1 method)

In [35]:
w,b = 0.0, 0.0
for t=1:100000
    η = .002  # there seems to be an art to picking these steplengths
    i = rand(1:4)
    d = Dloss(w,b,i)
    w -= η * d[1]
    b -= η * d[2]  
end
 println(b," ",w)   

-42.49592100446783 10.040047957689831


In [59]:
loss(w,b,i) =(w*χ[i]+b-y[i])^2  # loss due to point i
Dloss(w,b,i) = 2*(w*χ[i]+b-y[i])*[χ[i];1]

Dloss (generic function with 1 method)

In [62]:
μ = mean(x)
σ = std(x)
χ = (x-μ)/σ

w,b = 0.0, 0.0
for t=1:100000
    η = .02  # there seems to be an art to picking these steplengths
    i = rand(1:4)
    d = Dloss(w,b,i)
 #   w -= η * d[1]
  #  b -= η * d[2] 
    ## instead fancy update rules like Adam
    update(d,)
end
 println(b-w*μ/σ," ",w/σ)


-42.76911028054944 10.238718795569065


###  D. KNET

In [36]:
#Pkg.add("Knet")
using Knet

In [37]:
predict(w,x) = w[2]*x .+ w[1]
loss(w,x,y) = sum(abs2, y - predict(w,x)) 

loss (generic function with 2 methods)

In [38]:
lossgradient = grad(loss)

(::gradfun) (generic function with 1 method)

In [39]:
function train(w, data; lr=.1)
    p=1
    for (x,y) in data
        println("This is pass $p")
        p+=1
        dw = lossgradient(w, x, y)
        for i in 1:length(w)
            w[i] -= lr * dw[i]
        end
    end
    return w
end

train (generic function with 1 method)

In [40]:
train([0.0,0.0],zip(x,y),lr=.01) # not enough data

This is pass 1
This is pass 2
This is pass 3
This is pass 4


2-element Array{Float64,1}:
 0.79688
 5.16277

In [41]:
data = [(x[i],y[i]) for i=1:4]

4-element Array{Tuple{Float64,Float64},1}:
 (5.0, 10.1)
 (6.5, 19.9)
 (7.0, 30.1)
 (8.0, 40.3)

In [67]:
function train2(w, data; lr=.1)
       for t in 1:10000
          
        (x,y) = data[rand(1:4)]
        dw = lossgradient(w, x, y)
            for i=1:length(w)
           # w[i] -= lr * dw[i]
            update(w, lossgradient(w,x,y), adam())
        end
    end
    return w
end

train2 (generic function with 1 method)

In [68]:
train2([0.0;0.0],data,lr=.01) 

LoadError: [91mArgumentError: invalid index: 19.9[39m

### E. TensorFlow

In [70]:
#Pkg.add("TensorFlow")
using TensorFlow
session = Session()

Session(Ptr{Void} @0x0000000122d41c30)

In [71]:
W = TensorFlow.Variable(randn())
b = TensorFlow.Variable(randn())

TensorFlow.Variables.Variable{Float64}(<Tensor node_4:1 shape=() dtype=Float64>, <Tensor node_4/Assign:1 shape=unknown dtype=Float64>)

In [79]:
X = placeholder(Float32)
@tf Y = multiply(X,W).+b
Y_obs = placeholder(Float32)

Stacktrace:
 [1] [1mdepwarn[22m[22m[1m([22m[22m::String, ::Symbol[1m)[22m[22m at [1m./deprecated.jl:70[22m[22m
 [2] [1m(::Base.##712#713)[22m[22m[1m([22m[22m::TensorFlow.Tensor{Float64}, ::TensorFlow.Variables.Variable{Float64}[1m)[22m[22m at [1m./deprecated.jl:346[22m[22m
 [3] [1m(::TensorFlow.###8#9#11{Base.##712#713})[22m[22m[1m([22m[22m::Array{Any,1}, ::Function, ::TensorFlow.Tensor{Float64}, ::Vararg{Any,N} where N[1m)[22m[22m at [1m/Users/alanedelman/.julia/v0.6/TensorFlow/src/meta.jl:67[22m[22m
 [4] [1m(::TensorFlow.##8#10)[22m[22m[1m([22m[22m::TensorFlow.Tensor{Float64}, ::Vararg{Any,N} where N[1m)[22m[22m at [1m/Users/alanedelman/.julia/v0.6/TensorFlow/src/meta.jl:67[22m[22m
 [5] [1minclude_string[22m[22m[1m([22m[22m::String, ::String[1m)[22m[22m at [1m./loading.jl:515[22m[22m
 [6] [1minclude_string[22m[22m[1m([22m[22m::Module, ::String, ::String[1m)[22m[22m at [1m/Users/alanedelman/.julia/v0.6/Compat/src/Co

<Tensor placeholder_8:1 shape=unknown dtype=Float32>

In [73]:
Loss=sum( (Y.-Y_obs).^2 )

<Tensor reduce_2:1 shape=unknown dtype=Float64>

In [74]:
optimizer = TensorFlow.train.GradientDescentOptimizer(1e-3)
minimizer = TensorFlow.train.minimize(optimizer, Loss)

<Tensor Group_3:1 shape=unknown dtype=Any>

In [75]:
run(session, global_variables_initializer())
for i in 1:20000
    run(session, minimizer, Dict(X=>x, Y_obs=>y))
end

In [76]:
run(session, [b, W])

2-element Array{Float64,1}:
 -41.7336
  10.0909

In [80]:
visualize()