In [None]:
using Plots
#gr()
pyplot()
#plotly()

## A. Data Table choices

Machine learning is all about finding patterns in data, so it is very reasonble to start with data.

In [None]:
# Some Data (try your own)
x = [5,6.5,7,8]
y = [10.1, 19.9, 30.1, 40.3]
# plot(x,y,
#     label="Y", line=(7,:green), marker=(10,0.8,:red), xlims=(0,10), ylims=(0,50),
#     xlabel="X",ylabel="Y")
    

### A.1. Just a matrix please. (No labels, no extras, simple.)

In [None]:
data1 = [x y]

### A.2. Data Frames: Inspired by the R universe.

In [None]:
using DataFrames
data2 = DataFrame(X=x,Y=y) # Upper Case X and Y are labels (not data)

In [None]:
data2[1]

In [None]:
#Pkg.add("CSV")
using CSV
CSV.write("data.csv", data2)

In [None]:
;cat data.csv

### A.3. Indexed Tables (Treat data like array indices, knows type information)

In [None]:
# Pkg.add("IndexedTables")
using  IndexedTables.Table
using IndexedTables
data3 = Table(Columns(X=x),Columns(Y=y))

In [None]:
data3[6.5]

In [None]:
typeof.([data1,data2,data3])

### A.4. JuliaDB (Lots of bells and whistles, many files, parallelism, ...)

In [None]:
#Pkg.add("JuliaDB")
using JuliaDB:DTable
using JuliaDB

In [None]:
data4 = distribute(data3, 1) 

In [None]:
data5 = loadfiles(["data.csv"], usecache=false)

In [None]:
typeof(data4)

In [None]:
data4[1:2]

In [None]:
select(data4,1=>i->i≥7) 

In [None]:
filter(t->(t[1]>30),data4) 

### A.5 IterableTables

In [None]:

#using IterableTables, DataTables, TypedTables # haven't investigated  much but looks very nice

## B. Simple Line Fitting

[So why is it called "Regression" anyway?](http://blog.minitab.com/blog/statistics-and-quality-data-analysis/so-why-is-it-called-regression-anyway) Dalton's original meaning not quite what it means today.

B.1 Linear Regression function

In [None]:
b, w =  linreg(x,y)

In [None]:
plot()
plot(x,y,
    label="Y", line=(4,:blue), marker=(3,0.8,:blue), xlims=(0,10), ylims=(0,50),
    xlabel="X",ylabel="Y")
plot!(x->w*x+b,xlims=(minimum(x)-.5,maximum(x)+.5), line=(4,:red), label="best fit line")
plot!(x->w*x+b, x ,marker=(3,0.8,:red), label="" )
for i = 1:length(x)
    plot!([x[i],x[i]],[y[i],w*x[i]+b],line=(4,:green))
end
plot!(legend=:topleft)

Mathematically equivalent Approaches <br>
B.2 Linear Algebra Least Squares

In [None]:
A = [ones(x) x]

In [None]:
A'A

In [None]:
A\y 

In [None]:
(A'A)\A'y  # normal equations usually not recommended

In [None]:
q,r = qr(A)
r\(q'y)

In [None]:
[length(x) sum(x); sum(x) x⋅x] \ [ sum(y) ; x⋅y ] # (A'A)\A'y

B.3 Basic Formula

In [None]:
w = cov(x,y)/var(x) # same as (x.-mean(x))⋅(y.-mean(y))/sum(abs2,x.-mean(x))
b = mean(y)-w*mean(x)
b,w

In [None]:
@which linreg(x,y) # essentially uses the above formula

B.4 optimization  (think machine learning) via the package optim.jl

In [None]:
using Optim   # Julia all the way down
loss(bw) = sum(abs2,bw[2]*x.+bw[1]-y) # uglyish
optimize(loss,[0.0,0.0]).minimizer

B.5 optimization with the package JuMP <br>
Note not every julia function can be in @objective or @NLobjective
but that would be the goal. See  [linear and quadratic objective Jump Notes](http://www.juliaopt.org/JuMP.jl/0.18/refexpr.html)  and [Nonlinear Jump Notes](http://www.juliaopt.org/JuMP.jl/0.18/nlp.html#syntax-notes).

In [None]:
#Pkg.add("Ipopt")

In [None]:
using JuMP, Ipopt
n = length(x)
m = Model(solver=IpoptSolver(print_level=0))
@variable(m,w)
@variable(m,b)
@objective(m, Min, sum((w*x[i]+b-y[i])^2 for i in 1:n))
#@objective(m, Min,   sum(abs2,  w*x+b-y))
solve(m)
println( " b = ", getvalue(b), "w = ", getvalue(w))

B.6 Generalized Linear Models <br>
the very fancy statistical thing

In [None]:
#Pkg.add("GLM")
using GLM # Generalized Linear Models

In [None]:
lm(@formula(Y~X), data2)

The lines above are obviously b and w
We assume at the start X is known without error, b,w,σ are unknown and
the real Y is distributed like  b+w*X+$\sigma *$randn(),
and the Y we have are samples from this distribution.

Under these assumptions, if we fit many times, the b and w would be normal, with these predicted standard deviations.

The third column is just the ratio of column 1 to column 2 , thus normalizing the situation to a standard normal.

When the probability column is less than .05, we can reject the hypothesis that the intercept/slope is 0 at the 5 percent signficance level. What does this mean? It means we feel pretty good about our intercept and slope. If the probability is higher than .05 we can not reject the null hypothesis, meaning that we feel 0 for the intercept/slope could have been possible. In particular a 0 slope says that the dependent variable is not really statistically dependent after all.

### C. Stochastic Gradient Descent

In [None]:
loss(w,b,i) =(w*x[i]+b-y[i])^2  # loss due to point i
Dloss(w,b,i) = 2*(w*x[i]+b-y[i])*[x[i];1]

In [None]:
w,b = 0.0, 0.0
for t=1:100000
    η = .002  # there seems to be an art to picking these steplengths
    i = rand(1:4)
    d = Dloss(w,b,i)
    w -= η * d[1]
    b -= η * d[2]  
end
 println(b," ",w)   

In [None]:
loss(w,b,i) =(w*χ[i]+b-y[i])^2  # loss due to point i
Dloss(w,b,i) = 2*(w*χ[i]+b-y[i])*[χ[i];1]

In [None]:
μ = mean(x)
σ = std(x)
χ = (x-μ)/σ

w,b = 0.0, 0.0
for t=1:100000
    η = .01  # there seems to be an art to picking these steplengths
    i = rand(1:4)
    d = Dloss(w,b,i)
     w -= η * d[1]
     b -= η * d[2] 
    ## instead fancy update rules like Adam ??
   
end
 println(b-w*μ/σ," ",w/σ)


###  D. KNET

In [None]:
#Pkg.add("Knet")
using Knet

In [None]:
predict(w,x) = w[2]*x .+ w[1]
loss(w,x,y) = sum(abs2, y - predict(w,x)) 

In [None]:
lossgradient = grad(loss)

In [None]:
function train(w, data; lr=.1)
    p=1
    for (x,y) in data
        println("This is pass $p")
        p+=1
        dw = lossgradient(w, x, y)
        for i in 1:length(w)
            w[i] -= lr * dw[i]
        end
    end
    return w
end

In [None]:
train([0.0,0.0],zip(x,y),lr=.01) # not enough data

In [None]:
data = [(x[i],y[i]) for i=1:4]

In [None]:
function train2(w, data; lr=.1)
       for t in 1:10000
          
        (x,y) = data[rand(1:4)]
        dw = lossgradient(w, x, y)
            for i=1:length(w)
            w[i] -= lr * dw[i]
            #update(w, lossgradient(w,x,y), adam())
        end
    end
    return w
end

In [None]:
train2([0.0;0.0],data,lr=.01) 

### E. TensorFlow

Jon Malmaud explains why Julia+TensorFlow is way better than Python+Tensorflow
[https://www.youtube.com/watch?v=MaCf1PtHEJo](https://www.youtube.com/watch?v=MaCf1PtHEJo)

Why use Julia API when you can use Python API? <br>
At 3:20 he explains three reasons: <br>
1. Julia's Multiple Dispatch System <br>
2. Julia's macro system <br>
3. Julia's JIT compiler

Namespace doesn't carry over. <br>
Tensorflow models can be defined and Julia and sent over to Python. <br>
Native Julia while loops look like Julia, not some weird TensorFlow thing. <br>
Automatically imports new operations, no waiting. <br>

Nico Jimenez says TensorFlow is too low level to be useful in some ways and too high level to be useful in other ways:
[http://nicodjimenez.github.io/2017/10/08/tensorflow.html](http://nicodjimenez.github.io/2017/10/08/tensorflow.html)


In [1]:
x = [5,6.5,7,8]
y = [10.1, 19.9, 30.1, 40.3]

4-element Array{Float64,1}:
 10.1
 19.9
 30.1
 40.3

In [3]:
using TensorFlow

In [4]:
session=Session()

2017-11-02 11:11:03.716863: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-02 11:11:03.716894: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.


Session(Ptr{Void} @0x0000000121ac59d0)

In [5]:
@tf X=placeholder(Float32)

<Tensor X:1 shape=unknown dtype=Float32>

In [6]:
@tf W=get_variable([], Float32)

TensorFlow.Variables.Variable{Float32}(<Tensor W:1 shape=() dtype=Float32>, <Tensor W/Assign:1 shape=unknown dtype=Float32>)

In [7]:
 @tf b=get_variable([], Float32)

TensorFlow.Variables.Variable{Float32}(<Tensor b:1 shape=() dtype=Float32>, <Tensor b/Assign:1 shape=unknown dtype=Float32>)

In [8]:
 @tf Y=X.*W + b

<Tensor Y:1 shape=unknown dtype=Float32>

In [9]:
 @tf Y_obs=placeholder(Float32)

<Tensor Y_obs:1 shape=unknown dtype=Float32>

In [10]:
 @tf Loss=reduce_sum((Y.-Y_obs).^2)

<Tensor Loss:1 shape=unknown dtype=Float32>

In [11]:
optimizer=train.GradientDescentOptimizer(1e-3)

GradientDescentOptimizer(α=0.001)

In [12]:
minimizer=train.minimize(optimizer, Loss)

<Tensor Group:1 shape=unknown dtype=Float32>

In [14]:
run(session, global_variables_initializer())
for i in 1:20000
    run(session, minimizer, Dict(X=>[5,6.5,7,8], Y_obs=>[10.1,19.9,30.1,40.3]))
end

In [15]:
run(session, [b, W])

2-element Array{Float32,1}:
 -41.7249
  10.0896

In [16]:
visualize()

ERROR:tensorflow:TensorBoard attempted to bind to port 6006, but it was already in use
ERROR:tensorflow:TensorBoard attempted to bind to port 6007, but it was already in use


In [17]:
visualize()

ERROR:tensorflow:TensorBoard attempted to bind to port 6008, but it was already in use
ERROR:tensorflow:TensorBoard attempted to bind to port 6009, but it was already in use
