# Concise Implementation of Linear Regression

Deep learning has witnessed a Cambrian explosion of sorts over the past decade. The sheer number of techniques, applications and algorithms by far surpasses the progress of previous decades. This is due to a fortuitous combination of multiple factors, one of which is the powerful free tools offered by a number of open source deep learning frameworks. Theano (Bergstra et al., 2010), DistBelief (Dean et al., 2012), and Caffe (Jia et al., 2014) arguably represent the first generation of such models that found widespread adoption. In contrast to earlier (seminal) works like SN2 (Simulateur Neuristique) (Bottou and Le Cun, 1988), which provided a Lisp-like programming experience, modern frameworks offer automatic differentiation and the convenience of Julia. These frameworks allow us to automate and modularize the repetitive work of implementing gradient-based learning algorithms.

In Section 3.4, we relied only on (i) tensors for data storage and linear algebra; and (ii) automatic differentiation for calculating gradients. In practice, because data iterators, loss functions, optimizers, and neural network layers are so common, modern libraries implement these components for us as well. In this section, we will show you how to implement the linear regression model from Section 3.4 concisely by using high-level APIs of deep learning frameworks.

## Generating the Dataset

For this example, we will work low-dimensional for succinctness. The following code snippet generates 1000 examples with 2-dimensional features drawn from a standard normal distribution. 

In [1]:
using Distributions

function synthetic_data(w::Vector{<:Real},b::Real,num_example::Int)
    X = rand(Normal(0,1),num_example,length(w))
    y = X * w .+ b
    y += rand(Normal(0,0.01),size(y))
    return X',y
end

synthetic_data (generic function with 1 method)

Later, we can check our estimated parameters against these ground truth values.

In [2]:
features,labels = synthetic_data([2, -3.4],4.2,1000)

([0.5006953714114035 1.4278165842789703 … 2.6686629253301724 -0.5470407231072809; 1.1013609444195784 -0.8059628155679628 … 0.6777961748000622 1.9434748518640463], [1.4652014793459702, 9.806428240909254, 0.3565326564176827, 3.194371535492145, -0.9134768516893775, 3.1923506353899906, 3.921252857303777, 3.1402874931026026, -1.450671265814563, 1.0619568653777844  …  1.304614157970065, 5.653518363524242, -1.9093483481046456, 6.55398498746768, 8.175983252559547, 7.631529225782028, 6.4959758221852555, 7.392017066440783, 7.236007570818834, -3.5025616296615185])

Let’s have a look at the first entry.

In [3]:
println("features:$(features[:,1])")
println("label:$(labels[1])")

features:[0.5006953714114035, 1.1013609444195784]
label:1.4652014793459702


## Reading the Dataset

To build some intuition, let’s inspect the first minibatch of data. Each minibatch of features provides us with both its size and the dimensionality of input features. Likewise, our minibatch of labels will have a matching shape given by `batch_size`.

In [4]:
using MLUtils
train_loader = DataLoader((features,labels),batchsize=10,shuffle=true)
X,y = first(train_loader)
println("X shape:$(size(X))")
println("y shape:$(size(y))")

X shape:(2, 10)
y shape:(10,)
