# Concise Implementation of Linear Regression

Deep learning has witnessed a Cambrian explosion of sorts over the past decade. The sheer number of techniques, applications and algorithms by far surpasses the progress of previous decades. This is due to a fortuitous combination of multiple factors, one of which is the powerful free tools offered by a number of open source deep learning frameworks. Theano (Bergstra et al., 2010), DistBelief (Dean et al., 2012), and Caffe (Jia et al., 2014) arguably represent the first generation of such models that found widespread adoption. In contrast to earlier (seminal) works like SN2 (Simulateur Neuristique) (Bottou and Le Cun, 1988), which provided a Lisp-like programming experience, modern frameworks offer automatic differentiation and the convenience of Julia. These frameworks allow us to automate and modularize the repetitive work of implementing gradient-based learning algorithms.

In Section 3.4, we relied only on (i) tensors for data storage and linear algebra; and (ii) automatic differentiation for calculating gradients. In practice, because data iterators, loss functions, optimizers, and neural network layers are so common, modern libraries implement these components for us as well. In this section, we will show you how to implement the linear regression model from Section 3.4 concisely by using high-level APIs of deep learning frameworks.

## Generating the Dataset

For this example, we will work low-dimensional for succinctness. The following code snippet generates 1000 examples with 2-dimensional features drawn from a standard normal distribution. 

In [1]:
using Distributions

function synthetic_data(w::Vector{<:Real},b::Real,num_example::Int)
    X = rand(Normal(0,1),(num_example,length(w)))
    y = X * w .+ b
    y += rand(Normal(0,0.01),(size(y)))
    return X',y
end

synthetic_data (generic function with 1 method)

Later, we can check our estimated parameters against these ground truth values.

In [2]:
features,labels = synthetic_data([2, -3.4],4.2,1000)

([0.14619589632160165 1.3906239203897983 … -0.19059142703720552 0.14345235980909807; 1.9135127909065464 0.2650503275663784 … 0.8634588460090279 0.3644827666846965], [-2.012942409291826, 6.084187903685514, 1.814663421623127, 12.026125408555579, 2.2966439611326224, -1.7288451861623593, 6.0209413829411815, 0.3864627570256396, 7.1530735946813016, 3.1171856744911026  …  5.458526207305779, 1.1371184528082483, 6.59211298956371, 10.489637581895897, 3.811671968815091, 3.194875126840119, 6.25205350737315, 11.77415713545804, 0.8749169028874869, 3.243621500894337])

Let’s have a look at the first entry.

In [3]:
println("features:$(features[:,1])")
println("label:$(labels[1])")

features:[0.14619589632160165, 1.9135127909065464]
label:-2.012942409291826


## Reading the Dataset

To build some intuition, let’s inspect the first minibatch of data. Each minibatch of features provides us with both its size and the dimensionality of input features. Likewise, our minibatch of labels will have a matching shape given by `batch_size`.

In [4]:
using MLUtils
train_loader = DataLoader((features,labels),batchsize=10,shuffle=true)
X,y = first(train_loader)
println("X shape:$(size(X))")
println("y shape:$(size(y))")

X shape:(2, 10)
y shape:(10,)


## Defining the Model

For standard operations, we can use a framework’s predefined layers, which allow us to focus on the layers used to construct the model rather than worrying about their implementation. Recall the architecture of a single-layer network as described in Fig. 3.1.2. The layer is called fully connected, since each of its inputs is connected to each of its outputs by means of a matrix-vector multiplication.

A `Dense(2 => 1)` layer denotes a layer of one neuron with one input (one feature) and one output. 

In [5]:
using Flux
model = Dense(2=>1)

Dense(2 => 1)       [90m# 3 parameters[39m

## Defining the Loss Function