# Concise Implementation of Softmax Regression
Just as high-level deep learning frameworks made it easier to implement linear regression (see Section 3.5), they are similarly convenient here.

## Defining the Model

As in Section 3.5, we construct our fully connected layer using the built-in layer. We use a `flatten` layer to reshape arbitrarly-shaped input into a matrix-shaped output, preserving the size of the last dimension..

In [1]:
using Flux
model = Chain(Flux.flatten,Dense(28*28=>10))

Chain(
  Flux.flatten,
  Dense(784 => 10),                     [90m# 7_850 parameters[39m
) 


This avoids both overflow and underflow. We will want to keep the conventional softmax function handy in case we ever want to evaluate the output probabilities by our model. But instead of passing softmax probabilities into our new loss function, we just pass the logits and compute the softmax and its log all at once inside the cross-entropy loss function, which does smart things like the “LogSumExp trick”.

In [2]:
loss(model,x,y) = Flux.logitcrossentropy(model(x),y)

loss (generic function with 1 method)

## Training

Next we train our model. We use Fashion-MNIST images, flattened to 784-dimensional feature vectors.

In [4]:
using MLUtils
using MLDatasets

ENV["DATADEPS_ALWAYS_ACCEPT"] = true
mnist_train,mnist_test = FashionMNIST(:train),FashionMNIST(:test)
features = mnist_train.features
labels = Flux.onehotbatch(mnist_train.targets,0:9)

train_loader = DataLoader((features,labels),batchsize=256)
num_epochs = 10
loss_volume = map(1:num_epochs) do i
    for data in train_loader
        Flux.train!(loss,model,[data],Descent())
    end
    loss(model,features,labels)
end

10-element Vector{Float32}:
 0.6156508
 0.54177076
 0.50940657
 0.48994175
 0.47646806
 0.46637794
 0.45843133
 0.4519492
 0.44652238
 0.44188702