# Convolutional Sentiment Classification Network
### Ref: https://github.com/denizyuret/nn4nlp-code/blob/ilker/cnn-knet/05-cnn-knet/cnn-class.jl

# Imports

In [1]:
using Knet
using Random, Statistics, Printf

# Data Pre-Processing

We are using the data from Stanford Sentiment Treebank dataset
without tree information. First, we initialize our word->id and
tag->id collections and insert padding word "&lt;pad&gt;" and
unknown word "&lt;unk&gt;" symbols into word->id collection.

In [2]:
wdict, tdict = Dict(), Dict()
w2i(x) = get!(wdict, x, 1+length(wdict))
t2i(x) = get!(tdict, x, 1+length(tdict))
PAD = w2i("<pad>");
UNK = w2i("<unk>");

In the data files, each line consists of sentiment and sentence
information separated by `|||`.

In [3]:
function readdata(file)
    instances = []
    for line in eachline(file)
        y, x = split(line, " ||| ")
        y, x = t2i(y), w2i.(split(x))
        push!(instances, (x, [y]))
    end
    return instances
end

readdata (generic function with 1 method)

After reading training data, we redefine ```w2i``` procedure to
avoid inserting new words into our vocabulary collection and then
read validation data

In [4]:
trn = readdata("/Users/emrecanacikgoz/Desktop/Comp442/data/classes/train.txt")
w2i(x) = get(wdict, x, UNK)
t2i(x) = tdict[x]
nwords, ntags = length(wdict), length(tdict)
dev = readdata("/Users/emrecanacikgoz/Desktop/Comp442/data/classes/test.txt")

2210-element Vector{Any}:
 ([6067, 74, 2, 466], [3])
 ([242, 105, 534, 173, 7, 907, 7, 9, 621, 7  …  150, 29, 3138, 5, 22, 151, 1335, 7, 4561, 36], [1])
 ([2, 235, 1772, 1069, 29, 114, 2403, 319, 18, 12  …  2, 3315, 18, 136, 981, 326, 1077, 173, 294, 36], [2])
 ([3, 341, 665, 197, 496, 856, 323, 9, 11012, 5774  …  413, 90, 11074, 9, 8756, 6046, 40, 9, 2894, 36], [3])
 ([4431, 18, 1069, 2848, 40, 1258, 17, 6332, 36], [2])
 ([1696, 896, 1105, 694, 783, 254, 70, 13564, 2, 18, 9, 677, 7, 4957, 5, 13295, 96, 151, 4912, 36], [1])
 ([3567, 580, 122, 22, 6135, 141, 18, 2, 178, 9  …  136, 12, 44, 1750, 105, 527, 7, 3703, 136, 36], [1])
 ([372, 19, 2209, 3334, 136, 603, 36], [1])
 ([334, 335, 40, 336, 337, 1070, 22, 3125, 582, 3962, 40, 3610, 599, 36], [1])
 ([162, 5, 22, 341, 257, 305, 1580, 29, 4006, 17, 2162, 5784, 17, 248, 36], [2])
 ([360, 298, 3473, 166, 17874, 5, 355, 2987, 3132, 17, 2973, 5574, 36], [1])
 ([308, 5078, 5, 310, 3052, 846, 907, 7, 9, 599  …  350, 9, 1229, 29, 9, 365, 29, 9,

# Model

We begin developing convolutional sentiment classification model.
Our model is a stack of five consecutive operations: word embeddings,
1-dimensional convolution, max-pooling, ReLU activation and linear
prediction layer. First, we define our network,

In [5]:
mutable struct CNN
    embedding
    conv1d
    linear
end

Then, we implement the forward propagation and loss calculation,

In [6]:
function (model::CNN)(x)
    windowsize = size(model.conv1d.w, 2)
    if windowsize > length(x)
        x = vcat(x, [PAD for i = 1:windowsize-length(x)])
    end
    emb = model.embedding(x)
    T, E = size(emb); B = 1
    emb = reshape(emb, 1, T, E, B)
    hidden = relu.(maximum(model.conv1d(emb), dims=2))
    hidden = reshape(hidden, size(hidden,3), B)
    output = model.linear(hidden)
end

In [7]:
(model::CNN)(x,y) = nll(model(x),y)

In order to make our network working, we need to implement ```Embedding```,
```Linear``` and ```Conv``` layers,

In [8]:
mutable struct Embedding; w; end
(layer::Embedding)(x) = layer.w[x, :]
Embedding(vocabsize::Int, embedsize::Int) = Embedding(
    param(vocabsize, embedsize))

Embedding

In [9]:
mutable struct Linear; w; b; pdrop;end
Linear(inputsize::Int, outputsize::Int, pdrop=0) = Linear(
    param(outputsize, inputsize),
    param0(outputsize, 1), 
    pdrop)
(layer::Linear)(x) = layer.w * dropout(x, layer.pdrop) .+ layer.b

In [10]:
mutable struct Conv; w; b; pdrop;end
Conv(embedsize::Int, nfilters::Int, windowsize::Int, pdrop=0) = Conv(
    param(1, windowsize, embedsize, nfilters),
    param0(1, 1, nfilters, 1), 
    pdrop)
(layer::Conv)(x) = conv4(layer.w, dropout(x, layer.pdrop); stride=1, padding=0) .+ layer.b

# Training

We initialize our model,

In [15]:
EMBEDSIZE = 64
WINSIZE = KERNELSIZE = 3
NFILTERS = 64
model = CNN(
    Embedding(nwords, EMBEDSIZE),
    Conv(EMBEDSIZE, NFILTERS, KERNELSIZE, 0.2),
    Linear(NFILTERS, ntags, 0.2))

CNN(Embedding(P(Matrix{Float32}(18280,64))), Conv(P(Array{Float32, 4}(1,3,64,64)), P(Array{Float32, 4}(1,1,64,1)), 0.2), Linear(P(Matrix{Float32}(5,64)), P(Matrix{Float32}(5,1)), 0.2))

We implement a validation procedure which computes accuracy and average loss
over the entire input data split.

In [16]:
function validate(data)
    loss = correct = 0
    for (x,y) in data
        ŷ = model(x)
        loss += nll(ŷ,y)
        correct += argmax(Array(ŷ))[1] == y[1]
    end
    return loss/length(data), correct/length(data)
end

validate (generic function with 1 method)

Finally, here is the training loop:

In [17]:
function train(nepochs=20)
    for epoch=1:nepochs
        progress!(adam(model, shuffle(trn)))

        trnloss, trnacc = validate(trn)
        @printf("iter %d: trn loss/sent=%.4f, trn acc=%.4f\n",
                epoch, trnloss, trnacc)

        devloss, devacc = validate(dev)
        @printf("iter %d: dev loss/sent=%.4f, dev acc=%.4f\n",
                epoch, devloss, devacc)
    end
end

train (generic function with 2 methods)

In [18]:
train(3)

┣████████████████████┫ [100.00%, 8544/8544, 05:12/05:12, 27.41i/s] 
┣████████████████████┫ [100.00%, 8544/8544, 11:38/11:38, 12.24i/s] 
┣████████████████████┫ [100.00%, 8544/8544, 08:46/08:46, 16.23i/s] 


iter 1: trn loss/sent=1.1760, trn acc=0.5176
iter 1: dev loss/sent=1.3664, dev acc=0.4077
iter 2: trn loss/sent=0.8119, trn acc=0.7556
iter 2: dev loss/sent=1.3445, dev acc=0.4041
iter 3: trn loss/sent=0.4609, trn acc=0.8895
iter 3: dev loss/sent=1.4448, dev acc=0.4140
