Skip to content

Review of a paper : A Convolutional Neural Network for Modeling Sentences

hyerim1048 edited this page Jan 18, 2018 · 3 revisions

A Convolutional Neural Network for Modeling Sentences

Summarization 1

Idx Contents
Topic We describe a convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) that we adopt for the semantic modelling of sentences
Dataset TREC dataset, movie reviews dataset, Twitter sentiment dataset
Github Keras implementation
Conclusion We test the DCNN in four experiments: small scale binary and multi-class sentiment prediction, six-way question classification and Twitter sentiment prediction by distant supervision. The network achieves excellent performance in the first three tasks and a greater than 25% error reduction in the last task with respect to the strongest baseline.
Analysis 1 image 1
Accuracy of sentiment prediction in the movie reviews dataset
The first four results are reported from Socher et al. (3013b).
The baselines NB and BINB are Naive Bayes classifiers with, respectively, unigram features and unigram and bigram features.
SVM is a support vector machine with unigram and bigram features.
RECNTN is a recursive neural network with tensor-based feature function, which relies on external structural features given by a parse tree and performs best among the RecNNs.
Analysis 2 image 2
Analysis 3 image 2

Summarization 2

Abstract

  • We test the DCNN in four experiments:
    • small scale binary and multi-class sentiment prediction,
    • six-way question classification and Twitter sentiment prediction by distant supervision.
  • The network achieves excellent performance in the first three tasks and a greater than 25% error reduction in the last task with respect to the strongest baseline.

1. Introduction

  • We define a CNN architecture and apply it to the semantic modeling of sentences.

  • Multiple layers of convolutional and dynamic pooling operations induce a structured feature graph over the input sentence.

    • fig1
  • We experiment with the network in four settings.

    • The first two experiments involve predicting the sentiment of moview reviews (Socher et al., 2013b)
    • The network outperforms other approaches in both the binary and the multi-class experiments
  • The third experiment involved the categorisation of questions in siz question types in the TREC dataset.
    • The network matches the accuracy of other state-of-the-art models that are based on large sets of engineered features and hand-coded knowledge resources
  • The fourth experiment involves predicting the sentiment of Twitter posts using distant supervision

2. Background

  • relevent models
    • NBoW
    • RecNN
    • Recursive Neural network
    • RNN
    • Convolution
    • Time-Delay Neural Networks
  • limitation
    • The range of the feature detectors is limited to the span m of the weights. Increasing m or stacking multiple convolutional layers of the narrow type makes the range of the feature detectors larger; at the same time it also exacerbates the neglect of the margins of the sentence and increase the minimum size s of the input sentence required by the convolution.
    • The max pooling operation has some disadvantages too. It cannot distinguish whether a relevant feature in one of the rows occurs just one or multiple times and it forgets the order in which the features occur.

3. Convolutional Neural Networks with Dynamic k-Max pooling

  • We model sentences using a convolutional architecture that alternates wide convolutional layers with dynamic pooling layers given by dynamic k-max pooling.
    • DCNN

Model composition

  • Wide convolution
  • k-Max pooling
  • Dynamic k-Max pooling
  • Non-linear Feature function
  • multiple feature maps
  • folding

4. Properties of the Sentence Model

  • We describe some of the properties of the sentence model based on the DCNN.

properties

  • word and n-Gram order
    • for most applications and in order to learn fine-grained feature detectors, it is beneficial for a model to be able to discriminate whether a specific n-gram occurs in the input.
    • Likewise, it is beneficial for a model to be able to tell the relative position of the most relevant n-grams.
    • The network is designed to capture these two aspects.
  • Induced feature graph
    • Some sentence models use internal or external structure to compute the representation for the input sentence.
    • In a DCNN, the convolution and pooling layers induce an internal feature graph over the input.
Clone this wiki locally