In [None]:
library(ggplot2)
library(repr)
library(reshape2)

## Feature Selection

First we check how much does feature selection affect the final results.
As with MLP we are obligued to do a feature selection, we need to compare
to see if there is a possible inconsistency later.

For this we need the results of handcrafted features with and without
feature selection for the classifiers:
+ Decision Tree
+ Logistic Regression
+ Naive Bayes
+ SVM

We collect the metrics following metrics both considering monoclass
lemmas and filtering them:
+ Accuracy
+ Macro Precision
+ Macro Recall

After that we also collect the Cohen's kappa score for each classifiers
vs the ground truth. But only for the cases of lemmas with more than one class.

In [None]:
exp.df <- read.csv('./data/handcrafted_vs_feature_selection.csv')
exp.df.nomono <- exp.df[exp.df$num_classes > 1,]

exp.df.metrics <- melt(exp.df, id.vars = c('classifier', 'representation', 'lemma', 'corpus'),
                       measure.vars = c('accuracy', 'macro_precision', 'macro_recall'))
exp.df.metrics.nomono <- melt(exp.df.nomono, id.vars = c('classifier', 'representation', 'lemma', 'corpus'),
                              measure.vars = c('accuracy', 'macro_precision', 'macro_recall'))
exp.df.kappa <- melt(exp.df.nomono, id.vars = c('classifier', 'representation', 'lemma', 'corpus'),
                     measure.vars = c('kappa_score'))
exp.df.kappa <- exp.df.kappa[(exp.df.kappa$corpus == 'sensem.test') | (exp.df.kappa$corpus == 'semeval.test'),]

### Base metrics comparing features for all lemmas

In [None]:
levels(exp.df.metrics$representation) <- c('All Handcrafted Features', 'Top 10000 Handcrafted Features')
levels(exp.df.metrics$corpus) <- c('Semeval Test Set', 'Semeval Train Set', 'SenSem Test Set', 'SenSem Train Set')
exp.df.metrics$corpus <- factor(exp.df.metrics$corpus, levels = rev(levels(exp.df.metrics$corpus)))
levels(exp.df.metrics$variable) <- c('Accuracy', 'Precision\nMacro Average', 'Recall\nMacro Average')
levels(exp.df.metrics$classifier) <- c('Decision Tree', 'Logistic Regression', 'Naive Bayes', 'SVM')

In [None]:
options(repr.plot.width=8, repr.plot.height=8)

g <- ggplot(exp.df.metrics, aes(x = variable, y = value, fill = corpus))
g <- g + geom_boxplot() + stat_boxplot(geom = 'errorbar')
g <- g + facet_grid(classifier ~ representation)
g <- g + theme(legend.position = 'bottom', legend.title = element_blank(),
               axis.title.x = element_blank(), axis.title.y = element_blank())
ggsave('./plots/handcrafted_vs_feature_selection_metrics.png', plot = g, width = 8, height = 8)

### Base metrics comparing features for lemmas with more than 1 class

In [None]:
levels(exp.df.metrics.nomono$representation) <- c('All Handcrafted Features', 'Top 10000 Handcrafted Features')
levels(exp.df.metrics.nomono$corpus) <- c('Semeval Test Set', 'Semeval Train Set', 'SenSem Test Set',
                                          'SenSem Train Set')
exp.df.metrics.nomono$corpus <- factor(exp.df.metrics.nomono$corpus,
                                       levels = rev(levels(exp.df.metrics.nomono$corpus)))
levels(exp.df.metrics.nomono$variable) <- c('Accuracy', 'Precision\nMacro Average', 'Recall\nMacro Average')
levels(exp.df.metrics.nomono$classifier) <- c('Decision Tree', 'Logistic Regression', 'Naive Bayes', 'SVM')

In [None]:
options(repr.plot.width=8, repr.plot.height=8)

g <- ggplot(exp.df.metrics.nomono, aes(x = variable, y = value, fill = corpus))
g <- g + geom_boxplot() + stat_boxplot(geom = 'errorbar')
g <- g + facet_grid(classifier ~ representation)
g <- g + theme(legend.position = 'bottom', legend.title = element_blank(),
               axis.title.x = element_blank(), axis.title.y = element_blank())
ggsave('./plots/handcrafted_vs_feature_selection_metrics_no_monoclass.png', plot = g, width = 8, height = 8)

### Kappa comparing features for lemmas with more than one sense

In [None]:
levels(exp.df.kappa$representation) <- c('All Handcrafted Features', 'Top 10000 Handcrafted Features')
levels(exp.df.kappa$corpus) <- c('Semeval Test Set', 'Semeval Train Set', 'SenSem Test Set', 'SenSem Train Set')
exp.df.kappa$corpus <- factor(exp.df.kappa$corpus, levels = rev(levels(exp.df.kappa$corpus)))
levels(exp.df.kappa$variable) <- c('Cohen\'s Kappa Score\nvs. Ground Truth')
levels(exp.df.kappa$classifier) <- c('Decision Tree', 'Logistic Regression', 'Naive Bayes', 'SVM')

In [None]:
options(repr.plot.width=8, repr.plot.height=8)

g <- ggplot(exp.df.kappa, aes(x = variable, y = value, fill = corpus))
g <- g + geom_boxplot() + stat_boxplot(geom = 'errorbar')
g <- g + facet_grid(classifier ~ representation)
g <- g + theme(legend.position = 'bottom', legend.title = element_blank(),
               axis.title.x = element_blank(), axis.title.y = element_blank())
g <- g + ylim(c(-1.0, 1.0))
ggsave('./plots/handcrafted_vs_feature_selection_kappa.png', plot = g, width = 8, height = 8)

## Representations

After finding out there is no difference using Feature Selection, we show the general metrics
results for all the classifiers, that is:
+ Baseline
+ Decision Tree
+ Logistic Regression
+ MLP
+ Naive Bayes
+ SVM

For this we use the following representations:
+ Feature selection of handcrafted features
+ Hashed features with only positive values
+ Hashed features with positive and negative values (not valid with Naive Bayes)

The first boxplot will have the representations as columns and the classifiers as rows
with the following metrics:
+ Accuracy
+ Macro Precision
+ Macro Recall
+ PMFC
+ RMLFC

This last will show which is the best representation (or if there is any difference at all) and
then we use the visual information to select such representation and also we discard those
algorithms which are visually showing less performance.

In [None]:
exp.df <- read.csv('./data/experiment0_general_metrics.csv')
exp.df.nomono <- exp.df[exp.df$num_classes > 1,]

exp.df.metrics <- melt(exp.df, id.vars = c('classifier', 'representation', 'lemma', 'corpus'),
                       measure.vars = c('accuracy', 'macro_precision', 'macro_recall', 'pmfc', 'rmlfc'))
exp.df.metrics.nomono <- melt(exp.df.nomono, id.vars = c('classifier', 'representation', 'lemma', 'corpus'),
                              measure.vars = c('accuracy', 'macro_precision', 'macro_recall', 'pmfc', 'rmlfc'))
exp.df.kappa <- melt(exp.df.nomono, id.vars = c('classifier', 'representation', 'lemma', 'corpus'),
                     measure.vars = c('kappa_score'))
exp.df.kappa <- exp.df.kappa[(exp.df.kappa$corpus == 'sensem.test') | (exp.df.kappa$corpus == 'semeval.test'),]

### Base metrics comparing representations for all lemmas

In [None]:
levels(exp.df.metrics$representation) <- c('Top 10000 Handcrafted Features', 'Hashing with All Positive Features',
                                           'Hashing with Negative Features')
levels(exp.df.metrics$corpus) <- c('Semeval Test Set', 'Semeval Train Set', 'SenSem Test Set', 'SenSem Train Set')
exp.df.metrics$corpus <- factor(exp.df.metrics$corpus, levels = rev(levels(exp.df.metrics$corpus)))
levels(exp.df.metrics$variable) <- c('Accuracy', 'Precision\nMacro Average', 'Recall\nMacro Average',
                                     'PMFC', 'RMLFC')
levels(exp.df.metrics$classifier) <- c('Baseline', 'Decision Tree', 'Logistic\nRegression',
                                       'MLP', 'Naive Bayes', 'SVM')

In [None]:
options(repr.plot.width=9, repr.plot.height=8)

g <- ggplot(exp.df.metrics, aes(x = variable, y = value, fill = corpus))
g <- g + geom_boxplot() + stat_boxplot(geom = 'errorbar')
g <- g + facet_grid(classifier ~ representation)
g <- g + theme(legend.position = 'bottom', legend.title = element_blank(),
               axis.title.x = element_blank(), axis.title.y = element_blank(),
               axis.text.x=element_text(angle=45, vjust=0.5))
ggsave('./plots/experiment0_representations_comparison.png', plot = g, width = 9, height = 8)

### Base metrics comparing features for lemmas with more than 1 class

In [None]:
levels(exp.df.metrics.nomono$representation) <- c('Top 10000 Handcrafted Features',
                                                  'Hashing with All Positive Features',
                                                  'Hashing with Negative Features')
levels(exp.df.metrics.nomono$corpus) <- c('Semeval Test Set', 'Semeval Train Set', 'SenSem Test Set',
                                          'SenSem Train Set')
exp.df.metrics.nomono$corpus <- factor(exp.df.metrics.nomono$corpus,
                                       levels = rev(levels(exp.df.metrics.nomono$corpus)))
levels(exp.df.metrics.nomono$variable) <- c('Accuracy', 'Precision\nMacro Average', 'Recall\nMacro Average',
                                            'PMFC', 'RMLFC')
levels(exp.df.metrics.nomono$classifier) <- c('Baseline', 'Decision Tree', 'Logistic\nRegression',
                                              'MLP', 'Naive Bayes', 'SVM')

In [None]:
options(repr.plot.width=9, repr.plot.height=8)

g <- ggplot(exp.df.metrics.nomono, aes(x = variable, y = value, fill = corpus))
g <- g + geom_boxplot() + stat_boxplot(geom = 'errorbar')
g <- g + facet_grid(classifier ~ representation)
g <- g + theme(legend.position = 'bottom', legend.title = element_blank(),
               axis.title.x = element_blank(), axis.title.y = element_blank(),
               axis.text.x=element_text(angle=45, vjust=0.5))
ggsave('./plots/experiment0_representations_comparison_no_monoclass.png', plot = g, width = 9, height = 8)

### Kappa comparing features for lemmas with more than one class

In [None]:
levels(exp.df.kappa$representation) <- c('Top 10000 Handcrafted Features', 'Hashing with All Positive Features',
                                         'Hashing with Negative Features')
levels(exp.df.kappa$corpus) <- c('Semeval Test Set', 'Semeval Train Set', 'SenSem Test Set', 'SenSem Train Set')
exp.df.kappa$corpus <- factor(exp.df.kappa$corpus, levels = rev(levels(exp.df.kappa$corpus)))
levels(exp.df.kappa$variable) <- c('Cohen\'s Kappa Score\nvs. Ground Truth')
levels(exp.df.kappa$classifier) <- c('Baseline', 'Decision Tree', 'Logistic\nRegression',
                                     'MLP', 'Naive Bayes', 'SVM')

In [None]:
options(repr.plot.width=9, repr.plot.height=8)

g <- ggplot(exp.df.kappa, aes(x = variable, y = value, fill = corpus))
g <- g + geom_boxplot() + stat_boxplot(geom = 'errorbar')
g <- g + facet_grid(classifier ~ representation)
g <- g + theme(legend.position = 'bottom', legend.title = element_blank(),
               axis.title.x = element_blank(), axis.title.y = element_blank())
g <- g + ylim(c(-1.0, 1.0))
ggsave('./plots/experiment0_kappa_representations_comparison.png', plot = g, width = 9, height = 8)

## Representation selection and classifiers comparison

Once compared the representations, we see if any of them is visually better than the others. If
no representation shows real improvement, we decide to go with the one that simplifies everything (for now)
that is Hashed All Positive Features.

After selecting the final representation we need to do a classifier comparison to select the classifiers
to work with. Previous to this we filter out those classifiers that show visually worse performance
in the previous plots (baseline and naive_bayes).

This is done using two graphics:
+ A boxplot showing the different metrics of each classifier side by side.
+ A heatmap showing the kappa average values comparando cada clasificador contra todos los demás.

In [None]:
selected_representation <- 'hashed'

exp.df <- read.csv('./data/experiment0_general_metrics.csv')
exp.df <- exp.df[exp.df$representation == selected_representation,]
exp.df <- exp.df[(exp.df$classifier != 'baseline') & (exp.df$classifier != 'naive_bayes'),]
exp.df.nomono <- exp.df[exp.df$num_classes > 1,]
exp.df.kappa.heatmap <- read.csv('./data/experiment0_kappa_interclassifier.csv')

exp.df.metrics <- melt(exp.df, id.vars = c('classifier', 'lemma', 'corpus'),
                       measure.vars = c('accuracy', 'macro_precision', 'macro_recall', 'pmfc', 'rmlfc'))
exp.df.metrics.nomono <- melt(exp.df.nomono, id.vars = c('classifier', 'lemma', 'corpus'),
                              measure.vars = c('accuracy', 'macro_precision', 'macro_recall', 'pmfc', 'rmlfc'))

### Classifiers comparison for all lemmas

In [None]:
levels(exp.df.metrics$corpus) <- c('Semeval Test Set', 'Semeval Train Set', 'SenSem Test Set', 'SenSem Train Set')
exp.df.metrics$corpus <- factor(exp.df.metrics$corpus, levels = rev(levels(exp.df.metrics$corpus)))
levels(exp.df.metrics$variable) <- c('Accuracy', 'Precision\nMacro Average', 'Recall\nMacro Average',
                                     'PMFC', 'RMLFC')
levels(exp.df.metrics$classifier) <- c('Baseline', 'Decision Tree', 'Logistic Regression', 'MLP',
                                       'Naive Bayes', 'SVM')

In [None]:
options(repr.plot.width=8, repr.plot.height=3.5)

g <- ggplot(exp.df.metrics, aes(x = variable, y = value, fill = corpus))
g <- g + geom_boxplot() + stat_boxplot(geom = 'errorbar')
g <- g + facet_wrap(~ classifier, nrow = 1)
g <- g + theme(legend.position = 'bottom', legend.title = element_blank(),
               axis.title.x = element_blank(), axis.title.y = element_blank(),
               axis.text.x = element_text(angle=45, vjust=0.5))
ggsave('./plots/experiment0_classifiers_comparison.png', plot = g, width = 8, height = 3.5)

### Classifiers comparison for lemmas with more than 1 sense

In [None]:
levels(exp.df.metrics.nomono$corpus) <- c('Semeval Test Set', 'Semeval Train Set', 'SenSem Test Set',
                                          'SenSem Train Set')
exp.df.metrics.nomono$corpus <- factor(exp.df.metrics.nomono$corpus,
                                       levels = rev(levels(exp.df.metrics.nomono$corpus)))
levels(exp.df.metrics.nomono$variable) <- c('Accuracy', 'Precision\nMacro Average', 'Recall\nMacro Average',
                                            'PMFC', 'RMLFC')
levels(exp.df.metrics.nomono$classifier) <- c('Baseline', 'Decision Tree', 'Logistic Regression', 'MLP',
                                              'Naive Bayes', 'SVM')

In [None]:
options(repr.plot.width=8, repr.plot.height=3.5)

g <- ggplot(exp.df.metrics.nomono, aes(x = variable, y = value, fill = corpus))
g <- g + geom_boxplot() + stat_boxplot(geom = 'errorbar')
g <- g + facet_wrap(~ classifier, nrow = 1)
g <- g + theme(legend.position = 'bottom', legend.title = element_blank(),
               axis.title.x = element_blank(), axis.title.y = element_blank(),
               axis.text.x=element_text(angle=45, vjust=0.5))
ggsave('./plots/experiment0_classifiers_comparison_no_monoclass.png', plot = g, width = 8, height = 3.5)

### Kappa inter-classifier heatmap for lemmas with more than one sense for hashed representation

In [None]:
levels(exp.df.kappa.heatmap$corpus) <- c('Semeval', 'SenSem')
exp.df.kappa.heatmap$corpus <- factor(exp.df.kappa.heatmap$corpus,
                                      levels = rev(levels(exp.df.kappa.heatmap$corpus)))
exp.df.kappa.heatmap$t1 <- factor(exp.df.kappa.heatmap$t1,
                                  levels = c('ground_truth', 'decision_tree', 'log', 'mlp_5000', 'svm'))
levels(exp.df.kappa.heatmap$t1) <- c('Ground\nTruth', 'Decision\nTree', 'Logistic\nRegression', 'MLP', 'SVM')
exp.df.kappa.heatmap$t2 <- factor(exp.df.kappa.heatmap$t2,
                                  levels = c('ground_truth', 'decision_tree', 'log', 'mlp_5000', 'svm'))
levels(exp.df.kappa.heatmap$t2) <- c('Ground\nTruth', 'Decision\nTree', 'Logistic\nRegression', 'MLP', 'SVM')

In [None]:
options(repr.plot.width=8, repr.plot.height=4)

g <- ggplot(exp.df.kappa.heatmap, aes(x = t1, y = t2))
g <- g + geom_tile(aes(fill = kappa_score), colour = "white")
g <- g + geom_text(aes(label = round(kappa_score, 2)), colour = "white", size = 3.5)
g <- g + scale_fill_gradient(low="steelblue", high="black", name="Kappa Score")
g <- g + facet_grid(~ corpus)
g <- g + ylim(rev(levels(exp.df.kappa.heatmap$t2)))
g <- g + theme(legend.position='none',
               axis.title.x = element_blank(), axis.title.y = element_blank(),
               axis.text.y = element_text(hjust=0.5))
g <- g + labs(x="Classifier")
ggsave('./plots/experiment0_interclassifier_kappa.png', plot = g, width = 8, height = 4)