# Experiment 4
DEVO RIFARE READ BRAT DATASET SEGUENDO LA FOTO SUL CELLULARE


In this experiment, we aim to solve the counting task using QuaNet, a deep learning architecture for quantification that predicts class prevalence values. It takes as input: (i) class prevalence values estimated by a classifier; (ii) posterior probabilities $Pr(𝑦|x)$ for the positive class (since QuaNet is binary) for each document $x$, and (iii) embedded document representations.

We chose QuaNet because, as stated in the detailed overview of the [LeQua challenge](https://ceur-ws.org/Vol-3180/paper-146.pdf), it outperforms other methods in the **binary classification task on raw documents**. However, the methods provided by `QuaPy` are interchangeable black-boxes, meaning one can easily replace them to test which performs best for the given task.

In [1]:
from experiment_4_code import *

We first load the *Cabrio* and *Villata* dataset, constructing a dictionary that, for each abstract, contains the graph structure of its arguments, reconstructed using the *.ann* files provided with the raw documents.

For this experiment, we chose the label to be the number of arguments in each document. We then drop labels with counts less than 4 to prevent imbalance.

In [2]:
train_set = read_brat_dataset('../data/train/neoplasm_train') + read_brat_dataset('../data/dev/neoplasm_dev')
test_set = read_brat_dataset('../data/test/glaucoma_test') + read_brat_dataset('../data/test/neoplasm_test') + read_brat_dataset('../data/test/mixed_test')

In [3]:
arguments_counts_train, relations_counts_train = compute_dataset_statistics(train_set, dataset_name="train")
arguments_counts_test, relations_counts_test = compute_dataset_statistics(test_set, dataset_name="test")

- Train set statistics:

Number of samples with a certain number of arguments:
+---------------+-------------+
|   # Arguments |   # Samples |
|             0 |           4 |
+---------------+-------------+
|             1 |         133 |
+---------------+-------------+
|             2 |         108 |
+---------------+-------------+
|             3 |          72 |
+---------------+-------------+
|             4 |          44 |
+---------------+-------------+
|             5 |          22 |
+---------------+-------------+
|             6 |          11 |
+---------------+-------------+
|             7 |           3 |
+---------------+-------------+
|             8 |           1 |
+---------------+-------------+
|            10 |           1 |
+---------------+-------------+
|            13 |           1 |
+---------------+-------------+

Number of samples with a certain number of relations:
+---------------+-------------+
|   # Relations |   # Samples |
|             0 |           7 |
+-

In [21]:
train_set = [item for item in train_set if relations_counts_train.get(item['n_relations'], 0) > 4]
test_set = [item for item in test_set if relations_counts_test.get(item['n_relations'], 0) > 4]

In [22]:
arguments_counts_train, relations_counts_train= compute_dataset_statistics(train_set, dataset_name="train")
arguments_counts_test, relations_counts_test = compute_dataset_statistics(test_set, dataset_name="test")

- Train set statistics:

Number of samples with a certain number of arguments:
+---------------+-------------+
|   # Arguments |   # Samples |
|             1 |         132 |
+---------------+-------------+
|             2 |         107 |
+---------------+-------------+
|             3 |          71 |
+---------------+-------------+
|             4 |          43 |
+---------------+-------------+
|             5 |          22 |
+---------------+-------------+
|             6 |           8 |
+---------------+-------------+

Number of samples with a certain number of relations:
+---------------+-------------+
|   # Relations |   # Samples |
|             0 |           5 |
+---------------+-------------+
|             1 |          18 |
+---------------+-------------+
|             2 |          57 |
+---------------+-------------+
|             3 |          70 |
+---------------+-------------+
|             4 |          82 |
+---------------+-------------+
|             5 |          80 |
+-

In [6]:
display_file_info(train_set, filename=None)

File 21224783 - 4 relations - 1 arguments
Text: There are very few randomized controlled studies on exercise in cancer patients. Consequently, there are no guidelines available with regard to the exercises that can be recommended and difficulties are encountered in the clinical practice as to which exercise is more suitable to the patients. The purpose of this study was to investigate the impact of pilates exercises on physical performance, flexibility, fatigue, depression and quality of life in women who had been treated for breast cancer. Randomized controlled trial. Out patient group, Department of Physical Medicine and Rehabilitation and Medical Oncology Department, University Hospital. Fifty-two patients with breast cancer were divided into either pilates exercise (group 1) and control group (group 2). Patients in Group 1 performed pilates and home exercises and patients in group 2 performed only home exercises. Pilates exercise sessions were performed three times a week for a per

We now construct the `LabelledCollection` as required by `QuaPy`, from which we obtain the *train* and *test* collections; this is wrong, as the test set should be constructed using the other files provided by *Cabrio* and *Villata*. Finally, we create a `Dataset` object and tokenize it.

In [19]:
train_collection = qp.data.LabelledCollection([data['text'] for data in train_set], 
                                         [data['n_relations'] for data in train_set], 
                                         classes=list(relations_counts_test.keys()))

test_collection = qp.data.LabelledCollection([data['text'] for data in test_set], 
                                         [data['n_relations'] for data in test_set], 
                                         classes=list(relations_counts_test.keys()))
print(list(set(train_collection.labels)))
print(list(set(test_collection.labels)))

# train_collection = qp.data.LabelledCollection([data['text'] for data in train_set], 
#                                          [data['n_arguments'] for data in train_set], 
#                                          classes=list(relations_counts_test.keys()))

# test_collection = qp.data.LabelledCollection([data['text'] for data in test_set], 
#                                          [data['n_arguments'] for data in test_set], 
#                                          classes=list(relations_counts_test.keys()))

[np.int64(0), np.int64(1), np.int64(2), np.int64(3), np.int64(4), np.int64(5), np.int64(6), np.int64(7), np.int64(8)]
[np.int64(0), np.int64(1), np.int64(2), np.int64(3), np.int64(4), np.int64(5), np.int64(6), np.int64(7), np.int64(8), np.int64(9)]


In [8]:
indexer = qp.data.preprocessing.IndexTransformer(min_df=1)
abs_dataset = Dataset(train_collection, test_collection)

index(abs_dataset, indexer, inplace=True)

indexing: 100%|████████████████████████████████████████████████████████████████████| 383/383 [00:00<00:00, 6471.10it/s]
indexing: 100%|████████████████████████████████████████████████████████████████████| 288/288 [00:00<00:00, 5646.76it/s]


At this point, we train a simple CNN for the task. Note that this part might be replaced with the baseline in future experiments.

In [9]:
# train the text classifier:
qp.environ['SAMPLE_SIZE'] = 10

cnn_module = CNNnet(abs_dataset.vocabulary_size, abs_dataset.training.n_classes)
cnn_classifier = NeuralClassifierTrainer(cnn_module, device='cpu')
cnn_classifier.fit(*abs_dataset.training.Xy)

[NeuralNetwork running on cpu]


  self.net.load_state_dict(torch.load(checkpoint))
[CNNnet] training epoch=42 tr-loss=0.01774 tr-acc=100.00% tr-macroF1=100.00% patience=1/10 val-loss=2.17926 val-acc=16.


training ended by patience exhasted; loading best model parameters in ../checkpoint/classifier_net.dat for epoch 32
performing one training pass over the validation set...
[done]


<quapy.classification.neural.NeuralClassifierTrainer at 0x20b848109b0>

Next, we train `QuaNet`.

In [14]:
# train QuaNet (alternatively, we can set fit_classifier=True and let QuaNet train the classifier)
qp.environ['SAMPLE_SIZE'] = 1
quantifier = QuaNet(cnn_classifier, device='cpu')
quantifier.fit(abs_dataset.training, fit_classifier=False)

QuaNetModule(
  (lstm): LSTM(110, 64, batch_first=True, dropout=0.5, bidirectional=True)
  (dropout): Dropout(p=0.5, inplace=False)
  (ff_layers): ModuleList(
    (0): Linear(in_features=168, out_features=1024, bias=True)
    (1): Linear(in_features=1024, out_features=512, bias=True)
  )
  (output): Linear(in_features=512, out_features=10, bias=True)
)




ValueError: a must be greater than 0 unless no samples are taken

Finally, we evaluate the accuracy of our model by sampling one document at a time and inferring its distribution (from which the *Single Sample* name of the experiment). The `argmax` of the distribution will represent the model's output.

In [None]:
evaluate_accuracy(abs_dataset, quantifier, set_type='train')
evaluate_accuracy(abs_dataset, quantifier, set_type='test')

The results on the test set are poor. We might:
- Tune the model, as we just used the default configuration. This *must* be tested, has the method might work better with minimum effort;
- Explore other quantification techniques instead of QuaPy. If we determine that a method suits better, we might consider to conduct other experiments like the third one differently. Please, give a look to the [LeQua](https://ceur-ws.org/Vol-3180/paper-146.pdf) paper; the UniOviedo should have obtained the best result on the *T2A* task, the one with raw documents and multi-classes;