# Protesta CLEF 2020 Notebook

This notebooks contains the code related to our participation to the CLEF 2019 Lab ProtestNews shared task.

## Installation

To get started, install `protesta` from the github repository.

In [1]:
!rm -rf ./protest-clef-2020/
!git clone https://github.com/anbasile/protest-clef-2020 && cd protest-clef-2020 && pip install -e .

Cloning into 'protest-clef-2020'...
remote: Enumerating objects: 129, done.[K
remote: Counting objects: 100% (129/129), done.[K
remote: Compressing objects: 100% (97/97), done.[K
remote: Total 129 (delta 66), reused 66 (delta 30), pack-reused 0[K
Receiving objects: 100% (129/129), 47.13 KiB | 9.43 MiB/s, done.
Resolving deltas: 100% (66/66), done.
Obtaining file:///content/protest-clef-2020
Collecting nlp>=0.4
[?25l  Downloading https://files.pythonhosted.org/packages/09/e3/bcdc59f3434b224040c1047769c47b82705feca2b89ebbc28311e3764782/nlp-0.4.0-py3-none-any.whl (1.7MB)
[K     |████████████████████████████████| 1.7MB 5.3MB/s 
Collecting transformers>=3.0
[?25l  Downloading https://files.pythonhosted.org/packages/19/22/aff234f4a841f8999e68a7a94bdd4b60b4cebcfeca5d67d61cd08c9179de/transformers-3.3.1-py3-none-any.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 39.3MB/s 
[?25hCollecting typer>=0.3.1
  Downloading https://files.pythonhosted.org/packages/90/34/d138832f69454

## Data

Please note that the data has not been released publicly. If you have the training data, then create a folder organized as follows:

In [2]:
"""
$ tree /content/drive/My\ Drive/protesta-data/task3/

task3/
├── dev.tsv
├── protest.py
├── protest.py.lock
├── test.tsv
└── train.tsv

0 directories, 5 files
"""
# Run this cell to mount your Google Drive.
from google.colab import drive
drive.mount('/content/drive/')
# fine-tuned LM -> protest-model/
# input -> protesta-data/task3
# output -> protest-predictions-and-models
!ls /content/drive/My\ Drive/protest-predictions-and-models
!ls /content/drive/My\ Drive/protesta-data/task3

Mounted at /content/drive/
tagger_bert-base-uncased_False.tar.gz
china_test.data  test.tagger-protest-model-False  train.tsv.ftfy
dev.tsv		 test.tagger-protest-model-True   train.tsv.original
protest.py	 test.tsv
protest.py.lock  train.tsv


## Cli

Our code comes with a fully-featured cli: just type `protesta` to see the available commands.

In [4]:
!protesta

Usage: protesta [OPTIONS] COMMAND [ARGS]...

Options:
  --install-completion [bash|zsh|fish|powershell|pwsh]
                                  Install
                                  completion for
                                  the specified
                                  shell.

  --show-completion [bash|zsh|fish|powershell|pwsh]
                                  Show completion
                                  for the
                                  specified shell,
                                  to copy it or
                                  customize the
                                  installation.

  --help                          Show this
                                  message and
                                  exit.


Commands:
  evaluate
  fit
  predict
  serve


## Train

The following cells trains 4 models:
- a tagger using `bert-base-uncased`
- a tagger using `bert-base-uncased` with a CRF layer on top
- a tagger using `protest-bert`
- a tagger using `protest-bert` with a CRF layer on top

In [5]:
# copy the fine-tune bert model here
!cp -r /content/drive/My\ Drive/protest-model/ ./

### Model 1

In [None]:
!protesta fit tagger bert-base-uncased /content/drive/My\ Drive/protesta-data/task3/

2020-10-14 21:38:56.927012: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-14 21:38:58.944983: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-10-14 21:38:58.945398: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-14 21:38:58.946008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-10-14 21:38:58.946046: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-14 21:38:59.173541: I tensorflow/stream_executor/pl

In [None]:
!protesta predict outputs/tagger_bert-base-uncased_False /content/drive/My\ Drive/protesta-data/task3/test.tsv
!protesta predict outputs/tagger_bert-base-uncased_False /content/drive/My\ Drive/protesta-data/task3/china_test.data

In [None]:
!mv /content/drive/My\ Drive/protesta-data/task3/test.tagger-bert-base-uncased-False ./task3_test.predict
!mv /content/drive/My\ Drive/protesta-data/task3/china_test.tagger-bert-base-uncased-False ./china_test.predict
!zip /content/drive/My\ Drive/protest-predictions-and-models/bert-base-uncased-False.zip ./task3_test.predict ./china_test.predict
!tar -cvf /content/drive/My\ Drive/protest-predictions-and-models/tagger_bert-base-uncased_False.tar.gz outputs/tagger_bert-base-uncased_False
!rm -rf outputs/ task3_test.predict ./china_test.predict

### Model 2

In [None]:
!protesta fit tagger bert-base-uncased /content/drive/My\ Drive/protesta-data/task3/ --crf-decoding

In [None]:
!protesta predict outputs/tagger_bert-base-uncased_True /content/drive/My\ Drive/protesta-data/task3/test.tsv
!protesta predict outputs/tagger_bert-base-uncased_True /content/drive/My\ Drive/protesta-data/task3/china_test.data

In [None]:
!mv /content/drive/My\ Drive/protesta-data/task3/test.tagger-bert-base-uncased-True ./task3_test.predict
!mv /content/drive/My\ Drive/protesta-data/task3/china_test.tagger-bert-base-uncased-True ./china_test.predict
!zip /content/drive/My\ Drive/protest-predictions-and-models/bert-base-uncased-True.zip ./task3_test.predict ./china_test.predict
!tar -cvf /content/drive/My\ Drive/protest-predictions-and-models/tagger_bert-base-uncased_True.tar.gz outputs/tagger_bert-base-uncased_True
!rm -rf outputs/ ./task3_test.predict ./china_test.predict

### Model 3

In [None]:
!protesta fit tagger protest-model /content/drive/My\ Drive/protesta-data/task3/

In [None]:
!protesta predict outputs/tagger_protest-model_False /content/drive/My\ Drive/protesta-data/task3/test.tsv
!protesta predict outputs/tagger_protest-model_False /content/drive/My\ Drive/protesta-data/task3/china_test.data

In [None]:
!mv /content/drive/My\ Drive/protesta-data/task3/test.tagger-protest-model-False ./task3_test.predict
!mv /content/drive/My\ Drive/protesta-data/task3/china_test.tagger-protest-model-False ./china_test.predict
!zip /content/drive/My\ Drive/protest-predictions-and-models/protest-model-False.zip ./task3_test.predict ./china_test.predict
!tar -cvf /content/drive/My\ Drive/protest-predictions-and-models/tagger_protest-model_False.tar.gz outputs/tagger_protest-model_False
!rm -rf outputs/ task3_test.predict ./china_test.predict

### Model 4

In [None]:
!protesta fit tagger protest-model /content/drive/My\ Drive/protesta-data/task3/ --crf-decoding

In [None]:
!protesta predict outputs/tagger_protest-model_True /content/drive/My\ Drive/protesta-data/task3/test.tsv
!protesta predict outputs/tagger_protest-model_True /content/drive/My\ Drive/protesta-data/task3/china_test.data

In [None]:
!mv /content/drive/My\ Drive/protesta-data/task3/test.tagger-protest-model-True ./task3_test.predict
!mv /content/drive/My\ Drive/protesta-data/task3/china_test.tagger-protest-model-True ./china_test.predict
!zip /content/drive/My\ Drive/protest-predictions-and-models/protest-model-True.zip ./task3_test.predict ./china_test.predict
!tar -cvf /content/drive/My\ Drive/protest-predictions-and-models/tagger_protest-model_True.tar.gz outputs/tagger_protest-model_True
!rm -rf outputs/ task3_test.predict ./china_test.predict