# NER_200

**Mount your Google Drive**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Install utils:

First, you'll need to enable GPUs for the notebook:

* Navigate to Edit→Notebook Settings

* Select GPU from the Hardware Accelerator drop-down

In [None]:
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize
import psutil
import humanize
import os
import GPUtil as GPU
GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn’t guaranteed
gpu = GPUs[0]
def printm():
 process = psutil.Process(os.getpid())
 print("Gen RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ), " | Proc size: " + humanize.naturalsize( process.memory_info().rss))
 print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()


**Clone the BioBERT Github repository**

In [None]:
!git clone https://github.com/dmis-lab/biobert.git

Install the requirements for BioBERT

This will take ~15 minutes and you may encounter: 

`ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = ...`

You can continue anyway.

In [None]:
!pip install -r biobert/requirements.txt

**Install Cuda**

Please note, you will be prompted with a yes/no question in the next code cell that you must respond 'y' to.
 
This takes ~3 minutes

In [None]:
!sudo dpkg -i drive/My\Drive/NER_200/cuda-repo-ubuntu1704_9.0.176-1_amd64.deb
!sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
!sudo apt-get update
!sudo apt-get install cuda-9-0

# **Fine tuning**

*If you just want to evaluate model, skip to evaluate*

**Configure the paths**

Make sure you have the correct NER input directory selected, e.g. if you want to fine tune with Hunflair all cell lines corpora combined, make sure the 'NER_DIR' is set to: 

`'/content/drive/MyDrive/NER_200/ner_inputs/HunFlair_NER_cell_lines/cell_line_all_combined'`

Your BioBERT should be put in the biobert_pretrained_weights folder.

In [None]:
import os
os.environ['BIOBERT_DIR']= '/content/drive/MyDrive/NER_200/biobert_pretrained_weights'
os.environ['NER_DIR'] = '/content/drive/MyDrive/NER_200/ner_inputs/HunFlair_NER_cell_lines/cell_line_all_combined'
os.environ['OUTPUT_DIR'] = '/content/drive/MyDrive/NER_200/ner_outputs'

**Run_ner.py**

Run the next command to create your own fine tuned model. If you want to configure the batch size or sequence length, edit this in run_ner.py. You can change the number of epochs in the command below. If you also want to evaluate the model, set the `--do_eval` flag to true, then you will get a evaluation on the model on the last global step. Your model will be put in the ner_outputs folder. Remember to change the number in the filename of model.ckpt-1000000 if you have another verision of BioBERT.

***Please note, this may take several hours. The corpora all_chemical and all_disease are too big for the free verision of Google Colab***

In [None]:
!python biobert/run_ner.py --do_train=true --do_eval=false --vocab_file=$BIOBERT_DIR/vocab.txt --bert_config_file=$BIOBERT_DIR/bert_config.json --init_checkpoint=$BIOBERT_DIR/model.ckpt-1000000 --num_train_epochs=12.0 --data_dir=$NER_DIR --output_dir=$OUTPUT_DIR


# **Evaluate**

If you want to evaluate a model, make sure you have downloaded the correct model and placed it in the ner_outputs folder. Configure the other paths accordingly.

In [None]:
import os
os.environ['BIOBERT_DIR']= '/content/drive/MyDrive/NER_200/biobert_pretrained_weights'
os.environ['NER_DIR'] = '/content/drive/MyDrive/NER_200/ner_inputs/HunFlair_NER_cell_lines/cell_line_all_combined'
os.environ['OUTPUT_DIR'] = '/content/drive/MyDrive/NER_200/ner_outputs'

**Run evaluation**

Remember to change the number in the filename of model.ckpt-1000000 if you have another verision of BioBERT


In [None]:
!python biobert/run_ner.py --do_train=false --do_eval=true --vocab_file=$BIOBERT_DIR/vocab.txt --bert_config_file=$BIOBERT_DIR/bert_config.json --init_checkpoint=$BIOBERT_DIR/model.ckpt-1000000 --num_train_epochs=12.0 --data_dir=$NER_DIR --output_dir=$OUTPUT_DIR
