Introduction

TopoBERT, a toponym recognition module based on a one-dimensional Convolutional Neural Network (CNN) and Bidirectional Encoder Representation from Transformers (BERT).

The struture of the model is shown in the figure below:

Performance

The model is trained using CoNLL-2003 and evaluated with Harvey2017.

Evaluated with Harvey2017 comparing with other popular models:

Model	Precision	Recall	F1-score
Stanford NER (broad location)	0.729	0.440	0.548
spaCy NER (board location)	0.461	0.304	0.366
BiLSTM-CRF	0.703	0.600	0.649
DM_NLP	0.729	0.680	0.703
NeuroTPR	0.787	0.678	0.728
TopoBERT	0.898	0.835	0.865

Requirements

geojson 2.5.0
matplotlib 3.4.3
nltk 3.6.5
numpy 1.21.2
pandas 1.3.3
regex 2021.9.30
scikit-learn 1.0
scipy 1.7.1
seaborn 0.11.2
seqeval 1.2.2
tokenizers 0.10.3
torch 1.9.1+cu102
torchvision 0.10.1+cu102
tqdm 4.62.3
transformers 4.11.2

How to deploy

Clone the source codes and place in a path that your project can access.
Download the pretrained models, unzip the file and place the folder with its original name in the pretrained_models folder.
Download the required dependencies
Ready to use.

Use case

from topo_bert import *  # Refer to the path you put your downloaded files

test_text = """HarveyStorm over Austin TX at 8: 00 AM CDT via Weather Underground"""
current_geoparser = TopoBERT()
result = current_geoparser.predict(test_text)
print(result)

The demo output results:

{
	'combined_addresses': ['Austin', 'TX'],
	'address_result': ['Austin', 'TX'],
    'full_address': 'Austin TX',
	'org_result': [{
		'word': 'HarveyStorm',
		'tag': 'B-ORG',
		'confidence': 0.9983394145965576
	}, {
		'word': 'over',
		'tag': 'O',
		'confidence': 0.9998631477355957
	}, {
		'word': 'Austin',
		'tag': 'B-LOC',
		'confidence': 0.9995130300521851
	}, {
		'word': 'TX',
		'tag': 'B-LOC',
		'confidence': 0.9928538203239441
	}, {
		'word': 'at',
		'tag': 'O',
		'confidence': 0.9999804496765137
	}, {
		'word': '8',
		'tag': 'O',
		'confidence': 0.9999505281448364
	}, {
		'word': ':',
		'tag': 'O',
		'confidence': 0.9999704360961914
	}, {
		'word': '00',
		'tag': 'O',
		'confidence': 0.99994957447052
	}, {
		'word': 'AM',
		'tag': 'O',
		'confidence': 0.9463351368904114
	}, {
		'word': 'CDT',
		'tag': 'B-MISC',
		'confidence': 0.5280879735946655
	}, {
		'word': 'via',
		'tag': 'O',
		'confidence': 0.9999630451202393
	}, {
		'word': 'Weather',
		'tag': 'B-ORG',
		'confidence': 0.9993113279342651
	}, {
		'word': 'Underground',
		'tag': 'I-ORG',
		'confidence': 0.9984622001647949
	}]
}

How to train your own model

You can train your own model with the code below:

model_args_used = {
       "--cuda": "use GPU",
       "--pretrained_model": "bert-large-cased",
       "--num_of_labels": 12,
       "--model_hidden_layer_size": 1024,
       "--no_hidden_layers": 24,
       "--dropout": 0.1,
       "--out-channel": 16,
       "--freeze-bert": False,
       "--verbose": "whether to output the test results"
   }

exp_train_config = {
           "--task_name": "bert_geoparsing",
           "--toponym_only": False,
           "--random_seed": 42,
           "--use_gpu": 1,
           "--train_data_type": "conll",
           "--validate_data_type": "conll",
           "--test_data_type": "conll",
           "--train_data_dir": "Put your own file absolute path here",
           "--validate_data_dir": "Put your own file absolute path here",
           "--test_data_dir": "Put your own file absolute path here",
           "--train_data_file": "train.txt",
           "--validate_data_file": "test.txt",
           "--test_data_file": "test.txt",
           "--is_validate": 1,
           "--is_test": 1,
           "--output_dir": "./outputs",
           "--cache_dir": "./cache",
           "--bert_model": "bert-large-cased",
           "--do_lower_case": False,
           "--max_seq_length": 128,
           "--training_epoch": 50,
           "--train_batch_size": 32,
           "--test_batch_size": 32,
           "--learning_rate": 5e-5,
           "--warm_up_proportion": 0.1,
           "--weight_decay": 0.01,
           "--adam_epsilon": 1e-8,
           "--max_grad_norm": 1.0,
           "--num_grad_accum_steps": 1,
           "--loss_scale": 0
       }

model = BertCNN1DNer(model_config=model_args_used)
current_trainer = TopoBertModelTrainer(model, train_config = exp_train_config)
current_trainer.train()

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
datasets		datasets
figures		figures
pretrained_models/topobert_cnn1d		pretrained_models/topobert_cnn1d
README.md		README.md
__init__.py		__init__.py
backbone_models.py		backbone_models.py
dataset_process.py		dataset_process.py
model_trainer.py		model_trainer.py
requirements.txt		requirements.txt
topo_bert.py		topo_bert.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Performance

Requirements

How to deploy

Use case

How to train your own model

About

Releases

Packages

Languages

SPGBarrett/gearlab_topobert

Folders and files

Latest commit

History

Repository files navigation

Introduction

Performance

Requirements

How to deploy

Use case

How to train your own model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages