# Word2Vec Tokenizer in BERT Demo
A demonstration of the Word2Vec tokenizer in BERT for the semantic segmentation of Advanced Air Mobility Data Reasoning Fabric semantic recommendation engine.

Import the BERT Model and tokenizer, then load them from the saved model.

In [5]:
!pip install simpletransformers

Collecting simpletransformers
  Using cached simpletransformers-0.63.6-py3-none-any.whl (249 kB)
Collecting datasets
  Using cached datasets-2.0.0-py3-none-any.whl (325 kB)
Collecting streamlit
  Using cached streamlit-1.8.1-py2.py3-none-any.whl (10.1 MB)
Collecting seqeval
  Using cached seqeval-1.2.2-py3-none-any.whl
Collecting transformers>=4.6.0
  Using cached transformers-4.17.0-py3-none-any.whl (3.8 MB)
Collecting sentencepiece
  Using cached sentencepiece-0.1.96-cp36-cp36m-macosx_10_6_x86_64.whl (1.2 MB)
Collecting tokenizers
  Using cached tokenizers-0.11.6.tar.gz (221 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting wandb>=0.10.32
  Using cached wandb-0.12.11-py2.py3-none-any.whl (1.7 MB)
Collecting huggingface-hub<1.0,>=0.1.0
  Using cached huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
Collecting filelock
  Using cached file

In [2]:
import torch
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import logging
import numpy as np

# Logging for model
logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

train_gpu = torch.cuda.is_available()
print('GPU Available:', train_gpu)

# Create a TransformerModel
model_args = ClassificationArgs(num_train_epochs=2,
                                reprocess_input_data=True,
                                overwrite_output_dir=True,
                                train_batch_size=1,
                                )

model = ClassificationModel('bert', '/Users/kamranhussain/Documents/GitHub/bert-model-eval/outputs/checkpoint-7702-epoch-1/', num_labels=3, args=model_args, use_cuda=False)

GPU Available: False


Prepare the model for evaluation and processing. Then request input from the user.

In [4]:
import fasttext
token = fasttext.load_model('/Users/kamranhussain/Documents/GitHub/bert-model-eval/Tokenizers/fasttexttokenizer.bin')



In [5]:
def get_result(statement):
    result = model.predict([statement])
    print(result)
    pos = np.where(result[1][0] == np.amax(result[1][0]))
    pos = int(pos[0])
    sentiment_dict = {0:'negative',1:'positive',2:'neutral'}
    print(sentiment_dict[pos])
    return sentiment_dict[pos]

sentiment = get_result(token.get_word_vector(input("Input a phrase for Validation: ")))
print("The input data was classified as:", sentiment)

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/1 [00:00<?, ?it/s]

ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).

In [9]:
get_result("heavy rain")

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

(array([0]), array([[ 4.99420118, -3.72706962, -1.971609  ]]))
negative


'negative'

In [13]:
get_result("High visibility, clear throughout the day")

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

(array([1]), array([[-2.09089732,  4.58306503, -0.95916444]]))
positive


'positive'

In [11]:
get_result("Slightly foggy with acceptable visibility")

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

(array([2]), array([[-1.08907032, -2.8768549 ,  2.93425846]]))
neutral


'neutral'

In [20]:
!pip list

Package                      Version
---------------------------- -------------------
absl-py                      1.0.0
aiohttp                      3.8.1
aiosignal                    1.2.0
altair                       4.2.0
argon2-cffi                  21.3.0
argon2-cffi-bindings         21.2.0
asttokens                    2.0.5
astunparse                   1.6.3
async-timeout                4.0.2
attrs                        21.4.0
backcall                     0.2.0
backports.zoneinfo           0.2.1
beautifulsoup4               4.10.0
bleach                       4.1.0
blinker                      1.4
cachetools                   5.0.0
certifi                      2021.10.8
cffi                         1.15.0
charset-normalizer           2.0.12
click                        8.0.4
cycler                       0.11.0
datasets                     2.0.0
debugpy                      1.6.0
decorator                    5.1.1
defusedxml                   0.7.1
dil