<a href="https://colab.research.google.com/github/chewzzz1014/fyp/blob/master/ner/src/train_ner_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Train NER Models

In [2]:
# mount drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
!mkdir spacy_ner_data

mkdir: cannot create directory ‘spacy_ner_data’: File exists


In [4]:
import json
import random
from sklearn.model_selection import train_test_split
import spacy
from spacy.tokens import DocBin

# Load JSON data
with open('/content/drive/MyDrive/FYP/Implementation/Resume Dataset/40_resumes_annotated.json', "r") as f:
    data = json.load(f)

def remove_overlapping_entities(entities):
    """Remove overlapping entities from the list."""
    entities = sorted(entities, key=lambda x: x[0])  # Sort by start position
    non_overlapping = []
    last_end = -1
    for start, end, label in entities:
        if start >= last_end:  # Only add if there's no overlap with the previous entity
            non_overlapping.append((start, end, label))
            last_end = end
    return non_overlapping

# Function to convert JSON data to Spacy's DocBin format
def convert_to_spacy_format(data):
    nlp = spacy.blank("en")  # Load a blank Spacy model
    doc_bin = DocBin()  # Container for our docs

    for item in data:
        text = item['data']['Text']  # Full document text
        entities = []

        for annotation in item['annotations'][0]['result']:
            start = annotation['value']['start']
            end = annotation['value']['end']
            label = annotation['value']['labels'][0]  # Entity label
            entities.append((start, end, label))

        entities = remove_overlapping_entities(entities)  # Remove overlapping entities
        # Create a Spacy doc and add entities to it
        doc = nlp.make_doc(text)
        spans = [doc.char_span(start, end, label=label) for start, end, label in entities]
        # Filter out None spans if Spacy can't align the character indices with tokens
        spans = [span for span in spans if span is not None]
        doc.ents = spans  # Assign entities to the doc
        doc_bin.add(doc)

    return doc_bin

# Split data into train and test sets
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)

# Convert train and test sets to Spacy format
train_doc_bin = convert_to_spacy_format(train_data)
test_doc_bin = convert_to_spacy_format(test_data)

# Save the train and test data to .spacy files
train_doc_bin.to_disk("spacy_ner_data/train_data.spacy")
test_doc_bin.to_disk("spacy_ner_data/test_data.spacy")

## Spacy NER

In [5]:
# create base_config.cfg and paste the config generated from spacy widget
# update train and test file path
!touch base_config.cfg

In [7]:
# generate config.cfg from base_config.cfg
!python -m spacy init fill-config base_config.cfg config.cfg

[38;5;2m✔ Auto-filled config with all values[0m
[38;5;2m✔ Saved config[0m
config.cfg
You can now add your data and train your pipeline:
python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy


In [19]:
!python -m spacy download en_core_web_lg

Collecting en-core-web-lg==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl (587.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m587.7/587.7 MB[0m [31m801.1 kB/s[0m eta [36m0:00:00[0m
Installing collected packages: en-core-web-lg
Successfully installed en-core-web-lg-3.7.1
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [15]:
# train model using hyperparameters set in config.cfg
# trained model in output/ dir
!python -m spacy train config.cfg --output ./output

[38;5;4mℹ Saving to output directory: output[0m
[38;5;4mℹ Using CPU[0m
[1m
[38;5;2m✔ Initialized pipeline[0m
[1m
[38;5;4mℹ Pipeline: ['tok2vec', 'ner'][0m
[38;5;4mℹ Initial learn rate: 0.0005[0m
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00    238.65    0.00    0.00    0.00    0.00
  3     100       2129.88   9813.94   18.94   21.25   17.09    0.19
  6     200        433.86   5971.17   27.32   32.56   23.53    0.27
  9     300        174.71   4187.37   32.45   28.57   37.54    0.32
 12     400        655.81   3933.13   30.71   26.71   36.13    0.31
 15     500        834.10   2505.04   27.80   19.36   49.30    0.28
 18     600        112.18   1961.11   31.86   27.24   38.38    0.32
 21     700         97.26   1453.16   34.12   30.72   38.38    0.34
 25     800        109.06   1275.19   32.26   25.66   43.42    0.32
 28     900         98.32    981.77   30.19  

In [17]:
# evaluate trained model performance
# store output and visualization into result/ dir
!python -m spacy evaluate output/model-best spacy_ner_data/test_data.spacy -dp output

[38;5;4mℹ Using CPU[0m
[1m

TOK     100.00
NER P   30.00 
NER R   40.34 
NER F   34.41 
SPEED   2476  

[1m

                 P        R        F
NAME        100.00    62.50    76.92
PHONE       100.00   100.00   100.00
SKILL        24.81    36.84    29.65
WORK PER     48.39    75.00    58.82
COMPANY      14.29     5.88     8.33
JOB          53.85    33.33    41.18
STUDY PER    50.00    50.00    50.00
DEG          33.33    50.00    40.00
UNI          55.56    71.43    62.50
LOC         100.00    75.00    85.71

<IPython.core.display.HTML object>
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/spacy/__main__.py", line 4, in <module>
    setup_cli()
  File "/usr/local/lib/python3.10/dist-packages/spacy/cli/_util.py", line 87, in setup_cli
    co

In [18]:
# make prediction
import spacy
resume_text = 'dot net developer robert smith phone 123 456 78 99 email infoqwikresumecom website wwwqwikresumecom linkedin linkedincomqwikresume address 1737 marshville road alabama objective dot net developer seven years experience design development webbased windows based applications using net technologies hands experience phases software development life cycle sdlc like requirement gathering analysis architectural detail design documentation development testing implementation using agile methodologies like scrum xp test driven environment skills programming languages net technologies c sql plsql web forms win forms web technologies scripting html dhtml css xmlangular jsbootstrap ajax toolkit jquery javascript telerikkendo ui database ms sql server operating systems windows 8 packages ms office amp visio ms frontpage iis version control tools git hub vss tfs methodologies agile oops scrum soa reporting crystal reports net ms sql server reporting services ssrs work experience dot net developer charter communications june 2015 present assisting developing architectural design functional specifications involving analysis designing coding implementation application developing dynamic web page implemented creatively implemented design requirements using client side scripting language technologies assisting agile software development management activities respond unpredictability iterative sprints designing developing web application migrating project mvc architecture using mvc 3 separate internal representations information involved developing telerik kendo ul controls building application enabling focus value generating development tasks involving development presentation logic gui aspnet pages dot net developer abc corp 2011 2015 coded updated maintained computer programs assisted developers prepare high level technical design documents performed code enhancements assist performance analysis provided technical guidance programming standards team trained users team coordinated client amp offshore tearn meet project objectives participated backlog grooming meeting work client remove barriers team worked individual developer also manage offshore team free resume template copyright qwikresumecom usage guidelines'
nlp = spacy.load("output/model-best")
doc = nlp(resume_text)

print(doc.ents)

for ent in doc.ents:
    print(f"{ent.text}: {ent.label_}")

(robert smith, 123 456 78 99, infoqwikresumecom, alabama, agile, scrum, c, sql, plsql, scripting, html, css, jquery, ui, database, ms sql, windows, git, tfs, agile, oops, scrum, soa, dot net developer, june 2015 present assisting, mvc, mvc, gui, aspnet, abc corp, 2011 2015)
robert smith: NAME
123 456 78 99: PHONE
infoqwikresumecom: EMAIL
alabama: LOC
agile: SKILL
scrum: SKILL
c: SKILL
sql: SKILL
plsql: SKILL
scripting: SKILL
html: SKILL
css: SKILL
jquery: SKILL
ui: SKILL
database: SKILL
ms sql: SKILL
windows: SKILL
git: SKILL
tfs: SKILL
agile: SKILL
oops: SKILL
scrum: SKILL
soa: SKILL
dot net developer: JOB
june 2015 present assisting: WORK PER
mvc: SKILL
mvc: SKILL
gui: SKILL
aspnet: SKILL
abc corp: COMPANY
2011 2015: WORK PER


In [19]:
from spacy import displacy
displacy.render(doc, style="ent", jupyter=True)

In [None]:
# download trained model

## Flair NER

In [5]:
!pip install flair



In [6]:
import spacy
from spacy.tokens import DocBin
import os

def convert_spacy_to_flair(input_file, output_file):
    """
    Convert SpaCy binary format to Flair's CoNLL format.

    Args:
        input_file (str): Path to SpaCy binary file (.spacy)
        output_file (str): Path to output file for Flair format
    """
    # Load spaCy model
    nlp = spacy.blank("en")

    # Load the DocBin
    doc_bin = DocBin().from_disk(input_file)
    docs = list(doc_bin.get_docs(nlp.vocab))

    with open(output_file, 'w', encoding='utf-8') as f:
        for doc in docs:
            tokens = [(t.text, t.ent_iob_, t.ent_type_) for t in doc]

            # Write tokens in CoNLL format
            for token in tokens:
                text, iob, ent_type = token

                # Convert spaCy IOB to CoNLL format
                if iob == 'O':
                    tag = 'O'
                else:
                    tag = f'{iob}-{ent_type}' if ent_type else 'O'

                # Write line: token and NER tag
                f.write(f'{text} {tag}\n')

            # Empty line between sentences
            f.write('\n')

def convert_spacy_json_to_flair(input_file, output_file):
    """
    Convert SpaCy JSON format to Flair's CoNLL format.

    Args:
        input_file (str): Path to JSON file with SpaCy annotations
        output_file (str): Path to output file for Flair format
    """
    import json

    nlp = spacy.blank("en")

    with open(input_file, 'r', encoding='utf-8') as f:
        training_data = json.load(f)

    with open(output_file, 'w', encoding='utf-8') as f:
        for example in training_data:
            text = example['text']
            ents = example.get('entities', [])

            # Create a spaCy doc
            doc = nlp(text)

            # Add entities to doc
            spans = []
            for start, end, label in ents:
                span = doc.char_span(start, end, label=label)
                if span is not None:
                    spans.append(span)
            doc.ents = spans

            # Convert to CoNLL format
            tokens = [(t.text, t.ent_iob_, t.ent_type_) for t in doc]

            for token in tokens:
                text, iob, ent_type = token
                if iob == 'O':
                    tag = 'O'
                else:
                    tag = f'{iob}-{ent_type}' if ent_type else 'O'
                f.write(f'{text} {tag}\n')

            f.write('\n')

# Example usage for JSON format
flair_train_json = "flair_train.txt"
flair_test_json = "flair_test.txt"

convert_spacy_to_flair('/content/spacy_ner_data/train_data.spacy', flair_train_json)
convert_spacy_to_flair('/content/spacy_ner_data/test_data.spacy', flair_test_json)

In [24]:
import spacy
from spacy.training import Corpus

# !python -m spacy download de_core_news_sm
nlp = spacy.load("de_core_news_sm")
corpus = Corpus("/content/spacy_ner_data/train_data.spacy")

data = corpus(nlp)

# Flair supports BIO and BIOES, see https://github.com/flairNLP/flair/issues/875
def rename_biluo_to_bioes(old_tag):
    new_tag = ""
    try:
        if old_tag.startswith("L"):
            new_tag = "E" + old_tag[1:]
        elif old_tag.startswith("U"):
            new_tag = "S" + old_tag[1:]
        else:
            new_tag = old_tag
    except:
        pass
    return new_tag


def generate_corpus():
    corpus = []
    n_ex = 0
    for example in data:
        n_ex += 1
        text = example.text
        doc = nlp(text)
        tags = example.get_aligned_ner()
        # Check if it's an empty list of NER tags.
        if None in tags:
            pass
        else:
            new_tags = [rename_biluo_to_bioes(tag) for tag in tags]
            for token, tag in zip(doc,new_tags):
                row = token.text +' '+ token.pos_ +' ' +tag + '\n'
                corpus.append(row)
            corpus.append('\n')
    return corpus

def write_file(filepath):
    with open(filepath, 'w', encoding='utf-8') as f:
        corpus = generate_corpus()
        f.writelines(corpus)

def main():
    write_file('flair_train.txt')

main()

In [25]:
from flair.data import Corpus
from flair.datasets import ColumnCorpus

# Define columns for CoNLL (0: word, 1: label)
columns = {0: 'text', 1: 'pos', 2: 'ner'}

# Set data folder and file names
data_folder = './'
train_file = 'flair_train.txt'
test_file = 'flair_test.txt'

# Load the corpus
corpus: Corpus = ColumnCorpus(data_folder, columns,
                              train_file=train_file,
                              test_file=test_file)

2024-10-30 12:29:20,803 Reading data from .
2024-10-30 12:29:20,809 Train: flair_train.txt
2024-10-30 12:29:20,811 Dev: None
2024-10-30 12:29:20,813 Test: flair_test.txt
2024-10-30 12:29:21,944 No dev split found. Using 10% (i.e. 3 samples) of the train split as dev data


In [26]:
# create NER tagger
from flair.embeddings import WordEmbeddings, StackedEmbeddings
from flair.models import SequenceTagger

embeddings = StackedEmbeddings([
                WordEmbeddings('glove'),
                WordEmbeddings('en-crawl')
            ])

tagger = SequenceTagger(hidden_size=256,
                         embeddings=embeddings,
                         tag_dictionary=corpus.make_tag_dictionary(tag_type="ner"),
                         tag_type='ner',
                         use_crf=True)

2024-10-30 12:29:53,831 SequenceTagger predicts: Dictionary with 3 tags: O, <START>, <STOP>


  tag_dictionary=corpus.make_tag_dictionary(tag_type="ner"),


In [28]:
# train flair ner model
from flair.trainers import ModelTrainer
from flair.training_utils import EvaluationMetric

trainer = ModelTrainer(tagger, corpus)

trainer.train(
    base_path='flair_output/',
    learning_rate=0.001,
    mini_batch_size=32,
    max_epochs=100,
    patience=3,
    embeddings_storage_mode='gpu',
    use_amp=True,  # Use mixed precision training
    train_with_dev=False
)

2024-10-30 12:33:35,385 ----------------------------------------------------------------------------------------------------
2024-10-30 12:33:35,389 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings(
      'glove'
      (embedding): Embedding(400001, 100)
    )
    (list_embedding_1): WordEmbeddings(
      'en-crawl'
      (embedding): Embedding(1000001, 300)
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=400, out_features=400, bias=True)
  (rnn): LSTM(400, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=3, bias=True)
  (loss_function): ViterbiLoss()
  (crf): CRF()
)"
2024-10-30 12:33:35,392 ----------------------------------------------------------------------------------------------------
2024-10-30 12:33:35,396 Corpus: 29 train + 3 dev + 8 test sentences
2024-10-30 12:33:35,408 -------------------------------------

  scaler = torch.cuda.amp.GradScaler(enabled=use_amp and flair.device.type != "cpu")


2024-10-30 12:35:18,566 epoch 1 - iter 1/1 - loss 0.00000792 - time (sec): 103.09 - samples/sec: 119.58 - lr: 0.001000 - momentum: 0.000000
2024-10-30 12:35:18,574 ----------------------------------------------------------------------------------------------------
2024-10-30 12:35:18,579 EPOCH 1 done: loss 0.0000 - lr: 0.001000


100%|██████████| 1/1 [00:00<00:00,  2.29it/s]

2024-10-30 12:35:19,046 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:35:19,050  - 0 epochs without improvement
2024-10-30 12:35:19,054 ----------------------------------------------------------------------------------------------------





2024-10-30 12:37:00,303 epoch 2 - iter 1/1 - loss 0.00000808 - time (sec): 101.25 - samples/sec: 121.76 - lr: 0.001000 - momentum: 0.000000
2024-10-30 12:37:00,306 ----------------------------------------------------------------------------------------------------
2024-10-30 12:37:00,311 EPOCH 2 done: loss 0.0000 - lr: 0.001000


100%|██████████| 1/1 [00:00<00:00,  2.33it/s]

2024-10-30 12:37:00,779 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:37:00,784  - 1 epochs without improvement
2024-10-30 12:37:00,788 ----------------------------------------------------------------------------------------------------





2024-10-30 12:38:45,246 epoch 3 - iter 1/1 - loss 0.00000745 - time (sec): 104.46 - samples/sec: 118.02 - lr: 0.001000 - momentum: 0.000000
2024-10-30 12:38:45,256 ----------------------------------------------------------------------------------------------------
2024-10-30 12:38:45,260 EPOCH 3 done: loss 0.0000 - lr: 0.001000


100%|██████████| 1/1 [00:00<00:00,  2.26it/s]

2024-10-30 12:38:45,732 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:38:45,736  - 2 epochs without improvement
2024-10-30 12:38:45,739 ----------------------------------------------------------------------------------------------------





2024-10-30 12:40:33,934 epoch 4 - iter 1/1 - loss 0.00000808 - time (sec): 108.19 - samples/sec: 113.94 - lr: 0.001000 - momentum: 0.000000
2024-10-30 12:40:33,942 ----------------------------------------------------------------------------------------------------
2024-10-30 12:40:33,951 EPOCH 4 done: loss 0.0000 - lr: 0.001000


100%|██████████| 1/1 [00:00<00:00,  2.25it/s]

2024-10-30 12:40:34,426 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:40:34,429  - 3 epochs without improvement
2024-10-30 12:40:34,432 ----------------------------------------------------------------------------------------------------





2024-10-30 12:42:18,597 epoch 5 - iter 1/1 - loss 0.00000760 - time (sec): 104.16 - samples/sec: 118.36 - lr: 0.001000 - momentum: 0.000000
2024-10-30 12:42:18,605 ----------------------------------------------------------------------------------------------------
2024-10-30 12:42:18,614 EPOCH 5 done: loss 0.0000 - lr: 0.001000


100%|██████████| 1/1 [00:00<00:00,  1.40it/s]

2024-10-30 12:42:19,372 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:42:19,379  - 4 epochs without improvement (above 'patience')-> annealing learning_rate to [0.0005]
2024-10-30 12:42:19,383 ----------------------------------------------------------------------------------------------------





2024-10-30 12:44:02,708 epoch 6 - iter 1/1 - loss 0.00000776 - time (sec): 103.32 - samples/sec: 119.32 - lr: 0.000500 - momentum: 0.000000
2024-10-30 12:44:02,718 ----------------------------------------------------------------------------------------------------
2024-10-30 12:44:02,726 EPOCH 6 done: loss 0.0000 - lr: 0.000500


100%|██████████| 1/1 [00:00<00:00,  1.50it/s]

2024-10-30 12:44:03,437 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:44:03,445  - 1 epochs without improvement
2024-10-30 12:44:03,448 ----------------------------------------------------------------------------------------------------





2024-10-30 12:45:49,348 epoch 7 - iter 1/1 - loss 0.00000792 - time (sec): 105.90 - samples/sec: 116.42 - lr: 0.000500 - momentum: 0.000000
2024-10-30 12:45:49,362 ----------------------------------------------------------------------------------------------------
2024-10-30 12:45:49,373 EPOCH 7 done: loss 0.0000 - lr: 0.000500


100%|██████████| 1/1 [00:00<00:00,  1.44it/s]

2024-10-30 12:45:50,106 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:45:50,114  - 2 epochs without improvement
2024-10-30 12:45:50,120 ----------------------------------------------------------------------------------------------------





2024-10-30 12:47:34,047 epoch 8 - iter 1/1 - loss 0.00000760 - time (sec): 103.92 - samples/sec: 118.62 - lr: 0.000500 - momentum: 0.000000
2024-10-30 12:47:34,059 ----------------------------------------------------------------------------------------------------
2024-10-30 12:47:34,064 EPOCH 8 done: loss 0.0000 - lr: 0.000500


100%|██████████| 1/1 [00:00<00:00,  2.20it/s]

2024-10-30 12:47:34,552 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:47:34,557  - 3 epochs without improvement
2024-10-30 12:47:34,559 ----------------------------------------------------------------------------------------------------





2024-10-30 12:49:16,434 epoch 9 - iter 1/1 - loss 0.00000824 - time (sec): 101.87 - samples/sec: 121.01 - lr: 0.000500 - momentum: 0.000000
2024-10-30 12:49:16,444 ----------------------------------------------------------------------------------------------------
2024-10-30 12:49:16,450 EPOCH 9 done: loss 0.0000 - lr: 0.000500


100%|██████████| 1/1 [00:00<00:00,  2.23it/s]

2024-10-30 12:49:16,934 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:49:16,938  - 4 epochs without improvement (above 'patience')-> annealing learning_rate to [0.00025]
2024-10-30 12:49:16,942 ----------------------------------------------------------------------------------------------------





2024-10-30 12:51:04,196 epoch 10 - iter 1/1 - loss 0.00000792 - time (sec): 107.25 - samples/sec: 114.94 - lr: 0.000250 - momentum: 0.000000
2024-10-30 12:51:04,210 ----------------------------------------------------------------------------------------------------
2024-10-30 12:51:04,222 EPOCH 10 done: loss 0.0000 - lr: 0.000250


100%|██████████| 1/1 [00:00<00:00,  1.75it/s]

2024-10-30 12:51:04,831 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:51:04,835  - 1 epochs without improvement
2024-10-30 12:51:04,837 ----------------------------------------------------------------------------------------------------





2024-10-30 12:52:46,170 epoch 11 - iter 1/1 - loss 0.00000776 - time (sec): 101.33 - samples/sec: 121.66 - lr: 0.000250 - momentum: 0.000000
2024-10-30 12:52:46,179 ----------------------------------------------------------------------------------------------------
2024-10-30 12:52:46,186 EPOCH 11 done: loss 0.0000 - lr: 0.000250


100%|██████████| 1/1 [00:00<00:00,  2.26it/s]

2024-10-30 12:52:46,658 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:52:46,661  - 2 epochs without improvement
2024-10-30 12:52:46,671 ----------------------------------------------------------------------------------------------------





2024-10-30 12:54:30,668 epoch 12 - iter 1/1 - loss 0.00000840 - time (sec): 104.00 - samples/sec: 118.54 - lr: 0.000250 - momentum: 0.000000
2024-10-30 12:54:30,681 ----------------------------------------------------------------------------------------------------
2024-10-30 12:54:30,698 EPOCH 12 done: loss 0.0000 - lr: 0.000250


100%|██████████| 1/1 [00:00<00:00,  1.39it/s]

2024-10-30 12:54:31,461 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:54:31,468  - 3 epochs without improvement
2024-10-30 12:54:31,473 ----------------------------------------------------------------------------------------------------





2024-10-30 12:56:15,058 epoch 13 - iter 1/1 - loss 0.00000760 - time (sec): 103.58 - samples/sec: 119.02 - lr: 0.000250 - momentum: 0.000000
2024-10-30 12:56:15,068 ----------------------------------------------------------------------------------------------------
2024-10-30 12:56:15,078 EPOCH 13 done: loss 0.0000 - lr: 0.000250


100%|██████████| 1/1 [00:00<00:00,  1.43it/s]

2024-10-30 12:56:15,822 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:56:15,828  - 4 epochs without improvement (above 'patience')-> annealing learning_rate to [0.000125]
2024-10-30 12:56:15,831 ----------------------------------------------------------------------------------------------------





2024-10-30 12:57:59,145 epoch 14 - iter 1/1 - loss 0.00000776 - time (sec): 103.31 - samples/sec: 119.33 - lr: 0.000125 - momentum: 0.000000
2024-10-30 12:57:59,154 ----------------------------------------------------------------------------------------------------
2024-10-30 12:57:59,163 EPOCH 14 done: loss 0.0000 - lr: 0.000125


100%|██████████| 1/1 [00:00<00:00,  2.30it/s]

2024-10-30 12:57:59,625 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:57:59,628  - 1 epochs without improvement
2024-10-30 12:57:59,631 ----------------------------------------------------------------------------------------------------





2024-10-30 12:59:43,283 epoch 15 - iter 1/1 - loss 0.00000745 - time (sec): 103.65 - samples/sec: 118.94 - lr: 0.000125 - momentum: 0.000000
2024-10-30 12:59:43,291 ----------------------------------------------------------------------------------------------------
2024-10-30 12:59:43,300 EPOCH 15 done: loss 0.0000 - lr: 0.000125


100%|██████████| 1/1 [00:00<00:00,  2.33it/s]

2024-10-30 12:59:43,759 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 12:59:43,762  - 2 epochs without improvement
2024-10-30 12:59:43,765 ----------------------------------------------------------------------------------------------------





2024-10-30 13:01:29,602 epoch 16 - iter 1/1 - loss 0.00000824 - time (sec): 105.84 - samples/sec: 116.48 - lr: 0.000125 - momentum: 0.000000
2024-10-30 13:01:29,611 ----------------------------------------------------------------------------------------------------
2024-10-30 13:01:29,618 EPOCH 16 done: loss 0.0000 - lr: 0.000125


100%|██████████| 1/1 [00:00<00:00,  2.21it/s]

2024-10-30 13:01:30,105 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 13:01:30,108  - 3 epochs without improvement
2024-10-30 13:01:30,113 ----------------------------------------------------------------------------------------------------





2024-10-30 13:03:14,400 epoch 17 - iter 1/1 - loss 0.00000808 - time (sec): 104.28 - samples/sec: 118.22 - lr: 0.000125 - momentum: 0.000000
2024-10-30 13:03:14,413 ----------------------------------------------------------------------------------------------------
2024-10-30 13:03:14,421 EPOCH 17 done: loss 0.0000 - lr: 0.000125


100%|██████████| 1/1 [00:00<00:00,  1.35it/s]

2024-10-30 13:03:15,204 DEV : loss 1.9422483887865383e-07 - f1-score (micro avg)  0.0
2024-10-30 13:03:15,208  - 4 epochs without improvement (above 'patience')-> annealing learning_rate to [6.25e-05]
2024-10-30 13:03:15,211 ----------------------------------------------------------------------------------------------------
2024-10-30 13:03:15,214 learning rate too small - quitting training!
2024-10-30 13:03:15,216 ----------------------------------------------------------------------------------------------------
2024-10-30 13:03:15,218 Saving model ...





2024-10-30 13:03:52,671 Done.
2024-10-30 13:03:52,673 ----------------------------------------------------------------------------------------------------
2024-10-30 13:03:52,676 Testing using last state of model ...


100%|██████████| 1/1 [00:04<00:00,  4.02s/it]

2024-10-30 13:03:56,729 
Results:
- F-score (micro) 0.0
- F-score (macro) 0.0
- Accuracy 0.0

By class:
              precision    recall  f1-score   support

       SKILL     0.0000    0.0000    0.0000     266.0
         JOB     0.0000    0.0000    0.0000      21.0
        WORK     0.0000    0.0000    0.0000      20.0
     COMPANY     0.0000    0.0000    0.0000      17.0
        NAME     0.0000    0.0000    0.0000       8.0
         UNI     0.0000    0.0000    0.0000       7.0
       PHONE     0.0000    0.0000    0.0000       6.0
         DEG     0.0000    0.0000    0.0000       6.0
         LOC     0.0000    0.0000    0.0000       4.0
       STUDY     0.0000    0.0000    0.0000       2.0

   micro avg     0.0000    0.0000    0.0000     357.0
   macro avg     0.0000    0.0000    0.0000     357.0
weighted avg     0.0000    0.0000    0.0000     357.0

2024-10-30 13:03:56,730 ----------------------------------------------------------------------------------------------------





{'test_score': 0.0}

In [32]:
# evaluate model
from flair.data import Corpus
from flair.datasets import ColumnCorpus
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer

# Load the trained model
model = SequenceTagger.load('flair_output/best-model.pt')

# Evaluate the model on the test set
result = model.evaluate(corpus.test, gold_label_type='ner', mini_batch_size=32)

# Print the results
# print("Evaluation Loss:", eval_loss)
print(result.detailed_results)  # This will print the precision, recall, and F1-score per entity type

FileNotFoundError: [Errno 2] No such file or directory: '/root/.flair/models/best-model.pt'

In [16]:
# make prediction
import flair
model = SequenceTagger.load('flair_output/final-model.pt')
resume_text = 'dot net developer robert smith phone 123 456 78 99 email infoqwikresumecom website wwwqwikresumecom linkedin linkedincomqwikresume address 1737 marshville road alabama objective dot net developer seven years experience design development webbased windows based applications using net technologies hands experience phases software development life cycle sdlc like requirement gathering analysis architectural detail design documentation development testing implementation using agile methodologies like scrum xp test driven environment skills programming languages net technologies c sql plsql web forms win forms web technologies scripting html dhtml css xmlangular jsbootstrap ajax toolkit jquery javascript telerikkendo ui database ms sql server operating systems windows 8 packages ms office amp visio ms frontpage iis version control tools git hub vss tfs methodologies agile oops scrum soa reporting crystal reports net ms sql server reporting services ssrs work experience dot net developer charter communications june 2015 present assisting developing architectural design functional specifications involving analysis designing coding implementation application developing dynamic web page implemented creatively implemented design requirements using client side scripting language technologies assisting agile software development management activities respond unpredictability iterative sprints designing developing web application migrating project mvc architecture using mvc 3 separate internal representations information involved developing telerik kendo ul controls building application enabling focus value generating development tasks involving development presentation logic gui aspnet pages dot net developer abc corp 2011 2015 coded updated maintained computer programs assisted developers prepare high level technical design documents performed code enhancements assist performance analysis provided technical guidance programming standards team trained users team coordinated client amp offshore tearn meet project objectives participated backlog grooming meeting work client remove barriers team worked individual developer also manage offshore team free resume template copyright qwikresumecom usage guidelines'
sentence = flair.data.Sentence(resume_text)

model.predict(sentence)

print(sentence.to_tagged_string())

2024-10-30 11:56:27,350 SequenceTagger predicts: Dictionary with 16 tags: O, PROPN, ADV, NOUN, NUM, X, VERB, ADJ, DET, ADP, AUX, CCONJ, PUNCT, PRON, <START>, <STOP>
Sentence[278]: "dot net developer robert smith phone 123 456 78 99 email infoqwikresumecom website wwwqwikresumecom linkedin linkedincomqwikresume address 1737 marshville road alabama objective dot net developer seven years experience design development webbased windows based applications using net technologies hands experience phases software development life cycle sdlc like requirement gathering analysis architectural detail design documentation development testing implementation using agile methodologies like scrum xp test driven environment skills programming languages net technologies c sql plsql web forms win forms web technologies scripting html dhtml css xmlangular jsbootstrap ajax toolkit jquery javascript telerikkendo ui database ms sql server operating systems windows 8 packages ms office amp visio ms frontpage i