<a href="https://colab.research.google.com/github/dimaknyaz/DataProcessing/blob/main/Spacy_NER.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Spacy NER model

Source: https://www.machinelearningplus.com/nlp/training-custom-ner-model-in-spacy/

Maybe useful: https://spacy.io/usage/training/#ner

In [None]:
# Load spaCy model and check if it has NER
import spacy
nlp = spacy.load('en_core_web_sm')

# Make sure this includes 'ner'
nlp.pipe_names

['tagger', 'parser', 'ner']

In [None]:
# Perform default NER on a vacancy
vacancy_text = 'You know better than anyone how to bind other people to you, people for whom you can mean something. A great new assignment for jobseekers and the right match for their sourcing issue for clients. If you are also curious about market developments, would you like to hear more about projects within the industry and are you able to translate this information into opportunities for Brunel, then a role as a Sales Consultant is perfect for you! About this position As a Sales Consultant you always have something to do. Your main goal is to make the best match between clients and candidates, and that involves a lot. Your work does not stop at finding and connecting both parties. You are also responsible for expanding and maintaining your own network of candidates and clients. That means that you are in constant contact with both parties. Keeping an overview and keeping different balls in the air is no problem for you. Your focus area will be on specialists and organizations within the Northern Netherlands Industry. Together with your team you operate the fields: Maintenance & Asset Management, Industrial Automation, Supply Chain & Logistics and Innovation & Development. Your colleagues are all commercial, enterprising and ambitious. Together we aim for the best result and recognisability in the market.  About you In short, you have a mega palette of tasks and responsibilities. It is therefore important that you can keep an overview. And we ask more of you. So you have at least: A college degree from a commercial or technical-related study At least 1 year of experience in sales and preferably in job placement A lot of ambition to grow in your profession A representative attitude Strongly developed communication skills and a lot of persuasiveness And a valid driver\'s license B.  What we offer Give a little, take a little. You bring your expertise with you, and we provide a salary that matches it. You will also receive a laptop, smartphone and company car from us. And you have the chance to win interesting bonuses! We also arrange that you get a discount on the gym, cultural trips, insurance and your pension premium. And we don\'t take it overnight either with regard to your professional growth. You follow a tailor-made training program from the outset that is provided by renowned institutes. About us Don\'t be surprised if you hear a colleague in the office talking about how our location in Singapore handles certain matters, or if your supervisor makes a call in fluent German. We work from 44 countries around the world and we are proud of that! We would never have grown this big if we didn\'t just go for the best match between professional and company every day. And you are an indispensable link in this. So do you step into Brunel\'s world? Apply immediately!'

doc = nlp(vacancy_text)
for ent in doc.ents:
    print(ent.text, ent.label_)

Brunel PERSON
the Northern Netherlands Industry ORG
Maintenance & Asset Management ORG
Innovation & Development ORG
At least 1 year DATE
Strongly ORG
Singapore GPE
German NORP
44 CARDINAL
Brunel PERSON


In [None]:
# Get the NER pipeline
ner = nlp.get_pipe('ner')

# Add label
ner.add_label('SKILL')
optimizer = nlp.resume_training()
move_names = list(ner.move_names)


# Disable pipeline components we don't need to change
pipe_exceptions = ['ner', 'trf_wordpiecer', 'trf_tok2vec']
unaffected_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]

In [None]:
# Function to train NER pipeline to recognize skills
import random
import time
from spacy.util import minibatch, compounding
from pathlib import Path


k = 30
sizes = compounding(1.0, 4.0, 1.001)
drop = 0.35


def train_ner(train_data, iterations=k, sizes=sizes, drop=drop, optimizer=optimizer):
    with nlp.disable_pipes(*unaffected_pipes):

        for i in range(iterations):
            print('Iteration', i+1)
            start = time.time()

            # Shuffle examples before every iteration
            random.shuffle(train_data)
            losses = {}

            # Batch up the examples using spaCy's minibatch
            batches = minibatch(train_data, size=sizes)

            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(
                            texts,       # batch of texts
                            annotations, # batch of annotations
                            sgd=optimizer,
                            drop=drop,    # dropout - make it harder to memorise data
                            losses=losses,
                        )
            
            end = time.time()
            print('Time:', end-start)

In [None]:
# From here we will start training the model with our data
# The training file can be found at https://github.com/dimaknyaz/DataProcessing/blob/main/DataFiles/training_set_1.txt
import pickle
with open("training_set_1.txt", "rb") as fp:
    data = pickle.load(fp)

data[0:10]

[(' if you are also curious about market developments, would you like to hear more about projects within the industry and are you able to translate this information into opportunities for brunel, then a role as a sales consultant is perfect for you',
  {'entities': [(210, 226, 'SKILL')]}),
 (' together with your team you operate the fields: maintenance & asset management, industrial automation, supply chain & logistics and innovation & development',
  {'entities': [(63, 79, 'SKILL')]}),
 (' so you have at least: a college degree from a commercial or technical related study at least 1 year of experience in sales and preferably in job placement a lot of ambition to grow in your profession a representative attitude strongly developed communication skills and a lot of persuasiveness and a valid driver license b',
  {'entities': [(236, 259, 'SKILL')]}),
 (' so you have at least: a college degree from a commercial or technical related study at least 1 year of experience in sales and preferab

In [None]:
len(data)

7183

In [10]:
import math
x = math.floor(len(data) * .05)
y = math.floor(len(data) * .95)

train_data = data[0:x]
test_data = data[y:len(data)]

In [8]:
# Finally we can train the model
train_ner(train_data)

Iteration 0
Time: 17.99898362159729
Iteration 1
Time: 16.74249577522278
Iteration 2
Time: 8.21052360534668
Iteration 3
Time: 8.1687753200531
Iteration 4
Time: 6.083845615386963
Iteration 5
Time: 5.700453758239746
Iteration 6
Time: 5.141088962554932
Iteration 7
Time: 4.528369426727295
Iteration 8
Time: 4.4867024421691895
Iteration 9
Time: 4.4661595821380615
Iteration 10
Time: 4.50952410697937
Iteration 11
Time: 4.595851421356201
Iteration 12
Time: 4.6535608768463135
Iteration 13
Time: 4.679269075393677
Iteration 14
Time: 4.6867969036102295
Iteration 15
Time: 4.759816646575928
Iteration 16
Time: 4.674414157867432
Iteration 17
Time: 4.784733295440674
Iteration 18
Time: 5.500452041625977
Iteration 19
Time: 5.186835289001465
Iteration 20
Time: 5.072646856307983
Iteration 21
Time: 5.093194484710693
Iteration 22
Time: 5.142500877380371
Iteration 23
Time: 5.092230796813965
Iteration 24
Time: 5.181259870529175
Iteration 25
Time: 5.189377307891846
Iteration 26
Time: 5.223664045333862
Iteration 2

In [11]:
# Test if the NER model works
for line in test_data:
    doc = nlp(line[0])

    print(line[0])
    for ent in doc.ents:
        print(' > ', ent)

 you will work closely with the other healthcare technology project leader, information manager, the healthcare managers and team managers, the colleagues of espria
 >  healthcare technology
   initiating and leading projects with the aim of implementing one or more healthcare technology applications
 >  healthcare technology
 with our flexible, modular prefab construction method, we are uniquely able to translate wishes into practical and innovative solutions that we can realize
 >  innovative solutions
 you work closely with the project manager and you are a project team together with the production manager
 >  project manager
 commercial attitude and communication and social skills
 >  commercial attitude
 with our flexible, modular prefab construction method, we are uniquely able to translate wishes into practical and innovative solutions that we can realize
 >  innovative solutions
 job requirements mbo + / hbo working & thinking level excellent command of the dutch language in wo

In [12]:
# Test NER on same vacancy as before
vacancy_text = 'You know better than anyone how to bind other people to you, people for whom you can mean something. A great new assignment for jobseekers and the right match for their sourcing issue for clients. If you are also curious about market developments, would you like to hear more about projects within the industry and are you able to translate this information into opportunities for Brunel, then a role as a Sales Consultant is perfect for you! About this position As a Sales Consultant you always have something to do. Your main goal is to make the best match between clients and candidates, and that involves a lot. Your work does not stop at finding and connecting both parties. You are also responsible for expanding and maintaining your own network of candidates and clients. That means that you are in constant contact with both parties. Keeping an overview and keeping different balls in the air is no problem for you. Your focus area will be on specialists and organizations within the Northern Netherlands Industry. Together with your team you operate the fields: Maintenance & Asset Management, Industrial Automation, Supply Chain & Logistics and Innovation & Development. Your colleagues are all commercial, enterprising and ambitious. Together we aim for the best result and recognisability in the market.  About you In short, you have a mega palette of tasks and responsibilities. It is therefore important that you can keep an overview. And we ask more of you. So you have at least: A college degree from a commercial or technical-related study At least 1 year of experience in sales and preferably in job placement A lot of ambition to grow in your profession A representative attitude Strongly developed communication skills and a lot of persuasiveness And a valid driver\'s license B.  What we offer Give a little, take a little. You bring your expertise with you, and we provide a salary that matches it. You will also receive a laptop, smartphone and company car from us. And you have the chance to win interesting bonuses! We also arrange that you get a discount on the gym, cultural trips, insurance and your pension premium. And we don\'t take it overnight either with regard to your professional growth. You follow a tailor-made training program from the outset that is provided by renowned institutes. About us Don\'t be surprised if you hear a colleague in the office talking about how our location in Singapore handles certain matters, or if your supervisor makes a call in fluent German. We work from 44 countries around the world and we are proud of that! We would never have grown this big if we didn\'t just go for the best match between professional and company every day. And you are an indispensable link in this. So do you step into Brunel\'s world? Apply immediately!'

doc = nlp(vacancy_text)
for ent in doc.ents:
    print(ent.text, ent.label_)

Sales Consultant SKILL


In [24]:
# Split vacancy into sentences
# ...

In [23]:
# Test on other vacancy
vacancy_text = """project leader healthcare technology care group meander we are looking for an experienced and creative project leader healthcare technology. are you the project leader with knowledge of the implementation of  healthcare technology solutions? are you enthusiastic about contributing to the development of the digital skills of employees in our organization? then we are looking for you! project leader healthcare technology at zorggroep meander at zorggroep meander, more than 1,500 employees offer care, treatment and rehabilitation in the east groningen region. zorggroep meander is active at various locations. we have four residential care centers, four residential service centers, a rehabilitation center, district nursing and day care / day treatment in east groningen. we are there for the vulnerable elderly in the home situation and, when that is no longer possible, in an environment that is as at home as possible. the best care is provided by enthusiastic employees. together we provide a working environment where everyone enjoys working. so that our people can be proud and energetic. the use of  technology and digitization can make an important contribution to various social developments, such as living longer and safely at home, integrated care and labor market shortages. we use this to guarantee good services to clients in the future and to continue to improve the working conditions of our employees. zorggroep meander sees the importance of the correct application of  technology both in the care centers and at the home of the clients. zorggroep meander therefore wants to further shape its digital strategy. a development agenda for  technology is part of this. our vision is that the success of the use of  technology and digitization stands or falls with the support and use of it by healthcare workers and clients. that is why we are committed to the development of 'digital skills' of healthcare employees and we would like to introduce both clients and healthcare employees to the benefits of healthcare technology in an accessible way. our aim is to raise awareness of the importance of healthcare technology among healthcare workers and clients, while at the same time preparing our organization in this area for the future. you work from the policy and development department. you will work closely with the other healthcare technology project leader, information manager, the healthcare managers and team managers, the colleagues of espria. your location is veendam, but you work throughout our work area. who are you? the project manager profession holds no secrets for you. you are able to achieve the agreed results in the set time and within budget. you are able to supervise processes. you are a real bridge builder, customer oriented, representative and have good social skills. as a project manager you will focus on:   implementing the vision on healthcare technology as part of the organizational strategy.   initiating and leading projects with the aim of implementing one or more healthcare technology applications. examples are in4cure, video calling / image care and night sensors.   contributing to the implementation of the plan “digital skills in healthcare” with the aim of increasing the digital skills of our employees and involving them in the added value of healthcare technology.   ensuring the transfer and embedding of new applications in the existing ict / iv management organization.   in collaboration with the information manager and your colleague project leader care technology, give these projects and this vision a place in the total digital program and digital strategy of zorggroep meander.   participant in regional projects and collaborations focused on healthcare technology. required competencies: communicative, enthusiastic, courageous, result oriented with an eye for details, analytical, proactive, organization sensitive, dares to address people about behavior and creativity. job requirements you have at least an hbo education, supplemented with a qualification in project management. you have several years of work experience as a project manager and have a track record in managing complex projects. you are a good change manager. you have a clear vision of developments in health care technology / digitization in health care and welfare, especially in the vvt \u200b\u200bsector. you have excellent oral and written expression skills and can communicate at all levels in the organization. what do you get from us? if you come to work with us, we will offer you a fixed term employment contract for the duration of one year with a salary in accordance with the vvt \u200b\u200bfwg 60 collective labor agreement. you will also receive a year end bonus and accrue pension with the zorg & welzijn pension fund. we think it is important to pay attention to your health and job satisfaction. that is why you can use our extensive package of benefits, for example by purchasing and bicycle and laptop via gross / net arrangement. we also offer you the opportunity to develop by, for example, following training courses related to your area  of interest. the opportunity to work in an organization on the move, where you can add value with your entrepreneurship and innovations in the field of digitization and technology in healthcare. interested? do you recognize yourself in our ideal candidate? then we would be happy to talk to you! we ask you to apply via the button. want to know more first? please contact bart meems  on telephone number  137 12 457. obtaining references can be part of the procedure. the submission of a recent vog is a condition for an appointment. the closing date of the vacancy is december 31, 2020. the first round of interviews is likely to take place on january 13. in connection with measures concerning the coronavirus, any job interviews will probably take place via teams . the vacancy is presented both internally and externally, with internal candidates having priority if they are equally suitable. contract duration 12 months part time hours: 24 per week type of employment: part time, fixed term work schedule: mon fri education: hbo  """

doc = nlp(vacancy_text)
for ent in doc.ents:
    print(ent.text, ent.label_)

digital skills SKILL
customer oriented SKILL


In [None]:
# Save the model


# Output directory
from pathlib import Path
output_dir = Path('/content/')

# Saving the model to the output directory
if not output_dir.exists():
     output_dir.mkdir()
nlp.meta['name'] = 'ner_skills'  # rename model
nlp.to_disk(output_dir)
print("Saved model to", output_dir)

Saved model to /content
Loading from /content


In [None]:
# Download saved files
!zip -r /content/model.zip /content/

from google.colab import files
files.download('/content/model.zip')

  adding: content/ (stored 0%)
  adding: content/.config/ (stored 0%)
  adding: content/.config/gce (stored 0%)
  adding: content/.config/logs/ (stored 0%)
  adding: content/.config/logs/2021.01.20/ (stored 0%)
  adding: content/.config/logs/2021.01.20/17.27.27.315162.log (deflated 54%)
  adding: content/.config/logs/2021.01.20/17.27.07.888058.log (deflated 54%)
  adding: content/.config/logs/2021.01.20/17.27.43.241792.log (deflated 53%)
  adding: content/.config/logs/2021.01.20/17.26.49.689206.log (deflated 91%)
  adding: content/.config/logs/2021.01.20/17.27.22.039013.log (deflated 87%)
  adding: content/.config/logs/2021.01.20/17.27.42.676144.log (deflated 55%)
  adding: content/.config/.last_survey_prompt.yaml (stored 0%)
  adding: content/.config/.last_update_check.json (deflated 22%)
  adding: content/.config/.last_opt_in_prompt.yaml (stored 0%)
  adding: content/.config/active_config (stored 0%)
  adding: content/.config/configurations/ (stored 0%)
  adding: content/.config/conf

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# Loading the model
!unzip -q /content/model.zip

print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
assert nlp2.get_pipe("ner").move_names == move_names