# TaskMatch Demo

A short demonstration of TaskMatch - efficiently and accurately extracting task information from job ad texts.

# Simple Startup

In [2]:
from pprint import pprint
import nltk
from tqdm.auto import tqdm
import time

import sys
sys.path.insert(0, "/path/to/TaskMatch/") # TODO: edit!

from TaskMatch import TaskMatch

In [3]:
# load TaskMatch - performs setup steps
TM = TaskMatch()

INIT
Preparing embeddings...


Batches:   0%|          | 0/295 [00:00<?, ?it/s]

Setting up pipeline...
Finished.


# Extracting Task IDs from Job Ad Texts

Let's take a sample job ab:

In [4]:
job = """
Junior Full Stack Software Developer

Description
Develops software solutions by studying information needs; conferring with users; studying systems flow, data usage, and work processes; investigating problem areas; following the software development lifecycle.
Determines operational feasibility by evaluating analysis, problem definition, requirements, solution development, and proposed solutions.
Documents and demonstrates solutions by developing documentation, flowcharts, layouts, diagrams, charts, code comments and clear code.
Supports and develops software developers by providing advice, coaching and educational opportunities.
Other duties as required.

About Us
We are a small team of dedicated professionals that work to support the business objectives of our company as well as developing innovative software solutions for other companies in our industry. If you join our team, you will have the opportunity to work with a wide variety of technologies in a fast-paced development environment that caters to innovation and efficiency as opposed to rigid processes and ingrained mentalities. If you like to code, can follow other people’s code, can work in a team, and for a team, we say come and talk to us (mention ‘verko’ in your cover letter). We believe our company is a nice place to work and grow your skills, where working smart is appreciated as much as working hard.
Requirements:

Experience Requirements
Bachelor’s degree in computer science, MIS, other related field or relevant experience.
Experience with the Microsoft .NET technology stack (C#, MVC, Web API, Web Forms, etc.)
Experience with JavaScript frameworks (ReactJS, Node.js preferred).
Experience with relational databases (MS SQL preferred)
Experience with code versioning tools, such as Git.
Experience with modern software design patterns, debugging and refactoring.
Familiarity with continuous integration and automated build products like Team City and Azure DevOps
Geographical Requirements
Applicants from Glastonbury/Hartford CT and the vicinity will be favored.
Applicants from outside of New England states will not be considered.
"""
print(job)


Junior Full Stack Software Developer

Description
Develops software solutions by studying information needs; conferring with users; studying systems flow, data usage, and work processes; investigating problem areas; following the software development lifecycle.
Determines operational feasibility by evaluating analysis, problem definition, requirements, solution development, and proposed solutions.
Documents and demonstrates solutions by developing documentation, flowcharts, layouts, diagrams, charts, code comments and clear code.
Supports and develops software developers by providing advice, coaching and educational opportunities.
Other duties as required.

About Us
We are a small team of dedicated professionals that work to support the business objectives of our company as well as developing innovative software solutions for other companies in our industry. If you join our team, you will have the opportunity to work with a wide variety of technologies in a fast-paced development envi

To extract task IDs from the job, simply call:

In [5]:
tasks = TM.get_tasks(job)
pprint(tasks, width=120)

[('16363', 'Identify operational requirements for new systems to inform selection of technological solutions.'),
 ('16987', 'Prepare documentation or presentations, including charts, photos, or graphs.'),
 ('9583', 'Assign duties to other staff and give instructions regarding work methods and routines.')]


## Great! But let's look under the hood

Before matching to task IDs, TaskMatch first identifies candidate sentences. To do a this, a classifier model identifies which segments of the job ad text are potentially task statements. Going back to the example:

In [6]:
candidates = ["({}) ".format(i+1)+x.strip() for i, x in enumerate(TM.get_candidates(job))]
pprint(candidates, width=150)

['(1) Junior Full Stack Software Developer\n'
 '\n'
 'Description\n'
 'Develops software solutions by studying information needs; conferring with users; studying systems flow, data usage, and work processes; '
 'investigating problem areas; following the software development lifecycle.',
 '(2) Determines operational feasibility by evaluating analysis, problem definition, requirements, solution development, and proposed solutions.',
 '(3) Documents and demonstrates solutions by developing documentation, flowcharts, layouts, diagrams, charts, code comments and clear code.',
 '(4) Supports and develops software developers by providing advice, coaching and educational opportunities.',
 '(5) Other duties as required.',
 '(6) Experience with modern software design patterns, debugging and refactoring.']


In [7]:
len(nltk.sent_tokenize(job))

16

So we see that six candidates are identified out of the 16 "sentences" in the job ad. From these we can narrow down to three matched tasks.

## Batch Processing

What if we want to process many job ads at once? Use our batch processing function.

In [8]:
# read in random job ads
with open("/path/to/data_sample", 'r') as f:
    tasks = [x.strip() for x in f.readlines()[:100]]

Let's look at a couple samples.

In [9]:
tasks[1]

"Company: US0064 Sysco St. Louis, LLC Zip Code: 63301 Minimum Level of Education: High School or Equivalent Minimum Years of Experience: 0-1 Years Employment Type: Full Time Travel Percentage: 0 JOB SUMMARYThis is a warehouse position responsible for operating an electric pallet jack, or forklift, to select the correct products, labeling product using Sysco Order Selection (SOS) label technology, palletizing product to build customer orders and delivering the product to the dock safely and efficiently. This position requires working 6:00 p.m. until end-of-shift with all products accurately selected and loaded. Overtime hours and working weekends and holidays are required to successfully fill customers' orders. The job requires working in areas with temperature and humidity variations based on local weather conditions, and on selecting the environment (dry, cooler, freezer).RESPONSIBILITIES + Hand select orders within various warehouse environments of fluctuating temperatures, including

In [10]:
tasks[99]

'To be fully engaged in providing No Harm / Quality, Customer Experience, and Stewardship by working as a team member and maintaining a positive attitude. Support the Core Laboratory Supervisor and Technologists (no patient testing or running of Quality Control). Manage Core Laboratory supply inventory. PRIMARY ACCOUNTABILITIES - Complies with policies/processes/procedures to provide quality services with < 2 errors per year - Manages daily workload in a productive manner. - Demonstrates fiscal responsibility as measured by volume adjusted budget as measured by a variance of no more than 5% over volume adjusted budget. QUALIFICATIONS · High school diploma or equivalent. · Must be willing and able to work with bio-hazardous/toxic materials following OSHA guidelines. · Excellent human relations skills. · Excellent organizational skills. · Excellent communication skills. · Strong computer skills. · Demonstrate high school math skills. · Must be able to sit for extended periods (2-3 hours)

Let's first see how long it would take to process these texts sequentially.

In [None]:
start = time.time()
for t in tqdm(tasks):
    res = TM.get_tasks(t)
end = time.time() - start
print(end)

And now using our batch processing...

In [12]:
start = time.time()
res = TM.get_tasks_batch(tasks)
end = time.time() - start
print(end)

1.4743261337280273


Much faster!

Finally, the output tasks matches for our two examples:

In [13]:
pprint(res[1], width=150)

[('12019', 'Transport metal ingots to storage areas, using forklifts.'),
 ('11083', 'Supply, operate, or maintain personal protective equipment.'),
 ('13176', 'Perform equipment safety checks prior to departure.'),
 ('9667', 'Attend company meetings to exchange product information and coordinate work activities with other departments.'),
 ('9583', 'Assign duties to other staff and give instructions regarding work methods and routines.'),
 ('21881', 'Write and revise safety regulations and codes.')]


In [14]:
pprint(res[99], width=150)

[('20982', 'Place orders for laboratory equipment and supplies.')]


# That's all!