## Task 1

### Split sentences in the document
First, import spacy module and load the model.
For split sentences, create the custom pipe and add it to nlp.

In [1]:
import spacy
import pickle
activated = spacy.prefer_gpu()
nlp = spacy.load('en_core_web_lg')

In [132]:
# Custom pipe for seperate sentences from the document
def set_custom_boundaries(doc):
    for token in doc[:-1]:
        if token.text in ['\n', '\n\n']:
            doc[token.i+1].is_sent_start = True
    return doc
nlp.add_pipe(set_custom_boundaries, before="parser")

In [133]:
resume = nlp(open("Sample/Roles.txt", mode = 'r').read())
for i, sent in enumerate(resume.sents):
    print(i, sent.text)

0 Roles

1 Role 1

2 Understand customer needs and develop high-quality presentations, proposals and software demonstrations that speak to these needs – building a consensus for change on a multi-stakeholder basis

3 Work with the Enterprise sales team to build, manage and maintain customer relationships

4 Provide pre-sales technical assistance to members of the sales team to ensure proper technical and business fit

5 Work with marketing on new product development

6 Present professional and personalised demos

7 Need to ensure prospective customers are aligned technically with the sales process

8 Need to understand and demonstrate the solution and how to integrate to other business systems

9 Role 2

10 Must Have:


11 Experience shaping the BI strategy from C-Level to Technical developers.

12 Extensive delivery of platform within a Business Intelligence and Analytics function.

13 Communication with stakeholders on all levels.

14 Understanding and experience within KPIs, identif

### Split phrases from the sentence

Splite phrases from the sentence base on the spacy dependency tree. The algorithm is below:

1. Find the "VERB" (with part-of-speech)
2. Find the verb phrase by the syntactic dependency relation.
3. Separate the conjunctions.
4. Normalize the sentence.

An example and the dependency tree is shown below. Read the file, split phrases from the document and save as the pickle.

In [2]:
# Split VP phrases from the sentence
def split_VP_phrases(doc):
    print("Base ===== ", doc.text)
    res = []
    root = doc[:].root
    
    def check_conj(rt):
        for token in rt.rights:
            if token.dep_ not in ['cc', 'conj', 'punct']:
                return False
            if token.dep_ == 'conj':
                break
        return True
    
    def split(rt):   
        if rt.pos_ == 'PUNCT' or rt.pos_ == 'CCONJ' or rt.pos_ == 'SYM':
            return []
        ret = ['']
        if not (rt.pos_ == 'VERB' and rt.dep_ not in ['xcomp', 'ccomp']):
            for token in rt.lefts:
                left_ret = split(token)
                if len(left_ret) == 0:
                    continue
                ret_ = [x + y + " " for x in ret for y in left_ret]
                ret = ret_
        
        ret_ = [x + rt.text for x in ret]
        ret = []
     
        flag = False
        for token in rt.rights:
            right_ret = split(token)
           
            if len(right_ret) == 0:
                continue
            
            if token.dep_ == 'conj' and rt.pos_ == token.pos_ and check_conj(rt):
               
                if not flag:
                    for sent in right_ret:
                      
                        right = doc[token.left_edge.i : token.i+1].text
                        if right in sent:
                            ret.extend([sent.replace(right, x) for x in ret_])
                    flag = True
                if token.pos_ != 'VERB':
                    ret.extend(right_ret)
                ret_ = ret
               
            else:
                ret = [x + " " + y for x in ret_ for y in right_ret]
                ret_ = ret
        ret = ret_
      
        if rt.pos_ == 'VERB' and rt.dep_ not in ['xcomp', 'ccomp']:
            res.extend(ret)
            if rt.dep_ == 'conj' and rt.head.pos_ == 'VERB' and check_conj(rt.head):
                return ret
            else:
                return []
        else:
            if rt.i == root.i:
                res.extend(ret)
            return ret
        
    split(root)
    
    # Normalize the sentence
    res_ = []
    for sent in res:
        sent = nlp(sent)
        if len(sent) < 2:
            continue    
        res_.append(sent.text.replace(sent[0].text, sent[0].lemma_).strip())
    print('\n'.join(res_))
    return res_
      
ex = nlp("Understand customer needs and develop high-quality presentations, proposals and software demonstrations that speak to these needs - building a consensus for change on a multi-stakeholder basis")
from spacy import displacy
displacy.render(ex, style = "dep", jupyter = True)
ret = split_VP_phrases(ex)

Base =====  Understand customer needs and develop high-quality presentations, proposals and software demonstrations that speak to these needs - building a consensus for change on a multi-stakeholder basis
speak to a consensus for change on a multi stakeholder basis
develop high quality presentations
develop proposals
develop software demonstrations
understand customer needs


In [244]:
phrases = []
role = nlp(open("Sample/Roles.txt", mode = 'r').read())
for sent in resume.sents:
    phrases.extend(split_VP_phrases(nlp(sent.text.strip())))

# Save the phrases
with open('role_phrases.pickle', 'wb') as f:
    pickle.dump(phrases, f)

Base =====  Roles

Base =====  Role 1
role 1
Base =====  Understand customer needs and develop high-quality presentations, proposals and software demonstrations that speak to these needs – building a consensus for change on a multi-stakeholder basis
speak to these needs –
develop high quality presentations
develop proposals
develop software demonstrations
build a consensus for change on a multi stakeholder basis
understand customer needs
Base =====  Work with the Enterprise sales team to build, manage and maintain customer relationships
maintain customer relationships
manage customer relationships
build customer relationships
work with the Enterprise sales team
Base =====  Provide pre-sales technical assistance to members of the sales team to ensure proper technical and business fit
ensure proper technical business fit
provide sales technical assistance to members of the sales team
Base =====  Work with marketing on new product development
work with marketing on new product development

experience in two more of the transformation practice areas implementation use metrics
Base =====  Broad understanding of how businesses operate and compete and an understanding of the relationship between major functional areas of the product-centric enterprise (design, plan, procure, manufacture, distribute, service).
distribute service
manufacture service
procure service
broad understanding of an understanding of the relationship between major functional areas of the product centric enterprise design
broad understanding of an understanding of the relationship between major functional areas of the product centric enterprise plan
Base =====  Ability to analyse financial and operational data and synthesize findings in common business language.
synthesize findings in common business language
analyse financial data
analyse operational data
Base =====  Experience in using Design Thinking Principles to structure a customer’s thinking and flush out innovative concepts.
flush out innovative 

In [246]:
phrases = []
resume = nlp(open("Sample/Resume.txt", mode = 'r').read())
for sent in resume.sents:
    phrases.extend(split_VP_phrases(nlp(sent.text.strip())))

with open('resume_phrases.pickle', 'wb') as f:
    pickle.dump(phrases, f)

Base =====  Resume

Base =====  Consulting customer lead, Company 1 – San Jose (CR), Toronto (CA) January 2018 - May 2018
consulting customer lead Company 1 – San Jose CR Toronto CA January 2018 May 2018
Base =====  Identified processes automation portfolio totalling 50 FTE for 2019 and long-term potential of 200+ FTE
total 50 FTE for 2019 long term potential of 200 FTE
identify processes automation portfolio
Base =====  Oversaw agile delivery of four opportunities, working with offshore technical teams and business
work with offshore technical teams
work with business
agile delivery of four opportunities
Base =====  Architected reusable framework to integrate into SAP saving significant cost and reducing timelines
reduce timelines
save significant cost
integrate into
reusable framework
Base =====  Developed demand generation and assessment approach for Smart Automation CoE
demand generation approach for Smart Automation CoE
assessment approach for Smart Automation CoE
Base =====  Prog