# Some useful NLP Methods with Spacy
Author : Nasser-eddine MONIR <br>
Article : from realpython.com

**Libraries**

In [20]:
import spacy 
import warnings
warnings.filterwarnings("ignore")

In [21]:
nlp = spacy.load('en_core_web_sm')

**Lemmatization** : reducing the form of a word while ensuring that the short form of the word still exits 

In [22]:
text = 'Juliana is helping me on Natural Language Processing methods'
doc = nlp(text)
for token in doc:
    print(token, "-->", token.lemma_, token.tag_)

Juliana --> Juliana NNP
is --> be VBZ
helping --> help VBG
me --> -PRON- PRP
on --> on IN
Natural --> Natural NNP
Language --> Language NNP
Processing --> Processing NNP
methods --> method NNS


**POS-taging** : is the process of assigning a POS tag to each token depending on its usage in the sentence <br>
(Noun, Pronoun, Adjective, Verb, Adverb, Preposition, Conjunction, Interjection)

In [23]:
text = 'Juliana is helping me on Natural Language Processing methods'
doc = nlp(text)
for token in doc:
    print("token = ", token, " | tag = ", token.tag_)#, " | pos = ", token.pos_, " | explain = ", spacy.explain(token.tag_))

token =  Juliana  | tag =  NNP
token =  is  | tag =  VBZ
token =  helping  | tag =  VBG
token =  me  | tag =  PRP
token =  on  | tag =  IN
token =  Natural  | tag =  NNP
token =  Language  | tag =  NNP
token =  Processing  | tag =  NNP
token =  methods  | tag =  NNS


**tag_** lists the fine-grained part of speech <br>
**pos_** lists the coarse-grained part of speech

**Visualization of dependency parsing (or named entities)**

In [24]:
from spacy import displacy
text = 'Juliana Ferran is helping me on Natural Language Processing methods'
doc = nlp(text)
#displacy.serve(doc, style='dep') #--> in a server 
displacy.render(doc, style='dep', jupyter=True)

**Rule-Based Matching** : consists in matching/combining tokens manually either from a patterns - regular expressions - or a  grammatical features (ex. POS)

In [25]:
# Exemple : Extraction d'un nom 
from spacy.matcher import Matcher
text = 'Juliana Ferran is helping me on Natural Language Processing methods'
doc = nlp(text)
matcher = Matcher(nlp.vocab)


def extract_full_name(doc):
    pattern = [{'POS': 'PROPN'}, {'POS': 'PROPN'}]
    matcher.add('FULL_NAME', None, pattern)
    matches = matcher(doc)
    for match_id, start, end in matches:
        span = doc[start:end]
        return span.text

extract_full_name(doc)

'Juliana Ferran'

### Parsing 

**Dependency Parsing** is the process of extracting the dependency parse of a sentence to represent its grammatical structure. It defines the dependency relationship between headwords and their dependents. The head of a sentence has no dependency and is called the root of the sentence. The verb is usually the head of the sentence.

In [26]:
text = 'Juliana Ferran is helping me on Natural Language Processing methods'
doc = nlp(text)
for token in doc:
    print (token.text, token.tag_, token.head.text, token.dep_)

Juliana NNP Ferran compound
Ferran NNP helping nsubj
is VBZ helping aux
helping VBG helping ROOT
me PRP helping dobj
on IN helping prep
Natural NNP Language compound
Language NNP Processing compound
Processing NNP methods compound
methods NNS on pobj


**Shallow Parsing or Chunking** : is grouping adjacent tokens into phrases on the basis of their POS tags

In [27]:
# noun phrase detection 
text = 'Juliana Ferran is helping me on Natural Language Processing methods'
doc = nlp(text)
for chunk in doc.noun_chunks:
    print(chunk)

Juliana Ferran
me
Natural Language Processing methods


In [28]:
# verb phrase detection 
import textacy # ⚠️ needs to install textacy (pip3 install textacy)

text = 'Juliana Ferran will help me on Natural Language Processing methods'
pattern = r'(<VERB>?<ADV>*<VERB>+)'
doc = textacy.make_spacy_doc(text,lang='en_core_web_sm')
verb_phrases = textacy.extract.pos_regex_matches(doc, pattern)

for chunk in verb_phrases:
    print(chunk)

will help


  action="once",


**Named Entity Recognition (NER)** is the process of locating named entities in unstructured text and then classifying them into pre-defined categories, such as person names, organizations, locations, monetary values, percentages, time expressions, etc. 

In [31]:
text = 'Juliana Ferran is helping me on Natural Language Processing methods inside IDMC'
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.label_, spacy.explain(ent.label_))

displacy.render(doc, style='ent', jupyter=True) 
# I noticed that this technics usually miss classify terms (Find a better way to use it)

Juliana Ferran PERSON People, including fictional
IDMC NORP Nationalities or religious or political groups
