# Matching
The Matcher find words and phrases using rules describing their token attributes. Rules can refer to token annotations (like the text or part-of-speech tags), as well as lexical attributes like **Token.is_punct**. Applying the matcher to a Doc gives you access to the matched tokens in context. [For more](https://spacy.io/api/matcher/). To make patterns click [here](https://demos.explosion.ai/matcher).



In [None]:
# Creating Pattern
pattern_1 = [{'LOWER': 'helllo'},{'LOWER': 'world'}]
pattern_2 = [{'LOWER': 'hello'},{'IS_PUNCT': True},{'LOWER': 'world'}]

# **Rule-Based Matching**
Compared to using regular expressions on raw text. It also give access to the tokens within the document and their relationships. This means we can easily access and analyze the surrounding tokens, merge spans into single tokens or add entries to the named entities in **doc.ents**. [For more](https://spacy.io/usage/rule-based-matching). 




In [None]:
import spacy
nlp=spacy.load('en_core_web_sm')

In [None]:
from spacy.matcher import Matcher
matcher=Matcher(nlp.vocab)

In [None]:
# Add patterns to matcher object
# Add a match rule to matcher, A match rule consists of,
# 1) An ID key
# 2) pattern
matcher.add('Hello World',[pattern_2])

In [None]:
#create a document
doc = nlp(" Hello World nor ld' are the first two printed words for most of the programers, printing 'Hello—World' most for beginners")

### Finding the matches

In [None]:
find_matches=matcher(doc)
print(find_matches)

[(8585552006568828647, 20, 23)]


In [None]:
for match_id,start,end in find_matches:
  string_id =nlp.vocab.strings[match_id]
  span=doc[start:end]
  print(match_id,string_id,start,end,span.text)

8585552006568828647 Hello World 20 23 Hello—World


# **Phase Matching**
 If we need to match large terminology lists, we can also use the PhraseMatcher and create Doc objects instead of token patterns, which is much more efficient overall. The Doc patterns can contain single or multiple tokens.

In [None]:
import spacy
nlp=spacy.load('en_core_web_sm')

In [None]:
from spacy.matcher import PhraseMatcher
matcher= PhraseMatcher(nlp.vocab)

In [None]:
phrase_list=["Barack Obama", "Angela Merkel", "Washington, D.C."]

In [None]:
phrase_patterns=[nlp(text) for text in phrase_list]

In [None]:
phrase_patterns

[Barack Obama, Angela Merkel, Washington, D.C.]

In [None]:
type(phrase_patterns[0])

spacy.tokens.doc.Doc

In [None]:
matcher.add("TerminologyList",None,*phrase_patterns)

In [None]:
doc_3 = nlp(" German Chancellor Angela Merkel and US President Barack Obama "
      " converse in the Oval Office inside the White House in Washington, D.C. ")

In [None]:
find_matches = matcher(doc_3)
print(find_matches)

[(3766102292120407359, 3, 5), (3766102292120407359, 8, 10), (3766102292120407359, 21, 24)]


In [None]:
for match_id,start,end in find_matches:
  string_id =nlp.vocab.strings[match_id]
  span=doc[start:end]
  print(match_id,string_id,start,end,span.text)

3766102292120407359 TerminologyList 3 5 nor ld
3766102292120407359 TerminologyList 8 10 first two
3766102292120407359 TerminologyList 21 24 —World'
