<a href="https://colab.research.google.com/github/Philippe-AD/Jupyter/blob/master/spacy_sept.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Rule-Based Matching with spaCy

In [2]:
import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")

matcher = Matcher(nlp.vocab)
pattern = [{"TEXT": {"REGEX": "(?i)[0-9]+(?:[,.][0-9]+)?[ckwh]+"}}, 
           {'ORTH': '/'}, {"LOWER": {"REGEX": "(?i)^[ckwh]+$"}}]
matcher.add("Usage", None, pattern)

doc = nlp("Peak Usage 409 24.51c/kWh $100.25")

matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]
    span = doc[start:end]
    print(match_id, string_id, start, end, span.text)

17478281289916104085 Usage 3 6 24.51c/kWh


In [5]:
#Matcher.py
#import necessary modules and tools 
import spacy
from spacy.matcher import Matcher #import Matcher class from spacy
#import the Span class to extract the words from the document object
from spacy.tokens import Span 
#Language class with the English model 'en_core_web_sm' is loaded
nlp = spacy.load("en_core_web_sm")
#instantiate a new Matcher class object 
matcher = Matcher(nlp.vocab)
#define the pattern
pattern = [{'LOWER': 'computer', 'POS': 'NOUN'},
             {'POS':{'NOT_IN': ['VERB']}}]
#add the pattern to the previously created matcher object
matcher.add("Matching", None, pattern)
#The input text string is converted to a Document object
doc = nlp("""Computer programming is the process of writing instructions that get executed by computers. The instructions, also known as code, 
             are written in a programming language which the computer can understand and use to perform a task or solve a problem. 
             Basic computer programming involves the analysis of a problem and development of a logical sequence of instructions to solve it. 
             There can be numerous paths to a solution and the computer programmer seeks to design and code that which is most efficient. 
             Among the programmer’s tasks are understanding requirements, determining the right programming language to use, designing or architecting the solution, 
             coding, testing, debugging and writing documentation so that the solution can be easily understood by other programmers.Computer programming is at 
             the heart of computer science. It is the implementation portion of software development, application development and software engineering efforts, 
             transforming ideas and theories into actual, working solutions.""")
#call the matcher object the document object and it will return #match_id, start and stop indexes of the matched words
matches = matcher(doc)
#print the matched results and extract out the results
for match_id, start, end in matches:
    # Get the string representation 
    string_id = nlp.vocab.strings[match_id]  
    span = doc[start:end]  # The matched span
    print(match_id, string_id, start, end, span.text)

6895354335150655416 Matching 0 2 Computer programming
6895354335150655416 Matching 47 49 computer programming
6895354335150655416 Matching 78 80 computer programmer
6895354335150655416 Matching 136 138 Computer programming
6895354335150655416 Matching 144 146 computer science


In [7]:
# PhraseMatcher.py
# import necessary modules
import spacy
from spacy.matcher import PhraseMatcher #import PhraseMatcher class
# Language class with the English model 'en_core_web_sm' is loaded
nlp = spacy.load('en_core_web_sm')
# create the PhraseMatcher object
matcher = PhraseMatcher(nlp.vocab, attr='LOWER')
# the list containing the pharses to be matched
terminology_list = ["Machine Learning", "Hidden Structure",              
                           "Unlabeled Data"]
# convert the phrases into document object using nlp.make_doc to #speed up.
patterns = [nlp.make_doc(text) for text in terminology_list]
# add the patterns to the matcher object without any callbacks
matcher.add("Phrase Matching", None, *patterns)
# the input text string is converted to a Document object
doc = nlp("""Supervised machine learning algorithms can apply what has been learned in the past to new data using labeled examples to predict future events. 
Starting from the analysis of a known training dataset, the learning algorithm produces an inferred function to make predictions about the output values. 
The system is able to provide targets for any new input after sufficient training. The learning algorithm can also compare its output with the correct, 
intended output and find errors in order to modify the model accordingly. In contrast, unsupervised machine learning algorithms are used when the information 
used to train is neither classified nor labeled. Unsupervised learning studies how systems can infer a function to describe a hidden structure from unlabeled data. 
The system doesn’t figure out the right output, but it explores the data and can draw inferences from datasets to describe hidden structures from unlabeled data. 
Semi-supervised machine learning algorithms fall somewhere in between supervised and unsupervised learning, since they use both labeled and unlabeled data 
for training – typically a small amount of labeled data and a large amount of unlabeled data. The systems that use this method are able to considerably improve 
learning accuracy. Usually, semi-supervised learning is chosen when the acquired labeled data requires skilled and relevant resources in order to 
train it / learn from it. Otherwise, acquiring unlabeled data generally doesn’t require additional resources.""")
#call the matcher object the document object and it will return #match_id, start and stop indexes of the matched words
matches = matcher(doc)
#print the matched results and extract out the results
for match_id, start, end in matches:
    # Get the string representation 
    string_id = nlp.vocab.strings[match_id]  
    span = doc[start:end]  # The matched span
    print(match_id, string_id, start, end, span.text)

11356100181062323261 Phrase Matching 1 3 machine learning
11356100181062323261 Phrase Matching 96 98 machine learning
11356100181062323261 Phrase Matching 126 128 hidden structure
11356100181062323261 Phrase Matching 129 131 unlabeled data
11356100181062323261 Phrase Matching 159 161 unlabeled data
11356100181062323261 Phrase Matching 166 168 machine learning
11356100181062323261 Phrase Matching 184 186 unlabeled data
11356100181062323261 Phrase Matching 202 204 unlabeled data
11356100181062323261 Phrase Matching 252 254 unlabeled data


In [8]:
# EntityRuler.py
# import necessary packages and tools 
import spacy
from spacy.pipeline import EntityRuler # import EntityRuler 
# load a blank English model from spacy
nlp = spacy.blank('en')
# convert the input sentence into the document object 
doc = nlp("The First Estate included the clergy (church leaders), the Second Estate included the nobles, and the Third Estate included the commoners. The Third Estate paid most of the taxes, while the nobility lived lives of luxury and got all the high-ranking jobs.")
# print the entity types of each entity in the above sentence
print([(ent.text, ent.label_) for ent in doc.ents])

# instantiate an object of EntityRuler class
ruler = EntityRuler(nlp)
# define the pattern
patterns = [{"label": "NOUN", "pattern": "church"}, 
             {"label": "ORG",              
             "pattern": [{"lower": "the"}, 
             {"lower": {"IN": ["first", "second", "third"]}},                          
             {"ORTH": "Estate"}]}]
# add the pattern to the matcher object
ruler.add_patterns(patterns)
# add the matcher object as a new pipe to the model
nlp.add_pipe(ruler)
# convert the input sentence into the document object using the newly added 'nlp'
doc = nlp("The First Estate included the clergy (church leaders), the Second Estate included the nobles, and the Third Estate included the commoners. The Third Estate paid most of the taxes, while the nobility lived lives of luxury and got all the high-ranking jobs.")
# print the entities in the sentenced after adding the EntityRuler matcher
print([(ent.text, ent.label_) for ent in doc.ents])

[]
[('The First Estate', 'ORG'), ('church', 'NOUN'), ('the Second Estate', 'ORG'), ('the Third Estate', 'ORG'), ('The Third Estate', 'ORG')]


# Spacy Fr


---



In [9]:
from spacy.lang.fr import French

nlp = French()

# Traite le texte
doc = nlp(
    "En 1990, plus de 60 % de la population d'Asie orientale vivait dans une pauvreté extrême. "
    "Actuellement c'est moins de 4 %."
)

# Itère sur les tokens du doc
for token in doc:
    print(token)
    # Vérifie si le token ressemble à un nombre
    if token.like_num:
        # Obtiens le token suivant dans le document
        next_token = doc[token.i + 1]
        # Vérifie si le texte du token suivant est égal à "%"
        if next_token.text == "%":
            print("Pourcentage trouvé :", token.text)

En
1990
,
plus
de
60
Pourcentage trouvé : 60
%
de
la
population
d'
Asie
orientale
vivait
dans
une
pauvreté
extrême
.
Actuellement
c'
est
moins
de
4
Pourcentage trouvé : 4
%
.


In [None]:
!pip install fr_core_news_sm

In [21]:
from spacy.lang.fr import French

nlp = French()

# Traite le texte
doc = nlp(
    "En 1990, plus de 60 % de la population d'Asie orientale vivait dans une pauvreté extrême. "
    "Actuellement c'est moins de 4 %."
)

# Itère sur les tokens du doc
for token in doc:
    print(token)
    # Vérifie si le token ressemble à un nombre
    if token.like_num:
        # Obtiens le token suivant dans le document
        next_token = doc[token.i + 1]
        # Vérifie si le texte du token suivant est égal à "%"
        if next_token.text == "%":
            print("Pourcentage trouvé :", token.text)

print("*"*80)
doc = nlp("Importer un modèle LUIS et ajouter des intentions")
for token in doc:
    print(token.text, token.pos, token.dep)

for token in doc: 
  print(token, token.lemma_)     

En
1990
,
plus
de
60
Pourcentage trouvé : 60
%
de
la
population
d'
Asie
orientale
vivait
dans
une
pauvreté
extrême
.
Actuellement
c'
est
moins
de
4
Pourcentage trouvé : 4
%
.
********************************************************************************
Importer 0 0
un 0 0
modèle 0 0
LUIS 0 0
et 0 0
ajouter 0 0
des 0 0
intentions 0 0
Importer Importer
un un
modèle modèle
LUIS LUIS
et et
ajouter ajouter
des des
intentions intentions


In [26]:
import spacy
from spacy import displacy
from spacy.lang.fr import French

nlp = French()

doc = nlp("Demain je travaille à la maison.")
for token in doc:
    print("{0}\t{1}\t{2}\t{3}\t{4}\t{5}\t{6}\t{7}\t{8}".format(
        token.text,
        token.idx,
        token.lemma_,
        token.is_punct,
        token.is_space,
        token.shape_,
        token.pos_,
        token.tag_,
        token.ent_type_
    ))

for ent in doc.ents:
    print(">", ent.text, ent.label_)


Demain	0	Demain	False	False	Xxxxx			
je	7	je	False	False	xx			
travaille	10	travaille	False	False	xxxx			
à	20	à	False	False	x			
la	22	la	False	False	xx			
maison	25	maison	False	False	xxxx			
.	31	.	True	False	.			


In [None]:
from spacy import displacy

doc = nlp("I live in Guwahati, Assam")
displacy.render(doc, style="dep")

Natural language input

FirstSeed Calendar has a powerful natural-language processing engine that lets you almost "speak" to the app to enter a new event or reminder. So you can type **"Dinner at 7 pm tonight"** or **"Remind me to call John at noon"**. Or better yet, you can use Siri's voice dictation to literally create new events or reminders simply by speaking.

Natural language input is currently available in English, German and Japanese. More languages will be supported in the future versions. 