In [1]:
import spacy
nlp = spacy.load("en_core_web_sm")

In [2]:
sentences = [
  'The food we had yesterday was delicious',
  'My time in Italy was very enjoyable',
  'I found the meal to be tasty',
  'The internet was slow.',
  'Our experience was suboptimal'
]

### Nous allons faire un split sur nos phrase de telle façcon à obtenir l'aspect (ex: food) et son expression (ex: delicious)

Pour chaque jeton à l'intérieur de nos phrases, nous pouvons voir la dépendance grâce à l'analyse des dépendances de spacy et aux POS (Part-Of-Speech)tags
https://spacy.io/usage/linguistic-features

In [3]:
for sentence in sentences:
  doc = nlp(sentence)
  for token in doc:
    print(token.text, token.dep_, token.head.text, token.head.pos_,token.pos_,[child for child in token.children])

The det food NOUN DET []
food nsubj was AUX NOUN [The, had]
we nsubj had VERB PRON []
had relcl food NOUN VERB [we, yesterday]
yesterday npadvmod had VERB NOUN []
was ROOT was AUX AUX [food, delicious]
delicious acomp was AUX ADJ []
My poss time NOUN PRON []
time nsubj was AUX NOUN [My, in]
in prep time NOUN ADP [Italy]
Italy pobj in ADP PROPN []
was ROOT was AUX AUX [time, enjoyable]
very advmod enjoyable ADJ ADV []
enjoyable acomp was AUX ADJ [very]
I nsubj found VERB PRON []
found ROOT found VERB VERB [I, be]
the det meal NOUN DET []
meal nsubj be AUX NOUN [the]
to aux be AUX PART []
be ccomp found VERB AUX [meal, to, tasty]
tasty acomp be AUX ADJ []
The det internet NOUN DET []
internet nsubj was AUX NOUN [The]
was ROOT was AUX AUX [internet, slow, .]
slow acomp was AUX ADJ []
. punct was AUX PUNCT []
Our poss experience NOUN PRON []
experience nsubj was AUX NOUN [Our]
was ROOT was AUX AUX [experience, suboptimal]
suboptimal acomp was AUX ADJ []


ci-dessous un exemple de visualisation de dépendance dans une phrase:

https://spacy.io/usage/visualizers

In [4]:
import spacy
from spacy import displacy


doc = nlp("The food we had yesterday was delicious")
displacy.serve(doc, style="ent")




Using the 'ent' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.


En utilisant les caractéristiques linguistique et notemment les POS, nous allons extraire les adjectives comme expression de sentiment 

In [5]:
for sentence in sentences:
  doc = nlp(sentence)
  descriptive_term = ''
  for token in doc:
    if token.pos_ == 'ADJ':
      descriptive_term = token
  print(sentence)
  print(descriptive_term)

The food we had yesterday was delicious
delicious
My time in Italy was very enjoyable
enjoyable
I found the meal to be tasty
tasty
The internet was slow.
slow
Our experience was suboptimal
suboptimal


Comme vous pouvez le remarquer, ce qui manque ce sont intensificateurs comme "very" (nous allons éviter les adverbes). nous allons les extraires en utilisant la propriété children.  

In [6]:
for sentence in sentences:
  doc = nlp(sentence)
  descriptive_term = ''
  for token in doc:
    if token.pos_ == 'ADJ':
      prepend = ''
      for child in token.children:
        if child.pos_ != 'ADV':
          continue
        prepend += child.text + ' '
      descriptive_term = prepend + token.text
  print(sentence)
  print(descriptive_term)

The food we had yesterday was delicious
delicious
My time in Italy was very enjoyable
very enjoyable
I found the meal to be tasty
tasty
The internet was slow.
slow
Our experience was suboptimal
suboptimal


Nous allons mettre ça dans une liste de dictionnaire

In [7]:
aspects = []
for sentence in sentences:
  doc = nlp(sentence)
  descriptive_term = ''
  target = ''
  for token in doc:
    if token.dep_ == 'nsubj' and token.pos_ == 'NOUN':
      target = token.text
    if token.pos_ == 'ADJ':
      prepend = ''
      for child in token.children:
        if child.pos_ != 'ADV':
          continue
        prepend += child.text + ' '
      descriptive_term = prepend + token.text  
    
  aspects.append({'aspect': target,'description': descriptive_term})
print(aspects)

[{'aspect': 'food', 'description': 'delicious'}, {'aspect': 'time', 'description': 'very enjoyable'}, {'aspect': 'meal', 'description': 'tasty'}, {'aspect': 'internet', 'description': 'slow'}, {'aspect': 'experience', 'description': 'suboptimal'}]


### utilisation de TextBlob pour l'extraction de sentiment 

TextBlob est une bibliothèque qui propose une analyse des sentiments prête à l'emploi. Il a une approche par sac de mots, ce qui signifie qu'il a une liste de mots tels que «bon», «mauvais» et «excellent» qui ont un score de sentiment qui leur est attaché. Il est également capable de sélectionner des modificateurs (tels que «not») et des intensificateurs (tels que «very») qui affectent le score de sentiment. 

In [8]:
from textblob import TextBlob
for aspect in aspects:
  aspect['sentiment'] = TextBlob(aspect['description']).sentiment
print(aspects)

[{'aspect': 'food', 'description': 'delicious', 'sentiment': Sentiment(polarity=1.0, subjectivity=1.0)}, {'aspect': 'time', 'description': 'very enjoyable', 'sentiment': Sentiment(polarity=0.65, subjectivity=0.78)}, {'aspect': 'meal', 'description': 'tasty', 'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)}, {'aspect': 'internet', 'description': 'slow', 'sentiment': Sentiment(polarity=-0.30000000000000004, subjectivity=0.39999999999999997)}, {'aspect': 'experience', 'description': 'suboptimal', 'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)}]


en regardant les résultats nous pouvons remarquer que les adjectifs "tasty" et "suboptimal" sont considéré comme neutres. Il semble qu’ils ne font pas partie du dictionnaire de TextBlob et qu’ils ne sont donc pas repris. 

TextBlob nous permet d'entraîner un NaiveBayesClassifier à l'aide d'une syntaxe très simple et facile à comprendre pour tout le monde, que nous utiliserons pour améliorer notre analyse des sentiments. 

In [9]:
from textblob.classifiers import NaiveBayesClassifier
# We train the NaivesBayesClassifier
train = [
  ('Slow internet.', 'negative'),
  ('Delicious food', 'positive'),
  ('Suboptimal experience', 'negative'),
  ('Very enjoyable time', 'positive'),
  ('delicious food.', 'neg')
]
cl = NaiveBayesClassifier(train)# And then we try to classify some sample sentences.
blob = TextBlob("Delicious food. Very Slow internet. Suboptimal experience. Enjoyable food.", classifier=cl)
for s in blob.sentences:
  print(s)
  print(s.classify())

Delicious food.
positive
Very Slow internet.
negative
Suboptimal experience.
negative
Enjoyable food.
positive


Nous allons maintenant refaire notre classification en utilisant le modèle entrainer 

In [10]:
from textblob import TextBlob
for aspect in aspects:
  blob = TextBlob(aspect['description'], classifier=cl)  
  aspect['sentiment'] = blob.classify()
print(aspects)

[{'aspect': 'food', 'description': 'delicious', 'sentiment': 'neg'}, {'aspect': 'time', 'description': 'very enjoyable', 'sentiment': 'positive'}, {'aspect': 'meal', 'description': 'tasty', 'sentiment': 'negative'}, {'aspect': 'internet', 'description': 'slow', 'sentiment': 'negative'}, {'aspect': 'experience', 'description': 'suboptimal', 'sentiment': 'negative'}]


## TAF

Ajouter une phrase 'life is good','The match was fantastic'

In [19]:
sentences = [
  'The food we had yesterday was delicious',
  'My time in Italy was very enjoyable',
  'I found the meal to be tasty',
  'The internet was slow.',
  'Our experience was suboptimal',
  'life is good',
 'The match was fantastic'
]

Crée un dictionnaire avec les noms et leurs adjectifs

In [20]:
aspects = []
for sentence in sentences:
  doc = nlp(sentence)
  descriptive_term = ''
  target = ''
  for token in doc:
    if token.dep_ == 'nsubj' and token.pos_ == 'NOUN':
      target = token.text
    if token.pos_ == 'ADJ':
      prepend = ''
      for child in token.children:
        if child.pos_ != 'ADV':
          continue
        prepend += child.text + ' '
      descriptive_term = prepend + token.text  
    
  aspects.append({'aspect': target,'description': descriptive_term})
print(aspects)

[{'aspect': 'food', 'description': 'delicious'}, {'aspect': 'time', 'description': 'very enjoyable'}, {'aspect': 'meal', 'description': 'tasty'}, {'aspect': 'internet', 'description': 'slow'}, {'aspect': 'experience', 'description': 'suboptimal'}, {'aspect': 'life', 'description': 'good'}, {'aspect': 'match', 'description': 'fantastic'}]


utiliser TexTBlob pour extraire la polarity de chaque adjectif

In [21]:
from textblob import TextBlob
for aspect in aspects:
  aspect['sentiment'] = TextBlob(aspect['description']).sentiment
print(aspects)

[{'aspect': 'food', 'description': 'delicious', 'sentiment': Sentiment(polarity=1.0, subjectivity=1.0)}, {'aspect': 'time', 'description': 'very enjoyable', 'sentiment': Sentiment(polarity=0.65, subjectivity=0.78)}, {'aspect': 'meal', 'description': 'tasty', 'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)}, {'aspect': 'internet', 'description': 'slow', 'sentiment': Sentiment(polarity=-0.30000000000000004, subjectivity=0.39999999999999997)}, {'aspect': 'experience', 'description': 'suboptimal', 'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)}, {'aspect': 'life', 'description': 'good', 'sentiment': Sentiment(polarity=0.7, subjectivity=0.6000000000000001)}, {'aspect': 'match', 'description': 'fantastic', 'sentiment': Sentiment(polarity=0.4, subjectivity=0.9)}]


Entraîner un NaivesBayesClassifier avec des phrase d'entraînement

In [22]:
# We train the NaivesBayesClassifier
train = [
  ('Slow internet.', 'negative'),
  ('Delicious food', 'positive'),
  ('Suboptimal experience', 'negative'),
  ('Very enjoyable time', 'positive'),
  ('delicious food.', 'neg')
]
cl = NaiveBayesClassifier(train)# And then we try to classify some sample sentences.
blob = TextBlob("Delicious food. Very Slow internet. Suboptimal experience. Enjoyable food.", classifier=cl)
for s in blob.sentences:
  print(s)
  print(s.classify())

Delicious food.
positive
Very Slow internet.
negative
Suboptimal experience.
negative
Enjoyable food.
positive


Tester notre modèle avec les phrase de départ

In [23]:
for aspect in aspects:
  blob = TextBlob(aspect['description'], classifier=cl)  
  aspect['sentiment'] = blob.classify()
print(aspects)

[{'aspect': 'food', 'description': 'delicious', 'sentiment': 'neg'}, {'aspect': 'time', 'description': 'very enjoyable', 'sentiment': 'positive'}, {'aspect': 'meal', 'description': 'tasty', 'sentiment': 'negative'}, {'aspect': 'internet', 'description': 'slow', 'sentiment': 'negative'}, {'aspect': 'experience', 'description': 'suboptimal', 'sentiment': 'negative'}, {'aspect': 'life', 'description': 'good', 'sentiment': 'negative'}, {'aspect': 'match', 'description': 'fantastic', 'sentiment': 'negative'}]


On remarque que le modèle n'a pas bien classifier la nouvelle phrase

On va essayer d'ajouter des phrase d'entraînement, et reentraîner le modèle

In [24]:
# We train the NaivesBayesClassifier
train = [
  ('Slow internet.', 'negative'),
  ('Delicious food', 'positive'),
  ('Suboptimal experience', 'negative'),
  ('Very enjoyable time', 'positive'),
  ('delicious food.', 'neg'),
  ('fast computer.', 'positive')
]
cl = NaiveBayesClassifier(train)# And then we try to classify some sample sentences.
blob = TextBlob("Delicious food. Very Slow internet. Suboptimal experience. Enjoyable food.", classifier=cl)
for s in blob.sentences:
  print(s)
  print(s.classify())

Delicious food.
positive
Very Slow internet.
negative
Suboptimal experience.
negative
Enjoyable food.
positive


Tester le nouveau modèle

In [25]:
for aspect in aspects:
  blob = TextBlob(aspect['description'], classifier=cl)  
  aspect['sentiment'] = blob.classify()
print(aspects)

[{'aspect': 'food', 'description': 'delicious', 'sentiment': 'neg'}, {'aspect': 'time', 'description': 'very enjoyable', 'sentiment': 'positive'}, {'aspect': 'meal', 'description': 'tasty', 'sentiment': 'positive'}, {'aspect': 'internet', 'description': 'slow', 'sentiment': 'positive'}, {'aspect': 'experience', 'description': 'suboptimal', 'sentiment': 'positive'}, {'aspect': 'life', 'description': 'good', 'sentiment': 'positive'}, {'aspect': 'match', 'description': 'fantastic', 'sentiment': 'positive'}]


### la phrase est bien classifier