# Rule based aspect extraction

Based on [A Rule-Based Approach to Aspect Extraction from Product Reviews](https://aclanthology.org/W14-5905) (Poria et al., 2014) we implement the rules as 8 separed functions and we apply them to the reviews in series. 

Many other approaches and libraries. I coose this approach because of its simplicity (implementation time) and the control I have over it. 

* Aspect-based opinion mining focuses on the extraction of aspects (or product features) from opinionated text

* **Explicit** aspects explicitly denote targets 
  - e.g. I love the *touchscreen* of my phone but the *battery life* is so short

* Aspect can also be expressed indirectly through an **implicit aspect clue** (IAC)
    - e.g. This is the best phone one could have. It is *lightweight*, *sleek* and *attractive*. I found it very *user-friendly* and *easy to manipulate*
    - `lightweight` -> weight; 
    - `sleek` and `attractive`  -> appearance; 
    - `user-friendly`  -> interface; 
    - `easy to manipulate` -> functionality

* Detect explicit aspects and IACs from opinionated documents.
* Map IACs to their respective aspect **categories**.
* IACs = single words (`sleek`) or multi-word expressions (`easy to manipulate`); different part-of-speech (POS) (adjectives, noun, verbs);

The proposed aspect parser is based on two general rules:
1. Rules for the sentences having subject verb.
2. Rules for the sentences which do not have subject verb.

Import the NLP library with a pre-trained NLP model. For the use we are intereseted in we can avoid loading in the SpaCy pipeline the `EntityRecognizer`

In [None]:
import spacy
nlp = spacy.load('en_core_web_md', exclude="ner")

Load the senticnet lexicon

In [None]:
import pandas as pd
senticnet = pd.read_csv("senticnet.csv")

Load the data structure

In [None]:
from aspect_estraction import Aspect

Load the VADER (Valence Aware Dictionary and sEntiment Reasoner) sentiment analysis tool. It is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.

In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()


In [None]:
def explore(doc):
    for t in doc:
        print(t, t.dep_, t.pos_,t.tag_, [c for c in t.children], t.head,t.i)

In [None]:
def senticnet_search(query):
    '''Search if {query} list of word is present in the senticnet list'''
    processed_query = ("_".join(query)).lower()
    return not senticnet[senticnet["Aspect"] == processed_query].empty


### Subject Noun Rule:
* **Trigger**: token is a syntactic subject
* **Behavior**: if token *h* subject-noun (nsubj) relationship with word t:
    - if t has any adverbial or adjective modifier = t aspect

- if t has any adverbial or adjective modifier and the modifier exists in SenticNet, then t is extracted as an aspect.

In [None]:
def rule1(doc):
    '''
    Aspect extraction following Subject Noun Rule 1
    There is a subject of t that has any adverbial or adjective modifier.
    T is the aspect
    '''
    for token in doc:
        if token.dep_ in ["nsubj", "nsubjpass"]:
            for child in token.head.children:
                if child.dep_ in ["amod", "advmod"] and not token.head.is_stop:
                    if senticnet_search([token.head.lemma_]):
                        return Aspect(aspect=token.head.lemma_, rule=1)

In [None]:
doc = nlp("This mp3 player also costs a lot less than the ipod.")
rule1(doc)

* if the sentence does not have auxiliary verb, i.e., `is`, `was`, `would`, `should`, `could`, then:
    - if the verb t is modified by an adjective or an adverb or it is in adverbial clause modifier relation with another token, then both h and t are extracted as aspects. In EX, battery is in a subject relation with lasts and lasts is modified by the adjective modifier little, hence both the aspects last and battery are extracted.
      - *The battery lasts little.*
    - if t has any direct object relation with a token n and the POS of the token is `Noun` then n is extracted as an aspect. In EX, like is in direct object relation with lens so the aspect lens is extracted. 
      - *I like the lens of this camera.*
    - if t has any direct object relation with a token n and the POS of the token is `Noun`, then the token n extracted as aspect. In the dependency parse tree of the sentence, if another token n1 is connected to n using any dependency relation and the POS of n is Noun, then n1 is extracted as an aspect. In (3), like is in direct object relation with beauty which is connected to screen via a preposition relation. So the aspects screen and beauty are extracted.
      - *I like the beauty of the screen.*
    - if t is in open clausal complement relation with a token t1 , then the aspect t-t1 is extracted if t-t1 exists in the opinion lexicon. If t1 is connected with a token t2 whose POS is Noun, then t2 is extracted as an aspect. In EX, like and comment is in clausal complement relation and comment is connected to camera using a preposition relation. Here, the POS of camera is Noun and, hence, camera is extracted as an aspect.
      - I would like to comment on the camera of this phone. 

In [None]:
def rule2(doc):
    '''
    Aspect extraction following Subject Noun Rule 2
    Sentence without auxiliary verbs and t with adjective, adverbial or adverbial modifier clause with another token -> h and t are aspects
    or with direct object relation with a NOUN n, n is aspect if in SentiNet
    or with direct object relation with a NOUN n and not in SentiNet derive list of connected nouns and that is aspect
    or open clausal complement with another token
    '''
    for token in doc:
        if token.dep_ in ["nsubj", "nsubjpass"]:
            #check if an AUX is present
            aux_presence = [t for t in doc if t.pos_ == "AUX"]
            for child in token.head.children:
                if child.dep_ in ["amod", "advmod", "advcl"] and not aux_presence and not token.is_stop:
                    return Aspect(aspect=token.lemma_, rule=2)
                if child.dep_ == "dobj" and child.pos_ == "NOUN" and not aux_presence and not child.is_stop:
                    if senticnet_search([child.lemma_]):
                        return Aspect(aspect=child.lemma_, rule=2)
                    else:
                        tmp = " ".join([child.lemma_]+[cococ.lemma_ for coc in child.children if coc.pos_ ==
                                "ADP" for cococ in coc.children if cococ.pos_ == "NOUN"])
                        return Aspect(aspect=tmp,  rule=2)

                if child.dep_ == "xcomp" and not child.is_stop:
                    #if [child,coc] is in SenticNet
                    tmp = [[child.lemma_, coc.lemma_] for coc in child.children]
                    for coc in child.children:
                        if senticnet_search([child.lemma_, coc.lemma_]) or senticnet_search([coc.lemma_,child.lemma_]):
                            return Aspect(aspect=" ".join([child.lemma_, coc.lemma_]), rule=2)
                    else:
                        tmp = [
                            cococ.lemma_ for coc in child.children for cococ in coc.children if cococ.pos_ == "NOUN"]
                        if tmp:
                            return Aspect(aspect=" ".join(tmp), rule=2)

In [None]:
doc = nlp("The battery lasts little.")
print(f"EXAMPLE 1: {doc}")
print(rule2(doc))

doc = nlp("I like the lens of this camera.")
print(f"EXAMPLE 2: {doc}")
print(rule2(doc))

doc = nlp("I like the beauty of the screen.")
print(f"EXAMPLE 3: {doc}")
print(rule2(doc))

doc = nlp("I would like to comment on the camera of this phone.")
print(f"EXAMPLE 4: {doc}")
print(rule2(doc))

- A copula is the relation between the complement of a copular verb and the copular verb. If the token t is in copula relation with a copular verb and the copular verb exists in the implicit aspect lexicon, then t is extract as aspect term. In EX, expensive is extracted as an aspect.
  - *The car is expensive.*

In [None]:
def rule3(doc):
    '''
    Subject Noun Rule
    Sentence with auxiliary verb (copula) and token as complement -> token is aspect
    '''
    for token in doc:
        if token.dep_ in ["nsubj", "nsubjpass"]:
            for child in token.head.children:
                if child.dep_ in ["acomp"] and token.head.pos_ == "AUX" and not child.is_stop:
                    #check if child exists in the implicit aspect lexicon
                    #print(child)
                    return Aspect(aspect=child.lemma_, rule=3)

In [None]:
doc = nlp("The car is expensive.")
rule3(doc)

- If the token t is in copula relation with a copular verb and the POS of h is Noun, then h is extracted as an explicit aspect. In EX, camera is extracted as an aspect. 
  - *The camera is nice.*

In [None]:
def rule4(doc):
    '''
    Subject Noun Rule
    Sentence with auxiliary verb (copula) and token as complement and a Noun -> noun is aspect
    '''
    for token in doc:
        if token.dep_ in ["nsubj", "nsubjpass"]:
            for child in token.head.children:
                if child.dep_ in ["acomp"] and token.head.pos_ == "AUX" and token.pos_ == "NOUN" and not token.is_stop:
                    return Aspect(aspect=token.lemma_, rule=4)

In [None]:
doc = nlp("The camera is nice.")
print(rule4(doc))

- If the token t is in copula relation with a copular verb and the copular verb is connected to a token t1 using any dependency relation and t1 is a verb, then both t1 and t are extracted as implicit aspect terms, as long as they exist in the implicit aspect lexicon. In EX, lightweight is in copula relation with is and lightweight is connected to the word carry by open clausal complement relation. Here, both lightweight and carry are extracted as aspects.
  - *The phone is very lightweight to carry.*

In [None]:
def rule5(doc):
    '''
    Subject Noun Rule
    Sentence with auxiliary verb (copula) and token as complement and a Noun -> noun is aspect
    '''
    for token in doc:
        if token.dep_ in ["nsubj", "nsubjpass"]:
            for child in token.head.children:
                if child.dep_ in ["acomp"] and token.head.pos_ == "AUX" and not child.is_stop:
                    # check if  child and coc exists in the implicit aspect lexicon
                    tmp = " ".join(
                        [child.lemma_]+[coc.lemma_ for coc in child.children if coc.pos_ == "VERB"])
                    return Aspect(aspect=tmp, rule=5)


In [None]:
doc=nlp("The phone is very lightweight to carry.")
rule5(doc)

### NON subject noun rules

- if an `adjective` or `adverb` h is in `infinitival` or `open clausal complement` (ccomp, xcomp) relation with a token t and h exists in the implicit aspect lexicon, then h is extracted as an aspect. In EX, big is extracted as an aspect as it is connected to hold using a clausal complement relation.
    - Very big to hold.

In [None]:
def rule6(doc):
    '''
    NO Subject Noun Rule
    Sentence with adjective or adverb h in infinitival or open clausal complement -> if h in IAC lexicon -> h aspect
    '''
    for token in doc:
        if token.pos_ in ["ADJ", "ADV"]:
            for child in token.children:
                if child.dep_ in ["ccomp", "xcomp"] and not token.is_stop:
                    # if token is in IAC lexicon
                    return Aspect(aspect=token.lemma_, rule=6)

In [None]:
doc = nlp("Very big to hold.")
rule6(doc)

- if a token h is connected to a noun t using a prepositional relation, then both h and t are extracted as aspects. In EX, sleekness is extracted as an aspect.
    - *Love the sleekness of the player.*

In [None]:
def rule7(doc):
    '''
    NO Subject Noun Rule
    h token connected to noun t through preposition -> h+t aspect
    '''
    for token in doc:
        for child in token.children:
            if child.dep_ == "prep":
                for child_of_child in child.children:
                    if child_of_child.pos_ == "NOUN" and not token.is_stop:
                        return Aspect(aspect=f"{token.lemma_} {child_of_child.lemma_}", rule=4)

In [None]:
doc = nlp("Love the sleekness of the player.")
rule7(doc)

- if a token h is in a direct object relation (`dobj`) with a token t, t is extracted as aspect. In EX, mention is in a direct object relation with price, hence price is extracted as an aspect.
    - Not to mention the price of the phone.

In [None]:
def rule8(doc):
    '''
    NO Subject Noun Rule
    h token connected with direct object with t -> t aspect
    '''
    for token in doc:
        for child in token.children:
            if child.dep_ == "dobj" and not child.is_stop:
                return Aspect(aspect=child.lemma_, rule=8)

In [None]:
doc = nlp("Not to mention the price of the phone.")
rule8(doc)

### Additional rules

At this point I am ignoring these two additional rules

- For each aspect term extracted above, if an aspect term h is in co-ordination or conjunct relation with another token t, then t is also extracted as an aspect. In EX, amazing is firstly extracted as an aspect term. As amazing is in conjunct relation with easy, then use is also extracted as an aspect.
    - *The camera is amazing and easy to use.*

In [None]:
def add_rule1(token):
    '''
    Additional Rule 1. 
    Takes care of the cases where the aspect is hidden by conjunctions 
    '''
    print([coc for c in token.children if c.dep_ in ["conj"]
            for coc in c.children if coc.dep_ == "xcomp"])

In [None]:
doc = nlp("The camera is amazing and easy to use.")
rule3(doc)
add_rule1(doc[3])

- A noun compound modifier of an NP is any noun that serves to modify the head noun. If t is extracted as an aspect and t has noun compound modifier h, then the aspect h-t is extracted and t is removed from the aspect list. In EX, as chicken and casserole are in noun compound modifier relation, only chicken casserole is extracted as an aspect.
  - *We ordered the chicken casserole, but what we got were a few small pieces of chicken, all dark meat and on the bone.*

In [None]:
def add_rule2(token):
    '''
    Additional Rule 2. 
    Takes care of coumpound nouns
    '''
    print([c for c in token.children if c.dep_ == "compound" and c.pos_=="NOUN"]+[token])

In [None]:
doc = nlp("We loved the chicken casserole.")
rule2(doc)

add_rule2(doc[4])

In [None]:
SubjectRule = {"rule1": rule1, "rule2": rule2, "rule3": rule3, 
               "rule4": rule4, "rule5": rule5}
NoSubjectRule = {"rule6": rule6, "rule7": rule7, "rule8": rule8}

def extract_aspect_sentence(doc):
    '''Extract the aspects from a sentence'''
    subjects = [token for token in doc if token.dep_ in ["nsubj", "nsubjpass"]]
    aspects =[]
    if subjects:
        for name, rule in SubjectRule.items():
            if a:=rule(doc):
                if a not in aspects:
                    a.sentiment = sia.polarity_scores(doc.text).get('compound')
                    aspects.append(a)

    else:
        for name, rule in NoSubjectRule.items():
            if a := rule(doc):
                if a not in aspects:
                    a.sentiment = sia.polarity_scores(doc.text).get('compound')
                    aspects.append(a)
    return aspects

In [None]:
def get_aspects(text: str, nlp: spacy.lang.en.English):
    doc = nlp(text)
    aspects = []
    for sent in doc.sents:
        tmp_aspects = extract_aspect_sentence(sent)
        aspects += tmp_aspects
    return aspects

In [None]:
get_aspects("I brought it because I thought it would make my house smell like a Christmas tree but the smell is very dull and I have to leave it lit for a very long time to get even a modest smell in the house from it. This was my first buying this brand of candle and expected a stronger scent based off of what people told me. I smelled other candles at Walmart from Yankee and they were stronger so I think it might just be this scent.", nlp)

In [None]:
get_aspects("This candle has NO SCENT at all. The worst candle I have ever purchased. I'm never buying a Yankee candle again. First time purchase by this brand, and the last.", nlp)