### Rule-Based Aspect Extraction and Sentiment Analysis
Using part-of-speech tagging and grammatical dependencies (spaCy library). Done by Anood

In [3]:
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_sm')

#### Rule 1: Extract adjective-noun pairs (where the adjective directly precedes the noun).
Output for each sentence is {'aspect term': [list of adjectives]}

In [2]:
sentences1 = ['The root causes were identified as follows: Root Cause for Sub-Problem 1: Inadequate procedural guidance and unclear coordination between applicable proceduers',
            'This ultimately created an environment that promulgated a human error-likely environment.” More specifically, the RCE team determined that the environment consisted of poor communication, lack of engineering leadership, too much reliance on vendor designs, time pressure, and distractions. ',
            'Also, equipment problems due to aging have led to an increasingly negative trend in the station’s Deficient Critical Component Backlog Orders. ',
             'Mr. Baldwin stated the deficient performance was caused by maintenance procedural inadequacy which allowed work to proceed with the relay energized.'
            ]

extracted_aspects = []

for sentence in sentences1:
    doc = nlp(sentence)
    noun_adj_pairs = {}
    for token in doc:
        adj = []
        noun = ""
        if token.pos_ == 'NOUN':
            for child in token.children:
                if child.pos_ == 'ADJ':
                    noun = token.text
                    adj.append(child)
        if noun and adj:
            noun_adj_pairs.update({noun:adj})
    print(sentence)
    print(noun_adj_pairs, "\n")
    if len(noun_adj_pairs) != 0:
        extracted_aspects.append(noun_adj_pairs)


The root causes were identified as follows: Root Cause for Sub-Problem 1: Inadequate procedural guidance and unclear coordination between applicable proceduers
{'guidance': [Inadequate, procedural], 'coordination': [unclear], 'proceduers': [applicable]} 

This ultimately created an environment that promulgated a human error-likely environment.” More specifically, the RCE team determined that the environment consisted of poor communication, lack of engineering leadership, too much reliance on vendor designs, time pressure, and distractions. 
{'environment': [human, likely], 'communication': [poor], 'reliance': [much]} 

Also, equipment problems due to aging have led to an increasingly negative trend in the station’s Deficient Critical Component Backlog Orders. 
{'trend': [negative]} 

Mr. Baldwin stated the deficient performance was caused by maintenance procedural inadequacy which allowed work to proceed with the relay energized.
{'performance': [deficient], 'inadequacy': [procedural]}

In [5]:
#Visualizing dependencies
displacy.render(nlp(sentences1[0]), style='dep', jupyter=True)

#### Rule 2: Extract noun-adjective pairs (where the adjective comes after the noun)
Output for each sentence is {'aspect term': [list of adjectives]}

In [6]:
sentences2 =['Operator training was also poor with respect to how to cope with the conditions with which the operators at Fukushima were faced and availability of portable equipment was inadequate.',
            'This backlog of orders is reported on a weekly basis and was rated as Yellow, or deficient, in 70 of the 78 weeks between June 2012 and December 2013.',
            'the Plant Health Committee meeting the Appendix R Program Manager reported that this fire protection program health was Red'
           ]

for sentence in sentences2:
    doc = nlp(sentence)
    noun_adj_pairs = {}
    for token in doc:
        adj = []
        noun = ""
        if (token.dep_ == 'ROOT') or (token.dep_ == 'conj') or (token.pos_ == 'AUX'):
            for child in token.children:
                if (child.pos_ == 'NOUN') or (child.pos == 'PROPN'):
                    noun = child.text
                    continue
                if (child.pos_ == 'ADJ') or (child.text in ['Red','Yellow']):
                    adj.append(child.text)
        if noun and adj:
            noun_adj_pairs.update({noun:adj})
    if len(noun_adj_pairs) != 0:
        print(sentence)
        print(noun_adj_pairs, '\n')
        extracted_aspects.append(noun_adj_pairs)


Operator training was also poor with respect to how to cope with the conditions with which the operators at Fukushima were faced and availability of portable equipment was inadequate.
{'training': ['poor'], 'availability': ['inadequate']} 

the Plant Health Committee meeting the Appendix R Program Manager reported that this fire protection program health was Red
{'health': ['Red']} 



In [7]:
displacy.render(nlp(sentences2[0]), style='dep', jupyter=True)

#### Rule 3: Extracting aspect terms that are mentioned after "lack of"

In [8]:
sentences3 = ['This ultimately created an environment that promulgated a human error-likely environment.” More specifically, the RCE team determined that the environment consisted of poor communication, lack of engineering leadership, too much reliance on vendor designs, time pressure, and distractions.',
             'Other failures were due to lack of or inadequate regular inspections, maintenance (cleaning) or replacement and due to lack of a thorough understanding by operators of the infrequently used system.',
             'Mr. Wardell replied that the root cause of the safety system functional failure was due to lack of clear standards for risk '
            ]
             
for sentence in sentences3:
    doc = nlp(sentence)
    noun_adj_pairs = {}
    for token in doc:
        adj = []
        noun = ""
        if ((token.head.text).lower() == 'lack'):
            for child in token.children:
                if (child.pos_ == 'NOUN'):
                    noun = child.text
                    adj.append(token.head.text)
        if noun and adj:
            noun_adj_pairs.update({noun:adj})
    print(sentence)
    print(noun_adj_pairs, '\n') 
    if len(noun_adj_pairs) != 0:
        extracted_aspects.append(noun_adj_pairs)

This ultimately created an environment that promulgated a human error-likely environment.” More specifically, the RCE team determined that the environment consisted of poor communication, lack of engineering leadership, too much reliance on vendor designs, time pressure, and distractions.
{'leadership': ['lack']} 

Other failures were due to lack of or inadequate regular inspections, maintenance (cleaning) or replacement and due to lack of a thorough understanding by operators of the infrequently used system.
{'maintenance': ['lack'], 'understanding': ['lack']} 

Mr. Wardell replied that the root cause of the safety system functional failure was due to lack of clear standards for risk 
{'standards': ['lack']} 



In [9]:
displacy.render(nlp(sentences3[-1]), style='dep', jupyter=True)

#### Total list of extracted aspects using all three rules above

In [10]:
extracted_aspects

[{'guidance': [Inadequate, procedural],
  'coordination': [unclear],
  'proceduers': [applicable]},
 {'environment': [human, likely], 'communication': [poor], 'reliance': [much]},
 {'trend': [negative]},
 {'performance': [deficient], 'inadequacy': [procedural]},
 {'training': ['poor'], 'availability': ['inadequate']},
 {'health': ['Red']},
 {'leadership': ['lack']},
 {'maintenance': ['lack'], 'understanding': ['lack']},
 {'standards': ['lack']}]

#### Sentiment Analysis of extracted aspects

Download opinion lexicon (a list of positive/negative words) from: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#:~:text=Opinion%20Lexicon%3A%20A%20list%20of,Liu%2C%20KDD%2D2004

In [12]:
with open('positive-words.txt') as f:
    lines = f.read().splitlines() 
    pos_words = [x for x in lines if not x.startswith(';')][1:]

with open('negative-words.txt',encoding = "ISO-8859-1") as f:
    lines = f.read().splitlines() 
    neg_words = [x for x in lines if not x.startswith(';')][1:]

for noun_adj_pairs in extracted_aspects:
    for key,values in noun_adj_pairs.items():
        for adj in values:
            adj = str(adj).lower()
            if adj in neg_words:
                print('aspect term:', key)
                print('sentiment:', 'negative')
                print('safety traits:', [], '\n')
            elif adj in pos_words:
                print('aspect term:', key)
                print('sentiment:', 'positive')
                print('safety traits:', [], '\n')

aspect term: guidance
sentiment: negative
safety traits: [] 

aspect term: coordination
sentiment: negative
safety traits: [] 

aspect term: communication
sentiment: negative
safety traits: [] 

aspect term: trend
sentiment: negative
safety traits: [] 

aspect term: performance
sentiment: negative
safety traits: [] 

aspect term: training
sentiment: negative
safety traits: [] 

aspect term: availability
sentiment: negative
safety traits: [] 

aspect term: leadership
sentiment: negative
safety traits: [] 

aspect term: maintenance
sentiment: negative
safety traits: [] 

aspect term: understanding
sentiment: negative
safety traits: [] 

aspect term: standards
sentiment: negative
safety traits: [] 

