## Greedy Sentiment Transformation
11/13/16 - Use a greedy switching method to exchange words based on best antonym.
Uses the IMDB dataset folder (http://ai.stanford.edu/~amaas/data/sentiment/).

In [1]:
from sentiment_utils import *
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk import word_tokenize
from nltk.corpus import wordnet as wn



In [4]:
def greedy_transform_func(filename, review, score):
    sentiment_analyzer = SentimentIntensityAnalyzer()
    tagged_review = nltk.pos_tag(word_tokenize(review))
    transformed_review = []
    for tagged_word in tagged_review:
        if tagged_word[1] in ['JJ', 'JJR', 'JJS', 'RB', 'RBR', 'RBS']:
            #1) get sentiment of the tagged word
            #2) if the sentiment of the tagged word is opposite of the review score
            #3) -get the antonyms of tagged word
            #4) -for each antonym, score sentiment, or pick random sentiment?
            #5) -append antonym
            word_sentiment = sentiment_analyzer.polarity_scores(tagged_word[0])['compound']
            #print tagged_word[0], tagged_word[1], word_sentiment, score
            if word_sentiment*(score - 5) > 0:
                antonyms = get_antonyms(tagged_word[0], tagged_word[1])
                if len(antonyms) == 0:
                    transformed_review.append('not ' + tagged_word[0])
                else: transformed_review.append(antonyms[0])
            else: transformed_review.append(tagged_word[0])
        else:
            transformed_review.append(tagged_word[0])
    return " ".join(transformed_review)

def get_antonyms(word, word_pos):
    all_antonyms = []
    pos_dict = {'JJ': wn.ADJ, 'JJR': wn.ADJ, 'JJS': wn.ADJ, 'RB': wn.ADV, 'RBR': wn.ADV, 'RBS': wn.ADV}
    wn_pos = pos_dict[word_pos]
    for syn in wn.synsets(word, pos = pos_dict[word_pos]):
        for lemma in syn.lemmas():
            if lemma.antonyms():
                all_antonyms.append(lemma.antonyms()[0].name())
    antonyms = list(set(all_antonyms))
    return antonyms

Greedy Test One Review

In [105]:
i = 0
for (filename, review, score) in imdb_sentiment_reader(dataset_type='val', sentiment='pos'):
    if i != 551:
        i +=1
        continue
    print "Original review: "
    print review
    print "Transformed review:" 
    transformed = greedy_transform_func(filename, review, score)
    print transformed
    break

Original review: 
Natile Portman and Susan Sarandon play off of each other like a symphony in this coming of age story about a young girl, who is sentenced to life as the daughter of one of the nuttest women you will ever encounter. Sarandon has this ability, call it talent if you will, to play some of the most off-beat characters and bring their humanity to forefront of any film she makes. As the mother of this obviously brilliant and muture beyond her years young girl, Sarandon alternates between being the mom and being the child with the ease of a ballet dancer. More importantly she does it with strength and flare without stomping all over Portman's portrayal of the daughter. The question is always asked when we deconstruct the film plot, who changes? This film is certainly about the daughter, but if you look close at the dreams and sacrifices that Mom makes you come to understand that she changes in step with her daughter. I am willing to bet this makes all of us in the audience ch

# Rule-Based

In [2]:
def newWord(word):
    ant = get_antonyms(word[0], word[1])
    if len(ant) == 0:
        return "not " + word[0]
    else:
        return ant[0]


def rule_based_trans_func(filename, review, score):
    sentiment_analyzer = SentimentIntensityAnalyzer()
    tagged_review = nltk.pos_tag(word_tokenize(review))
    transformed_review = []

    
    
    i = 0
    appended = False
    
    while i < (len(tagged_review) - 1):
        word1 = tagged_review[i]
        word2 = tagged_review[i+1]
        
        word1_sentiment = sentiment_analyzer.polarity_scores(word1[0])['compound']
        word2_sentiment = sentiment_analyzer.polarity_scores(word2[0])['compound']
        if (word1[1] in ['JJ', 'JJR', 'JJS', 'RB', 'RBR', 'RBS'] or word2[1] in ['JJ', 'JJR', 'JJS', 'RB', 'RBR', 'RBS'])and (word1_sentiment*(score - 5) > 0 or word2_sentiment*(score - 5) > 0):

            # (adverb, adj/adv) special case
            if word1[1] in ['RB', 'RBR', 'RBS'] and word2[1] in ['JJ', 'JJR', 'JJS', 'RB', 'RBR', 'RBS']:
                if word1_sentiment*(score - 5) > 0 and word2_sentiment*(score - 5) <= 0:
                    word = newWord(word1)
                    if not appended: transformed_review.append(word)
                    transformed_review.append(word2[0])
                elif word1_sentiment*(score - 5) <= 0 and word2_sentiment*(score - 5) > 0:
                    word = newWord(word2)
                    if not appended: transformed_review.append(word1[0])
                    transformed_review.append(word)
                else:
                    w1 = newWord(word1)
                    w2 = newWord(word2)
                    if not appended: transformed_review.append(w1)
                    transformed_review.append(w2)

            # if not in front
            elif word1[0].lower() == "not" or word1[0].lower() == "never":
                transformed_review.append(word2[0])

            # final special case 
            elif word1[0].lower() == "but" or word1[0].lower() == "yet" and word2[1] in ['JJ', 'JJR', 'JJS', 'RB', 'RBR', 'RBS']:
                word = newWord(word2)
                if not appended: transformed_review.append("and")
                transformed_review.append(word)

            else:
                w1 = word1[0]
                w2 = word2[0]
                if word1_sentiment*(score - 5) > 0 and word1[1] in ['JJ', 'JJR', 'JJS', 'RB', 'RBR', 'RBS']:
                    w1 = newWord(word1)
                if word2_sentiment*(score - 5) > 0 and word2[1] in ['JJ', 'JJR', 'JJS', 'RB', 'RBR', 'RBS']:
                    w2 = newWord(word2)
                transformed_review.append(w1)
                transformed_review.append(w2)

            
            if i != len(tagged_review) - 2:
                word3 = tagged_review[i+2]
                word3_sentiment = sentiment_analyzer.polarity_scores(word3[0])['compound']

                if word2[1] in ['RB', 'RBR', 'RBS'] and word3[1] in ['JJ', 'JJR', 'JJS'] and word1_sentiment*(score - 5) > 0:
                    i += 1
                    appended = True
                else:
                    i += 2
                    appended = False
                    
            else:
                i += 2
                appended = False

                    
        else:
            transformed_review.append(word1[0])
            transformed_review.append(word2[0])
            i += 2
            appended = False
                
                    
    return " ".join(transformed_review)
                        
                     
    

 
                            
        
        
    

In [6]:
i = 0
for (filename, review, score) in imdb_sentiment_reader(dataset_type='val', sentiment='pos'):
    if i != 550:
        i +=1
        continue
    print "Original review: "
    print review
    print "Transformed review:" 
    transformed = rule_based_trans_func(filename, review, score)
    print transformed
    break

Original review: 
My one-line summary hints that this is not a good film, but this is not true. I did enjoy the movie, but was probably expecting too much.<br /><br />Adele, who is solidly portrayed by Susan Sarandon, did not come off as a very likable character. She is flighty and irresponsible to what would be an unforgivable degree were it not for the tremendous love she has for her daughter. This is the one thing she knows how to do without fail. Adele's daughter, Anna, is a sad girl who is so busy making up for her mother's shortcomings that she does not seem to be only 14-17 years old. This, of course, makes Natalie Portman the perfect choice to play Anna since she never seems to be 14- 17 years old either. Portman pulls this role off with such ease that you almost forget that she has not been making movies for 20 years. Yet, even with the two solid leads, Wayne Wang never seems to quite draw the audience in as he did with The Joy Luck Luck and even more so with Smoke. Though I h

In [76]:
sentiment_analyzer = SentimentIntensityAnalyzer()
sentiment_analyzer.polarity_scores("Brilliant")['compound']


0.5859

In [72]:
print transformed

Brilliant and moving performances by Tom Courtenay and Peter Finch


In [7]:
train_reader = imdb_sentiment_reader(dataset_type='train', sentiment='both')
test_reader = imdb_sentiment_reader(dataset_type='val', sentiment='both')
default_evaluator = DefaultEvaluator(verbose=True)
baseline_runner = ExperimentRunner(train_reader, test_reader, rule_based_trans_func, 
                               evaluator=default_evaluator, verbose=True)
baseline_runner.run_experiment()

Building positive bigram list...
Now on: 0
Now on: 1000
Now on: 2000
Now on: 3000
Now on: 4000
Now on: 5000
Now on: 6000
Now on: 7000
Now on: 8000
Now on: 9000
Now on: 10000
Now on: 11000
Now on: 12000
Building negative bigram list...
Now on: 0
Now on: 1000
Now on: 2000
Now on: 3000
Now on: 4000
Now on: 5000
Now on: 6000
Now on: 7000
Now on: 8000
Now on: 9000
Now on: 10000
Now on: 11000
Now on: 12000
Now evaluating: 0
Current mean score: nan


  ret = ret.dtype.type(ret / rcount)


IndexError: list index out of range