# Sentiment Classification & How To "Frame Problems" for a Neural Network

by Andrew Trask

- **Twitter**: @iamtrask
- **Blog**: http://iamtrask.github.io

### What You Should Already Know

- neural networks, forward and back-propagation
- stochastic gradient descent
- mean squared error
- and train/test splits

### Where to Get Help if You Need it
- Re-watch previous Udacity Lectures
- Leverage the recommended Course Reading Material - [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning) (40% Off: **traskud17**)
- Shoot me a tweet @iamtrask


### Tutorial Outline:

- Intro: The Importance of "Framing a Problem"


- Curate a Dataset
- Developing a "Predictive Theory"
- **PROJECT 1**: Quick Theory Validation


- Transforming Text to Numbers
- **PROJECT 2**: Creating the Input/Output Data


- Putting it all together in a Neural Network
- **PROJECT 3**: Building our Neural Network


- Understanding Neural Noise
- **PROJECT 4**: Making Learning Faster by Reducing Noise


- Analyzing Inefficiencies in our Network
- **PROJECT 5**: Making our Network Train and Run Faster


- Further Noise Reduction
- **PROJECT 6**: Reducing Noise by Strategically Reducing the Vocabulary


- Analysis: What's going on in the weights?

# Lesson: Curate a Dataset

In [62]:
def pretty_print_review_and_label(i):
    print(labels[i] + "\t:\t" + reviews[i][:80] + "...")

g = open('reviews.txt','r') # What we know!
reviews = list(map(lambda x:x[:-1],g.readlines()))
g.close()

g = open('labels.txt','r') # What we WANT to know!
labels = list(map(lambda x:x[:-1].upper(),g.readlines()))
g.close()

In [63]:
len(reviews)

25000

In [69]:
reviews[5]

'this film lacked something i couldn  t put my finger on at first charisma on the part of the leading actress . this inevitably translated to lack of chemistry when she shared the screen with her leading man . even the romantic scenes came across as being merely the actors at play . it could very well have been the director who miscalculated what he needed from the actors . i just don  t know .  br    br   but could it have been the screenplay  just exactly who was the chef in love with  he seemed more enamored of his culinary skills and restaurant  and ultimately of himself and his youthful exploits  than of anybody or anything else . he never convinced me he was in love with the princess .  br    br   i was disappointed in this movie . but  don  t forget it was nominated for an oscar  so judge for yourself .  '

In [68]:
labels[5]

'POSITIVE'

# Lesson: Develop a Predictive Theory

In [9]:
print("labels.txt \t : \t reviews.txt\n")
pretty_print_review_and_label(2137)
pretty_print_review_and_label(12816)
pretty_print_review_and_label(6267)
pretty_print_review_and_label(21934)
pretty_print_review_and_label(5297)
pretty_print_review_and_label(4998)

labels.txt 	 : 	 reviews.txt

NEGATIVE	:	this movie is terrible but it has some good effects .  ...
POSITIVE	:	adrian pasdar is excellent is this film . he makes a fascinating woman .  ...
NEGATIVE	:	comment this movie is impossible . is terrible  very improbable  bad interpretat...
POSITIVE	:	excellent episode movie ala pulp fiction .  days   suicides . it doesnt get more...
NEGATIVE	:	if you haven  t seen this  it  s terrible . it is pure trash . i saw this about ...
POSITIVE	:	this schiffer guy is a real genius  the movie is of excellent quality and both e...


In [70]:
import re
def separate_words(text, min_lenth=3):
    splitter = re.compile("\\W+")
    return [s.lower() for s in splitter.split(text) if len(s) > min_lenth]
#通过统计词的数目来建立ｉｎｐｕｔ和ｌａｂｅl的关系
#通过建立好词字典和坏词字典
positive_dic = ['excellent','inspire','best','better','brilliant']
negative_dic = ['terrible','boring','disappointed']
#统计输入的数据中正词和负词的数目，谁的数目多，就是哪种态度
#一致的数目
match_count=0
for i,sentence in enumerate(reviews[0:10]):
    positive_num = 0
    negative_num = 0
    words_list = separate_words(sentence,3)
    for word in words_list:
        if word in positive_dic:
            positive_num+=1
        if word in negative_dic:
            negative_num+=1
    #得到了这句话的统计数据
    print(positive_num,negative_num,'\n')
    if(positive_num > negative_num)and(labels[i] =='POSITIVE'):
        match_count+=1
    if(positive_num < negative_num)and(labels[i] =='NEGATIVE'):
        match_count+=1
    
        
print(match_count)      




0 0 

1 0 

1 0 

1 1 

3 0 

0 1 

0 0 

2 0 

1 0 

1 1 

4


In [17]:
print(dismatch_count)

0


In [40]:
positive_dic = ['excellent','inspire','airport']
if 'excellent' in positive_dic:
    print("y")

y


In [58]:
import re
def separate_words(text, min_lenth=3):
    splitter = re.compile("\\W+")
    return [s.lower() for s in splitter.split(text) if len(s) > min_lenth]
ａ ='airport    starts as a brand new luxury    plane is loaded up with valuable paintings  such belongi'
l = separate_words(a,2)
print(l)

['airport', 'starts', 'brand', 'new', 'luxury', 'plane', 'loaded', 'with', 'valuable', 'paintings', 'such', 'belongi']
