# Broad Goal - Actionable Machine Learning

- Building a data driven machine learning pipeline to solve an actual business problem. 
- Building an intuition for machine learning methods when you approach new problem.
- Give you a starting step to explore more advanced NLP models.
- Project extensions for those interested in advanced learning.

# Outline of the Tutorial

This workshpop has 8 sections:

1. Identifying a problem which ML can help solve.
2. Approaching the problem in a data-driven manner. Formulating it as an ML problem.
3. Data? Do we collect it all?
4. Splitting the problem into sub-parts.
4. Several ways to solve these sub-parts.
5. What further can we do? What kind of model will be useful for this?
6. Overview and Conclusions
7. Proposed extensions for those interested.

# Section 1. The problem at hand.

Consider a hypothetical firm XYZ Tech, which makes charging cables, mobile phone cases, and adapters. This is a crowded market, and they want to track people's reviews for improvement. However,large number of reviews make this hard. The CEO calls you in as a datascience to help solve this problem.

# Section 2. A Data driven approach to this problem.

## Data = Values of Variables.


### Think about variables of interest.

Some of the important things the company would like to keep track of includes:-

- Common complaints.
- Deal breakers for people.
- Thinks people like about the product.
- New proposed feature requests.
- Doing all of this for competitors products.

## Proposed product to address this problem - An automated review analysis pipeline.

What features might be good to have so that the output of the pipeline can be used to make business decisions?


-  Ques) Negative reviews are more important for the company than positive ones. Can we identify negative ones?
- Ans) This can be done using sentiment analysis methods, which allow you to categorize a review as positive or negative.



- Ques) Reviews can often be a page long rants. What to do?
- Ans) Let's automatically summarize reviews, i.e. give a 1-2 line short summary of each review, so that they can be read easily at a glance.



- Ques)Would be good to have some structure to all these. Can we categorize them?
- Ans) Yes! We can do classification to divide them into categories.




- Ques)Can we get something more numeric or visual to compare?
- Ans) Yes, ML can also help us visualize the content!

## Formulating this as a machine learning problem.

- Supervised, Unsupervised, or a blend of the two.

- For the most part, we try to cast our problem in the form of a Supervised Machine Learning problem, as these techniques are well developed and understood. They usually tend to perform better, especially since the deep learning era. Curious about Why? We'll come to it.

### Let's begin with a very quick refresher

One good way to think about machine learning pipelines is that they take in some information, and convert it into more meaningful information.

<img src="Slides/Images/machine_learning_intro.png"></img>

For ex:

- Apple's FaceID takes a picture of a face and tells you if it matches a specific person's, i.e. Identification.

- Amazon's Recommendation Engine takes in your purchase and web browising history and returns potential products you may want to buy. 

To create such systems, we usually resort to Supervised machine learning systems. Such a system takes in "Pairs of X,y data". In the above example, X corresponds to all the data we have collected on a user, and y corresponds to the product they ended up buying. By training a model using such data, given the data on a new user, we can begin to make predictions about what product they might be willing to buy!

### So, let us try to design or Review Analysis Pipeline as a supervised machine learning problem.

Given a new review on Amazon, we would like to do the following:-

<ol>
<li> Tell if it is a positive or a negative review (Sentiment Analysis/Polarity Prediction).
<li> For a negative review, summarize it (Sequence Summarization).
<li> For all the negative reviews, identify groups of reviews (Clustering).
<li> For each group, identify a representative sample for the management to read.
</ol>

As we proceed to build this pipeline, you will see that there are a lot of design decisions we need to make in our machine learning pipeline. One purpose of this tutorial is to lay these out for you, to help you get an overview of how to go about tackling a new problem using a machine learning approach.

### Let's talk about how to get the data for this.

#### Data for Sentiment Analysis

We want pairs of data (X,y pairs), which look like this:-

X = review of a product, y = whether it was a positive or a negative review. 

<html lang="en">
    <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">
        <meta name="viewport" content="width=device-width, initial-scale=1">
        <link href="css/bootstrap.min.css" rel="stylesheet">
    </head>
    <body>
        <div class="container">
            <p>
                For example:-<br>
                ("superb: I use it for my small business.", Positive).<br>
                ("Broke within one month of use, I suggest against purchasing this charger", Negative).
            </p>
        </div>

   </body>
</html>

#### Data for Sequence Summarization

X = review of a product, y = summary of the review.

For example:-<br>
X= "Not an "ultimate guide": Firstly,I enjoyed the format and tone of the book (how the author addressed the reader). However, I did not feel that she imparted any insider secrets that the book promised to reveal. If you are just starting to research law school, and do not know all the requirements of admission, then this book may be a tremendous help. If you have done your homework and are looking for an edge when it comes to admissions, I recommend some more topic-specific books. For example, books on how to write your personal statment, books geared specifically towards LSAT preparation (Powerscore books were the most helpful for me), and there are some websites with great advice geared towards aiding the individuals whom you are asking to write letters of recommendation. Yet, for those new to the entire affair, this book can definitely clarify the requirements for you."<br>

y= "Great for starting out, but if you've done your homework look for something more specific.")


Similarly, we want data for clustering of reviews, and identifying representative members of the cluster (3 and 4 above). These will be discussed later in greater detail as collecting data for them is not as straightforward. As a foreshadowing, I can tell you that using unsupervised machine learning, a lot can be done without collecting pairs of data. These are meant to give you an insight into what you can do, if you cannot collect pairs of X,y data due to difficulty/cost! Stay tuned!

# Section 3: Getting the data

In practice, data collection is governed not only by the problem at hand, but by factors like:

- What data is already publicly available and how it can be re-purposed for our problem.
- How much will it cost to collect the data we need. Is there a way to collect data which solves a part/version of the problem we care about, and if that can help save a lot of money.
- Are there any privacy concerns in collecting the data we need?

As a thumb rule, start by looking at what is already available for free online. Chances are, there is an existing dataset that you can re-purpose for the problem you care about. If not completely, it can definitely be used for pilot testing before you go all in on the process of colecting data for your own purpose.

### 3.1 Where to look for already existing data?

Google it! Recently, google also made this - https://toolbox.google.com/datasetsearch

For the purpose of sentiment analysis, there are several datasets that are already available online. The one which is closest to our purpose, is the Amazon Reviews Dataset.

### 3.2 Amazon Reviews Dataset

This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014.

This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). 

### 3.3 Re-formatting the dataset

For the most part, a big chunk of the time of anyone working with machine learning goes into re-formatting the dataset to put it in a form which can be fed into a machine learning pipeline.

In this section, I am going to go over some of the different formats in which the data is often seen being stored. This includes:

- As a combined text file for both X and y
- As two seperate text files for the X (reviews) and y (their corresponding positive/negative labels).
- As .csv files which can be opened in excel
- In numeric formats you cannot open and read, for example as numpy arrays, or sparse scipy arrays (used very often).

##### As a combined Text file for X and y

In [10]:
# open a text file in python and read and store it's contents

f = open('data/train.ft.txt','r')
content = f.readlines()
f.close()

In [7]:
# Lets see the first 5 lines.

printed_count = 0
while printed_count < 5:
    c = content[printed_count]
    c = c.rstrip()
    print(c)
    print('\n')
    printed_count += 1
    

__label__2 Stuning even for the non-gamer: This sound track was beautiful! It paints the senery in your mind so well I would recomend it even to people who hate vid. game music! I have played the game Chrono Cross but out of all of the games I have ever played it has the best music! It backs away from crude keyboarding and takes a fresher step with grate guitars and soulful orchestras. It would impress anyone who cares to listen! ^_^


__label__2 The best soundtrack ever to anything.: I'm reading a lot of reviews saying that this is the best 'game soundtrack' and I figured that I'd write a review to disagree a bit. This in my opinino is Yasunori Mitsuda's ultimate masterpiece. The music is timeless and I'm been listening to it for years now and its beauty simply refuses to fade.The price tag on this is pretty staggering I must say, but if you are going to buy any cd for this much money, this is the only one that I feel would be worth every penny.


__label__2 Amazing!: This soundtrack 

Here, 'label_2' corresponds to "Positive"

##### Reformatting the data into separate text files for X and y

In [8]:
f = open('data/train_reviews.txt','w')
f2 = open('data/train_labels.txt','w')

for c in content:
    c = c.rstrip()
    words = c.split()
    label = words[0]
    review = ' '.join(words[1:])
    print(review,file=f)
    print(label,file=f2)

f.close()
f2.close()

In [20]:
chunk_1 = "Absolute love this charger's"
chunk_2 = " advertisement!"
chunk_3 = " Apart from it, not too special."
chunk_4 = " In fact, I'd say it's overpriced, and cheaper alternatives do a better."

##### Reformatting into csv files

In [18]:
import gensim
word2vec_path = 'data/GoogleNews-vectors-negative300.bin'
model2 = gensim.models.KeyedVectors.load_word2vec_format(word2vec_path, binary=True)

In [19]:
import csv

In [21]:
f = open('data/train.ft.txt','r')
content = f.readlines()
f.close()

data_rows = []
for c in content:
    c = c.strip()
    words = c.split()
    label = int(words[0].split('__')[-1])-1
    text = ' '.join(words[1:])
    row_data = [text,label]
    data_rows.append(row_data)
with open("data/train_data.csv", "w") as f:
    writer = csv.writer(f)
    writer.writerows(data_rows)

In [24]:
f = open('data/test.ft.txt','r')
content = f.readlines()
f.close()

data_rows = []
for c in content:
    c = c.strip()
    words = c.split()
    label = int(words[0].split('__')[-1])-1
    text = ' '.join(words[1:])
    row_data = [text,label]
    data_rows.append(row_data)
with open("data/test_data.csv", "w") as f:
    writer = csv.writer(f)
    writer.writerows(data_rows)

##### Reformatting into numpy arrays

In [27]:
import numpy as np

f = open('data/train.ft.txt','r')
content = f.readlines()
f.close()

all_reviews = []
Y_Train = np.zeros(len(content))
for i in range(len(content)):
    c = content[i]
    c = c.rstrip()
    words = c.split(' ')
    Y_Train[i] = int(words[0].split('__')[-1])-1
    review = ' '.join(words[1:])
    all_reviews.append(review)

In [43]:
# from sklearn import feature_extraction
# tfidf_transformer = feature_extraction.text.TfidfTransformer

In [44]:
# X_Train = tfidf_transformer.fit_transform(X)

# np.save('data/X_Train.npy', X_Train.toarray())
# np.save('data/Y_Train.npy', Y_Train)

# X_t = vectorize.fit_transform(all_reviews_test)
# tfidf_transformer = TfidfTransformer()
# X_Test = tfidf_transformer.fit_transform(X_t)

# X_Train_arr = np.load('X_Train.npy')
# Y_Train = np.load('Y_Train.npy')

In [45]:
f = open('data/test.ft.txt','r')
content_test = f.readlines()
f.close()

all_reviews_test = []
Y_Test = np.zeros(len(content_test))
for i in range(len(content_test)):
    c = content_test[i]
    c = c.rstrip()
    words = c.split(' ')
    Y_Test[i] = int(words[0].split('__')[-1])-1
    review = ' '.join(words[1:])
    all_reviews_test.append(review)

In [47]:
from sklearn import feature_extraction

In [51]:
from sklearn.feature_extraction.text import CountVectorizer,TfidfTransformer

In [52]:
vectorize=CountVectorizer(max_df=0.95, min_df=0.005)
tfidf_transformer = TfidfTransformer()

In [53]:
X = vectorize.fit_transform(all_reviews)
X_Train = tfidf_transformer.fit_transform(X)

In [54]:
X_t = vectorize.transform(all_reviews_test)
X_Test = tfidf_transformer.transform(X_t)

In [55]:
np.save('data/X_Train.npy', X_Train.toarray())
np.save('data/Y_Train.npy', Y_Train)

np.save('data/X_Test.npy', X_Test.toarray())
np.save('data/Y_Test.npy', Y_Test)

In [57]:
X_Train.shape

(3600000, 1281)

In [58]:
X_Test.shape

(400000, 1281)

In [None]:
import numpy as np
X_Train = np.load('data/X_Train.npy')

In [None]:
Y_Train = np.load('data/Y_Train.npy')

In [None]:
X_Test = np.load('data/X_Test.npy')


In [None]:
Y_Test = np.load('data/Y_Test.npy')

# Section 4: Training models for sentiment analysis

### From Sentences to Numbers: How Decision Boundaries relate to real world problems

<img src="pictures/hyperplane.png"></img>

#### Ever looked at such graphs and wondered how it relates to the sentences, or images?

Problem - Computers only understand numbers. This text written here too is stored in computers in a numeric format. For instance, this sentence is seen by a computer as this numeric code - 

80 114 111 98 108 101 109 32 45 32 67 111 109 112 117 116 101 114 115 32 111 110 108 121 32 117 110 100 101 114 115 116 97 110 100 32 110 117 109 98 101 114 115 46 32 84 104 105 115 32 116 101 120 116 32 119 114 105 116 116 101 110 32 104 101 114 101 32 116 111 111 32 105 115 32 115 116 111 114 101 100 32 105 110 32 99 111 109 112 117 116 101 114 115 32 105 110 32 97 32 110 117 109 101 114 105 99 32 102 111 114 109 97 116 46 


So, anything we deal with must be coverted into numbers! We convert any kind of data - text, images and even audio into numbers. And numbers can be put on a graph, that is how the dots in these figures correspond to numbers.

The most important question that comes up then is - dimensionality.


### Embedding

In machine learning tongue, converting information into such numbers is called "embedding them in space".

<img src="pictures/embedding.png"></img>

Here, we do a 2D embedding. So, every dog is described by 2 numbers. These could be any 2 things, like their height, length, length of tail.

We can also define them using the color of their eyes. But then, we'd need to define their color in numbers. How would we do that? ;)

But the important point is, there's infinitely many ways to embed them. And the "right one" is the one which helps us solve the task at hand. For ex, if we want to divide dogs into tall and short ones, then, we just need 1 Dimension, their heights. If we want to divide them into fat and non fat ones, we would need their weight and height. Presumable, above a certain weight/height ratio we can call them fat.

Similarly, for our sentences, we have a 1281 Dimensional embedding right now. 

What does each axis represent?

The presence or absence of a particular word!

### 4.1 SVM classifier

### 4.1.1 What is an SVM (in short)?

### Learn to solve the problem for the hardest cases, and you will automatically learn the easier ones.

For sentences, if you can make a system that can understand sarcastic reviews and say they are negative, it will be an easy job for the classifier to identify generic negative reviews.

<img src="pictures/hyperplane.png"></img>

In [None]:
from scipy import sparse
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import f1_score
from sklearn.metrics import make_scorer
from sklearn.metrics import classification_report

In [None]:
clf = SVC(max_iter=-1,C=0.01,class_weight='balanced')

In [None]:
vectorize=CountVectorizer(max_df=0.95, min_df=0.005)
tfidf_transformer = TfidfTransformer()

X = vectorize.fit_transform(all_reviews)
X_Train = tfidf_transformer.fit_transform(X)

X_t = vectorize.transform(all_reviews_test)
X_Test = tfidf_transformer.transform(X_t)

clf = SVC(max_iter=-1,C=0.01,class_weight='balanced')

clf.fit(X_Train, Y_Train)

predictions = clf.predict(X_Test)

accuracy = np.sum(predictions == Y_Test)/len(Y_Test)
print("Accuracy of SVM classifier", accuracy)

### 4.2 Naive Bayes Classifer

Let's play a game to understand the intuition behind the naive bayes classifier.

I will reveal this review in chunks, and with each chunk you need to enter your response in this poll.

The remarkable thing about about this charger is that it's company can sell a dummy through good marketing. It's absolutely useless.

In [16]:
# chunk_1

But wait, let's not get ahead of ourselves here. We know people mostly comment negative stuff, positive reviews are hardly given. But then, the word remarkable is something that's not used very often. Maybe we should give it more importance.

In [17]:
# chunk_1 + chunk_2

In [18]:
# chunk_1 + chunk_2 + chunk_3

In [19]:
# chunk_1 + chunk_2 + chunk_3 + chunk_4

- Intuition 1: The individual contributions of chunks to positive/negative probability can be used to get overall probability
- Intuition 2: People usually tend to rate only negative products. Positive ones get less reviews.
- Certain words are used more/less often. So we should should account for this too.

In [27]:
from sklearn.naive_bayes import MultinomialNB
classifnb = MultinomialNB()
classifnb.fit(X_Train, Y_Train)

predictions_nb=classifnb.predict(X_Test)

accuracy_nb = np.sum(predictions_nb == Y_Test)/len(Y_Test)
print(accuracy_nb)

0.842855


Quick note - Naive? Equal importance to all features.

### 4.3 Mean word2vec + Neural Network

Word2Vec: Giving every word an embedding.

In [29]:
import numpy as np
import torch
import pickle

N = 3599973

import gensim
word2vec_path = 'data/GoogleNews-vectors-negative300.bin'
model2 = gensim.models.KeyedVectors.load_word2vec_format(word2vec_path, binary=True)

In [31]:
f = open('data/train.ft.txt','r')
content = f.readlines()
f.close()

all_reviews = []
Y_Train = np.zeros(len(content))
for i in range(len(content)):
    c = content[i]
    c = c.rstrip()
    words = c.split(' ')
    Y_Train[i] = int(words[0].split('__')[-1])-1
    review = ' '.join(words[1:])
    all_reviews.append(review)

In [32]:
counter = 0
avg_wordvecs={}
np_wordvecs=np.zeros((N,300))
np_labels=np.zeros(N)
ids = []
for i in range(len(content)):
    if i % 100000 == 0:
        print(i)
    c = content[i]
    words = c.split(' ')
    label = int(words[0].split('__')[-1])-1
    count = 0
    rep_sum = np.zeros(300)
    for word in words[1:]:
        if word in model2.vocab:
            word_rep = model2[word]
            rep_sum += word_rep
            count+=1
    review_rep = rep_sum/float(count)
    if count > 0:
        np_wordvecs[counter] = review_rep
        np_labels[counter] = label
        ids.append(i)
        counter += 1

X_Train = np_wordvecs
Y_Train = np_labels
np.save('X_Train_w2vec.npy', X_Train)
np.save('Y_Train_w2vec.npy', Y_Train)

0
100000




200000
300000
400000
500000
600000
700000
800000
900000
1000000
1100000
1200000
1300000
1400000
1500000
1600000
1700000
1800000
1900000
2000000
2100000
2200000
2300000
2400000
2500000
2600000
2700000
2800000
2900000
3000000
3100000
3200000
3300000
3400000
3500000


In [33]:
X_Train = np.load('X_Train_w2vec.npy')
Y_Train = np.load('Y_Train_w2vec.npy')

In [34]:
tensor_x=torch.from_numpy(X_Train).float()
tensor_y=torch.from_numpy(Y_Train).long()

In [35]:
# Code in file nn/two_layer_net_optim.py
import torch
from torch.autograd import Variable
import numpy as np
import torch.optim as optim
losses=[]
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = N, 300, 100, 2

# Create random Tensors to hold inputs and outputs, and wrap them in Variables.
x = Variable(tensor_x)
y = Variable(tensor_y,requires_grad=False)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.ReLU(),
          torch.nn.Linear(H, D_out),
        )
loss_fn = torch.nn.CrossEntropyLoss()  # use a Classification Cross-Entropy loss
# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Variables it should update.
learning_rate = 1e-2
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(100):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)
    #print(x)
    #print(y.type)
    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.data[0])
    a=loss[0][0][0][0]
    #print(a)

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable weights
    # of the model)
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its parameters
    optimizer.step()

# import matplotlib.pyplot as plt
# plt.plot(np.asarray(losses))
# plt.show()



0 tensor(0.6815)




1 tensor(0.6866)
2 tensor(0.6882)
3 tensor(0.6801)
4 tensor(0.6697)
5 tensor(0.6569)
6 tensor(0.6435)
7 tensor(0.6271)
8 tensor(0.6106)
9 tensor(0.5926)
10 tensor(0.5739)
11 tensor(0.5562)
12 tensor(0.5388)
13 tensor(0.5233)
14 tensor(0.5095)
15 tensor(0.4969)
16 tensor(0.4866)
17 tensor(0.4778)
18 tensor(0.4699)
19 tensor(0.4628)
20 tensor(0.4569)
21 tensor(0.4523)
22 tensor(0.4488)
23 tensor(0.4464)
24 tensor(0.4441)
25 tensor(0.4407)
26 tensor(0.4374)
27 tensor(0.4360)
28 tensor(0.4346)
29 tensor(0.4314)
30 tensor(0.4285)
31 tensor(0.4272)
32 tensor(0.4251)
33 tensor(0.4223)
34 tensor(0.4207)
35 tensor(0.4196)
36 tensor(0.4175)
37 tensor(0.4157)
38 tensor(0.4148)
39 tensor(0.4135)
40 tensor(0.4118)
41 tensor(0.4106)
42 tensor(0.4099)
43 tensor(0.4088)
44 tensor(0.4074)
45 tensor(0.4065)
46 tensor(0.4059)
47 tensor(0.4053)
48 tensor(0.4044)
49 tensor(0.4036)
50 tensor(0.4029)
51 tensor(0.4025)
52 tensor(0.4021)
53 tensor(0.4017)
54 tensor(0.4012)
55 tensor(0.4006)
56 tensor(0.3999)
5

In [36]:
torch.save(model, 'meanwordvec_model.pt')

In [142]:
model = torch.load('meanwordvec_model.pt')

In [38]:
# len(test_np_labels)

In [39]:
# np.sum(test_np_labels)

In [41]:
f = open('data/test.ft.txt','r')
test_content = f.readlines()
f.close()

In [42]:
N2 = 399998

In [43]:
counter = 0
test_np_wordvecs=np.zeros((N2,300))
test_np_labels=np.zeros(N2)
test_ids = []
for i in range(len(test_content)):
    c = test_content[i]
    words = c.split(' ')
    label = int(words[0].split('__')[-1])-1
    count = 0
    rep_sum = np.zeros(300)
    for word in words[1:]:
        if word in model2.vocab:
            word_rep = model2[word]
            rep_sum += word_rep
            count+=1
    review_rep = rep_sum/float(count)
    if count > 0:
        test_np_wordvecs[counter] = review_rep
        test_np_labels[counter] = label
        test_ids.append(i)
        counter += 1

  app.launch_new_instance()


In [44]:
X_Test = test_np_wordvecs
Y_Test = test_np_labels

In [45]:
len(test_np_labels)

399998

In [None]:
np.sum(Y_Test)/len(Y_Test)

In [46]:
np.save('X_Test_w2vec.npy', X_Test)
np.save('Y_Test_w2vec.npy', Y_Test)

In [47]:
tensor_x_test=torch.from_numpy(X_Test).float()
tensor_y_test=torch.from_numpy(Y_Test).long()

x_test_tensor=Variable(tensor_x_test)
predictions = model(x_test_tensor)
np_preds=predictions.data.numpy()

In [48]:
K=1
k=np.argpartition(np_preds,-K)[-K:]
k=np.argsort(np_preds,axis=1)[:,-K:]

In [49]:
gt = Y_Test.astype(int)

In [50]:
accuracy = np.sum(k[:,0] == gt)/len(gt)

In [52]:
print('Accuracy is',accuracy)

Accuracy is 0.8266991334956675


#### Let's look at some of the errors made by our model.

In [53]:
true_mask = k[:,0] == gt

In [59]:
printed = -1
id_label = {'0':'Negative','1':'Positive'}
for i in range(len(true_mask)):
    if true_mask[i] == False:
        printed += 1
        if printed % 8000 ==0: 
            test_id = test_ids[i]
            print(true_mask[i],test_id,i)
            print('Actual Sentiment: %s'%id_label[str(int(test_np_labels[test_id]))])
            print('Predicted Sentiment: %s'%id_label[str(int(k[:,0][test_id]))])
            print('\n')
            words = test_content[test_id].split(' ')
            print(' '.join(words[1:]))

False 9 9
Actual Sentiment: Negative
Predicted Sentiment: Positive


Not an "ultimate guide": Firstly,I enjoyed the format and tone of the book (how the author addressed the reader). However, I did not feel that she imparted any insider secrets that the book promised to reveal. If you are just starting to research law school, and do not know all the requirements of admission, then this book may be a tremendous help. If you have done your homework and are looking for an edge when it comes to admissions, I recommend some more topic-specific books. For example, books on how to write your personal statment, books geared specifically towards LSAT preparation (Powerscore books were the most helpful for me), and there are some websites with great advice geared towards aiding the individuals whom you are asking to write letters of recommendation. Yet, for those new to the entire affair, this book can definitely clarify the requirements for you.

False 45758 45757
Actual Sentiment: Negative
Pre

#### Conclusions from comparing different models

So, as you can see, the performance of the three models here was not very different. But, how is this possible? Shouldn't  the neural network beat everything else?

Well, it could. Maybe, maybe not. 

The larger point I am trying to make here is to not rush to a complicated neural network, unless needed. The Binary classification task of positive vs negative is in fact an easy task. Also, a simple, specialized model lik the naive bayes seems to do just fine for this task. So, we probably should just stick to a simple model which is fast, easy and interpretable.

# Section 5: Negative review summarization

# Section 6: Overview and Conclusions

# Section 7: Proposed extensions

- Named entity extraction
- Identifying feature requests
- Exploring what our networks have learned