<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Decoding-data-to-plain-text" data-toc-modified-id="Decoding-data-to-plain-text-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Decoding data to plain text</a></span></li><li><span><a href="#Steps-requested" data-toc-modified-id="Steps-requested-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Steps requested</a></span></li></ul></div>

# Decoding data to plain text

In [1]:
from __future__ import absolute_import, division, print_function
import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

In [2]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(
    path='imdb.npz',
    num_words=None,
    skip_top=0,
    maxlen=None,
    seed=113,
    start_char=1,
    oov_char=2,
    index_from=3
)

word_index = tf.keras.datasets.imdb.get_word_index()

In [3]:
type(x_train)

numpy.ndarray

In [4]:
x_train.shape

(25000,)

In [5]:
word_index = {k:(v+3) for k,v in word_index.items()} 
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2  # unknown
word_index["<UNUSED>"] = 3

index_word = dict([(value, key) for (key, value) in word_index.items()])

In [6]:
def decode_review(encoded_array):
    return ' '.join([index_word.get(i, '?') for i in encoded_array])

This is the way I would have myself done the extraction, in the next section I am going to do it as it is requested.

In [7]:
negative_reviews = [decode_review(x)for y,x in zip(y_train,x_train) if y == 0]

In [8]:
positive_reviews = [decode_review(x)for y,x in zip(y_train,x_train) if y == 1]

This is an example of a **negative review** present in the corpus

In [9]:
negative_reviews[-1]

"<START> as a big fan of the original film it's hard to watch this show the garish set decor and harshly lighted sets rob any style from this remake the mood is never there instead it has the look and feel of so many television movies of the seventies crenna is not a bad choice as walter neff but his snappy wardrobe and swank apartment don't fit the mood of the original or make him an interesting character he does his best to make it work but samantha egger is a really bad choice the english accent and california looks can't hold a candle to barbara stanwick's velvet voice and sex appeal lee j cobb tries mightily to fashion barton keyes but even his performance is just gruff without style br br it feels like the tv movie it was and again reminds me of what a remarkable film the original still is"

This is an example of a **positive review** present in the corpus

In [10]:
positive_reviews[-1]

"<START> six degrees had me hooked i looked forward to it coming on and was totally disappointed when men in trees replaced it's time spot i thought it was just on hiatus and would be back early in 2007 what happened all my friends were really surprised it ended we could relate to the characters who had real problems we talked about each episode and had our favorite characters there wasn't anybody on the show i didn't like and felt the acting was superb i alway like seeing programs being taped in cities where you can identify the local areas i for one would like to protest the canceling of this show and ask you to bring it back and give it another chance give it a good time slot don't keep moving it from this day to that day and advertise it so people will know it is on"

The assignment talks about positions, and being called `*_index`  I assume it is the index of the list, that can be done in the following way

In [11]:
positive_index = [i for i,s in enumerate(y_train) if s == 1]
negative_index = [i for i,s in enumerate(y_train) if s == 0]
positive_index[4]

10

In [12]:
negative_index[4]

7

# Steps requested


Even though I have fulfilled the requirements (in a way that I find more language-oriented and cleaner), I am extracting the same information as requested

In [13]:
negative_index = np.where(y_train == 0)[0]
positive_index = np.where(y_train == 1)[0]
positive_index[-1]

24998

In [14]:
negative_index[-1]

24999

In [15]:
positive_example = decode_review(x_train[positive_index[4]])
negative_example = decode_review(x_train[negative_index[4]])

one example of a **positive review**

In [16]:
positive_example

"<START> french horror cinema has seen something of a revival over the last couple of years with great films such as inside and switchblade romance bursting on to the scene maléfique preceded the revival just slightly but stands head and shoulders over most modern horror titles and is surely one of the best french horror films ever made maléfique was obviously shot on a low budget but this is made up for in far more ways than one by the originality of the film and this in turn is complimented by the excellent writing and acting that ensure the film is a winner the plot focuses on two main ideas prison and black magic the central character is a man named carrère sent to prison for fraud he is put in a cell with three others the quietly insane lassalle body building transvestite marcus and his retarded boyfriend daisy after a short while in the cell together they stumble upon a hiding place in the wall that contains an old journal after translating part of it they soon realise its magica

one example of a **negative reviews**

In [17]:
negative_example

"<START> the hamiltons tells the story of the four hamilton siblings teenager francis cory knauf twins wendell joseph mckelheer darlene mackenzie firgens the eldest david samuel who is now the surrogate parent in charge the hamilton's move house a lot franics is unsure why is unhappy with the way things are the fact that his brother's sister kidnap imprison murder people in the basement doesn't help relax or calm francis' nerves either francis know's something just isn't right when he eventually finds out the truth things will never be the same again br br co written co produced directed by mitchell altieri phil flores as the butcher brothers who's only other film director's credit so far is the april fool's day 2008 remake enough said this was one of the 'films to die for' at the 2006 after dark horrorfest or whatever it's called in keeping with pretty much all the other's i've seen i thought the hamiltons was complete total utter crap i found the character's really poor very unlikabl