Text Embedding - IMDB dataset
=============================
---
Introduction au Deep Learning  (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020  

## Reviews analysis :

The objective is to guess whether our new and personals films reviews are **positive or negative** .  
For this, we will use our previously saved model.

What we're going to do:

 - Preparing the data
 - Retrieve our saved model
 - Evaluate the result


## Step 1 - Init python stuff

In [None]:
import numpy as np

import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.datasets.imdb as imdb

import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
import pandas as pd

import os,sys,h5py,json,re

from importlib import reload

sys.path.append('..')
import fidle.pwk as ooo

ooo.init()

## Step 2 : Preparing the data
### 2.1 - Our reviews :

In [None]:
reviews = [ "This film is particularly nice, a must see.",
             "Some films are classics and cannot be ignored.",
             "This movie is just abominable and doesn't deserve to be seen!"]

### 2.2 - Retrieve dictionaries

In [None]:
with open('./data/word_index.json', 'r') as fp:
    word_index = json.load(fp)
    index_word = {index:word for word,index in word_index.items()} 

### 2.3 - Clean, index and padd

In [None]:
max_len    = 256
vocab_size = 10000


nb_reviews = len(reviews)
x_data     = []

# ---- For all reviews
for review in reviews:
    # ---- First index must be <start>
    index_review=[1]
    # ---- For all words
    for w in review.split(' '):
        # ---- Clean it
        w_clean = re.sub(r"[^a-zA-Z0-9]", "", w)
        # ---- Not empty ?
        if len(w_clean)>0:
            # ---- Get the index
            w_index = word_index.get(w,2)
            if w_index>vocab_size : w_index=2
            # ---- Add the index if < vocab_size
            index_review.append(w_index)
    # ---- Add the indexed review
    x_data.append(index_review)    

# ---- Padding
x_data = keras.preprocessing.sequence.pad_sequences(x_data, value   = 0, padding = 'post', maxlen  = max_len)

### 2.4 - Have a look

In [None]:
def translate(x):
    return ' '.join( [index_word.get(i,'?') for i in x] )

for i in range(nb_reviews):
    imax=np.where(x_data[i]==0)[0][0]+5
    print(f'\nText review      :',    reviews[i])
    print(  f'x_train[{i:}]       :', list(x_data[i][:imax]), '(...)')
    print(  'Translation      :', translate(x_data[i][:imax]), '(...)')

## Step 2 - Bring back the model

In [None]:
model = keras.models.load_model('./run/models/best_model.h5')

## Step 4 - Predict

In [None]:
y_pred   = model.predict(x_data)

#### And the winner is :

In [None]:
for i in range(nb_reviews):
    print(f'\n{reviews[i]:<70} =>',('NEGATIVE' if y_pred[i][0]<0.5 else 'POSITIVE'),f'({y_pred[i][0]:.2f})')

In [None]:
a=[1]+[i for i in range(3)]
a