Nous allons utiliser "IMDB movie review sentiment classification dataset" 

Description du dataset: https://keras.io/api/datasets/imdb/

Il s'agit d'un dataset de 25 000 reviews de films d'IMDB, étiquetées par sentiment (positif / négatif). Les reviews ont été prétraitées et chaque review est codée sous la forme d'une liste d'index de mots (entiers). Pour plus de commodité, les mots sont indexés par leur fréquence globale dans le dataset, de sorte que, par exemple, l'entier "3" code le 3ème mot le plus fréquent dans les données. 

In [1]:
import numpy
import keras
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM, Dropout
from tensorflow.python.keras.layers.embeddings import Embedding
from tensorflow.python.keras.layers.convolutional import Conv1D
from tensorflow.python.keras.layers.convolutional import MaxPooling1D
from tensorflow.keras.preprocessing import sequence
from tensorflow.python.keras.preprocessing.text import one_hot
from tensorflow.python.keras.preprocessing.sequence import pad_sequences
from tensorflow.python.keras.layers import Flatten
# fix random seed for reproducibility;pl
numpy.random.seed(7)

In [31]:
db=imdb.load_data()

In [32]:
top_words = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)

In [33]:
len(X_train)


25000

In [34]:
y_train

array([1, 0, 0, ..., 0, 1, 0], dtype=int64)

In [35]:
max_review_length = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)

In [36]:
X_train.shape


(25000, 500)

nous allons utiliser la couche embedding qui définit la première couche cachée du réseau. elle doit spécifier 3 argument:

input_dim: la taille du vocabulaire dans le texte 

output_dim: c'est la taille de l'espace de vecteur dans lequel chaque mot sera plongé 

input_legth: c'est la taille de la séquence, par exemple si vos documlents contiennent 100 mots chaqu'un alors c'est 100 

In [45]:
# creation du modèle 
embedding_vecor_length = 32

model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.build(X_train.shape) 
print(model.summary())
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=6, batch_size=64)

Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 module_wrapper_30 (ModuleWr  (25000, 500, 32)         160000    
 apper)                                                          
                                                                 
 module_wrapper_31 (ModuleWr  (25000, 500, 32)         3104      
 apper)                                                          
                                                                 
 module_wrapper_32 (ModuleWr  (25000, 250, 32)         0         
 apper)                                                          
                                                                 
 lstm_10 (LSTM)              (25000, 100)              53200     
                                                                 
 dense_10 (Dense)            (25000, 1)                101       
                                                     

<keras.callbacks.History at 0x1ad8892dd90>

In [46]:

# evaluation
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 87.40%


## un exemple simple de la couche embedding 

In [82]:
docs = ['Well done!',
		'Good work',
		'Great effort',
		'nice work',
		'Excellent!',
		'Weak',
		'Poor effort!',
		'not good',
		'poor work',
		'Could have done better.']

In [83]:
labels = [1,1,1,1,1,0,0,0,0,0]

In [84]:
vocab_size = 50

In [85]:
encoded_docs = [one_hot(d, vocab_size) for d in docs]

In [86]:
print(encoded_docs)

[[48, 38], [37, 37], [13, 26], [41, 37], [4], [43], [35, 26], [32, 37], [35, 37], [9, 34, 38, 8]]


In [87]:
max_length = 4
padded_docs = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
print(padded_docs)

[[48 38  0  0]
 [37 37  0  0]
 [13 26  0  0]
 [41 37  0  0]
 [ 4  0  0  0]
 [43  0  0  0]
 [35 26  0  0]
 [32 37  0  0]
 [35 37  0  0]
 [ 9 34 38  8]]


Nous sommes maintenant prêts à définir notre couche Embedding dans le cadre de notre modèle.

L'incorporation a un vocabulaire de 50 et une longueur d'entrée de 4. Nous choisirons un petit espace de plongement de 8 dimensions.

Le modèle est un modèle de classification binaire simple. Il est important de noter que la sortie de la couche Embedding sera de 4 vecteurs de 8 dimensions chacun, un pour chaque mot. Nous l'aplatissons (la couche flatten) en un vecteur de 32 éléments pour le transmettre à la couche de sortie Dense. 

In [88]:
model = Sequential()
model.add(Embedding(vocab_size, 8, input_length=max_length))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.build(padded_docs.shape) 
# summarize the model
print(model.summary())


Model: "sequential_17"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 module_wrapper_45 (ModuleWr  (10, 4, 8)               400       
 apper)                                                          
                                                                 
 module_wrapper_46 (ModuleWr  (10, 32)                 0         
 apper)                                                          
                                                                 
 dense_17 (Dense)            (10, 1)                   33        
                                                                 
Total params: 433
Trainable params: 433
Non-trainable params: 0
_________________________________________________________________
None


In [89]:
import numpy as np
labels=np.array(labels)
model.fit(padded_docs, labels,epochs=50, verbose=0)

<keras.callbacks.History at 0x1ad81d85160>

In [90]:
loss, accuracy = model.evaluate(padded_docs, labels, verbose=0)
print('Accuracy: %f' % (accuracy*100))

Accuracy: 100.000000


### TAF : word2vec sur un dataset de choix et appliquer le IMDB-LSTM with CNN

In [1]:
import numpy
import keras
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM, Dropout
from tensorflow.python.keras.layers.embeddings import Embedding
from tensorflow.python.keras.layers.convolutional import Conv1D
from tensorflow.python.keras.layers.convolutional import MaxPooling1D
from tensorflow.keras.preprocessing import sequence
from tensorflow.python.keras.preprocessing.text import one_hot
from tensorflow.python.keras.preprocessing.sequence import pad_sequences
from tensorflow.python.keras.layers import Flatten

import pandas as pd



In [2]:
tweets_train=pd.read_csv("train_data.csv")
tweets_test=pd.read_csv("test_data.csv")

#tweets

In [3]:
tweets_train=tweets_train[750000:800000] # j'ai pris cet intervalle pour avoir des sentiment 0 et 1 
tweets_train.shape

(50000, 2)

In [4]:

tweets_test.shape

(359, 2)

In [5]:
tweets_train.head()

Unnamed: 0,sentence,sentiment
750000,i just paid for car insurance,0
750001,i am not no but i have been a brazi soccer fan...,0
750002,is laid in bed and can t move coz of my sunburn,0
750003,stop doing what,0
750004,maths and french homework,0


### Preprocessing et vectorisation du train_data

In [6]:
tweets_train = tweets_train[tweets_train['sentence'] != ''].reset_index(drop=True)
tweets_train

Unnamed: 0,sentence,sentiment
0,i just paid for car insurance,0
1,i am not no but i have been a brazi soccer fan...,0
2,is laid in bed and can t move coz of my sunburn,0
3,stop doing what,0
4,maths and french homework,0
...,...,...
49995,are you ever too old for jelly or banana custa...,1
49996,ooooo i would love to come over and spread som...,1
49997,girl i saw pete wentz on made on mtv today pur...,1
49998,quot i am happiness times infinity million quo...,1


In [7]:
tweets_list =[]
#tweets_train['sentence'] = tweets_traint['sentence'].astype('str')
for tweet in tweets_train["sentence"]:
    tweets_list.append(tweet)
    
tweets_list

['i just paid for car insurance',
 'i am not no but i have been a brazi soccer fangirl forever sa soccer fangirl not so much',
 'is laid in bed and can t move coz of my sunburn',
 'stop doing what',
 'maths and french homework',
 'i was hoping for more from just poor lonely did get stuck at the border no hypeman lame',
 's too early i have that zombie look about me today like my brains still asleep and only basic motor skills are functioning at the moment',
 'ooo those guys sound good currently working through your mix tape list i have to admit to only having a couple of the songs',
 'blur are by far the best band ive seen live i want to go again',
 'that totally sucks you have the worst luck with toys',
 'make sure u come early coz yesterday all session times were sold out',
 'i miss snow i feel like i m in a box',
 'woke up after cycling at am today s my relaxing day i think that tomorrow im gonna leaving for my grandma s house i miss her',
 'days of boredom coming up nicks going on 

In [8]:
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import re
processed_tweets_list=[]
for tweet in tweets_list:
    processed_tweet = tweet.lower()
    processed_tweet = re.sub('[^a-zA-Z]', ' ', processed_tweet )
    processed_tweet = re.sub(r'\s+', ' ', processed_tweet)
    processed_tweet = re.sub(r'http[s]?://\S+', '',processed_tweet)
    processed_tweet = re.sub(r' www\S+', '', processed_tweet)
    processed_tweet = re.sub(r'@\S+', '', processed_tweet)
    processed_tweet = re.sub(r'[^\w\s]|[\d]', ' ', processed_tweet)
    processed_tweet = re.sub(r'\s\s+', ' ',processed_tweet)
    #processed_tweet =  processed_tweet.strip().lower().encode('ascii', 'ignore').decode()
    processed_tweets_list.append(processed_tweet)

all_tweets_words=[]
lemmmatizer=WordNetLemmatizer()
for tweet in processed_tweets_list:
    data=tweet
    words = word_tokenize(data)
    words = [lemmmatizer.lemmatize(word.lower()) for word in words if(not word in set(stopwords.words('english')) and  word.isalpha())]
    all_tweets_words.append(words)
all_tweets_words    

[['paid', 'car', 'insurance'],
 ['brazi', 'soccer', 'fangirl', 'forever', 'sa', 'soccer', 'fangirl', 'much'],
 ['laid', 'bed', 'move', 'coz', 'sunburn'],
 ['stop'],
 ['math', 'french', 'homework'],
 ['hoping', 'poor', 'lonely', 'get', 'stuck', 'border', 'hypeman', 'lame'],
 ['early',
  'zombie',
  'look',
  'today',
  'like',
  'brain',
  'still',
  'asleep',
  'basic',
  'motor',
  'skill',
  'functioning',
  'moment'],
 ['ooo',
  'guy',
  'sound',
  'good',
  'currently',
  'working',
  'mix',
  'tape',
  'list',
  'admit',
  'couple',
  'song'],
 ['blur', 'far', 'best', 'band', 'ive', 'seen', 'live', 'want', 'go'],
 ['totally', 'suck', 'worst', 'luck', 'toy'],
 ['make',
  'sure',
  'u',
  'come',
  'early',
  'coz',
  'yesterday',
  'session',
  'time',
  'sold'],
 ['miss', 'snow', 'feel', 'like', 'box'],
 ['woke',
  'cycling',
  'today',
  'relaxing',
  'day',
  'think',
  'tomorrow',
  'im',
  'gon',
  'na',
  'leaving',
  'grandma',
  'house',
  'miss'],
 ['day', 'boredom', 'comi

In [9]:
import gensim.downloader
from gensim.models import Word2Vec
from gensim.test.utils import datapath
from gensim.models.word2vec import PathLineSentences

In [22]:
def propreDocWV(corpus_lemetized):
    model = Word2Vec(sentences=corpus_lemetized, vector_size=10, window=3,negative =10, min_count=0, workers=2,seed=14)
    M=[]
    for doc in corpus_lemetized:
        v=np.zeros(10)
        for word in doc:
            v+=model.wv[word]
        v=v/len(doc)
        M.append(v)
    return M


In [23]:
import numpy as np
vectors_x_train=propreDocWV(all_tweets_words )
vectors_x_train

  v=v/len(doc)


[array([ 0.6136945 , -1.56757909, -0.71847992, -0.30494433, -2.21955061,
         1.16360666,  0.70754535,  0.63668569,  0.13929293,  0.91335736]),
 array([ 0.09363706, -1.14923077, -0.33710334, -0.22519355, -1.95884847,
         0.95419337,  0.5928699 ,  0.39407328,  0.28001025,  0.57742669]),
 array([ 0.5706891 , -1.16636932, -0.45110393, -0.16407705, -2.44429569,
         0.9768643 ,  1.38849605,  0.3960963 , -0.00470814,  1.15066411]),
 array([ 0.74913853, -2.23480797, -0.95124143, -0.07684651, -2.99777532,
         1.82201731,  1.4648149 ,  1.08056045,  0.50776786,  1.07535136]),
 array([ 0.46159844, -1.61173252, -0.5295833 , -0.49651379, -2.77554925,
         1.44281252,  0.89448732,  0.48155541,  0.37834621,  1.18847255]),
 array([ 0.51709726, -1.44302201, -0.54963631, -0.21295907, -2.38289683,
         1.10277738,  1.25630184,  0.70788367,  0.27363492,  0.79550207]),
 array([ 0.40217836, -1.51585056, -1.14099944, -0.24909839, -2.64120317,
         1.046898  ,  1.29566493,  0.54

In [76]:
vectors_y_train =[]
#tweets_train['sentiment'] = tweets_train['sentiment'].astype('int')
for tweet in tweets_train['sentiment']:
    vectors_y_train.append(tweet)
    
vectors_y_train

[0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,


In [77]:
vectors_y_train.count(1)

32941

### Preprocessing et vectorisation du test_data

In [78]:
#Removing neutrals from the test_csv

tweets_test = tweets_test[tweets_test['sentiment'] > -1].reset_index(drop=True)


In [35]:
tweets_list_test =[]
tweets_test['sentence'] = tweets_test['sentence'].astype('str')
for tweet in tweets_test['sentence']:
    tweets_list_test.append(tweet)
    
tweets_list_test

['i loooooooovvvvvveee my kindle not that the dx is cool but the is fantastic in its own right',
 'reading my kindle love it lee childs is good read',
 'ok first assesment of the kindle it fucking rocks',
 'you ll love your kindle i ve had mine for a few months and never looked back the new big one is huge no need for remorse',
 'fair enough but i have the kindle and i think it s perfect',
 'no it is too big i m quite happy with the kindle',
 'fuck this economy i hate aig and their non loan given asses',
 'jquery is my new best friend',
 'loves twitter',
 'how can you not love obama he makes jokes about himself',
 'i firmly believe that obama pelosi have zero desire to be civil it s a charade and a slogan but they want to destroy conservatism',
 'house correspondents dinner was last night whoopi barbara amp sherri went obama got a standing ovation',
 'watchin espn jus seen this new nike commerical with a puppet lebron sh t was hilarious lmao',
 'dear nike stop with the flywire that shi

In [36]:
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

processed_tweets_list_test=[]
for tweet in tweets_list_test:
    processed_tweet_test = tweet.lower()
    processed_tweet_test = re.sub('[^a-zA-Z]', ' ', processed_tweet_test )
    processed_tweet_test = re.sub(r'\s+', ' ', processed_tweet_test)
    processed_tweet_test = re.sub(r'http[s]?://\S+', '',processed_tweet_test)
    processed_tweet_test = re.sub(r' www\S+', '', processed_tweet_test)
    processed_tweet_test = re.sub(r'@\S+', '', processed_tweet_test)
    processed_tweet_test = re.sub(r'[^\w\s]|[\d]', ' ', processed_tweet_test)
    processed_tweet_test = re.sub(r'\s\s+', ' ',processed_tweet_test)
    processed_tweet_test =  processed_tweet_test.strip().lower().encode('ascii', 'ignore').decode()
    processed_tweets_list_test.append(processed_tweet_test)

all_tweets_words_test=[]
lemmmatizer=WordNetLemmatizer()
for tweet in processed_tweets_list_test:
    data=tweet
    words = word_tokenize(data)
    words = [lemmmatizer.lemmatize(word.lower()) for word in words if(not word in set(stopwords.words('english')) and  word.isalpha())]
    all_tweets_words_test.append(words)
all_tweets_words_test   

[['loooooooovvvvvveee', 'kindle', 'dx', 'cool', 'fantastic', 'right'],
 ['reading', 'kindle', 'love', 'lee', 'child', 'good', 'read'],
 ['ok', 'first', 'assesment', 'kindle', 'fucking', 'rock'],
 ['love',
  'kindle',
  'mine',
  'month',
  'never',
  'looked',
  'back',
  'new',
  'big',
  'one',
  'huge',
  'need',
  'remorse'],
 ['fair', 'enough', 'kindle', 'think', 'perfect'],
 ['big', 'quite', 'happy', 'kindle'],
 ['fuck', 'economy', 'hate', 'aig', 'non', 'loan', 'given', 'ass'],
 ['jquery', 'new', 'best', 'friend'],
 ['love', 'twitter'],
 ['love', 'obama', 'make', 'joke'],
 ['firmly',
  'believe',
  'obama',
  'pelosi',
  'zero',
  'desire',
  'civil',
  'charade',
  'slogan',
  'want',
  'destroy',
  'conservatism'],
 ['house',
  'correspondent',
  'dinner',
  'last',
  'night',
  'whoopi',
  'barbara',
  'amp',
  'sherri',
  'went',
  'obama',
  'got',
  'standing',
  'ovation'],
 ['watchin',
  'espn',
  'jus',
  'seen',
  'new',
  'nike',
  'commerical',
  'puppet',
  'lebron',

In [37]:
vectors_x_test=propreDocWV(all_tweets_words_test)
vectors_x_test

[array([-0.00163558,  0.04066331,  0.00211927, -0.02721975,  0.04844733,
        -0.03346215, -0.00543478,  0.04260352, -0.03257908, -0.05257192]),
 array([ 0.01966142,  0.00941234,  0.02874738, -0.03091909,  0.00656022,
        -0.02962932,  0.04188734,  0.01768171,  0.0065023 , -0.03430595]),
 array([ 0.01572508,  0.01628324,  0.03782013,  0.02133737,  0.00191237,
        -0.00623306,  0.05328319, -0.02073501, -0.02179542, -0.01317546]),
 array([ 0.01828805, -0.01968385,  0.02291916, -0.0276415 ,  0.0127972 ,
        -0.0016218 ,  0.00551999,  0.0235882 , -0.01368805,  0.00452693]),
 array([-0.02236329,  0.00885324,  0.00937928,  0.00207432,  0.03335678,
         0.00799575,  0.00612557,  0.06287029,  0.00719243, -0.01795185]),
 array([-0.02453067, -0.00840564,  0.04281795,  0.00537786, -0.0200921 ,
        -0.00913443,  0.01555002,  0.05933105, -0.00374633, -0.04692368]),
 array([ 0.04411334,  0.0418305 ,  0.00043773, -0.03385943,  0.03843012,
         0.02678623,  0.00758228,  0.01

In [79]:
vectors_y_test =[]
#tweets_test['sentiment'] = tweets_test['sentiment'].astype('int')
for tweet in tweets_test['sentiment']:
    vectors_y_test.append(tweet)
    
len(vectors_y_test)

359

### Implementation du modèle

In [191]:
len(vectors_x_train)

50000

In [192]:
vectors_y_train=np.array(vectors_y_train)
vectors_y_train

array([0., 0., 0., ..., 1., 1., 1.], dtype=float32)

In [193]:
vectors_y_test=np.array(vectors_y_test)
vectors_y_test

array([1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1., 0., 1., 0., 1.,
       0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0.,
       0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 1., 0., 1., 1., 1., 0.,
       1., 1., 0., 0., 1., 1., 1., 0., 1., 1., 1., 0., 0., 1., 0., 1., 1.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0.,
       1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 1.,
       0., 0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0.,
       0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1.,
       1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1.,
       1., 1., 0., 1., 1., 0., 1., 1., 1., 1., 1., 1., 0., 0., 1., 1., 1.,
       1., 1., 1., 0., 0., 0., 1., 0., 1., 1., 1., 1., 1., 0., 0., 1., 1.,
       0., 0., 0., 0., 0.

In [194]:
max_review_length = 40
X_train = sequence.pad_sequences(vectors_x_train, maxlen=max_review_length,padding='post')
X_test = sequence.pad_sequences(vectors_x_test, maxlen=max_review_length,padding='post')

In [195]:
X_train[60,:]

array([ 1,  0,  0,  0,  1,  0,  2,  1, -1,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0])

In [196]:
X_train = X_train.astype("float32") 
X_test = X_test.astype("float32") 

vectors_y_train=vectors_y_train.astype("float32") 
vectors_y_test=vectors_y_test.astype("float32") 

In [210]:
# creation du modèle 
embedding_vecor_length = 32
top_words=100
model = Sequential()

model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.build(X_train.shape) 
print(model.summary())
#model.fit(X_train, vectors_y_train, validation_data=(X_test, vectors_y_test), epochs=4, batch_size=64)

Model: "sequential_49"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 module_wrapper_110 (ModuleW  (50000, 40, 32)          3200      
 rapper)                                                         
                                                                 
 module_wrapper_111 (ModuleW  (50000, 40, 32)          3104      
 rapper)                                                         
                                                                 
 module_wrapper_112 (ModuleW  (50000, 20, 32)          0         
 rapper)                                                         
                                                                 
 lstm_38 (LSTM)              (50000, 100)              53200     
                                                                 
 dense_40 (Dense)            (50000, 1)                101       
                                                     

In [211]:
model.fit(X_train, vectors_y_train,epochs=25, batch_size=64)

Epoch 1/25


InvalidArgumentError:  indices[56,8] = -1 is not in [0, 100)
	 [[node sequential_49/module_wrapper_110/embedding_34/embedding_lookup
 (defined at C:\Users\Hp\anaconda3\lib\site-packages\tensorflow\python\keras\layers\embeddings.py:191)
]] [Op:__inference_train_function_86765]

Errors may have originated from an input operation.
Input Source operations connected to node sequential_49/module_wrapper_110/embedding_34/embedding_lookup:
In[0] sequential_49/module_wrapper_110/embedding_34/embedding_lookup/85493:	
In[1] sequential_49/module_wrapper_110/embedding_34/Cast (defined at C:\Users\Hp\anaconda3\lib\site-packages\tensorflow\python\keras\layers\embeddings.py:190)

Operation defined at: (most recent call last)
>>>   File "C:\Users\Hp\anaconda3\lib\runpy.py", line 197, in _run_module_as_main
>>>     return _run_code(code, main_globals, None,
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\runpy.py", line 87, in _run_code
>>>     exec(code, run_globals)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
>>>     app.launch_new_instance()
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\traitlets\config\application.py", line 846, in launch_instance
>>>     app.start()
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 677, in start
>>>     self.io_loop.start()
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\tornado\platform\asyncio.py", line 199, in start
>>>     self.asyncio_loop.run_forever()
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\asyncio\base_events.py", line 596, in run_forever
>>>     self._run_once()
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\asyncio\base_events.py", line 1890, in _run_once
>>>     handle._run()
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\asyncio\events.py", line 80, in _run
>>>     self._context.run(self._callback, *self._args)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 457, in dispatch_queue
>>>     await self.process_one()
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 446, in process_one
>>>     await dispatch(*args)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 353, in dispatch_shell
>>>     await result
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 648, in execute_request
>>>     reply_content = await reply_content
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 353, in do_execute
>>>     res = shell.run_cell(code, store_history=store_history, silent=silent)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell
>>>     return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2901, in run_cell
>>>     result = self._run_cell(
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2947, in _run_cell
>>>     return runner(coro)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\IPython\core\async_helpers.py", line 68, in _pseudo_sync_runner
>>>     coro.send(None)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3172, in run_cell_async
>>>     has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3364, in run_ast_nodes
>>>     if (await self.run_code(code, result,  async_=asy)):
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3444, in run_code
>>>     exec(code_obj, self.user_global_ns, self.user_ns)
>>> 
>>>   File "C:\Users\Hp\AppData\Local\Temp/ipykernel_9644/1209497260.py", line 1, in <module>
>>>     model.fit(X_train, vectors_y_train,epochs=25, batch_size=64)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\engine\training.py", line 1216, in fit
>>>     tmp_logs = self.train_function(iterator)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\engine\training.py", line 878, in train_function
>>>     return step_function(self, iterator)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\engine\training.py", line 867, in step_function
>>>     outputs = model.distribute_strategy.run(run_step, args=(data,))
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\engine\training.py", line 860, in run_step
>>>     outputs = model.train_step(data)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\engine\training.py", line 808, in train_step
>>>     y_pred = self(x, training=True)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\engine\sequential.py", line 373, in call
>>>     return super(Sequential, self).call(inputs, training=training, mask=mask)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\engine\functional.py", line 451, in call
>>>     return self._run_internal_graph(
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\engine\functional.py", line 589, in _run_internal_graph
>>>     outputs = node.layer(*args, **kwargs)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\keras\engine\functional.py", line 1502, in call
>>>     return getattr(self._module, self._method_name)(*args, **kwargs)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1044, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "C:\Users\Hp\anaconda3\lib\site-packages\tensorflow\python\keras\layers\embeddings.py", line 191, in call
>>>     out = embedding_ops.embedding_lookup_v2(self.embeddings, inputs)
>>> 

In [212]:
loss, accuracy = model.evaluate(X_test,vectors_y_test, verbose=0)
print('Accuracy: %f' % (accuracy*100))

Accuracy: 49.303621
