### Instalação das dependências
Como este notebook está dentro do projeto do TCC é necessário instalar somente o hyperas. Caso não tenha as outras dependências basta rodar o arquivo de **requirements.txt** presente na raiz da aplicação.

In [1]:
!pip install hyperas



You should consider upgrading via the 'd:\documentos\harã stuffs\faculdade\tcc\codes\tcc-rede-neural-siamesa\venv\scripts\python.exe -m pip install --upgrade pip' command.




### Importações das dependências

In [2]:
import os
import itertools
import pandas as pd
import numpy as np
import tensorflow as tf
import tensorflow_addons as tfa
from sklearn.model_selection import train_test_split
from tensorflow.python.keras.preprocessing.sequence import pad_sequences
from tensorflow.python.keras.models import Model, Sequential
from tensorflow.python.keras.layers import Input, Embedding, LSTM, Lambda, Conv1D, Dense, Dropout, Activation, Bidirectional
from tensorflow.python.keras import backend as k
from tensorflow.python.keras.layers import Lambda, Reshape, dot
from hyperas import optim
from hyperas.distributions import choice, uniform
from hyperopt import Trials, STATUS_OK, tpe

### Definição dos paths padrões

In [3]:
DATA_FILES_INDEX_VECTORS_PATH = os.path.join(os.path.dirname(os.path.abspath("")), "data", "processed/index_vectors")
DATA_FILES_EMBEDDING_MATRICES_PATH = os.path.join(os.path.dirname(os.path.abspath("")), "data", "processed/embedding_matrices")

### Carregando os vetores de índices
São os vetores que são alimentados na entrada da rede neural siamesa, onde cada valor representa **o índice correspondente na matriz de incoporação**. Ambos (matriz de incoporação e vetor de índice) são armazenados na aplicação após ser aplicado o processo de estruturação de dados. Para cada word embedding utilizado foi criado um arquivo que armazena os vetores de índices para os datasets: cru (raw), sem stopwords (sw) e sem stopwords e com lematização (sw_lemmatization).

In [4]:
def data():
    ### FUNÇÕES INTERNAS ###
    
    def load_index_vector_dataframe(filename):
        dataframe = pd.read_csv(filename)

        for q in ['phrase1', 'phrase2']:
            dataframe[q + '_n'] = dataframe[q].apply(lambda x: [int(i) for i in x.replace('[', '').replace(']', '').split(',')])

        return dataframe

    def split_data_train(train_dataframe):
        x_phrases = train_dataframe[['phrase1_n', 'phrase2_n']]

        train_dataframe.label = pd.Categorical(train_dataframe.label)
        train_dataframe['label'] = train_dataframe.label.cat.codes
        y_labels = train_dataframe['label']

        return x_phrases, y_labels
    
    def split_and_zero_padding(dataframe, max_seq_length):
        # Split to dicts
        side_phrases = {'left': dataframe['phrase1_n'], 'right': dataframe['phrase2_n']}
        dataset = None

        # Zero padding
        for dataset, side in itertools.product([side_phrases], ['left', 'right']):
            dataset[side] = pad_sequences(dataset[side], padding='pre', truncating='post', maxlen=max_seq_length)

        return dataset
    
    ### FUNÇÕES INTERNAS ###
    
    
    ### ARQUIVOS VETOR INDICES ###

    DATA_FILES_INDEX_VECTORS_PATH = os.path.join(os.path.dirname(os.path.abspath("")), "data", "processed/index_vectors")
    
    # Word2vec Google News
    raw_w2v_GN = os.path.join(DATA_FILES_INDEX_VECTORS_PATH, "training-raw-w2v_GN.csv")
    sw_w2v_GN = os.path.join(DATA_FILES_INDEX_VECTORS_PATH, "training-sw-w2v_GN.csv")
    sw_lemma_w2v_GN = os.path.join(DATA_FILES_INDEX_VECTORS_PATH, "training-sw-lemmatization-w2v_GN.csv")
    
    # Word2vec Wikipedia
    
    # Glove Wikipedia + Gigaword
    
    # Glove Common Crawl
    
    ### ARQUIVOS WORD EMBEDDING ###
    
    
    ### TIPOS DE MAX_SEQ_LENGTH ###

    MAX_SEQ_LENGTH_RAW = (17, 10, 14)
    MAX_SEQ_LENGTH_SW = (9, 5, 7)
    MAX_SEQ_LENGTH_SW_LEMMA = (9, 5, 7)

    max_seq_length = MAX_SEQ_LENGTH_RAW[0]

    ### TIPOS DE MAX_SEQ_LENGTH ###
    
    
    # Carregamento do vetor de índices através do pandas    
    train_dataframe = load_index_vector_dataframe(raw_w2v_GN)
    
    # Divisão do dataset de entrada da rede neural em TREINAMENTO e TESTE (PREDIÇÃO)
    x_phrases, y_labels = split_data_train(train_dataframe)
    
    # Divisão 90/10 de treinamento e teste
    x_train, x_test, y_train, y_test = train_test_split(x_phrases, y_labels, test_size=0.1, random_state=0, stratify=train_dataframe['label'])
    
    # Divisão dos dados de TREINAMENTO de entrada entre a parte esquerda e direita das subredes e zero padding à esquerda dos dados de TREINAMENTO (retorno em ndarray)
    x_train = split_and_zero_padding(x_train, max_seq_length)

    # Divisão dos dados de TESTE (predição) de entrada entre a parte esquerda e direita das subredes e zero padding à esquerda dos dados de TESTE (predição) (retorno em ndarray)
    x_test = split_and_zero_padding(x_test, max_seq_length)

    # Conversão para numpy das labels
    y_train = y_train.values
    y_test = y_test.values

    return x_train, y_train, x_test, y_test

### Criação do modelo da rede siamesa
Criação do modelo da rede siamesa utilizando como subredes internas a arquitetura *LSTM (Long Short Term Memory)*.

In [5]:
def create_model(x_train, y_train, x_test, y_test):
    
    ### FUNÇÕES INTERNAS ###

    # Distância de Manhattan
    def define_manhattan_model(shared_model, max_seq_length):
        def calculate_manhattan_distance(left_output, right_output):          
            def __exponent_neg_manhattan_distance(left, right):
                return k.exp(-k.sum(k.abs(left - right), axis=1, keepdims=True))

            manhattan_distance = Lambda(
                function=lambda x: __exponent_neg_manhattan_distance(x[0], x[1]),
                output_shape=lambda x: (x[0][0], 1)
            )([left_output, right_output])

            return manhattan_distance

        # The visible layer
        left_input = Input(shape=(max_seq_length,), dtype='int32')
        right_input = Input(shape=(max_seq_length,), dtype='int32')

        # Pack it all up into a Manhattan Distance model
        malstm_distance = calculate_manhattan_distance(shared_model(left_input), shared_model(right_input))
        manhattan_model = Model(inputs=[left_input, right_input], outputs=[malstm_distance])

        return manhattan_model

    # Similaridade de Cossenos
    def define_cosine_model(shared_model, max_seq_length):
        def calculate_cosine_distance(left_output, right_output):
            cos_distance = dot([left_output, right_output], axes=1, normalize=True)
            cos_distance = Reshape((1,))(cos_distance)
            cos_similarity = Lambda(lambda x: 1 - x)(cos_distance)

            return cos_similarity

        left_input = Input(shape=(max_seq_length,))
        right_input = Input(shape=(max_seq_length,))

        cosine_distance = calculate_cosine_distance(shared_model(left_input), shared_model(right_input))
        cosine_model = Model(inputs=[left_input, right_input], outputs=[cosine_distance])

        return cosine_model

    # Distância Euclidiana
    def define_euclidean_model(shared_model, max_seq_length):
        def calculate_euclidean_distance(vects):
            x, y = vects
            sum_square = k.sum(k.square(x - y), axis=1, keepdims=True)

            return k.sqrt(k.maximum(sum_square, k.epsilon()))

        def dist_output_shape(shapes):
            shape1, shape2 = shapes

            return (shape1[0], 1)

        left_input = Input(shape=(max_seq_length,))
        right_input = Input(shape=(max_seq_length,))

        euclidean_distance = Lambda(
            calculate_euclidean_distance,
            output_shape=dist_output_shape
        )([shared_model(left_input), shared_model(right_input)])
        euclidean_model = Model(inputs=[left_input, right_input], outputs=[euclidean_distance])

        return euclidean_model
    
    def choose_optimizer(option):
        lr_choice = {{choice([0.001, 0.01, 0.1])}}
        clipnorm_choice = {{choice([0, 0.75, 1.5, 2])}}
        
        if option == 'Adadelta':
            return tf.keras.optimizers.Adadelta(learning_rate=lr_choice, rho=0.95, epsilon=1e-07, name='Adadelta', clipnorm=clipnorm_choice)
        elif option == 'Adamax':
            return tf.keras.optimizers.Adamax(learning_rate=lr_choice, beta_1=0.9, beta_2=0.999, epsilon=1e-07, name="Adamax", clipnorm=clipnorm_choice)
        elif option == 'Adagrad':
            return tf.keras.optimizers.Adagrad(learning_rate=lr_choice,  initial_accumulator_value=0.1,epsilon=1e-07, name='Adagrad', clipnorm=clipnorm_choice)
        elif option == 'SGD':
            return tf.keras.optimizers.SGD(learning_rate=lr_choice, momentum=0.0, nesterov=False, name='SGD', clipnorm=clipnorm_choice)
        elif option == 'RMSprop':
            return tf.keras.optimizers.RMSprop(learning_rate=lr_choice, rho=0.9, momentum=0.0, epsilon=1e-07, centered=False, name='RMSprop', clipnorm=clipnorm_choice)
        else:
            return tf.keras.optimizers.Adam(learning_rate=lr_choice, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=False, name='Adam', clipnorm=clipnorm_choice)
    
    ### FUNÇÕES INTERNAS ###


    ### TIPOS DE MAX_SEQ_LENGTH ###

    MAX_SEQ_LENGTH_RAW = [17, 10, 14]
    MAX_SEQ_LENGTH_SW = [9, 5, 7]
    MAX_SEQ_LENGTH_SW_LEMMA = [9, 5, 7]

    max_seq_length = MAX_SEQ_LENGTH_RAW[0]

    ### TIPOS DE MAX_SEQ_LENGTH ###


    ### ARQUIVOS EMBEDDINGS ###
    
    DATA_FILES_EMBEDDING_MATRICES_PATH = os.path.join(os.path.dirname(os.path.abspath("")), "data", "processed/embedding_matrices")
    
    # Word2vec Google News
    raw_w2v_GN = os.path.join(DATA_FILES_EMBEDDING_MATRICES_PATH, "raw-w2v_GN.npy")
    sw_w2v_GN = os.path.join(DATA_FILES_EMBEDDING_MATRICES_PATH, "sw-w2v_GN.npy")
    sw_lemma_w2v_GN = os.path.join(DATA_FILES_EMBEDDING_MATRICES_PATH, "sw-lemmatization-w2v_GN.npy")

    # Word2vec Wikipedia

    # Glove Wikipedia + Gigaword

    # Glove Common Crawl

    ### ARQUIVOS EMBEDDINGS ###
    
    # Definição e escolhas de parâmetros e hiperparâmetros
    embedding_dim = 300
    embedding_matrix = np.load(raw_w2v_GN)
    
    kernel_initializer_choice = {{choice([tf.keras.initializers.VarianceScaling(scale=1.0, mode='fan_in', distribution='truncated_normal',seed=1), tf.keras.initializers.glorot_normal(seed=1)])}}
    kernel_regularizer_l1_choice = {{choice([0.001, 0.01])}}
    kernel_regularizer_l2_choice = {{choice([0.001, 0.01, 0.1])}}
    bias_regularizer_l2_choice = {{choice([0.001, 0.01, 0.1])}}
    activity_regularizer_l2_choice = {{choice([0.001, 0.01, 0.1])}}
    
    optimizer_choice_option = {{choice(['Adadelta', 'Adamax', 'Adagrad', 'SGD', 'RMSProp', 'Adam'])}}
    optimizer_choice = choose_optimizer(optimizer_choice_option)
    loss_choice = {{choice([tf.keras.losses.BinaryCrossentropy(), tf.keras.losses.MeanSquaredError(), tfa.losses.ContrastiveLoss()])}}
    
    # Definição do modelo compartilhado (shared model) entre as subredes, pois são idênticas
    shared_model = Sequential()
    shared_model.add(
        Embedding(
            len(embedding_matrix),
            embedding_dim,
            weights=[embedding_matrix],
            input_shape=(max_seq_length,),
            trainable=False
        )
    )
    
    shared_model.add(Dropout({{uniform(0, 1)}}))
    shared_model.add(Bidirectional(LSTM(
        {{choice([64, 128, 256])}},
        kernel_initializer=kernel_initializer_choice,
        kernel_regularizer=tf.keras.regularizers.l1_l2(l1=kernel_regularizer_l1_choice, l2=kernel_regularizer_l2_choice),\n",
        bias_regularizer=tf.keras.regularizers.l2(bias_regularizer_l2_choice),
        activity_regularizer=tf.keras.regularizers.l2(activity_regularizer_choice),
        activation={{choice(['softsign', 'tanh'])}},
        recurrent_activation={{choice(['sigmoid', 'hard_sigmoid'])}},
        dropout={{uniform(0, 1)}},
        recurrent_dropout={{uniform(0, 1)}}
    )))
    shared_model.add(Activation({{choice(['relu', 'selu', 'elu'])}}))
    shared_model.add(Dense(1, activation={{choice(['sigmoid', 'hard_sigmoid'])}}))

    # Define o lado esquerdo e direito das subredes a partir da shared_model. Também define a medida/função de similaridade usada na camada de merge na saída da rede
    model = define_manhattan_model(shared_model, max_seq_length)

    # Compilação do modelo da rede siamesa
    model.compile(loss=loss_choice, optimizer=optimizer_choice, metrics=['accuracy'])

    # Execução do treinamento da rede
    training_history = model.fit(
        [x_train['left'], x_train['right']],
        y_train,
        batch_size={{choice([16, 32, 64, 128])}},
        epochs={{choice([15, 20, 25])}},
        verbose=2,
        validation_split=0.2
    )
    
    return {'loss': -(np.amax(training_history.history['val_accuracy'])), 'status': STATUS_OK, 'model': model}

### Execução do Hyperas
Execução das funções com os códigos básicos do Hyperas retornando o melhor modelo.

In [7]:
def try_hyperas():
    try:
        best_run, best_model, space = optim.minimize(model=create_model,
                                                     data=data,
                                                     algo=tpe.suggest,
                                                     max_evals=5,
                                                     trials=Trials(),
                                                     eval_space=True,
                                                     notebook_name='hyperas_optimization',
                                                     return_space=True)
        
        return best_run, best_model, space
    
    except Exception as e:
        print(e.message)
        print(e)
        
        return None, None, None 
        
best_run, best_model, space = try_hyperas()

>>> Imports:
#coding=utf-8

try:
    import os
except:
    pass

try:
    import itertools
except:
    pass

try:
    import pandas as pd
except:
    pass

try:
    import numpy as np
except:
    pass

try:
    import tensorflow as tf
except:
    pass

try:
    import tensorflow_addons as tfa
except:
    pass

try:
    from sklearn.model_selection import train_test_split
except:
    pass

try:
    from tensorflow.python.keras.preprocessing.sequence import pad_sequences
except:
    pass

try:
    from tensorflow.python.keras.models import Model, Sequential
except:
    pass

try:
    from tensorflow.python.keras.layers import Input, Embedding, LSTM, Lambda, Conv1D, Dense, Dropout, Activation, Bidirectional
except:
    pass

try:
    from tensorflow.python.keras import backend as k
except:
    pass

try:
    from tensorflow.python.keras.layers import Lambda, Reshape, dot
except:
    pass

try:
    from hyperas import optim
except:
    pass

try:
    from hyperas.distributions import choic

Epoch 1/20                                                                                                                                                                                
810/810 - 231s - loss: 0.4850 - accuracy: 0.5008 - val_loss: 0.4727 - val_accuracy: 0.4968                                                                                                

Epoch 2/20                                                                                                                                                                                
810/810 - 223s - loss: 0.4647 - accuracy: 0.5008 - val_loss: 0.4451 - val_accuracy: 0.4968                                                                                                

Epoch 3/20                                                                                                                                                                                
810/810 - 224s - loss: 0.4379 - accuracy: 0.5008 - val_loss: 0.

1620/1620 - 377s - loss: 0.3510 - accuracy: 0.4829 - val_loss: 0.3562 - val_accuracy: 0.4937                                                                                              

Epoch 3/20                                                                                                                                                                                
1620/1620 - 372s - loss: 0.3588 - accuracy: 0.4914 - val_loss: 0.3493 - val_accuracy: 0.4898                                                                                              

Epoch 4/20                                                                                                                                                                                
1620/1620 - 371s - loss: 0.3668 - accuracy: 0.4993 - val_loss: 0.3452 - val_accuracy: 0.4943                                                                                              

Epoch 5/20                                                    

Epoch 4/20                                                                                                                                                                                
810/810 - 191s - loss: 0.3429 - accuracy: 0.5022 - val_loss: 0.3346 - val_accuracy: 0.5000                                                                                                

Epoch 5/20                                                                                                                                                                                
810/810 - 200s - loss: 0.3398 - accuracy: 0.4980 - val_loss: 0.3296 - val_accuracy: 0.4779                                                                                                

Epoch 6/20                                                                                                                                                                                
810/810 - 199s - loss: 0.3463 - accuracy: 0.5068 - val_loss: 0.

810/810 - 212s - loss: 0.2797 - accuracy: 0.4038 - val_loss: 0.2817 - val_accuracy: 0.4017                                                                                                

Epoch 6/15                                                                                                                                                                                
810/810 - 216s - loss: 0.2788 - accuracy: 0.4058 - val_loss: 0.2773 - val_accuracy: 0.3981                                                                                                

Epoch 7/15                                                                                                                                                                                
810/810 - 211s - loss: 0.2763 - accuracy: 0.4008 - val_loss: 0.2771 - val_accuracy: 0.3904                                                                                                

Epoch 8/15                                                    

Epoch 12/20                                                                                                                                                                               
405/405 - 100s - loss: 0.2708 - accuracy: 0.3923 - val_loss: 0.2753 - val_accuracy: 0.3917                                                                                                

Epoch 13/20                                                                                                                                                                               
405/405 - 98s - loss: 0.2701 - accuracy: 0.3877 - val_loss: 0.2753 - val_accuracy: 0.3923                                                                                                 

Epoch 14/20                                                                                                                                                                               
405/405 - 97s - loss: 0.2687 - accuracy: 0.3848 - val_loss: 0.2

### Melhores resultados
Resultado dos melhores hiperparâmetros encontrados assim como o evaluate dos valores de loss e métricas do modelo treinado.

Como é apresentado no [link](https://stackoverflow.com/questions/44843581/what-is-the-difference-between-model-fit-an-model-evaluate-in-keras) diferença entre:

- *fit()*: is for training the model with the given inputs (and corresponding training labels);

- *evaluate()*: is for evaluating the already trained model using the validation (or test) data and the corresponding labels. Returns the loss value and metrics values for the model;

- *predict()*: is for the actual prediction. It generates output predictions for the input samples.

Logo o hyperas realiza a avaliação do modelo treinado e não a predição de dados dado como entrada e com resultados de saída, o qual no caso do meu trabalho seriam valores entre 0 e 1, onde quanto mais próximo de 1 mais similar são duas instâncias de textos indicando que possivelmente são de mesmo autores. Caso contrário são assimilares e de autores distintos.

In [8]:
if best_run == None:
    print("It was not possible to perform your best model hyperparameters!")
else:
    x_train, y_train, x_test, y_test = data()
    
    print("Evalutation of best performing model:")
    print(best_model.evaluate([x_test['left'], x_test['right']], y_test))
    
    print("Best performing model chosen hyperparameters:")
    print(best_run)
    
    # realizar aqui o código que guarda em um arquivo .CSV os melhores hyperparâmetros

Evalutation of best performing model:
[0.35383737087249756, 0.503333330154419]
Best performing model chosen hyperparameters:
{'Activation': 'elu', 'Dropout': 0.8897104326405169, 'Dropout_1': 0.5461499863940397, 'Dropout_2': 0.05753063853734175, 'LSTM': 128, 'activation': 'tanh', 'batch_size': 16, 'clipnorm_choice': 2, 'epochs': 20, 'kernel_initializer_choice': <tensorflow.python.keras.initializers.initializers_v2.VarianceScaling object at 0x00000191871D7940>, 'kernel_regularizer_l1_choice': 0.01, 'loss_choice': <tensorflow_addons.losses.contrastive.ContrastiveLoss object at 0x0000019187276FD0>, 'lr_choice': 0.1, 'lr_choice_1': 0.01, 'lr_choice_2': 0.1, 'lr_choice_3': 0.001, 'optimizer_choice_option': 'Adamax', 'recurrent_activation': 'sigmoid', 'recurrent_activation_1': 'sigmoid'}
