<a href="https://colab.research.google.com/github/crodier1/machine_learning_deep_learning/blob/main/English_to_Spanish_Translator_Seq2Seq_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This translates short phrases and word in Englishto Spanish. Follow the following steps start translating:
1.   Run all fields
2.   Scroll to bottom
3.   Insert word or a short phrase to translate

This is a sequence 2 sequence NLP model built in Tensorflow. To see the source code of the model look at: [English to Spanish Translator Source Code - Seq2Seq - NLP](https://colab.research.google.com/drive/10-b9Yx7oii0jo2biN14MEnbe7e1-FajP?usp=sharing)

In [12]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import tokenizer_from_json
import json
from keras.preprocessing.sequence import pad_sequences
import numpy as np
import pickle
!pip install gdown
import gdown
import os



In [13]:
if not os.path.exists("/content/spanish_docs"):
  folder_id = '1vG7uOU0W6dqpjLceeFOTqt_31RJ9eHsG'
  download_url = f'https://drive.google.com/drive/folders/{folder_id}'

  !gdown --folder {download_url} -O /content/

In [14]:
max_len_input = 6
max_len_target = 10

In [15]:
with open('/content/spanish_docs/tokenizer.json') as file:
    tokenizer_json = json.load(file)

# Create a new tokenizer from the JSON
tokenizer_inputs = tokenizer_from_json(tokenizer_json)

In [16]:
with open('/content/spanish_docs/word2idx_outputs.pkl', 'rb') as f:
    word2idx_outputs = pickle.load(f)

In [17]:
with open('/content/spanish_docs/indx2word_trans.pkl', 'rb') as f:
    indx2word_trans = pickle.load(f)

In [18]:
encoder_model = tf.keras.models.load_model('/content/spanish_docs/encoder_model.h5')
decoder_model = tf.keras.models.load_model('/content/spanish_docs/decoder_model.h5')



In [19]:
def decode_sequence(input_seq):
  states_value = encoder_model.predict(input_seq, verbose = 0)
  target_seq = np.zeros((1, 1))
  target_seq[0, 0] = word2idx_outputs['<sos>']
  eos = word2idx_outputs['<eos>']

  output_sentence = []
  for _ in range(max_len_target):
    output_tokens, h, c = decoder_model.predict([target_seq] + states_value, verbose=0)

    idx = np.argmax(output_tokens[0, 0, :])

    if eos == idx:
      break

    word = ''

    if idx > 0:
      word = indx2word_trans[idx]
      output_sentence.append(word)

    target_seq[0, 0] = idx

    states_value = [h, c]

  return ' '.join(output_sentence)

In [20]:
def translate():
  phrase = input("What word would you like to translate from English to Spanish? ")
  phrase_sequences = tokenizer_inputs.texts_to_sequences([phrase])
  pharse_encoded = pad_sequences(phrase_sequences, maxlen=max_len_input)
  return phrase, pharse_encoded

In [22]:
while True:
  phrase, pharse_encoded = translate()
  translation = decode_sequence(pharse_encoded)
  print('-')
  print('Input:', phrase)
  print('Translation:', translation)

  ans = input('Continue? [y/n] ')
  if ans and ans.lower().startswith('n'):
    break

What word would you like to translate from English to Spanish? Hello
-
Input: Hello
Translation: hola.
Continue? [y/n] goodbye
What word would you like to translate from English to Spanish? goodbye
-
Input: goodbye
Translation: hasta la vista.
Continue? [y/n] n
