<a href="https://colab.research.google.com/github/sergiomora03/AdvancedTopicsAnalytics/blob/main/exercises/E7-TextSummary.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# **Traducción de Texto**


In [None]:
# pip install transformers
# pip install sentencepiece
# pip install sacremoses

In [21]:
from transformers import pipeline
import transformers
import random

In [3]:
pregunta = "Como hago una funcion lineal?"

In [13]:
# Se entrena modelo con traductor español - inglés
translator = pipeline("translation_en_to_es", model="Helsinki-NLP/opus-mt-es-en")


All model checkpoint layers were used when initializing TFMarianMTModel.

All the layers of TFMarianMTModel were initialized from the model checkpoint at Helsinki-NLP/opus-mt-es-en.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMarianMTModel for predictions without further training.


In [14]:
# Se traduce la oración
english_quesion = translator(
    pregunta, clean_up_tokenization_spaces=True, truncation=True
)
print(english_quesion[0]["translation_text"])

How do I make a linear function?


Función que traduce cualquier texto del español al inglés

In [28]:
# Se entrena modelo con traductor español - inglés
translator = pipeline("translation_es_to_en", model="Helsinki-NLP/opus-mt-es-en")
def traductor(pregunta: str) -> str:
    """Tranducir una frase que se ingrese del español al inglés

    Args:
        pregunta (str): Pregunta en español a ser traducida

    Returns:
        str: Pregunta en inglés
    """

    english_quesion = translator(
        pregunta, clean_up_tokenization_spaces=True, truncation=True
    )
    return english_quesion[0]["translation_text"]

All model checkpoint layers were used when initializing TFMarianMTModel.

All the layers of TFMarianMTModel were initialized from the model checkpoint at Helsinki-NLP/opus-mt-es-en.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMarianMTModel for predictions without further training.


# **Using Pretrained model**

In [25]:
from pathlib import Path
class Args:
    # define training arguments

    # MODEL
    model_type = 't5'
    tokenizer_name = 'Salesforce/codet5-base'
    model_name_or_path = 'Salesforce/codet5-base'

    # DATA
    train_batch_size = 8
    validation_batch_size = 8
    max_input_length = 48
    max_target_length = 128
    prefix = "Generate Python: "

    # OPTIMIZER
    learning_rate = 3e-4
    weight_decay = 1e-4
    warmup_ratio = 0.2
    adam_epsilon = 1e-8

    # TRAINING
    seed = 2022
    epochs = 20

    # DIRECTORIES
    output_dir = "runs/"
    logging_dir = f"{output_dir}/logs/"
    checkpoint_dir = f"checkpoint"
    save_dir = f"{output_dir}/saved_model/" #HERE YOU MUST ADD THE COMPLETE PATH
    cache_dir = '../working/'
    #Path(output_dir).mkdir(parents=True, exist_ok=True)
    #Path(logging_dir).mkdir(parents=True, exist_ok=True)
    #Path(save_dir).mkdir(parents=True, exist_ok=True)


# initialize training arguments
args = Args()

In [26]:
def run_predict(args, text):
    # load saved finetuned model
    model = transformers.TFT5ForConditionalGeneration.from_pretrained(args.save_dir)
    # load saved tokenizer
    tokenizer = transformers.RobertaTokenizer.from_pretrained(args.save_dir)

     # encode texts by prepending the task for input sequence and appending the test sequence
    query = args.prefix + text
    encoded_text = tokenizer(query, return_tensors='tf', padding='max_length', truncation=True, max_length=args.max_input_length)

    # inference
    generated_code = model.generate(
        encoded_text["input_ids"], attention_mask=encoded_text["attention_mask"],
        max_length=args.max_target_length, top_p=0.95, top_k=50, repetition_penalty=2.0, num_return_sequences=1
    )

    # decode generated tokens
    decoded_code = tokenizer.decode(generated_code.numpy()[0], skip_special_tokens=True)
    return decoded_code

def predict_from_dataset(args):
    # load using hf datasets
    dataset = load_dataset('json', data_files='../working/mbpp.jsonl')
    # train test split
    dataset = dataset['train'].train_test_split(0.1, shuffle=False)
    test_dataset = dataset['test']

    # randomly select an index from the validation dataset
    index = random.randint(0, len(test_dataset))
    text = test_dataset[index]['text']
    code = test_dataset[index]['code']

    # run-predict on text
    decoded_code = run_predict(args, text)

    print("#" * 25); print("QUERY: ", text);
    print()
    print('#' * 25); print("ORIGINAL: "); print("\n", code);
    print()
    print('#' * 25); print("GENERATED: "); print("\n", decoded_code);

def predict_from_text(args, text):
    # run-predict on text
    decoded_code = run_predict(args, text)
    print("#" * 25); print("QUERY: ", text);
    print()
    print('#' * 25); print("GENERATED: "); print("\n", decoded_code);

# **Paso a Paso Backend**

In [23]:
# example 1
predict_from_text(args, "Write a function to add two random numbers"); print()

All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at C:/Users/cande/Downloads/runs/saved_model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


#########################
QUERY:  Write a function to add two random numbers

#########################
GENERATED: 

 def add_random(num1, num2):
    random = [0 for i in range (min(_int(),max((x) + 1)/3)] 
        if x % 2 == 0:  
            return y    
    else :
          None;



In [33]:
def main_IA_backend():
    """
    Función que realiza todo el cómputo para generar la generación de código.

    Está disponible para inglés y español
    """
    while True:
        idioma = input("Select your language (English or Spanish)")
        if str(idioma.lower()) == "english":
            question = str(input("Write down the function you want to create"))
            predict_from_text(args, question); print()
            break
        elif str(idioma.lower()) == "spanish":
            pregunta = str(input("Escribe la función que deseas crear"))
            question = traductor(pregunta)
            print(question)
            predict_from_text(args, question); print()
            break
        else:
            print("Just two possible languages (English or Spanish). Try Again.")

In [34]:
main_IA_backend()

Just two possible languages (English or Spanish). Try Again.
Just two possible languages (English or Spanish). Try Again.
Just two possible languages (English or Spanish). Try Again.


All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at C:/Users/cande/Downloads/runs/saved_model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


#########################
QUERY:  Create a function that substract two values

#########################
GENERATED: 

 def substract_single(a,b):
    if a > b: 
        return (A-B)  
    res = A + B - 1
    for i in range(_mini(), _maxint), 0, 2 :    
         result += ((c * d)+((e % p ==0 or e%p!= n))          ]) 

