<a id="title_ID"></a>
<font color='#425066'><center><h1>Text-to-code Generation (NLP Assignment 2)</h1></center></font>


<a id="section1"><font color='#FF6F00'><h2>Introduction</h2></font></a>


Text-to-code generation is a task where we can generate code based on the natural language description. It can further be used to build an AI-powered coding assistant. Developers simply type the natural language description or the function signature to specify their intents, and the AI coding assistant can generate or complete the target function for them. This helps to accelerate implementation and also reduce their reliance on external resources.

<br>

<br>

CodeT5 by Salesforce is the first code-aware, encoder-decoder-based pre-trained programming language model, which enables a wide range of code intelligence applications including code understanding and generation tasks.


In this notebook we will finetune CodeT5 on MBPP - Mostly Basic Python Problems by Google Research to generate code based on problem description. MBPP is a benchmark that consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on. Each problem consists of a task description, code solution and 3 automated test cases.

<b>The notebook demonstrates how to finetune CodeT5 on MBPP Dataset w/ TensorFlow.</b>

Let's start by importing required libraries to the environment: 

- [*TensorFlow*](https://www.tensorflow.org/) an end-to-end open source platform for machine learning.
- [*transformers*](https://huggingface.co/docs/transformers/index) provides APIs to easily download and train state-of-the-art pretrained models
- [*datasets*](https://huggingface.co/docs/datasets/index) a library for easily accessing and sharing datasets.

In [2]:
import os
import time
import math
import random
import datetime
from pathlib import Path

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "1"  # reduce the amount of console output from TF
import tensorflow as tf

from transformers import *
!pip install -q datasets 
from datasets import load_dataset

logging.set_verbosity_warning()
logging.set_verbosity_error()

import logging

print('TF version',tf.__version__)
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU'))) 

TF version 2.6.2
Num GPUs Available:  1


In [3]:
def setup_strategy(xla, fp16, no_cuda):
    print(" Tensorflow: setting up strategy")
    
    
    if xla:
        print(" XLA Enabled")
        tf.config.optimizer.set_jit(True)
    
    # setup mixed precision training
    if fp16:
        # Set to float16 at first
        print(" Mixed Precision Training Enabled")
        policy = tf.keras.mixed_precision.experimental.Policy("mixed_float16")
        tf.keras.mixed_precision.experimental.set_policy(policy)
    
    # setup distribution strategy
    gpus = tf.config.list_physical_devices("GPU")
    if no_cuda:
        strategy = tf.distribute.OneDeviceStrategy(device="/cpu:0")
    else:
        if len(gpus) == 0:
            print(" One Device Strategy [CPU] Enabled")
            strategy = tf.distribute.OneDeviceStrategy(device="/cpu:0")
        elif len(gpus) == 1:
            print(" One Device Strategy [GPU] Enabled")
            strategy = tf.distribute.OneDeviceStrategy(device="/gpu:0")
        elif len(gpus) > 1:
            print(" Mirrored Strategy Enabled")
            
            strategy = tf.distribute.MirroredStrategy()
        else:
            strategy = tf.distribute.get_strategy()

    return strategy

def n_replicas(strategy):
    # return number of devices
    return strategy.num_replicas_in_sync


strategy = setup_strategy(xla=True, fp16=False, no_cuda=False)

 Tensorflow: setting up strategy
 XLA Enabled
 One Device Strategy [GPU] Enabled


In [4]:
def download_dataset(cache_dir):
    
    _url = "https://raw.githubusercontent.com/google-research/google-research/master/mbpp/mbpp.jsonl" 
    dataset_path = tf.keras.utils.get_file("mbpp.jsonl", origin=_url, cache_dir=cache_dir, cache_subdir=cache_dir)
    return dataset_path 

def convert_examples_to_features(examples, tokenizer, args):
    
    texts = examples['text']
    codes = examples['code']
    # tests = [" ".join(test) for test in examples['test_list']] # convert list of test cases to single string
    
    # encode texts by prepending the task for input sequence
    inputs = [args.prefix + text for text in texts]
    model_inputs = tokenizer(inputs, max_length=args.max_input_length, padding="max_length", truncation=True)
    
    
    # encode texts by prepending the task for input sequence
    labels = tokenizer(codes, max_length=args.max_target_length, padding="max_length", truncation=True).input_ids
    
    # we need to replace the index of the padding tokens by -100
    # such that they are not taken into account by the CrossEntropyLoss
    labels_with_ignore_index = []
    for labels_example in labels:
        labels_example = [label if label != 0 else -100 for label in labels_example]
        labels_with_ignore_index.append(labels_example)
    model_inputs["labels"] = labels_with_ignore_index
    
    # return features
    return model_inputs


def get_train_tfdataset(train_dataset, num_train_examples, args):
    # select feature columns
    columns = ['input_ids', 'attention_mask', 'labels'] 
    # set to tensorflow format
    train_dataset.set_format(type='tensorflow', columns=columns) 
    
    # specify return types
    return_types = {'input_ids':tf.int32, 'attention_mask':tf.int32, 'labels':tf.int32} 
    # specify return shapes
    return_shapes = {'input_ids': tf.TensorShape([None]),'attention_mask': tf.TensorShape([None]), 'labels': tf.TensorShape([None])} 
    # initialize dataset 
    tf_dataset = tf.data.Dataset.from_generator(lambda : train_dataset, return_types, return_shapes) 
    
    # turn off auto-sharding
    options = tf.data.Options()
    options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.OFF
    tf_dataset = tf_dataset.with_options(options)
    
    # repeat, shuffle, batch, prefetch
    ds = (
        tf_dataset.repeat()
        .shuffle(num_train_examples, seed=args.seed)
        .batch(args.train_batch_size)
        .prefetch(tf.data.AUTOTUNE)
    )
    
    # distribute dataset to devices
    return strategy.experimental_distribute_dataset(ds)

def get_validation_tfdataset(eval_dataset, num_validation_examples, args):
    # select feature columns
    columns = ['input_ids', 'attention_mask', 'labels'] 
    # set to tensorflow format
    eval_dataset.set_format(type='tensorflow', columns=columns) 
    
    # specify return types
    return_types = {'input_ids':tf.int32, 'attention_mask':tf.int32, 'labels':tf.int32} 
    # specify return shapes
    return_shapes = {'input_ids': tf.TensorShape([None]),'attention_mask': tf.TensorShape([None]), 'labels': tf.TensorShape([None])} 
    # initialize dataset 
    tf_dataset = tf.data.Dataset.from_generator(lambda : eval_dataset, return_types, return_shapes) 
    
    # turn off auto-sharding
    options = tf.data.Options()
    options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.OFF
    tf_dataset = tf_dataset.with_options(options)
    
    # repeat, batch, prefetch
    ds = (
        tf_dataset.repeat()
        .batch(args.validation_batch_size)
        .prefetch(tf.data.AUTOTUNE)
    )
    
    # distribute dataset to devices
    return strategy.experimental_distribute_dataset(ds)

<a id="section8"><font color='#FF6F00'><h2>Utility Functions / Class</h2></font></a>

- *fix_all_seeds()* - sets the random seed for deterministic results.
- *init_logger()* - initialize logger for tracking events.
- *ProgressBar()* - custom progress bar to display metrics.

In [5]:
def fix_all_seeds(seed):
    # set random seed
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    tf.random.set_seed(seed)
    
def init_logger(log_file=None, log_file_level=logging.NOTSET):
    # initialize logger for tracking events and save in file
    if isinstance(log_file, Path):
        log_file = str(log_file)
    log_format = logging.Formatter(
        fmt='%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
        datefmt='%m/%d/%Y %H:%M:%S'
    )
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    console_handler = logging.StreamHandler()
    console_handler.setFormatter(log_format)
    logger.handlers = [console_handler]
    if log_file and log_file != '':
        file_handler = logging.FileHandler(log_file)
        file_handler.setLevel(log_file_level)
        # file_handler.setFormatter(log_format)
        logger.addHandler(file_handler)
    return logger

class ProgressBar(object):
    # custom progress bar
    def __init__(self, n_total,width=30,desc = 'Training'):
        self.width = width
        self.n_total = n_total
        self.start_time = time.time()
        self.desc = desc

    def __call__(self, step, info={}):
        now = time.time()
        current = step + 1
        recv_per = current / self.n_total
        bar = f'[{self.desc}] {current}/{self.n_total} ['
        if recv_per >= 1:
            recv_per = 1
        prog_width = int(self.width * recv_per)
        if prog_width > 0:
            bar += '=' * (prog_width - 1)
            if current< self.n_total:
                bar += ">"
            else:
                bar += '='
        bar += '.' * (self.width - prog_width)
        bar += ']'
        show_bar = f"\r{bar}"
        time_per_unit = (now - self.start_time) / current
        if current < self.n_total:
            eta = time_per_unit * (self.n_total - current)
            if eta > 3600:
                eta_format = ('%d:%02d:%02d' %
                              (eta // 3600, (eta % 3600) // 60, eta % 60))
            elif eta > 60:
                eta_format = '%d:%02d' % (eta // 60, eta % 60)
            else:
                eta_format = '%ds' % eta
            time_info = f' - ETA: {eta_format}'
        else:
            if time_per_unit >= 1:
                time_info = f' {time_per_unit:.1f}s/step'
            elif time_per_unit >= 1e-3:
                time_info = f' {time_per_unit * 1e3:.1f}ms/step'
            else:
                time_info = f' {time_per_unit * 1e6:.1f}us/step'

        show_bar += time_info
        if len(info) != 0:
            show_info = f'{show_bar} ' + \
                        "-".join([f' {key}: {value:.4f} ' if key != "learning_rate" else f' {key}: {value:.8f} ' for key, value in info.items()])
            print(show_info, end='')
        else:
            print(show_bar, end='')

In [6]:
class Trainer:
    def __init__(
        self, model, args, train_dataset, validation_dataset, 
        num_train_examples, num_validation_examples
    ):
        self.model = model
        self.args = args
        
        self.train_dataset = train_dataset
        self.num_train_examples = num_train_examples
        
        self.validation_dataset = validation_dataset
        self.num_validation_examples = num_validation_examples
        
        self.global_step = 0
        self.eval_loss = tf.keras.metrics.Sum()
        
    def create_optimizer_and_scheduler(self, num_training_steps):
        # creates an optimizer with a learning rate schedule using a warmup phase followed by a linear decay.
        num_warmup_steps = math.ceil(num_training_steps * self.args.warmup_ratio)
        self.optimizer, self.lr_scheduler = create_optimizer(
            init_lr=self.args.learning_rate,
            num_train_steps=num_training_steps,
            num_warmup_steps=num_warmup_steps,
            weight_decay_rate=self.args.weight_decay,
            adam_epsilon=self.args.adam_epsilon
        )
    
    def evaluation_step(self, features, labels, nb_instances_in_global_batch):
        # forward pass
        outputs = self.model(input_ids=features['input_ids'], attention_mask=features['attention_mask'], labels=labels, training=False)[:2]
        loss, logits = outputs[:2]
        # loss scaling
        scaled_loss = loss / tf.cast(nb_instances_in_global_batch, dtype=loss.dtype)
        # add current batch loss
        self.eval_loss.update_state(scaled_loss)
    
    @tf.function
    def distributed_evaluation_steps(self, batch):
        features = {k: v for k, v in batch.items() if 'labels' not in k}
        labels = batch['labels']
        nb_instances = tf.reduce_sum(tf.cast(labels != -100, dtype=tf.int32))
        # strategy.run() expects args to be a list or tuple
        inputs = (features, labels, nb_instances)
        # `run` replicates the provided computation and runs with the distributed input
        strategy.run(self.evaluation_step, inputs)

    def evaluate(self):
        # calculate total validation steps
        steps = math.ceil(self.num_validation_examples / self.args.validation_batch_size)
        # reset eval loss after every epoch
        self.eval_loss.reset_states()
        logs = {}
        pbar = ProgressBar(n_total=steps, desc='Evaluating')
        # iterate over validation dataset
        for step, batch in enumerate(self.validation_dataset): 
            # distributed evaluation step
            self.distributed_evaluation_steps(batch) 
            logs["eval_loss"] = self.eval_loss.result() / (step + 1)
            pbar(step=step, info=logs)
            if step == steps - 1:
                break
        print("\n------------- validation result -----------------")
        
    def apply_gradients(self, features, labels, nb_instances_in_global_batch):
        # forward pass
        outputs = self.model(input_ids=features['input_ids'], attention_mask=features['attention_mask'], labels=labels, training=True)[:2] 
        loss, logits = outputs[:2]
        # loss scaling
        scaled_loss = loss / tf.cast(nb_instances_in_global_batch, dtype=loss.dtype) 
        # calculate gradients
        gradients = tf.gradients(scaled_loss, self.model.trainable_variables) 
        # convert gradients with nan value
        gradients = [g if g is not None else tf.zeros_like(v) for g, v in zip(gradients, self.model.trainable_variables)] 
        # optimize the model
        self.optimizer.apply_gradients(list(zip(gradients, self.model.trainable_variables))) 
        # add current batch loss
        self.train_loss.update_state(scaled_loss) 
    
    @tf.function
    def distributed_training_steps(self, batch):
        with strategy.scope():
            features = {k: v for k, v in batch.items() if 'labels' not in k}
            labels = batch['labels']
            nb_instances = tf.reduce_sum(tf.cast(labels != -100, dtype=tf.int32))
            # strategy.run() expects args to be a list or tuple
            inputs = (features, labels, nb_instances)
            # `run` replicates the provided computation and runs with the distributed input.
            strategy.run(self.apply_gradients, inputs)
    
    def train(self):
        # calculate total training steps
        num_updates_per_epoch = self.num_train_examples // args.train_batch_size 
        self.steps_per_epoch = num_updates_per_epoch
        t_total = self.steps_per_epoch * self.args.epochs
        
        with strategy.scope():
            # optimizer, and checkpoint must be created under `strategy.scope`
            # create optimizer and scheduler
            self.create_optimizer_and_scheduler(num_training_steps=t_total) 
            
            # create checkpoint manager
            folder = os.path.join(self.args.output_dir, self.args.checkpoint_dir)
            ckpt = tf.train.Checkpoint(optimizer=self.optimizer, model=self.model) 
            self.model.ckpt_manager = tf.train.CheckpointManager(ckpt, folder, max_to_keep=1)
            iterations = self.optimizer.iterations
            
            logger.info("***** Running training *****")
            logger.info(f"  Num examples = {self.num_train_examples}")
            logger.info(f"  Num Epochs = {self.args.epochs}")
            logger.info(f"  Total train batch size (w. parallel & distributed) = {self.args.train_batch_size * n_replicas(strategy)}")
            logger.info(f"  Steps per epoch = {self.steps_per_epoch}")
            logger.info(f"  Total optimization steps = {t_total}")
            
            self.train_loss = tf.keras.metrics.Sum(name="training_loss")
            start_time = datetime.datetime.now()
            for epoch_iter in range(self.args.epochs):
                # training loop
                logger.info(f"Epoch {epoch_iter + 1}/{self.args.epochs}")
                
                pbar = ProgressBar(n_total=self.steps_per_epoch, desc='Training')
                # iterate over training dataset
                for step, batch in enumerate(self.train_dataset):    
                    # distributed training step
                    self.distributed_training_steps(batch) 
                    
                    self.global_step = iterations.numpy()
                    training_loss = self.train_loss.result() / (step + 1)
                    
                    logs = {}
                    logs["training_loss"] = training_loss.numpy()
                    logs["learning_rate"] = self.lr_scheduler(self.global_step).numpy()
                    pbar(step=step, info=logs)
                    
                    if self.global_step % self.steps_per_epoch == 0:
                        print("\n------------- train result -----------------")
                        # call to evaluation loop
                        self.evaluate()
                        # save checkpoint
                        ckpt_save_path = self.model.ckpt_manager.save()
                        logger.info(f"Saving checkpoint at {ckpt_save_path}")
                        break
                
                # reset train loss after every epoch
                self.train_loss.reset_states()
            end_time = datetime.datetime.now()
            logger.info(f"Training took: {str(end_time - start_time)}")

In [7]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import mean_squared_error, mean_absolute_error, classification_report, confusion_matrix
    

def run(args):
    logger.info(" Starting training / evaluation")
    
    logger.info(" Downloading Data Files")
    dataset_path = download_dataset(args.cache_dir) 

    logger.info(" Loading Data Files")
    dataset = load_dataset('json', data_files=dataset_path) 
    # train test split
    dataset = dataset['train'].train_test_split(0.1, shuffle=False) 
        
    logger.info(" Initializing Tokenizer")
    tokenizer = RobertaTokenizer.from_pretrained(args.tokenizer_name) 
    
    logger.info(" Preparing Features")
    dataset = dataset.map(convert_examples_to_features, batched=True, fn_kwargs={"tokenizer":tokenizer, "args":args})

    logger.info(" Intializing training and validation dataset ")
    train_dataset = dataset['train']
    num_train_examples = len(dataset['train'])
    # create tf train dataset
    tf_train_dataset = get_train_tfdataset(train_dataset, num_train_examples, args) 
    
    validation_dataset = dataset['test']
    num_validation_examples = len(dataset['test'])
    # create tf validation dataset
    tf_validation_dataset = get_validation_tfdataset(train_dataset, num_validation_examples, args) 
    
    logger.info(f' Intializing model | {args.model_type.upper()} ')
    with strategy.scope():
        # model must be created under `strategy.scope`
        model = TFT5ForConditionalGeneration.from_pretrained(args.model_name_or_path, from_pt=True)
    
    # custom training loop
    trainer = Trainer(model, args, tf_train_dataset, tf_validation_dataset, num_train_examples, num_validation_examples) 
    trainer.train()
    
    # save pretrained model and tokenizer
    logger.info(f" Saving model in {args.save_dir}")
    trainer.model.save_pretrained(args.save_dir)
    tokenizer.save_pretrained(args.save_dir)
    


In [8]:
class Args:
    # define training arguments
    
    # MODEL
    model_type = 't5'
    tokenizer_name = 'Salesforce/codet5-base'
    model_name_or_path = 'Salesforce/codet5-base'
    
    # DATA
    train_batch_size = 8
    validation_batch_size = 8
    max_input_length = 48
    max_target_length = 128
    prefix = "Generate Python: "    

    # OPTIMIZER
    learning_rate = 3e-4
    weight_decay = 1e-4
    warmup_ratio = 0.2
    adam_epsilon = 1e-8

    # TRAINING
    seed = 2022
    epochs = 20

    # DIRECTORIES
    output_dir = "runs/"
    logging_dir = f"{output_dir}/logs/"
    checkpoint_dir = f"checkpoint"
    save_dir = f"{output_dir}/saved_model/"
    cache_dir = '../working/'
    Path(output_dir).mkdir(parents=True, exist_ok=True)
    Path(logging_dir).mkdir(parents=True, exist_ok=True)
    Path(save_dir).mkdir(parents=True, exist_ok=True)
    

# initialize training arguments
args = Args()
# initialize logger
logger = init_logger(log_file=os.path.join(args.logging_dir, f"{args.model_type}-{time.strftime('%Y-%m-%d-%H-%M-%S', time.localtime())}.log"))
# fix all seeds
fix_all_seeds(args.seed)

if __name__ == "__main__":
    # run training and evaluation
    dataset = run(args)

05/13/2024 20:00:16 - INFO - root -    Starting training / evaluation
05/13/2024 20:00:16 - INFO - root -    Downloading Data Files
05/13/2024 20:00:16 - INFO - root -    Loading Data Files


  0%|          | 0/1 [00:00<?, ?it/s]

05/13/2024 20:00:17 - INFO - root -    Initializing Tokenizer
05/13/2024 20:00:20 - INFO - root -    Preparing Features


  0%|          | 0/1 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

05/13/2024 20:00:23 - INFO - root -    Intializing training and validation dataset 
05/13/2024 20:00:23 - INFO - root -    Intializing model | T5 
2024-05-13 20:00:24.023836: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
05/13/2024 20:00:27 - INFO - root -   ***** Running training *****
05/13/2024 20:00:27 - INFO - root -     Num examples = 876
05/13/2024 20:00:27 - INFO - root -     Num Epochs = 20
05/13/2024 20:00:27 - INFO - root -     Total train batch size (w. parallel & distributed) = 8
05/13/2024 20:00:27 - INFO - root -     Steps per epoch = 109
05/13/2024 20:00:27 - INFO - root -     Total optimization steps = 2180
05/13/2024 20:00:27 - INFO - root -   Epoch 1/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:06:29 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-1
05/13/2024 20:06:29 - INFO - root -   Epoch 2/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:08:08 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-2
05/13/2024 20:08:08 - INFO - root -   Epoch 3/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:08:40 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-3
05/13/2024 20:08:40 - INFO - root -   Epoch 4/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:09:13 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-4
05/13/2024 20:09:13 - INFO - root -   Epoch 5/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:09:46 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-5
05/13/2024 20:09:46 - INFO - root -   Epoch 6/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:10:19 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-6
05/13/2024 20:10:19 - INFO - root -   Epoch 7/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:10:52 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-7
05/13/2024 20:10:52 - INFO - root -   Epoch 8/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:11:25 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-8
05/13/2024 20:11:25 - INFO - root -   Epoch 9/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:11:58 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-9
05/13/2024 20:11:58 - INFO - root -   Epoch 10/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:12:31 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-10
05/13/2024 20:12:31 - INFO - root -   Epoch 11/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:13:03 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-11
05/13/2024 20:13:03 - INFO - root -   Epoch 12/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:13:36 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-12
05/13/2024 20:13:36 - INFO - root -   Epoch 13/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:14:09 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-13
05/13/2024 20:14:09 - INFO - root -   Epoch 14/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:14:41 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-14
05/13/2024 20:14:41 - INFO - root -   Epoch 15/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:15:14 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-15
05/13/2024 20:15:14 - INFO - root -   Epoch 16/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:15:47 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-16
05/13/2024 20:15:47 - INFO - root -   Epoch 17/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:16:20 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-17
05/13/2024 20:16:20 - INFO - root -   Epoch 18/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:16:53 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-18
05/13/2024 20:16:53 - INFO - root -   Epoch 19/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:17:25 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-19
05/13/2024 20:17:25 - INFO - root -   Epoch 20/20


------------- train result -----------------
------------- validation result -----------------


05/13/2024 20:17:58 - INFO - root -   Saving checkpoint at runs/checkpoint/ckpt-20
05/13/2024 20:17:58 - INFO - root -   Training took: 0:17:31.661229
05/13/2024 20:17:58 - INFO - root -    Saving model in runs//saved_model/


In [9]:
def run_predict(args, text):
    # load saved finetuned model
    model = TFT5ForConditionalGeneration.from_pretrained(args.save_dir)
    # load saved tokenizer
    tokenizer = RobertaTokenizer.from_pretrained(args.save_dir) 
    
     # encode texts by prepending the task for input sequence and appending the test sequence
    query = args.prefix + text 
    encoded_text = tokenizer(query, return_tensors='tf', padding='max_length', truncation=True, max_length=args.max_input_length)
    
    # inference
    generated_code = model.generate(
        encoded_text["input_ids"], attention_mask=encoded_text["attention_mask"], 
        max_length=args.max_target_length, top_p=0.95, top_k=50, repetition_penalty=2, num_return_sequences=1
    )
    
    # decode generated tokens
    decoded_code = tokenizer.decode(generated_code.numpy()[0], skip_special_tokens=True)
    return decoded_code

def predict_from_dataset(args):
    # load using hf datasets
    dataset = load_dataset('json', data_files='../working/mbpp.jsonl') 
    # train test split
    dataset = dataset['train'].train_test_split(0.1, shuffle=False) 
    test_dataset = dataset['test']
    
    # randomly select an index from the validation dataset
    index = random.randint(0, len(test_dataset))
    text = test_dataset[index]['text']
    code = test_dataset[index]['code']
    
    # run-predict on text
    decoded_code = run_predict(args, text)
    
    print("#" * 25); print("QUERY: ", text); 
    print()
    print('#' * 25); print("ORIGINAL: "); print("\n", code);
    print()
    print('#' * 25); print("GENERATED: "); print("\n", decoded_code);
    
def predict_from_text(args, text):
    # run-predict on text
    decoded_code = run_predict(args, text)
    print("#" * 25); print("QUERY: ", text); 
    print()
    print('#' * 25); print("GENERATED: "); print("\n", decoded_code);

<a id="section12a"><font color='#425066'><h3>Predict from Dataset</h3></font></a>

In [10]:
# example 1
predict_from_dataset(args)
# example 2
predict_from_dataset(args)
# example 3
predict_from_dataset(args)



  0%|          | 0/1 [00:00<?, ?it/s]



#########################
QUERY:  Write a function to convert the given tuples into set.

#########################
ORIGINAL: 

 def tuple_to_set(t):
  s = set(t)
  return (s) 

#########################
GENERATED: 

 def tuple_set(testtup):
  res = set([tuple() for ele in testT up]) 
  return (res)




  0%|          | 0/1 [00:00<?, ?it/s]



#########################
QUERY:  Write a function to check for a number at the end of a string.

#########################
ORIGINAL: 

 import re
def end_num(string):
    text = re.compile(r".*[0-9]$")
    if text.match(string):
        return True
    else:
        return False

#########################
GENERATED: 

 def check_end(str1):
    count = 0   for i in range(_len() - 1) : 
        if str2[i] == '0' and len($list)+count < _max: 
            return True    
    xnumre.search("$", string)) or\N" not within the list, int((x / n)]), "string") at all (y >= 10)".split(): 
         XNUMRE").sort(), num really=lambda s: bool('inf'))




  0%|          | 0/1 [00:00<?, ?it/s]



#########################
QUERY:  Write a function to convert camel case string to snake case string by using regex.

#########################
ORIGINAL: 

 import re
def camel_to_snake(text):
  str1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', text)
  return re.sub('([a-z0-9])([A-Z])', r'\1_\2', str1).lower()

#########################
GENERATED: 

 import re
def snake_to_camel(word):
  return ''.join('_')


<a id="section12b"><font color='#425066'><h3>Predict from Text</h3></font></a>

In [11]:
# example 1
predict_from_text(args, "Write a function to add two random numbers"); print()
# example 2
predict_from_text(args, "Write a function to find the frequency of items in a list"); print()
# example 3
predict_from_text(args, "Write a function to concatenate two dictionary"); print()

#########################
QUERY:  Write a function to add two random numbers

#########################
GENERATED: 

 def add_random(nums1, nums2):
    if len (numbers)==0:
        return 0;
     else:\
         random = [a for a in range('10',len(_))] + num1[:-count];
          yield from listify([int((i / math.log 10), int("".join(map()", array)))])

#########################
QUERY:  Write a function to find the frequency of items in a list

#########################
GENERATED: 

 import collections
def freq_count(list1): 
    dict = Counter()   for i in list 1: 
        keys=dict.keys():    
            if len (key) == 0 or key[len(*value)] > maxsize : 
                return -2
       result, frequency

#########################
QUERY:  Write a function to concatenate two dictionary

#########################
GENERATED: 

 def concatenate_dict(d1, d2):
    return (sorted([x for x in zip(*map(_add__), dict[y]]))



In [23]:


# Initialize global lists to store true and predicted values
y_true = []
y_pred = []

def evaluate_from_dataset(args):
    global y_true, y_pred  # Declare the lists as global within the function
    
    # Load dataset
    dataset = load_dataset('json', data_files='../working/mbpp.jsonl') 
    # Train-test split
    dataset = dataset['train'].train_test_split(0.1, shuffle=False) 
    test_dataset = dataset['test']
    
    # Clear lists before each evaluation
    y_true.clear()
    y_pred.clear()
    
    # Initialize lists to store reference and candidate sentences for BLEU calculation
    references = []
    candidates = []
    
    # Iterate through test dataset
    for i, data in enumerate(test_dataset):
        text = data['text']
        code = data['code']
        
        # Run prediction
        decoded_code = run_predict(args, text)
        
        # Append true and predicted values
        y_true.append(code)
        y_pred.append(decoded_code)
        
        # Append original and generated code for BLEU score calculation
        
        print("Predicted ", i)
        print()

    # Calculate evaluation metrics


# Call the function
evaluate_from_dataset(args)





  0%|          | 0/1 [00:00<?, ?it/s]



Predicted  0

Predicted  1

Predicted  2

Predicted  3

Predicted  4

Predicted  5

Predicted  6

Predicted  7

Predicted  8

Predicted  9

Predicted  10

Predicted  11

Predicted  12

Predicted  13

Predicted  14

Predicted  15

Predicted  16

Predicted  17

Predicted  18

Predicted  19

Predicted  20

Predicted  21

Predicted  22

Predicted  23

Predicted  24

Predicted  25

Predicted  26

Predicted  27

Predicted  28

Predicted  29

Predicted  30

Predicted  31

Predicted  32

Predicted  33

Predicted  34

Predicted  35

Predicted  36

Predicted  37

Predicted  38

Predicted  39

Predicted  40

Predicted  41

Predicted  42

Predicted  43

Predicted  44

Predicted  45

Predicted  46

Predicted  47

Predicted  48

Predicted  49

Predicted  50

Predicted  51

Predicted  52

Predicted  53

Predicted  54

Predicted  55

Predicted  56

Predicted  57

Predicted  58

Predicted  59

Predicted  60

Predicted  61

Predicted  62

Predicted  63

Predicted  64

Predicted  65

Predicted  66

Predi

In [13]:
# %cd /kaggle/working
# from IPython.display import FileLink
# FileLink('runs/saved_model/tf_model.h5')

In [15]:
print(y_true)

["def sort_String(str) : \r\n    str = ''.join(sorted(str)) \r\n    return (str) ", 'def check_tuples(test_tuple, K):\r\n  res = all(ele in K for ele in test_tuple)\r\n  return (res) ', "import re\r\ndef text_match(text):\r\n  patterns = 'a.*?b$'\r\n  if re.search(patterns,  text):\r\n    return ('Found a match!')\r\n  else:\r\n    return ('Not matched!')", 'def Check_Solution(a,b,c) : \r\n    if ((b*b) - (4*a*c)) > 0 : \r\n        return ("2 solutions") \r\n    elif ((b*b) - (4*a*c)) == 0 : \r\n        return ("1 solution") \r\n    else : \r\n        return ("No solutions") ', 'def sum_even_odd(list1):\r\n    first_even = next((el for el in list1 if el%2==0),-1)\r\n    first_odd = next((el for el in list1 if el%2!=0),-1)\r\n    return (first_even+first_odd)', 'def parallelogram_perimeter(b,h):\r\n  perimeter=2*(b*h)\r\n  return perimeter', 'def div_of_nums(nums,m,n):\r\n result = list(filter(lambda x: (x % m == 0 and x % n == 0), nums)) \r\n return result', 'def all_Bits_Set_In_The_Gi

In [63]:
print(y_pred)

['def sort_String(str1):\r\n    result = sorted([x for x in str 1 if not y] ) \r\n    return string', 'def check_only(testtup, K):\r\n  res = True \r\n  for ele in testTdown:\r\n    if not isinstance (ele, tuple)) :\r\n      break\r\n  return False', 'import re\r\ndef text_startab(text):\r\n  patterns = \'ab{2,3}\'\r\n ifre.search("", data)):\r\n    return (\'Found a match!\')\r\n else:\\\r\n     for i in range(\'0\',len(*patterns)-1),\\', 'def Check_Solution(a,b): \r\n    if (2*B * b == 9) :\r\n        return ("Yes");   else:\r\n            solution = 1;    \r\n    for i in range((3*(4-bit+1)*10 + ((6**i - 2)) % B);      \r\n          result +=("No") ', 'def sum_even(list1):\r\n    first = next((el for el in list2 if e% 2==0),- 1)\r\n     return (first[sum] )', 'def parallelogram_perimeter(b,h):\r\n  perimeter=2*a+w*(i + h)\r\n returnPerimeter', 'def div_of(nums,m-n):\r\n result = list (filter((lambda x: ((x % m == 0) and ("No such number"), nums)) \r\n returnresult', 'def all_Bits(n,

In [18]:
!pip install astor

Collecting astor
  Downloading astor-0.8.1-py2.py3-none-any.whl (27 kB)
Installing collected packages: astor
Successfully installed astor-0.8.1


In [62]:
import ast
import astor
import nltk.translate.bleu_score as bleu

def token_level_accuracy(y_true, y_pred):
    total_tokens = sum(len(code.split()) for code in y_true)
    correct_tokens = sum(len(set(true_code.split()) & set(pred_code.split())) for true_code, pred_code in zip(y_true, y_pred))
    accuracy = correct_tokens / total_tokens
    return accuracy

def bleu_score(y_true, y_pred, ngram_order=1):
    smoothing_function = bleu.SmoothingFunction()
    weights = [1. / ngram_order] * ngram_order  # Equal weights for all n-gram orders
    return bleu.corpus_bleu([[true_code.split()] for true_code in y_true], [pred_code.split() for pred_code in y_pred], weights=weights, smoothing_function=smoothing_function.method1)


def is_valid_python(code):
    try:
        ast.parse(code)
        return True
    except SyntaxError:
        return False



In [71]:
print("Token-level Accuracy:", token_level_accuracy(y_true, y_pred))
print("BLEU Score:", bleu_score(y_true, y_pred))

Token-level Accuracy: 0.75
BLEU Score: 0.5454545454545454
