## Overview

Codey models are text-to-code models from Google AI, trained on a massive code related dataset. You can generate code related responses for different scenarios such as writing functions, unit tests, debugging, autocompleting and etc. This notebook is to show you how to use Code-bison and Codechat-bison to generat unit tests and debug code.

The notebook is structured as follows:

1. We will explain how to Code-bison Python SDK to write unit tests for functions.
2. We will explain how to Codechat-bison Python SDK to debug code.
3. We will explain how to Code-bison Python SDK to generate documents/comments for code blocks.

* Author: [leip@](https://moma.corp.google.com/person/leip)
* Date: 09/22/23

## Prep Work

In [None]:
# @title Install Libraries
import sys

if 'google.colab' in sys.modules:
    ! pip install google-cloud-aiplatform
    ! pip install google-cloud-discoveryengine
    from google.colab import auth as google_auth
    google_auth.authenticate_user()

In [None]:
# @title Initialize Vertex AI

import vertexai
from vertexai.language_models import CodeGenerationModel

VERTEX_API_PROJECT = 'certain-haiku-391918'
VERTEX_API_LOCATION = 'us-central1'

vertexai.init(project=VERTEX_API_PROJECT, location=VERTEX_API_LOCATION)
code_generation_model = CodeGenerationModel.from_pretrained("code-bison@001")

## Unit Test with Code-bison




In [None]:
# @title Set Up Unit Test Prompt and Invoke Code-bison with Parameters and Prompt


prefix = """You're an expert Python programmer and great at writing unit tests in Python. please write unit test for the code below:

class Rectangle:
    def __init__(self, width, height):
        self.width = width
        self.height = height

    def get_area(self):
        return self.width * self.height

    def set_width(self, width):
        self.width = width

    def set_height(self, height):
        self.height = height

"""
parameters = {
    "temperature": 0.2,
    "max_output_tokens": 512
}

response = code_generation_model.predict(
        prefix=prefix, **parameters
)

print(response.text)

```python
import unittest

from rectangle import Rectangle


class TestRectangle(unittest.TestCase):

    def test_get_area(self):
        """Test that the get_area() method returns the correct area."""
        rectangle = Rectangle(10, 20)
        self.assertEqual(rectangle.get_area(), 200)

    def test_set_width(self):
        """Test that the set_width() method sets the width correctly."""
        rectangle = Rectangle(10, 20)
        rectangle.set_width(30)
        self.assertEqual(rectangle.width, 30)

    def test_set_height(self):
        """Test that the set_height() method sets the height correctly."""
        rectangle = Rectangle(10, 20)
        rectangle.set_height(40)
        self.assertEqual(rectangle.height, 40)


if __name__ == '__main__':
    unittest.main()
```


## Debug Code with Code-chat

In [None]:
# @title Initialize Codechat-bison

from vertexai.language_models import CodeChatModel

code_chat_model = CodeChatModel.from_pretrained("codechat-bison@001")
chat = code_chat_model.start_chat()

In [None]:
# @title Set Up Debugging Prompt

message_1 = """You're an expert Pytorch programmer and great at debugging issues in Pytorch. Tell me how to fix this error - 'ValueError: expected sequence of length 43 at dim 1 (got 37)' I got from the code below:

from datasets import load_dataset
import evaluate
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer,
)

raw_datasets = load_dataset("glue", "mnli")

model_checkpoint = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)


def preprocess_function(examples):
    return tokenizer(examples["premise"], examples["hypothesis"], truncation=True)


tokenized_datasets = raw_datasets.map(preprocess_function, batched=True)
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint)

args = TrainingArguments(
    f"distilbert-finetuned-mnli",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    num_train_epochs=3,
    weight_decay=0.01,
)

metric = evaluate.load("glue", "mnli")


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    return metric.compute(predictions=predictions, references=labels)


trainer = Trainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation_matched"],
    compute_metrics=compute_metrics,
)
trainer.train()
"""

In [None]:
# @title Invoke Codechat-bison with Parameters and Prompt
parameters = {
    "temperature": 0.2,
    "max_output_tokens": 2048
}

response_1 = chat.send_message(message_1)

print(response_1)

The error message is telling you that the input to the model is not the correct size. The model expects an input of shape (batch_size, sequence_length), but the input you are providing is of shape (batch_size, 37).

To fix this, you need to make sure that the input to the model is the correct size. In this case, you can do this by changing the `truncation=True` parameter in the `preprocess_function` to `truncation=False`. This will tell the tokenizer to not truncate the input, even if it is longer than the maximum sequence length.

Once you have made this change, you should be able to run your code without any errors.


In [None]:
# @title Set Up Follow-Up Debugging Question Prompt

message_2 = """ Ok. I fixed that error. I am getting this new error - "RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
". How to fix this error?
"""

In [None]:
# @title Invoke Codechat-bison with Parameters and Follow-Up Debugging Question Prompt

response_2 = chat.send_message(message_2)

print(response_2)

The error message is telling you that there is a problem with the CUDA memory allocation. This could be caused by a number of things, such as not having enough free memory on your GPU, or a problem with the CUDA driver.

To fix this error, you can try the following:

* Make sure that you have enough free memory on your GPU.
* Try restarting your computer.
* Try updating your CUDA driver.
* Try reinstalling PyTorch.

If you are still having problems, you can contact the PyTorch support team for help.


## Code Document Generation

In [None]:
# @title Set Up Document Generation Prompt and Invoke Code-bison with Parameters and Prompt


prefix = """You're great at writing documents for python code. Write comments for this block of python code line by line: [

import cmath

a = 1
b = 5
c = 6

d = (b**2) - (4*a*c)

sol1 = (-b-cmath.sqrt(d))/(2*a)
sol2 = (-b+cmath.sqrt(d))/(2*a)

print('The solution are {0} and {1}'.format(sol1,sol2))
]

"""
parameters = {
    "temperature": 0.2,
    "max_output_tokens": 512
}

response = code_generation_model.predict(
        prefix=prefix, **parameters
)

print(response.text)

* Import the `cmath` module.
* Define the variables `a`, `b`, and `c`.
* Calculate the discriminant `d`.
* Calculate the solutions `sol1` and `sol2`.
* Print the solutions to the console.
