# Using CLDK to generate JUnit tests

In this tutorial, we will use CLDK to generate a JUnit test for all the methods in a Java Application.

By the end of this tutorial, you will have a JUnit test for all the methods in a Java application. You'll be able to explore some of the benefits of using CLDK to perform fast and easy program analysis and build a LLM-based test generator.

You will learn how to do the following:

<ol>
<li> Create a new instance of the CLDK class.
<li> Create an analysis object over the Java application.
<li> Iterate over all the files in the project.
<li> Iterate over all the classes in the file.
<li> Iterate over all the methods in the class.
<li> Get the code body of the method.
<li> Initialize the treesitter utils for the class file content.
<li> Sanitize the class for analysis.
</ol>
Next, we will write a couple of helper methods to:

<ol>
<li> Format the instruction for the given focal method and class.
<li> Prompts the local model on Ollama.
<li> Use CLDK to go through an application and generate unit test cases for each method.
</ol>

## Prequisites

Before we get started, let's make sure you have the following installed:

<ol>
<li> Python 3.11 or later
<li> Ollama 0.3.4 or later
<li> Java 11 or later
</ol>
We will use ollama to spin up a local granite model that will act as our LLM for this turorial.

### Prerequisite 1: Install ollama

If you don't have ollama installed, please download and install it from here: [Ollama](https://ollama.com/download).
Once you have ollama, start the server and make sure it is running. Once ollama is up and running, you can download the latest version of the Granite 8b Instruct model by running the following command:
There are other granite versions available, but for this tutorial, we will use the Granite 8b Instruct model. You if prefer to use a different version, you can replace `8b-instruct` with any of the other [versions](https://ollama.com/library/granite-code/tags).
Let's make sure the model is downloaded by running the following command:

In [None]:
%%bash
ollama run granite-code:8b-instruct \"Write a python function to print 'Hello, World!'

### Prerequisite 3: Install ollama Python SDK

In [None]:
pip install ollama

### Prerequisite 4: Install CLDK
CLDK is avaliable on github at github.com/IBM/codellm-devkit.git. You can install it by running the following command:

In [None]:
pip install git+https://github.com/IBM/codellm-devkit.git

### Get the sample Java application
For this tutorial, we will use apache commons cli. You can download the source code to a temporary directory by running the following command:

In [None]:
%%bash
wget https://github.com/apache/commons-cli/archive/refs/tags/rel/commons-cli-1.7.0.zip -O /tmp/commons-cli-1.7.0.zip && unzip -o /tmp/commons-cli-1.7.0.zip -d /tmp

The project will now be extracted to `/tmp/commons-cli-rel-commons-cli-1.7.0`. We'll remove these files later, so don't worry about the location.

### Building a JUnit test generator using CLDK and Granite Code Instruct Model
Now that we have all the prerequisites installed, let's start building a JUnit test generator using CLDK and the Granite Code Instruct Model.

Generating unit tests for code is a very tedious task and often takes a significant effort from the developers to write good test cases. There are various tools that are available for automated test generation, such as EvoSuite, which uses evolutionary algorithms to generate test cases. However, the test cases that are being generated are not natural and often developers do not prefer to add them to their test suite. Whereas Large Language Models (LLM) being trained with developer-written code it has a better affinity towards generating more natural code--more readable, maintainable code. In this excercise, we will show we can leverage LLMs to generate test cases with the help of CLDK. 

For simplicity, we will cover certain aspects of test generation and provide some context information to LLM for better quality of test cases. In this exercise, we will generate a unit test for a non-private method from a Java class and provide the focal method body and the signature of all the constructors of the class so that LLM can understand how to create an object of the focal class during the setup phase of the tests. Also, we will ask LLMs to generate ```N``` number of test cases, where ```N``` is the cyclomatic complexity of the focal method. The intuition is that one test may not be sufficient for covering fairly complex methods, and a cyclomatic complexity score can provide some guidance towards that. 

(Step 1) First, we will import all the necessary libraries

In [None]:
import ollama
from cldk import CLDK
from cldk.analysis import AnalysisLevel

(Step 2) Second, we will form the prompt for the model, which will include all the constructor signarures, and the body of the focal method.

In [None]:
def format_inst(focal_method_body, focal_method, focal_class, constructor_signatures, language):
    """
    Format the instruction for the given focal method and class.
    """
    inst = f"Question: Can you generate junit tests with @Test annotation for the method `{focal_method}` in the class `{focal_class}` below. Only generate the test and no description.\n"
    inst += 'Use the constructor signatures to form the object if the method is not static. Generate the code under ``` code block.'
    inst += "\n"
    inst += f"```{language}\n"
    inst += f"public class {focal_class} " + "{\n"
    inst += f"{constructor_signatures}\n"
    inst += f"{focal_method_body} \n" 
    inst += "}"
    inst += "```\n"
    inst += "Answer:\n"
    return inst

(Step 3) Third, use ollama to call LLM (in case Granite 8b).

In [None]:
def prompt_ollama(message: str, model_id: str = "granite-code:8b-instruct") -> str:
    """Prompt local model on Ollama"""
    response_object = ollama.generate(model=model_id, prompt=message, options={"temperature":0.2})
    return response_object["response"]

(Step 4) Fourth, collect all the information needed for each method. In this process, we go through all the classes in the application, and then for each class, we collect the signature of all the constructors. If there is no constructor present, we add the signature of the default constructor. Then, we go through all the non-private methods of the class and formulate the prompt using the constructor and the method information. Finally, we use the prompt to call LLM and get the final output.

In [None]:
# Create a new instance of the CLDK class
cldk = CLDK(language="java")
# Create an analysis object over the java application. Provide the application path.
analysis = cldk.analysis(project_path="/tmp/commons-cli-rel-commons-cli-1.7.0", analysis_level=AnalysisLevel.symbol_table)

# For simplicity, we run the test generation for a single class and method. One can remove that filter to run this code for the entire application
qualified_class_name = 'org.apache.commons.cli.GnuParser'
method_signature = 'flatten(Options, String[], boolean)'

# Go through all the classes in the application
for class_name in analysis.get_classes():

    if class_name == qualified_class_name:
        class_details  = analysis.get_class(qualified_class_name=class_name)
        focal_class_name = class_name.split('.')[-1]

        # Generate test cases for non-interface and non-abstract classes
        if not class_details.is_interface and 'abstract' not in class_details.modifiers:
            
            # Get all constructor signatures
            constructor_signatures = ''
            
            for method in analysis.get_methods_in_class(qualified_class_name=class_name):
                method_details = analysis.get_method(qualified_class_name=class_name, qualified_method_name=method)
                
                if method_details.is_constructor:
                    constructor_signatures += method_details.signature + '\n'
            
            # If no constructor present, then add the signature of the default constructor
            if constructor_signatures=='':
                constructor_signatures = f'public {focal_class_name}() ' + '{}'
            
            # Go through all the methods in the class
            for method in analysis.get_methods_in_class(qualified_class_name=class_name):
                
                if method==method_signature:
                    # Get the method details
                    method_details = analysis.get_method(qualified_class_name=class_name, qualified_method_name=method)
                    
                    # Generate test cases for non-private methods
                    if 'private' not in method_details.modifiers and not method_details.is_constructor:
                        
                        # Gather all the information needed for the prompt, which are focal method body, focal method name, focal class name, and constructor signature
                        prompt = format_inst(focal_method_body=method_details.declaration+method_details.code,
                                            focal_method=method.split('(')[0],
                                            focal_class=focal_class_name,
                                            constructor_signatures=constructor_signatures,
                                            language='Java')
                        
                        print(f"Instruction:\n{prompt}\n")
                        print(f"Generating test case . . .\n")
                        
                        # Prompt the local model on Ollama
                        llm_output = prompt_ollama(
                            message=prompt
                        )
                        
                        # Print the instruction and LLM output
                        print(f"LLM Output:\n{llm_output}")