## Test Generation with CLDK

In this tutorial, we will use CLDK to implement a simple unit test generator for Java. You'll explore some of the benefits of using CLDK to perform quick and easy program analysis and build an LLM-based test generator. By the end of this tutorial, you will have implemented such a tool and generated a JUnit test case for a Java application.

Specifically, you will learn how to perform the following tasks on the application under test to create LLM prompts for test generation:

1. Create a new instance of the CLDK class.
2. Create an analysis object for the Java application under test.
3. Iterate over all files in the application.
4. Iterate over all classes in a file.
5. Iterate over all methods in a class.
6. Get the code body of a method.
7. Get the constructors of a class.

Let's get started by installing the required dependencies.

In [None]:
%%bash
python3 -m venv .venv
source .venv/bin/activate
pip install -U -r requirements.txt

## Let's setup our LLM 

We'll be using open router, so we'll load the API key from the environment variable `OPENROUTER_API`.

In [1]:
## Import API keys
import os
from dotenv import load_dotenv

load_dotenv(dotenv_path=os.getenv("PWD") + "/.env", override=True)
# Load environment variables from .env file
    
print("API keys loaded successfully.")
print(f"API_KEY: {os.getenv('OPENROUTER_API')[:3]}...{os.getenv('OPENROUTER_API')[-3:]}")

API keys loaded successfully.
API_KEY: sk-...906


#### Let's create a simple prompting function

This function will take a prompt and return the response from the OpenRouter API.

In [2]:
from openai import OpenAI


def prompt(message: str) -> str:
    """
    Function to prompt the user for input.
    """
    client = OpenAI(
        base_url="https://openrouter.ai/api/v1",
        api_key=os.getenv("OPENROUTER_API"),  # OpenRouter API key
    )
    completion = client.chat.completions.create(
        model="meta-llama/llama-3.2-3b-instruct:free", messages=[{"role": "user", "content": message}]
    )

    return completion.choices[0].message.content

def test_prompt():
    """
    Test function to check if the prompt function works correctly.
    """
    test_message = "What is the capital of France?"
    response = prompt(test_message)
    
    assert "Paris" in response, f"Expected response to contain 'Paris', but got '{response}'"

test_prompt()

## Generate unit tests for methods in a java application

We'll start by downloading apache commons cli for this tutorial.

In [None]:
%%bash
COMMONS=commons-cli-1.7.0  
wget https://github.com/apache/commons-cli/archive/refs/tags/rel/$COMMONS.zip -O $COMMONS.zip && \
unzip -o $COMMONS.zip && \
rm -f $COMMONS.zip 

Next, let's create another helper function to formulate the prompt for summarizing the methods in a java application.

In [3]:
def format_inst(
    focal_method_body, focal_method, focal_class, constructor_signatures, language
):
    """
    Format the LLM instruction for the given focal method and class.
    """
    inst = f"Can you generate junit tests with @Test annotation for the method `{focal_method}` in the class `{focal_class}` below. Only generate the test and no description.\n"
    inst += "Use the constructor signatures to form the object if the method is not static. Generate the code under ``` code block."
    inst += "\n"
    inst += f"```{language}\n"
    inst += f"public class {focal_class} " + "{\n"
    inst += f"{constructor_signatures}\n"
    inst += f"{focal_method_body} \n"
    inst += "}\n"
    inst += "```\n"
    return inst

Let's initialize CLDK with Java as the language

In [4]:
from cldk import CLDK

cldk = CLDK(language="java")

#### Generate analysis artifacts

##### What is CLDK analysis?
CLDK uses [CodeAnalyzer](https://github.com/codellm-devkit/codeanalyzer-java) (built with [WALA](https://github.com/wala/WALA) and [JavaParser](https://github.com/javaparser/javaparser))as the Java analysis engine. CLDK supports different analysis levels: 1) symbol table, 2) call graph, 3) system dependency graph. 

The analysis level can be selected using the `AnalysisLevel` enumerated type. For this example, we select the symbol-table analysis level, with CodeAnalyzer as the default analysis engine.

> **NOTE:** If the next cell throws an error `CalledProcessError`, make sure you have a working Java installation! See the [**CLDK Documentation**](https://codellm-devkit.info/installing/#java-analysis) for how to set this up.

##### How to create an analysis object?

To create an analysis object, we call `cldk.analysis(...)` with the following parameters:
- `project_path`: The path to the project to be analyzed.
- `analysis_level`: The analysis level to be used. This can be one of the following: 
  - `AnalysisLevel.SYMBOL_TABLE`: For analyzing the symbol tables of the application with the analysis engine's JavaParser.
  - `AnalysisLevel.CALL_GRAPH`: To build the call graph of the application with the analysis engine's WALA.


In [5]:
# Setup analysis object
analysis = cldk.analysis(
    project_path="commons-cli-rel-commons-cli-1.7.0", #  <-- the path to the project we downloaded a few cells ago.
    analysis_level="symbol table",  # <-- This is the default, no need to specify it explicitly.
)

> **NOTE:** This will take a few seconds to run, as it will analyze the entire project. 
> The analysis pipeline involves the following steps:
>   1. **Dependency Resolution**: Maven or gradle is used to resolve the dependencies of the project and download them to a local directory.
>   2. **Parsing**: The JavaParser library is used to parse the Java source code files and build an abstract syntax tree (AST) representation of the code.
>   3. **Type Resolution**: The JavaParser library is used to resolve the types of the variables and methods in the code, which is necessary for building the symbol table and call graph.
>   4. **Symbol Table Construction**: The symbol table is constructed from the AST, which includes information about the classes, methods, and variables in the code.
>   5. **Call Graph Construction**: The call graph is constructed using the WALA library, which analyzes the control flow of the program and builds a graph representation of the method calls. (*Not executed this time because we set `analysis_level="symbol table"`*)

### Putting it all together

Now that we have the analysis object, we will take a slightly different approach to generate the test cases: 

We go through all the classes in the application, and for each class, 
   1. We collect the signatures of its constructors. 
   2. If a class has no constructors, we add the signature of the default constructor. 
   3. We go through each non-private method of the class and formulate the prompt using the constructor and the method information. 

Finally, we use the prompt to call the LLM to generate test cases and get the LLM response. 

> **NOTE:** For the sake of simplicity, we run the test generation on a single class and method but this filter can be removed to run this code over the entire application.

In [6]:
focal_class = "org.apache.commons.cli.GnuParser"
focal_method = "flatten(Options, String[], boolean)"

In [7]:
# -----
# I am importing class for type hinting (optional but recommended)
from cldk.models.java import JType, JCallable
# -----

# Go through all the classes in the application
for class_name in analysis.get_classes():
    #  ^^^^^^^^^^^
    #  This will return a list of all the classes in the application.
    if class_name == focal_class:
        print(f"Class: {class_name}")
        class_details: JType = analysis.get_class(qualified_class_name=class_name)
        #  ^^^^^
        #  JType captures the class details, including its methods, fields, and modifiers, callables, etc.
        focal_class_name = class_name.split(".")[-1]

        # We will also ignore abstract classes
        # __________
        if not class_details.is_interface and "abstract" not in class_details.modifiers:
            # ^^^^^^^^^^^^
            # This will return True if the class is an interface, and False otherwise.

            # Get all constructor signatures
            constructor_signatures = ""

            for method in analysis.get_methods_in_class(
                                 # ^^^^^^^^^^^^^^^^^^^^
                                 # This will return a list of all the methods in the class.
                qualified_class_name=class_name
            ):
                method_details: JCallable = analysis.get_method(
                    qualified_class_name=class_name, qualified_method_name=method
                )

                if method_details.is_constructor:
                    # ^^^^^^^^^^^^^^
                    # We can find if a method is a constructor with this field in JCallable
                    constructor_signatures += method_details.signature + "\n"

            # If no constructor present, then add the signature of the default constructor
            if constructor_signatures == "":
                constructor_signatures = f"public {focal_class_name}() " + "{}"

            # Go through all the methods in the class
            for method in analysis.get_methods_in_class(
                qualified_class_name=class_name
            ):
                if method == focal_method:
                    # Get the method details
                    method_details: JCallable = analysis.get_method(
                        qualified_class_name=class_name, qualified_method_name=method
                    )

                    # Generate test cases for non-private methods
                    if (
                        "private" not in method_details.modifiers
                        and not method_details.is_constructor
                    ):

                        # Gather all the information needed for the prompt, which are focal method body, focal method name, focal class name, and constructor signature
                        instruction = format_inst(
                            focal_method_body=method_details.declaration
                            + method_details.code,
                            focal_method=method.split("(")[0],
                            focal_class=focal_class_name,
                            constructor_signatures=constructor_signatures,
                            language="java",
                        )

                        # Print the instruction
                        print(f"Instruction:\n{instruction}\n")
                        print(
                            f"Generating test case and it will take few minutes (or even seconds) based on where the model has been hosted...\n"
                        )

                        # Prompt the local model on Ollama
                        llm_output = prompt(message=instruction)

                        # Print the LLM output
                        print(f"LLM Output:\n{llm_output}")

Class: org.apache.commons.cli.GnuParser
Instruction:
Can you generate junit tests with @Test annotation for the method `flatten` in the class `GnuParser` below. Only generate the test and no description.
Use the constructor signatures to form the object if the method is not static. Generate the code under ``` code block.
```java
public class GnuParser {
public GnuParser() {}
protected String[] flatten(final Options options, final String[] arguments, final boolean stopAtNonOption){
    final List<String> tokens = new ArrayList<>();
    boolean eatTheRest = false;
    for (int i = 0; i < arguments.length; i++) {
        final String arg = arguments[i];
        if ("--".equals(arg)) {
            eatTheRest = true;
            tokens.add("--");
        } else if ("-".equals(arg)) {
            tokens.add("-");
        } else if (arg.startsWith("-")) {
            final String opt = Util.stripLeadingHyphens(arg);
            if (options.hasOption(opt)) {
                tokens.add(arg);
  