# Using CLDK to validate code translation

In this tutorial, we will use CLDK to valdiate translated code.

By the end of this tutorial, you will have a very light-weight approach for validating code translated from Java to Python. You'll be able to explore some of the benefits of using CLDK to perform fast and easy program analysis.

You will learn how to do the following:

<ol>
<li> Create a new instance of the CLDK class.
<li> Create an analysis object over the Java application.
<li> Iterate over all the files in the project.
<li> Iterate over all the classes in the file.
<li> Iterate over all the methods in the class.
<li> Get the code body of the method.
<li> Initialize the treesitter utils for the class file content.
<li> Sanitize the class for analysis.
</ol>
Next, we will write a couple of helper methods to:

<ol>
<li> Format the instruction for the given focal method and class.
<li> Prompts the local model on Ollama.
<li> Use CLDK to analyze code and get context information for translating code.
</ol>

## Prequisites

Before we get started, let's make sure you have the following installed:

<ol>
<li> Python 3.11 or later
<li> Ollama 0.3.4 or later
<li> Java 11 or later
</ol>
We will use ollama to spin up a local granite model that will act as our LLM for this turorial.

### Prerequisite 1: Install ollama

If you don't have ollama installed, please download and install it from here: [Ollama](https://ollama.com/download).
Once you have ollama, start the server and make sure it is running. Once ollama is up and running, you can download the latest version of the Granite 8b Instruct model by running the following command:
There are other granite versions available, but for this tutorial, we will use the Granite 8b Instruct model. You if prefer to use a different version, you can replace `8b-instruct` with any of the other [versions](https://ollama.com/library/granite-code/tags).
Let's make sure the model is downloaded by running the following command:

In [None]:
%%bash
ollama run granite-code:8b-instruct \"Write a python function to print 'Hello, World!'

### Prerequisite 3: Install ollama Python SDK

In [None]:
pip install ollama

### Prerequisite 4: Install CLDK
CLDK is avaliable on github at github.com/IBM/codellm-devkit.git. You can install it by running the following command:

In [None]:
pip install git+https://github.com/IBM/codellm-devkit.git

### Step 1: Get the sample Java application
For this tutorial, we will use apache commons cli. You can download the source code to a temporary directory by running the following command:

In [None]:
%%bash
wget https://github.com/apache/commons-cli/archive/refs/tags/rel/commons-cli-1.7.0.zip -O /tmp/commons-cli-1.7.0.zip && unzip -o /tmp/commons-cli-1.7.0.zip -d /tmp

The project will now be extracted to `/tmp/commons-cli-rel-commons-cli-1.7.0`. We'll remove these files later, so don't worry about the location.

### Translating Jave code to Python and build a light-weight validation logic
Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are exploring their potential to automate code translation. In our recent paper [https://dl.acm.org/doi/10.1145/3597503.3639226]  published at ICSE'24, we found that LLM-based code translation is very promising. In this example, we will walk through the steps of translating each Java class to Python and checking various properties of translated code, such as the number of methods, number of fields, formal arguments, etc.

(Step 1) First, we will import all the necessary libraries

In [None]:
from cldk.analysis.python.treesitter import PythonSitter
from cldk.analysis.java.treesitter import JavaSitter
import ollama
from cldk import CLDK
from cldk.analysis import AnalysisLevel

(Step 2) Second, we will form the prompt for the model, which will include the body of the Java class after removing all the comments and the import statements.

In [None]:
def format_inst(code, focal_class, language):
    """
    Format the instruction for the given focal method and class.
    """
    inst = f"Question: Can you translate the Java class `{focal_class}` below to Python and generate under code block (```)?\n"

    inst += "\n"
    inst += f"```{language}\n"
    inst += code
    inst += "```" if code.endswith("\n") else "\n```"
    inst += "\n"
    return inst

(Step 3) Create a function to call LLM. There are various ways to achieve that. However, for illustrative purpose, we use ollama, a library to communicate with models downloaded locally.

In [None]:
def prompt_ollama(message: str, model_id: str = "granite-code:8b-instruct") -> str:
    """Prompt local model on Ollama"""
    response_object = ollama.generate(model=model_id, prompt=message)
    return response_object["response"]

(Step 4) Translate each class in the application and check certain properties of the translated code, such as (a) number of translated method, and (b) number of translated fields. 

In [None]:
# Create a new instance of the CLDK class
cldk = CLDK(language="java")

# Create an analysis object over the java application
analysis = cldk.analysis(project_path="/tmp/commons-cli-rel-commons-cli-1.7.0", analysis_level=AnalysisLevel.symbol_table)

# For simplicity, we run the code translation for a single class. One can remove that filter to run this code for the entire application
qualified_class_name = 'org.apache.commons.cli.GnuParser'

# Go through all the classes in the application
for class_name in analysis.get_classes():
    
    if class_name==qualified_class_name:
        # Get the location of the Java class
        class_path = analysis.get_java_file(qualified_class_name=class_name)
        
        # Read the file content
        if not class_path:
            class_body = ''
        with open(class_path, 'r', encoding='utf-8', errors='ignore') as f:
            class_body = f.read()
        
        # Sanitize the file content by removing comments.
        tree_sitter_utils = cldk.tree_sitter_utils(source_code=class_body)
        sanitized_class =  JavaSitter().remove_all_comments(source_code=class_body)

        inst = format_inst(code=sanitized_class, language='java', focal_class=class_name.split('.')[-1])

        print(f"Instruction:\n{inst}\n")
        print(f"Translating Java code to Python . . .\n")
        translated_code = prompt_ollama(
                    message=inst)
        
        print(f"Translated Python code: {translated_code}")
        py_cldk = PythonSitter()
        all_methods = py_cldk.get_all_methods(module=translated_code)
        all_functions = py_cldk.get_all_functions(module=translated_code)
        all_fields = py_cldk.get_all_fields(module=translated_code)
        
        if len(all_methods) + len(all_functions) != len(analysis.get_methods_in_class(qualified_class_name=class_name)):
            print(f'Number of translated method not matching in class {class_name}')
        else:
            print(f'Number of translated method in class {class_name} is {len(all_methods)}')
        if all_fields:
            if len(all_fields) != len(analysis.get_class(qualified_class_name=class_name).field_declarations):
                print(f'Number of translated field not matching in class {class_name}')    