In [None]:
!pip install ollama

Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are exploring their potential to automate code translation. In our recent paper [https://dl.acm.org/doi/10.1145/3597503.3639226]  published at ICSE'24, we found that LLM-based code translation is very promising. In this example, we will walk through the steps of translating each Java class to Python and checking various properties of translated code, such as the number of methods, number of fields, formal arguments, etc.

(Step 1) First, we will import all the necessary libraries

In [None]:
from cldk.analysis.python.treesitter import PythonSitter
from cldk.analysis.java.treesitter import JavaSitter
import ollama
from cldk import CLDK
from cldk.analysis import AnalysisLevel

(Step 2) Second, we will form the prompt for the model, which will include the body of the Java class after removing all the comments and the import statements.

In [None]:
def format_inst(code, focal_class, language):
    """
    Format the instruction for the given focal method and class.
    """
    inst = f"Question: Can you translate the Java class `{focal_class}` below to Python and generate under code block (```)?\n"

    inst += "\n"
    inst += f"```{language}\n"
    inst += code
    inst += "```" if code.endswith("\n") else "\n```"
    inst += "\n"
    return inst

(Step 3) Create a function to call LLM. There are various ways to achieve that. However, for illustrative purpose, we use ollama, a library to communicate with models downloaded locally.

In [None]:
def prompt_ollama(message: str, model_id: str = "granite-code:8b-instruct") -> str:
    """Prompt local model on Ollama"""
    response_object = ollama.generate(model=model_id, prompt=message)
    return response_object["response"]

(Step 4) Translate each class in the application (provide the application path as an environment variable, ```JAVA_APP_PATH```) and check certain properties of the translated code, such as (a) number of translated method, and (b) number of translated fields. 

In [None]:
# Create a new instance of the CLDK class
cldk = CLDK(language="java")
# Create an analysis object over the java application. Provide the application path using JAVA_APP_PATH
analysis = cldk.analysis(project_path="JAVA_APP_PATH", analysis_level=AnalysisLevel.symbol_table)
# Go through all the classes in the application
for class_name in analysis.get_classes():
    # Get the location of the Java class
    class_path = analysis.get_java_file(qualified_class_name=class_name)
    # Read the file content
    if not class_path:
        class_body = ''
    with open(class_path, 'r', encoding='utf-8', errors='ignore') as f:
        class_body = f.read()
    # Sanitize the file content by removing comments.
    tree_sitter_utils = cldk.tree_sitter_utils(source_code=class_body)
    sanitized_class =  JavaSitter.remove_all_comments(source_code=class_body)
    translated_code = prompt_ollama(
                message=sanitized_class,
                model_id="granite-code:20b-instruct")
    py_cldk = PythonSitter()
    all_methods = py_cldk.get_all_methods(module=translated_code)
    all_functions = py_cldk.get_all_functions(module=translated_code)
    all_fields = py_cldk.get_all_fields(module=translated_code)
    if len(all_methods) + len(all_functions) != len(analysis.get_methods_in_class(qualified_class_name=class_name)):
        print(f'Number of translated method not matching in class {class_name}')
    if len(all_fields) != len(analysis.get_class(qualified_class_name=class_name).field_declarations):
        print(f'Number of translated field not matching in class {class_name}')    