In [None]:
!pip install ollama

# Using CLDK to explain Java methods

In this tutorial, we will use CLDK to explain or generate code summary for all the methods in a Java Application.

By the end of this tutorial, you will have code summary for all the methods in a Java application. You'll be able to explore some of the benefits of using CLDK to perform fast and easy program analysis and build a LLM-based code summary generation.

You will learn how to do the following:

<ol>
<li> Create a new instance of the CLDK class.
<li> Create an analysis object over the Java application.
<li> Iterate over all the files in the project.
<li> Iterate over all the classes in the file.
<li> Iterate over all the methods in the class.
<li> Get the code body of the method.
<li> Initialize the treesitter utils for the class file content.
<li> Sanitize the class for analysis.
</ol>
Next, we will write a couple of helper methods to:

<ol>
<li> Format the instruction for the given focal method and class.
<li> Prompts the local model on Ollama.
<li> Prints the instruction and LLM output.
</ol>

## Prequisites

Before we get started, let's make sure you have the following installed:

<ol>
<li> Python 3.11 or later
<li> Ollama 0.3.4 or later
</ol>
We will use ollama to spin up a local granite model that will act as our LLM for this turorial.

### Prerequisite 1: Install ollama

If you don't have ollama installed, please download and install it from here: [Ollama](https://ollama.com/download).
Once you have ollama, start the server and make sure it is running.
If you're on MacOS, Linux, or WSL, you can check to make sure the server is running by running the following command:

In [None]:
systemctl status ollama

If not, you may have to start the server manually. You can do this by running the following command:

In [None]:
systemctl start ollama

Once ollama is up and running, you can download the latest version of the Granite 8b Instruct model by running the following command:

There are other granite versions available, but for this tutorial, we will use the Granite 8b Instruct model. You if prefer to use a different version, you can replace `8b-instruct` with any of the other [versions](https://ollama.com/library/granite-code/tags).

In [None]:
ollama pull granite-code:8b-instruct

Let's make sure the model is downloaded by running the following command:

In [None]:
ollama run granite-code:8b-instruct \"Write a python function to print 'Hello, World!'

### Prerequisite 3: Install ollama Python SDK

In [None]:
pip install ollama

### Prerequisite 4: Install CLDK
CLDK is avaliable on github at github.com/IBM/codellm-devkit.git. You can install it by running the following command:

In [None]:
pip install git+https://github.com/IBM/codellm-devkit.git

### Step 1: Get the sample Java application
For this tutorial, we will use apache commons cli. You can download the source code to a temporary directory by running the following command:

In [None]:
wget https://github.com/apache/commons-cli/archive/refs/tags/rel/commons-cli-1.7.0.zip -O /tmp/commons-cli-1.7.0.zip && unzip -o /tmp/commons-cli-1.7.0.zip -d /tmp

The project will now be extracted to `/tmp/commons-cli-rel-commons-cli-1.7.0`. We'll remove these files later, so don't worry about the location.

### Generate code summary
Code summarization or code explanation is a task that converts a code written in a programming language to a natural language. This particular task has several
benefits, such as understanding code without looking at its intrinsic details, documenting code for better maintenance, etc. To do that, one needs to
understand the basic details of code structure works, and use that knowledge to generate the summary using various AI-based approaches. In this particular
example, we will be using Large Language Models (LLM), specifically Granite 8B, an open-source model built by IBM. We will show how easily a developer can use
CLDK to expose various parts of the code by calling various APIs without implementing various time-intensive program analyses from scratch.

Step 1: Add all the neccessary imports

In [None]:
from pathlib import Path
import ollama
from cldk import CLDK
from cldk.analysis import AnalysisLevel

Step 2: Formulate the LLM prompt. The prompt can be tailored towards various needs. In this case, we show a simple example of generating summary for each
method in a Java class

In [None]:
def format_inst(code, focal_method, focal_class, language):
    """
    Format the instruction for the given focal method and class.
    """
    inst = f"Question: Can you write a brief summary for the method `{focal_method}` in the class `{focal_class}` below?\n"

    inst += "\n"
    inst += f"```{language}\n"
    inst += code
    inst += "```" if code.endswith("\n") else "\n```"
    inst += "\n"
    return inst

Step 3: Create a function to call LLM. There are various ways to achieve that. However, for illustrative purpose, we use ollama, a library to communicate with models downloaded locally.

In [None]:
def prompt_ollama(message: str, model_id: str = "granite-code:8b-instruct") -> str:
    """Prompt local model on Ollama"""
    response_object = ollama.generate(model=model_id, prompt=message)
    return response_object["response"]

Step 4: Create an object of CLDK and provide the programming language of the source code.

In [None]:
# Create a new instance of the CLDK class
cldk = CLDK(language="java")

Step 5: CLDK uses different analysis engine--Codeanalyzer (built using WALA and Javaparser), Treesitter, and CodeQL (future). By default, codenanalyzer has
been selected as the default analysis engine. Also, CLDK support different analysis levels--(a) symbol table, (b) call graph, (c) program dependency graph, and
(d) system dependency graph. Analysis engine can be selected using ```AnalysisLevel``` enum. In this example, we will generate summarization of all the methods
of an application. 

In [None]:
# Create an analysis object over the java application
analysis = cldk.analysis(project_path="/tmp/commons-cli-rel-commons-cli-1.7.0", analysis_level=AnalysisLevel.symbol_table)

Step 6: Iterate over all the class files and create the prompt. In this case, we want to provide a customized Java class in the prompt. For instance,

```
package com.ibm.org;
import A.B.C.D;
...
public class Foo {
 // code comment
 public void bar(){ 
    int a;
    a = baz();
    // do something
    }
 private int baz()
 {
    // do something
 }
 public String dummy (String a)
 {
    // do somthing
 }   
```
Given the above class, let's say we want to generate a summary for the ```bar``` method. To understand what it does, we add the callee of this method in the prompt, which in this case is ```baz```. We also remove imports, comments, etc. All of these are done using a single call to ```sanitize_focal_class``` API. In this process, we also use Treesitter to analyze the code. Once the input code has been sanitized, we call the ```format_inst``` method to create the LLM prompt, which has been passed to ```prompt_ollama``` method to generate the summary using LLM.

In [None]:
# Iterate over all the files in the project
for file_path, class_file in analysis.get_symbol_table().items():
    class_file_path = Path(file_path).absolute().resolve()
    # Iterate over all the classes in the file
    for type_name, type_declaration in class_file.type_declarations.items():
        # Iterate over all the methods in the class
        for method in type_declaration.callable_declarations.values():
            # Get code body of the method
            code_body = class_file_path.read_text()
    
            # Initialize the treesitter utils for the class file content
            tree_sitter_utils = cldk.tree_sitter_utils(source_code=code_body)
    
            # Sanitize the class for analysis
            sanitized_class = tree_sitter_utils.sanitize_focal_class(method.declaration)
    
            # Format the instruction for the given focal method and class
            instruction = format_inst(
                code=sanitized_class,
                focal_method=method.declaration,
                focal_class=type_name,
                language="java"
            )
    
            # Prompt the local model on Ollama
            llm_output = prompt_ollama(
                message=instruction,
                model_id="granite-code:20b-instruct",
            )
    
            # Print the instruction and LLM output
            print(f"Instruction:\n{instruction}")
            print(f"LLM Output:\n{llm_output}")