In [None]:
!pip install ollama

Code summarization or code explanation is a task that converts a code written in a programming language to a natural language. This particular task has several
benefits, such as understanding code without looking at its intrinsic details, documenting code for better maintenance, etc. To do that, one needs to
understand the basic details of code structure works, and use that knowledge to generate the summary using various AI-based approaches. In this particular
example, we will be using Large Language Models (LLM), specifically Granite 8B, an open-source model built by IBM. We will show how easily a developer can use
CLDK to expose various parts of the code by calling various APIs without implementing various time-intensive program analyses from scratch.

Step 1: Add all the neccessary imports

In [None]:
from pathlib import Path
import ollama
from cldk import CLDK
from cldk.analysis import AnalysisLevel

Step 2: Formulate the LLM prompt. The prompt can be tailored towards various needs. In this case, we show a simple example of generating summary for each
method in a Java class

In [None]:
def format_inst(code, focal_method, focal_class, language):
    """
    Format the instruction for the given focal method and class.
    """
    inst = f"Question: Can you write a brief summary for the method `{focal_method}` in the class `{focal_class}` below?\n"

    inst += "\n"
    inst += f"```{language}\n"
    inst += code
    inst += "```" if code.endswith("\n") else "\n```"
    inst += "\n"
    return inst

Step 3: Create a function to call LLM. There are various ways to achieve that. However, for illustrative purpose, we use ollama, a library to communicate with models downloaded locally.

In [None]:
def prompt_ollama(message: str, model_id: str = "granite-code:8b-instruct") -> str:
    """Prompt local model on Ollama"""
    response_object = ollama.generate(model=model_id, prompt=message)
    return response_object["response"]

Step 4: Create an object of CLDK and provide the programming language of the source code.

In [None]:
# Create a new instance of the CLDK class
cldk = CLDK(language="java")

Step 5: CLDK uses different analysis engine--Codeanalyzer (built using WALA and Javaparser), Treesitter, and CodeQL (future). By default, codenanalyzer has
been selected as the default analysis engine. Also, CLDK support different analysis levels--(a) symbol table, (b) call graph, (c) program dependency graph, and
(d) system dependency graph. Analysis engine can be selected using ```AnalysisLevel``` enum. In this example, we will generate summarization of all the methods
of an application. To select the application location, you can set the environment variable ```JAVA_APP_PATH```. 

In [None]:
# Create an analysis object over the java application
analysis = cldk.analysis(project_path="JAVA_APP_PATH", analysis_level=AnalysisLevel.symbol_table)

Step 6: Iterate over all the class files and create the prompt. In this case, we want to provide a customized Java class in the prompt. For instance,

```
package com.ibm.org;
import A.B.C.D;
...
public class Foo {
 // code comment
 public void bar(){ 
    int a;
    a = baz();
    // do something
    }
 private int baz()
 {
    // do something
 }
 public String dummy (String a)
 {
    // do somthing
 }   
```
Given the above class, let's say we want to generate a summary for the ```bar``` method. To understand what it does, we add the callee of this method in the prompt, which in this case is ```baz```. We also remove imports, comments, etc. All of these are done using a single call to ```sanitize_focal_class``` API. In this process, we also use Treesitter to analyze the code. Once the input code has been sanitized, we call the ```format_inst``` method to create the LLM prompt, which has been passed to ```prompt_ollama``` method to generate the summary using LLM.

In [None]:
# Iterate over all the files in the project
for file_path, class_file in analysis.get_symbol_table().items():
    class_file_path = Path(file_path).absolute().resolve()
    # Iterate over all the classes in the file
    for type_name, type_declaration in class_file.type_declarations.items():
        # Iterate over all the methods in the class
        for method in type_declaration.callable_declarations.values():
            # Get code body of the method
            code_body = class_file_path.read_text()
    
            # Initialize the treesitter utils for the class file content
            tree_sitter_utils = cldk.tree_sitter_utils(source_code=code_body)
    
            # Sanitize the class for analysis
            sanitized_class = tree_sitter_utils.sanitize_focal_class(method.declaration)
    
            # Format the instruction for the given focal method and class
            instruction = format_inst(
                code=sanitized_class,
                focal_method=method.declaration,
                focal_class=type_name,
                language="java"
            )
    
            # Prompt the local model on Ollama
            llm_output = prompt_ollama(
                message=instruction,
                model_id="granite-code:20b-instruct",
            )
    
            # Print the instruction and LLM output
            print(f"Instruction:\n{instruction}")
            print(f"LLM Output:\n{llm_output}")