# LLM and SampleTrimmer

 > This tutorial gives details about two import classes to process LLM generated codes (known as "samples"). The `LLM` class defines how to access to the LLM, while the `SampleTrimmer` class trims the unuseful part of the code using abstract systax tree (ast) package.

## Sampler class

The `LLM` class defines how to access to the LLM. The user can either deploy an LLM locally on your own device/server, or use LLM API. The user should create a new child class of the `LLM` class (extend `LLM`) and implement (override) the `draw_sample` function.

### Initialization of the user-defined sampler class

There is a keyword argument `auto_trim`  in the `LLM` class of which the default value is `True`. This means no matter the user chooses a code completion model (such as StarCoder, CodeLlama-Python, etc.) or a chat model (GPT series, Llama series, etc.), we can automatically identify the “useful part” without descriptions and truncated code. So, if there is no special issue, please **always leave it default**.

### Implementation of the draw_sample function

The `draw_sample` function decides the manner to obtain the generated content from LLM and return the `str` -typed content **(feel free to return the answer generated by LLM, which may incorporate some useless descriptions, as they will be trimmed automatically by our trimmer)**. Here, we show a brief example of using LLM API.

## SampleTrimmer class

The following examples demonstrate how `SampleTrimmer` works.

## Tutorial

In [1]:
from llm4ad.base import SampleTrimmer

Below is an example of response content of LLM. 

In [2]:
llm_response_content = '''\
OK, this is the generated code:

def my_function(arr):
    """This is an example function."""
    max = np.max(arr)
    min = np.min(arr)
    result = max / min
    return result
    
This function aims to calculate the ...
'''

In our pipline, we only want the informative part, i.e., the code for the heuristic. So we can trim the redundant part ("OK, this is ...", "This function aims to ...") of the generated content by using the *SampleTrimmer.auto_trim*. The *auto_trim* function can automatically identify if a response content is come from an instruct model (i.e., GPT-3.5) or a completion model (i.e., StarCoder), and perform correspond operations to trim the code.

The trimmed result of the response content consists of **function body** and **descriptions** after the function body (don't worry about the content after the function body, as they can be removed easily).

In [3]:
trimmed_response_content = SampleTrimmer.auto_trim(llm_response_content)
print(trimmed_response_content)

    """This is an example function."""
    max = np.max(arr)
    min = np.min(arr)
    result = max / min
    return result

This function aims to calculate the ...



Convert the trimmed response content (in str) to a Program instance by giving a template program.

In [4]:
template_program = '''\
import numpy as np

def func(arr):
    return arr
'''

program = SampleTrimmer.sample_to_program(trimmed_response_content, template_program)
print(str(program))

import numpy as np

def func(arr):
    max = np.max(arr)
    min = np.min(arr)
    result = max / min
    return result


