# Run Inference on StarCoder2-7B

## References
1. [StarCoder2-7B's HuggingFace Repo](https://huggingface.co/bigcode/starcoder2-7b)

## Install and Load Dependencies

In [None]:
!pip install accelerate

In [None]:
import torch
from datetime import datetime
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

## Import Model and Tokenizer

In [None]:
checkpoint = "bigcode/starcoder2-7b"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# for fp16 use `torch_dtype=torch.float16` instead
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.float16)

In [None]:
print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")

## Create Test Prompt

In [None]:
prompt = """
Write a script in Python that implements a circular queue and creates one with user input. Give an example of how the script works and document the code with appropriate comments and docstrings wherever necessary. 
Generate a response only for the prompt that is given and nothing else. 
"""

## Run Inference with `model.generate()`

In [None]:
%%time
inputs = tokenizer.encode(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

## Run Inference with Pipeline

In [None]:
def generateOutputWithModelPipeline(model, tokenizer, prompt, temperature=0.7, max_new_tokens=256) : 
    textGenerator = pipeline(
        task="text-generation", 
        model=model, 
        tokenizer=tokenizer,
        pad_token_id=tokenizer.eos_token_id
    )
    
    start = datetime.now()
    
    output = textGenerator(
        prompt,
        do_sample=True,
        max_new_tokens=max_new_tokens, 
        temperature=temperature, 
        top_k=50, 
        top_p=0.9,
        num_return_sequences=1
    )
    
    stop = datetime.now()
    
    totalTimeToPrompt = stop - start
    print(f"Execution Time : {totalTimeToPrompt}")
    
    return output

In [None]:
output = generateOutputWithModelPipeline(model, tokenizer, prompt)
print(output[0]['generated_text'])

## Test Pipeline with Custom Input

In [None]:
language = "Python"
code = """
def test_euler_method(self):
    def dydx(x, y):
        return x + y

    x0, y0 = 0, 1
    x_end = 1
    h = 0.01
    x, y = differential_equations.euler_method(dydx, x0, y0, x_end, h)
    expected = np.exp(x_end) - x_end - 1  # Analytical solution for the differential equation dy/dx = x + y
    self.assertAlmostEqual(y[-1], expected, places=2)
"""

In [None]:
template = f"""
Given a script file in {language}, generate its documentation for each function. For each function in the script, document its name, arguments, return values, and a brief explanation of its logic. Ensure that code within comments is not parsed and documented. Generate nothing else than what is asked.

Strictly use the following format for each function:

## Function Name: `function_name`

### Arguments
* `arg1` (type): Description of argument 1.
* `arg2` (type): Description of argument 2.
* ...

### Return Values
* `return_value1` (type): Description of return value 1.
* `return_value2` (type): Description of return value 2.
* ...

### Explanation of Function Logic:
1. Brief explanation of the function logic step by step.
2. ...
3. ...

-----------------------------------------------------------------------------

Here is the function I want you to generate documentation of:

{code}
"""

In [None]:
template

In [None]:
output = generateOutputWithModelPipeline(model, tokenizer, template)
generatedText = output[0]['generated_text']
response = generatedText[len(template):].strip()

In [None]:
print(generatedText)

In [None]:
print(response)

## Remarks

I'm sure StarCoder2 is great in it's own regard as a code completion agent, however, for my use case, it simply won't cut it.