# Run Inference on StarCoder2-7B

## Install and Load Dependencies

In [1]:
!pip install accelerate



In [2]:
import torch
from datetime import datetime
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

2024-06-11 13:49:39.965166: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-11 13:49:39.965262: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-11 13:49:40.146068: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## Import Model and Tokenizer

In [3]:
checkpoint = "bigcode/starcoder2-7b"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# for fp16 use `torch_dtype=torch.float16` instead
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.float16)

tokenizer_config.json:   0%|          | 0.00/7.88k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/777k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/442k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.06M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/958 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/893 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/41.6k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.89G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.51G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

## Create Test Prompt

In [39]:
prompt = """
Write a script in Python that implements a circular queue and creates one with user input. Give an example of how the script works and document the code with appropriate comments and docstrings wherever necessary. 
Generate a response only for the prompt that is given and nothing else. 
"""

## Run Inference with `model.generate()`

In [30]:
%%time
inputs = tokenizer.encode(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.



Question: Write a script in Python that implements a circular queue and creates one with user input.
Answer:

```
class CircularQueue:
    def __init__(self, size):
        self.size = size
        self.queue = [None] * size
        self.front = 0
        self.rear = 0

    def enqueue(self, data):
        if self.is_full():
            print("Queue is full")
            return
        self.queue[self.rear] = data
        self.rear = (self.rear + 1) % self.size

    def dequeue(self):
        if self.is_empty():
            print("Queue is empty")
            return
        data = self.queue[self.front]
        self.front = (self.front + 1) % self.size
        return data

    def is_empty(self):
        return self.front == self.rear

    def is_full(self):
        return (self.rear + 1) % self.size == self.front

    def print_queue(self):
        if self.is_empty():
            print("Queue is empty")
            return
        print("Queue:")
        for i in range(self.front, sel

## Run Inference with Pipeline

In [8]:
def generateOutputWithModelPipeline(model, tokenizer, prompt, temperature=0.7, max_new_tokens=256) : 
    pipe = pipeline(
        task="text-generation", 
        model=model, 
        tokenizer=tokenizer
    )
    
    start = datetime.now()
    
    output = pipe(
        prompt,
        do_sample=True,
        max_new_tokens=max_new_tokens, 
        temperature=temperature, 
        top_k=50, 
        top_p=0.95,
        num_return_sequences=1
    )
    
    stop = datetime.now()
    
    totalTimeToPrompt = stop - start
    print(f"Execution Time : {totalTimeToPrompt}")
    
    return output

In [41]:
output = generateOutputWithModelPipeline(model, tokenizer, prompt, max_new_tokens=1024)
print(output[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Execution Time : 0:01:06.005802

Write a script in Python that implements a circular queue and creates one with user input. Give an example of how the script works and document the code with appropriate comments and docstrings wherever necessary. 
Generate a response only for the prompt that is given and nothing else. 

For example: 

```
Enter the size of the queue: 3
Queue is created.
Enter a command:
a
Enter the value to be added: 1
1 is added.
Enter a command:
a
Enter the value to be added: 2
2 is added.
Enter a command:
a
Enter the value to be added: 3
3 is added.
Enter a command:
a
Enter the value to be added: 4
Queue is full.
Enter a command:
p
1 is popped.
Enter a command:
p
2 is popped.
Enter a command:
p
3 is popped.
Enter a command:
p
Queue is empty.
Enter a command:
a
Enter the value to be added: 4
Queue is full.
Enter a command:
a
Enter the value to be added: 5
Queue is full.
Enter a command:
p
Queue is empty.
Enter a command:
q

```

The queue is implemented as an array a

## Test Pipeline with Custom Input

In [4]:
language = "Python"
code = """
import typer
from yaspin import yaspin

from utils import *

app = typer.Typer()
spinner = yaspin()


@app.command()
def generate(
    path: str,
    model: str = typer.Argument(None),
    outputFile: str = 'output'
):
    '''
    Typer command to generate the documentation for an input script file.

    Arguments:
    filePath (string) - path to the input file
    model (string) - string to specify which model to use when generating documentation
    '''

    if model is None:
        model = modelUtils.getModelChoice()

    print(f"Arguments specified:")
    print(f"File Path: {path}")
    print(f"Model: {model}\n")

    functions = parserUtils.extractFunctionsAsList(path)

    chain = modelUtils.setupLangChain(model)

    print("Generating model outputs...")
    # spinner.start()

    modelOutputs = []

    for function in functions:
        stream = chain.stream({'function': function})
        outputString = ''
        
        for chunk in stream:                
            # Print each chunk to the terminal
            print(chunk, end='')
            # Concatenate each chunk to the output string
            outputString+= chunk
        
        modelOutputs.append(outputString)

    # spinner.stop()

    fileUtils.writeOutputToMarkdownFile(outputFile, modelOutputs, title=f"Documentation for `{path}`")


@app.command()
def easter():
    '''
    An easter egg for the keen eyed open-source contributor.
    '''
    print("This is a test command! If you're here, that means you've discovered something pretty cool in our code-base.\nCongrats and thanks for using our app!")


if __name__ == "__main__":
    app()
"""

In [5]:
template = f"""
Given a script file in {language}, generate its documentation for each function. For each function in the script, document its name, arguments, return values, and a brief explanation of its logic.

Strictly use the following format for each function:

## Function Name: `function_name`

### Arguments
* `arg1` (type): Description of argument 1.
* `arg2` (type): Description of argument 2.
* ...

### Return Values
* `return_value1` (type): Description of return value 1.
* `return_value2` (type): Description of return value 2.
* ...

### Explanation of Function Logic:
1. Brief explanation of the function logic step by step.
2. ...
3. ...

Ensure that code within comments is not parsed and documented. Generate nothing else than what is asked.

{code}
"""

In [6]:
template

'\nGiven a script file in Python, generate its documentation for each function. For each function in the script, document its name, arguments, return values, and a brief explanation of its logic.\n\nStrictly use the following format for each function:\n\n## Function Name: `function_name`\n\n### Arguments\n* `arg1` (type): Description of argument 1.\n* `arg2` (type): Description of argument 2.\n* ...\n\n### Return Values\n* `return_value1` (type): Description of return value 1.\n* `return_value2` (type): Description of return value 2.\n* ...\n\n### Explanation of Function Logic:\n1. Brief explanation of the function logic step by step.\n2. ...\n3. ...\n\nEnsure that code within comments is not parsed and documented. Generate nothing else than what is asked.\n\n\nimport typer\nfrom yaspin import yaspin\n\nfrom utils import *\n\napp = typer.Typer()\nspinner = yaspin()\n\n\n@app.command()\ndef generate(\n    path: str,\n    model: str = typer.Argument(None),\n    outputFile: str = \'output

In [None]:
output = generateOutputWithModelPipeline(model, tokenizer, template, max_new_tokens=2048)

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


In [None]:
print(output[0]["generated_text"])