In [1]:
%load_ext autoreload
%autoreload 2

import logging
import os
import sys
from collections import defaultdict
from pathlib import Path
from typing import Dict, List, Callable

from autodev.code_functions import ExplainCodeFunction, CodeFunction, PotentialProblemsFunction
from autodev.llm import LLMType, StreamingLLM

log = logging.getLogger("main")
logging.basicConfig(format='%(levelname)-5s %(asctime)-15s %(name)s:%(funcName)s - %(message)s', stream=sys.stdout, level=logging.INFO)



In [2]:
def read_code_snippets() -> Dict[str, str]:
    code_snippets = {}
    root = Path("data") / "code_snippets"
    for fn in os.listdir(root):
        with open(root / fn, "r") as f:
            code_snippets[fn] = f.read()
    return code_snippets


code_snippets = read_code_snippets()
log.info(f"{len(code_snippets)} code snippets read")

INFO  2023-05-23 11:21:23,967 main:<module> - 2 code snippets read


In [3]:
model_types = [LLMType.HUGGINGFACE_STARCHAT, LLMType.OPENAI_CHAT_GPT4]
code_fn_factories: List[Callable[[StreamingLLM], CodeFunction]] = [ExplainCodeFunction, PotentialProblemsFunction]

responses = defaultdict(lambda: defaultdict(dict))
for model_type in model_types:
    model = model_type.create_streaming_llm()
    model_name = model_type.name
    for code_fn_factory in code_fn_factories:
        code_fn = code_fn_factory(model)
        code_fn_name = code_fn.__class__.__name__
        for code_snippet_name, code_snippet in code_snippets.items():
            log.info(f"Applying {code_fn_name} to {code_snippet_name} using {model_name}")
            response = code_fn.apply(code_snippet, stream=sys.stdout)
            responses[code_fn_name][code_snippet_name][model_name] = response


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


INFO  2023-05-23 11:22:07,027 main:<module> - Applying ExplainCodeFunction to segmentation.py using HUGGINGFACE_STARCHAT


Setting `pad_token_id` to `eos_token_id`:49155 for open-end generation.



This code is a Python implementation of the Viterbi algorithm for sequence segmentation. The Viterbi algorithm is used to find the most likely sequence of hidden states (in this case, the separator values) that results in a sequence of observed events (in this case, the integer values in the sequence).

The code takes in a sequence of integers and a separator value as input, and returns a list of lists where each sublist represents a segmented sequence. The code first initializes a list result to store the segmented sequences, a list curr to store the current segment, and a for loop to iterate over the sequence.

In each iteration of the loop, the code checks if the current integer in the sequence is equal to the separator value. If it is, it adds the current segment to the result list, resets the current segment to an empty list, and increments the separator value by 1. This ensures that each segmented sequence in the result list has a unique separator value.

If the current integer 

Setting `pad_token_id` to `eos_token_id`:49155 for open-end generation.



This is a Python function that takes two arguments, x and y, and returns their sum. The function is named add, and it is defined using the def keyword. The function body is indented, and contains a single return statement that adds the two arguments.<|end|>INFO  2023-05-23 11:22:22,578 main:<module> - Applying PotentialProblemsFunction to segmentation.py using HUGGINGFACE_STARCHAT


Setting `pad_token_id` to `eos_token_id`:49155 for open-end generation.



<|assistant|>
Here are some potential problems with the code:

1. The function signature is not PEP 484 compliant. The type annotations are not enforced at runtime, so they do not provide any type safety or other benefits.
2. The function is not type-annotated, so it is not self-documenting. This makes it harder for other developers to understand and use the function.
3. The function has a confusing name. The name "segmentation" suggests that it is splitting a sequence into segments, but it actually creates subsequences based on a separator value. The name "split_by_value" or "group_by_value" would be more clear.
4. The function does not handle edge cases, such as empty lists or lists with only one element. This can cause the function to return incorrect results or raise exceptions.
5. The function does not check for invalid input, such as non-numeric separator values. This can cause the function to return incorrect results or raise exceptions.
6. The function does not have any tests 

Setting `pad_token_id` to `eos_token_id`:49155 for open-end generation.



<|assistant|>
Here are some potential problems with the code:

1. Function name conflicts: If another function in the same scope has the same name as the function, it will override the original function definition. This can cause errors if the original function was intended to perform a different operation.

2. Unintended side effects: Functions can have unintended side effects, such as modifying global variables or modifying the arguments passed in. This can cause bugs that are hard to find and fix.

3. Difficult to maintain: Functions can be difficult to maintain and modify over time. If the function needs to be changed, it can be difficult to determine the exact impact of each change.

4. Lack of encapsulation: Functions can be difficult to encapsulate, which means that they cannot be easily reused in other parts of the code. This can make it difficult to write modular and reusable code.

5. Performance issues: Functions can have a negative impact on performance if they are called 



This piece of code defines a function called "segmentation" that takes in a list of integers called "seq" and an integer called "separator" as input arguments. The function returns a list of lists of integers as its output.

The "segmentation" function processes the input sequence "seq" and splits it into sub-sequences (sub-lists) whenever the "separator" integer is encountered. It appends each sub-list to a list called "result".

Here's how the code works:

1. Initialize an empty list "result" that will store the final list of lists.
2. Initialize an empty list "curr" that will temporarily store sub-sequences.
3. Iterate through each item in the input sequence "seq" using a for loop, with index "i" and "item" representing the current item in the sequence.
4. Inside the loop, check if the current item is equal to the separator value. If it is, then append the current "curr" sub-list to the "result" list and reset "curr" to an empty list.
5. If the current item is not equal to the separ

In [4]:
def format_responses(responses: Dict[str, Dict[str, Dict[str, str]]]) -> str:
    html = ""
    for code_fn_name, code_fn_responses in responses.items():
        html += f"<h2>{code_fn_name}</h2>"
        for code_snippet_name, code_snippet_responses in code_fn_responses.items():
            html += f"<h3>{code_snippet_name}</h3>"
            html += f"<pre>{code_snippets[code_snippet_name]}</pre>"
            for model_name, response in code_snippet_responses.items():
                html += f"<h4>{model_name}</h4>"
                html += f"<pre>{response}</pre>"
    return html

from IPython.core.display import HTML
display(HTML(format_responses(responses)))