# Converting Code with Open-Source models

### Big Code Models Leaderboard
https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard    
- Look at the base model (filter)
- And then "All Models"
- Python and C++

| all | base | instruction-tuned | EXT external-evaluation | 
| --- | --- | --- | --- |

| T | Model | Win Rate | humaneval-python | java | javascript | cpp | 
| --- | --- | --- | --- | --- | --- | --- | 
| EXT | Qwen2.5-Coder-32B-Instruct | 59.17 | 83.2 | 73.69 | 76.05 | 81.95 |    
      
EXT - benchmark was done externally (Select the ones which does not have this)         
However, Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen)     

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-Coder-32B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "write a quick sort algorithm."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Ask huggingFace to run this model for you and to give you an endpoint which you can use to call the model remotely from your code.      
https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct     
"Deploy" Dropdown Selct "Inference EndPoints(dedicated" (I assume this is now changed to "HF inference EndPoint")   

Unable to set it up as the lower cost ones are not available. $8 an hour is expensive. So did not try    
     
Tried the "Inference provider", the new option and see if it works. However, it looks like another solution is provided in the community section 

In [None]:
from huggingface_hub import InferenceClient

client = InferenceClient(
	provider="together",
	api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx"
)

messages = [
	{
		"role": "user",
		"content": "What is the capital of France?"
	}
]

completion = client.chat.completions.create(
    model="Qwen/Qwen2.5-Coder-32B-Instruct", 
	messages=messages, 
	max_tokens=500,
)

print(completion.choices[0].message)

In [None]:
# SambaNova   
from huggingface_hub import InferenceClient

client = InferenceClient(
	provider="sambanova",
	api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx"
)

messages = [
	{
		"role": "user",
		"content": "What is the capital of France?"
	}
]

completion = client.chat.completions.create(
    model="Qwen/Qwen2.5-Coder-32B-Instruct", 
	messages=messages, 
	max_tokens=500,
)

print(completion.choices[0].message)

# How to evaluate the performance of a Gen AI solution?      
Two different kind of performance metrics     
- Model Centric or Technical
- Business-centric or Outcome Metrics

## Model centric or Technical Metrics       
kind of metrics that data scientists live and breathe by because these are metrics which we can optimize our models with and they tend to measure in a very immediate way the performance of the model              
- Loss (e.g. cross entropy loss)
- Perplexity
- Accuracy
- Precision, Recall, F1
- AUC-ROC
      
Easiest to optimize with

### Loss    
How poorly an LLM has performed in its task which is used during optimization of the training and try to reduce it    
Cross-Entropy Loss is most frequently used.   
Input set/sequence of tokens and try to predict the next token and you have the real token and calculate the loss.      
What the model actually does is it just doesn't predict the next token but gives you a probability distribution of the possible next tokens that could come in the list.    
We may pick the highest probability and what the model predict and that probability is what we will use as the basis for the cross-entropy loss     
(We take the negative log, we take minus the log of the probability. If you take the probability of one, which would be a perfect answer, it would mean that we said there was a 100% likelihood that the next token was exactly the thing that turned out to be the next token. The negative log of one is zero. zero loss. perfect answer.)

### Perplexity   

The power of cross-entropy loss. The perplexity of one would mean that the model is completly confident and correct in its results.     
100% accurate and 100% certainty gives perplexity of one.  A perplexity of two mean its right half the time and perplexity of four means 25%       
A higher perplexity gives you a sense of a how many tokens would need to be , if all things were equal, in order to predict the next token. 

## Business-centric or Outcome Metrics    
These ones resonate most with the business audience and ultimately the provlem that they are asking you to solve     

- KPIs tied to business objectives
- ROI
- Improvemens in time, cost or resources
- Customer satisfaction
- Benchmark comparisons
     
Most tangible impact

## Challenges    
     
- A code tool that automatically adds docstring/eomments
- A code gen tool that writes unit test cases
- A code generator that writes trading code to buy and sell equities in a simulated environment, based on a given API

In [1]:
# imports

import os
import io
import sys
import json
import requests
from dotenv import load_dotenv
from openai import OpenAI
import google.generativeai
import anthropic
from IPython.display import Markdown, display, update_display
import gradio as gr
import subprocess, re   

from huggingface_hub import login, InferenceClient
from transformers import AutoTokenizer

In [2]:
# environment

load_dotenv()
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')
os.environ['ANTHROPIC_API_KEY'] = os.getenv('ANTHROPIC_API_KEY', 'your-key-if-not-using-env')
# os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN', 'your-key-if-not-using-env')

hf_token = os.environ['HF_TOKEN']

In [3]:
# initialize

openai = OpenAI()
claude = anthropic.Anthropic()   

login(hf_token, add_to_git_credential=True)

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


In [4]:
OPENAI_MODEL = "gpt-4o-mini"
CLAUDE_MODEL = "claude-3-5-sonnet-20240620"

In [5]:
system_message = "You are an assistant that reimplements Python code in high performance for windows with Intel CPU. "
system_message += "Respond only with C++ code; use comments sparingly and do not provide any explanation other than occasional comments. "
system_message += "The C++ response needs to produce an identical output in the fastest possible time. Keep implementations of random number generators identical so that results match exactly."

In [6]:
def user_prompt_for(python):
    user_prompt = "Rewrite this Python code in C++ with the fastest possible implementation that produces identical output in the least time. "
    user_prompt += "Respond only with C++ code; do not explain your work other than a few comments. "
    user_prompt += "Pay attention to number types to ensure no int overflows. Remember to #include all necessary C++ packages such as iomanip.\n\n"
    user_prompt += python
    return user_prompt

In [7]:
def messages_for(python):
    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_prompt_for(python)}
    ]

In [8]:
# write to a file called optimized.cpp

def write_output(cpp):
    code = cpp.replace("```cpp","").replace("```","")
    with open("optimized.cpp", "w") as f:
        f.write(code)

In [9]:
pi = """
import time

def calculate(iterations, param1, param2):
    result = 1.0
    for i in range(1, iterations+1):
        j = i * param1 - param2
        result -= (1/j)
        j = i * param1 + param2
        result += (1/j)
    return result

start_time = time.time()
result = calculate(100_000_000, 4, 1) * 4
end_time = time.time()

print(f"Result: {result:.12f}")
print(f"Execution Time: {(end_time - start_time):.6f} seconds")
"""

In [10]:
python_hard = """# Be careful to support large number sizes

def lcg(seed, a=1664525, c=1013904223, m=2**32):
    value = seed
    while True:
        value = (a * value + c) % m
        yield value
        
def max_subarray_sum(n, seed, min_val, max_val):
    lcg_gen = lcg(seed)
    random_numbers = [next(lcg_gen) % (max_val - min_val + 1) + min_val for _ in range(n)]
    max_sum = float('-inf')
    for i in range(n):
        current_sum = 0
        for j in range(i, n):
            current_sum += random_numbers[j]
            if current_sum > max_sum:
                max_sum = current_sum
    return max_sum

def total_max_subarray_sum(n, initial_seed, min_val, max_val):
    total_sum = 0
    lcg_gen = lcg(initial_seed)
    for _ in range(20):
        seed = next(lcg_gen)
        total_sum += max_subarray_sum(n, seed, min_val, max_val)
    return total_sum

# Parameters
n = 10000         # Number of random numbers
initial_seed = 42 # Initial seed for the LCG
min_val = -10     # Minimum value of random numbers
max_val = 10      # Maximum value of random numbers

# Timing the function
import time
start_time = time.time()
result = total_max_subarray_sum(n, initial_seed, min_val, max_val)
end_time = time.time()

print("Total Maximum Subarray Sum (20 runs):", result)
print("Execution Time: {:.6f} seconds".format(end_time - start_time))
"""

In [11]:
def stream_gpt(python):    
    stream = openai.chat.completions.create(model=OPENAI_MODEL, messages=messages_for(python), stream=True)
    reply = ""
    for chunk in stream:
        fragment = chunk.choices[0].delta.content or ""
        reply += fragment
        yield reply.replace('```cpp\n','').replace('```','')

In [12]:
def stream_claude(python):
    result = claude.messages.stream(
        model=CLAUDE_MODEL,
        max_tokens=2000,
        system=system_message,
        messages=[{"role": "user", "content": user_prompt_for(python)}],
    )
    reply = ""
    with result as stream:
        for text in stream.text_stream:
            reply += text
            yield reply.replace('```cpp\n','').replace('```','')

In [13]:
# using inference providers - QWEN
def stream_code_qwen(python):
    messages = messages_for(python)
    client = InferenceClient(
    	provider="sambanova",
    	api_key=hf_token
    )
    stream = client.chat.completions.create(
    	model="Qwen/Qwen2.5-Coder-32B-Instruct", 
    	messages=messages, 
    	max_tokens=500,
    	stream=True
    )
    result = ""
    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            result += chunk.choices[0].delta.content
            yield result

In [28]:
# using inference providers
def stream_code_inference(python, model):
    messages = messages_for(python)
    client = InferenceClient(
    	provider="sambanova",
    	api_key=hf_token
    )
    stream = client.chat.completions.create(
    	model= model, # "Qwen/Qwen2.5-Coder-32B-Instruct", 
    	messages=messages, 
    	max_tokens=500,
    	stream=True
    )
    result = ""
    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            result += chunk.choices[0].delta.content
            yield result

In [44]:
def optimize(python, model):
    if model=="GPT":
        result = stream_gpt(python)
    elif model=="Claude":
        result = stream_claude(python)
    elif model=="CodeQwen":
        result = stream_code_qwen(python)
        result = stream_code_inference(python, "Qwen/Qwen2.5-Coder-32B-Instruct")
    else:
        raise ValueError("Unknown model")
    for stream_so_far in result:
        yield stream_so_far

In [29]:
def select_sample_program(sample_program):
    if sample_program=="pi":
        return pi
    elif sample_program=="python_hard":
        return python_hard
    else:
        return "Type your Python program here"

In [31]:
def execute_python(code):
    try:
        output = io.StringIO()
        sys.stdout = output
        exec(code)
    finally:
        sys.stdout = sys.__stdout__
    return output.getvalue()

In [32]:
def execute_cpp(code):
    write_output(code)
    try:
        compile_result = subprocess.run(compiler_cmd[2], check=True, text=True, capture_output=True)
        run_cmd = ["./optimized"]
        run_result = subprocess.run(run_cmd, check=True, text=True, capture_output=True)
        return run_result.stdout
    except subprocess.CalledProcessError as e:
        return f"An error occurred:\n{e.stderr}"

In [33]:
import platform

VISUAL_STUDIO_2022_TOOLS = "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\Common7\Tools\\VsDevCmd.bat"
VISUAL_STUDIO_2019_TOOLS = "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\Common7\\Tools\\VsDevCmd.bat"
GCC_COMPILER = "C:\\msys64\\ucrt64\\bin\\g++.exe"

simple_cpp = """
#include <iostream>

int main() {
    std::cout << "Hello";
    return 0;
}
"""

def run_cmd(command_to_run):
    try:
        run_result = subprocess.run(command_to_run, check=True, text=True, capture_output=True)
        return run_result.stdout if run_result.stdout else "SUCCESS"
    except:
        return ""

def c_compiler_cmd(filename_base):
    my_platform = platform.system()
    my_compiler = []

    try:
        with open("simple.cpp", "w") as f:
            f.write(simple_cpp)
            
        if my_platform == "Windows":
            
            if os.path.isfile(VISUAL_STUDIO_2022_TOOLS):
                if os.path.isfile("./simple.exe"):
                    os.remove("./simple.exe")
                compile_cmd = ["cmd", "/c", VISUAL_STUDIO_2022_TOOLS, "&", "cl", "simple.cpp"]
                if run_cmd(compile_cmd):
                    if run_cmd(["./simple.exe"]) == "Hello":
                        my_compiler = ["Windows", "Visual Studio 2022", ["cmd", "/c", VISUAL_STUDIO_2022_TOOLS, "&", "cl", f"{filename_base}.cpp"]]

            if os.path.isfile(GCC_COMPILER):
                if os.path.isfile("./simple.exe"): 
                    os.remove("./simple.exe")
                compile_cmd = ["g++", "simple.cpp", "-o", "simple"]
                if run_cmd(compile_cmd):
                    if run_cmd(["./simple"]) == "Hello":
                        my_compiler = ["Windows", "GCC (g++)", ["g++", f"{filename_base}.cpp", "-o", f"{filename_base}" ]] 
        
            if not my_compiler:
                if os.path.isfile(VISUAL_STUDIO_2019_TOOLS):
                    if os.path.isfile("./simple.exe"):
                        os.remove("./simple.exe")
                    compile_cmd = ["cmd", "/c", VISUAL_STUDIO_2019_TOOLS, "&", "cl", "simple.cpp"]
                    if run_cmd(compile_cmd):
                        if run_cmd(["./simple.exe"]) == "Hello":
                            my_compiler = ["Windows", "Visual Studio 2019", ["cmd", "/c", VISUAL_STUDIO_2019_TOOLS, "&", "cl", f"{filename_base}.cpp"]]
    
            if not my_compiler:
                my_compiler=[my_platform, "Unavailable", []]
                
        elif my_platform == "Linux":
            if os.path.isfile("./simple"):
                os.remove("./simple")
            compile_cmd = ["g++", "simple.cpp", "-o", "simple"]
            if run_cmd(compile_cmd):
                if run_cmd(["./simple"]) == "Hello":
                    my_compiler = ["Linux", "GCC (g++)", ["g++", f"{filename_base}.cpp", "-o", f"{filename_base}" ]]
    
            if not my_compiler:
                if os.path.isfile("./simple"):
                    os.remove("./simple")
                compile_cmd = ["clang++", "simple.cpp", "-o", "simple"]
                if run_cmd(compile_cmd):
                    if run_cmd(["./simple"]) == "Hello":
                        my_compiler = ["Linux", "Clang++", ["clang++", f"{filename_base}.cpp", "-o", f"{filename_base}"]]
        
            if not my_compiler:
                my_compiler=[my_platform, "Unavailable", []]
    
        elif my_platform == "Darwin":
            if os.path.isfile("./simple"):
                os.remove("./simple")
            compile_cmd = ["clang++", "-Ofast", "-std=c++17", "-march=armv8.5-a", "-mtune=apple-m1", "-mcpu=apple-m1", "-o", "simple", "simple.cpp"]
            if run_cmd(compile_cmd):
                if run_cmd(["./simple"]) == "Hello":
                    my_compiler = ["Macintosh", "Clang++", ["clang++", "-Ofast", "-std=c++17", "-march=armv8.5-a", "-mtune=apple-m1", "-mcpu=apple-m1", "-o", f"{filename_base}", f"{filename_base}.cpp"]]
    
            if not my_compiler:
                my_compiler=[my_platform, "Unavailable", []]
    except:
        my_compiler=[my_platform, "Unavailable", []]
        
    if my_compiler:
        return my_compiler
    else:
        return ["Unknown", "Unavailable", []]


In [34]:
c_compiler_cmd("simple.cpp")

['Windows', 'GCC (g++)', ['g++', 'simple.cpp.cpp', '-o', 'simple.cpp']]

In [35]:
css = """
.python {background-color: #306998;}
.cpp {background-color: #050;}
"""

In [49]:
compiler_cmd = c_compiler_cmd("optimized")

with gr.Blocks(css=css) as ui:
    gr.Markdown("## Convert code from Python to C++")
    with gr.Row():
        python = gr.Textbox(label="Python code:", value=python_hard, lines=10)
        cpp = gr.Textbox(label="C++ code:", lines=10)
    with gr.Row():
        with gr.Column():
            sample_program = gr.Radio(["pi", "python_hard"], label="Sample program", value="python_hard")
            model = gr.Dropdown(["GPT", "Claude", "CodeQwen", ], label="Select model", value="GPT")
        with gr.Column():
            architecture = gr.Radio([compiler_cmd[0]], label="Architecture", interactive=False, value=compiler_cmd[0])
            compiler = gr.Radio([compiler_cmd[1]], label="Compiler", interactive=False, value=compiler_cmd[1])
    with gr.Row():
        convert = gr.Button("Convert code")
    with gr.Row():
        python_run = gr.Button("Run Python")
        if not compiler_cmd[1] == "Unavailable":
            cpp_run = gr.Button("Run C++")
        else:
            cpp_run = gr.Button("No compiler to run C++", interactive=False)
    with gr.Row():
        python_out = gr.TextArea(label="Python result:", elem_classes=["python"])
        cpp_out = gr.TextArea(label="C++ result:", elem_classes=["cpp"])

    sample_program.change(select_sample_program, inputs=[sample_program], outputs=[python])
    convert.click(optimize, inputs=[python, model], outputs=[cpp])
    python_run.click(execute_python, inputs=[python], outputs=[python_out])
    cpp_run.click(execute_cpp, inputs=[cpp], outputs=[cpp_out])

ui.launch(inbrowser=True)

