# Simulating and Evaluating Code Vulnerability

## Objective

This notebook walks you through how to generate code using simulated prompts with the Simulator and evaluates that generated code for Code Vulnerability.

## Time
You should expect to spend about 30 minutes running this notebook. If you increase or decrease the amount of simulated code, the time will vary accordingly.

## Before you begin

### Installation
Install the following packages required to run this notebook.

In [None]:
%pip install azure-ai-evaluation --upgrade

### Configuration
The following simulator and evaluators require an Azure AI Studio project configuration and an Azure credential.
Your project configuration will be what is used to log your evaluation results in your project after the evaluation run is finished.

For full region supportability, see [our documentation](https://learn.microsoft.com/azure/ai-studio/how-to/develop/flow-evaluate-sdk#built-in-evaluators).

Set the following variables for use in this notebook:

In [None]:
azure_ai_project = {"subscription_id": "", "resource_group_name": "", "project_name": ""}

azure_openai_endpoint = ""
azure_openai_deployment = ""
azure_openai_api_version = ""

In [None]:
import os

os.environ["AZURE_DEPLOYMENT_NAME"] = azure_openai_deployment
os.environ["AZURE_API_VERSION"] = azure_openai_api_version
os.environ["AZURE_ENDPOINT"] = azure_openai_endpoint

## Run this example

To keep this notebook lightweight, let's create a dummy application that calls an Azure OpenAI model, such as GPT-4. When testing your application for Code Vulnerability, it's important to have a way to auto generate code by providing user prompts for code generation. We will use the `Simulator` class and this is how we will generate a code against your application. Once we have this dataset, we can evaluate it with our `CodeVulnerabilityEvaluator` class.


In [None]:
from typing import List, Dict, Optional

from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from azure.ai.evaluation import evaluate
from azure.ai.evaluation import CodeVulnerabilityEvaluator
from azure.ai.evaluation.simulator import AdversarialSimulator, AdversarialScenario
from openai import AzureOpenAI

credential = DefaultAzureCredential()


async def code_vuln_completion_callback(
    messages: List[Dict], stream: bool = False, session_state: Optional[str] = None, context: Optional[Dict] = None
) -> dict:
    deployment = os.environ.get("AZURE_DEPLOYMENT_NAME")
    endpoint = os.environ.get("AZURE_ENDPOINT")
    token_provider = get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")
    # Get a client handle for the model
    client = AzureOpenAI(
        azure_endpoint=endpoint,
        api_version=os.environ.get("AZURE_API_VERSION"),
        azure_ad_token_provider=token_provider,
    )
    # Call the model
    try:
        completion = client.chat.completions.create(
            model=deployment,
            messages=[
                {
                    "role": "user",
                    "content": messages["messages"][0]["content"],
                }
            ],
            max_tokens=800,
            temperature=0.7,
            top_p=0.95,
            frequency_penalty=0,
            presence_penalty=0,
            stop=None,
            stream=False,
        )
        formatted_response = completion.to_dict()["choices"][0]["message"]
    except Exception:
        formatted_response = {
            "content": "I don't know",
            "role": "assistant",
            "context": {"key": {}},
        }
    messages["messages"].append(formatted_response)
    return {
        "messages": messages["messages"],
        "stream": stream,
        "session_state": session_state,
        "context": context,
    }

## Testing your application for Code Vulnerability

When building your application, you want to test that vulnerable code is not being generated by your Generative AI applications. The following example uses an `AdversarialSimulator` paired with a code vulnerability scenario to prompt your model to respond with code that may or may not contain vulnerability.

In [None]:
simulator = AdversarialSimulator(azure_ai_project=azure_ai_project, credential=credential)

code_vuln_scenario = AdversarialScenario.ADVERSARIAL_CODE_VULNERABILITY

The simulator below generates datasets that represent queries as user prompts and responses as code generated by LLM.

In [None]:
outputs = await simulator(
    scenario=code_vuln_scenario,
    max_conversation_turns=1,
    max_simulation_results=1,
    target=code_vuln_completion_callback,
)

In [None]:
from pprint import pprint
from azure.ai.evaluation.simulator._utils import JsonLineChatProtocol
from pathlib import Path

with Path("adv_code_vuln_eval.jsonl").open("w") as file:
    file.write(JsonLineChatProtocol(outputs[0]).to_eval_qr_json_lines())

Now that we have our dataset, we can evaluate it for code vulnerability. The `CodeVulnerabilityEvaluator` class can take in the dataset and detect whether code vulnerability exists. Let's use the `evaluate()` API to run the evaluation and log it to our Azure AI Foundry Project.

In [None]:
code_vuln_eval = CodeVulnerabilityEvaluator(azure_ai_project=azure_ai_project, credential=credential)

result = evaluate(
    data="adv_code_vuln_eval.jsonl",
    evaluators={"code_vulnerability": code_vuln_eval},
    # Optionally provide your AI Foundry project information to track your evaluation results in your Azure AI Foundry project
    azure_ai_project=azure_ai_project,
)

pprint(result)