Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
f719a8e
first commit
w-javed Feb 27, 2025
2dd0a12
added text
w-javed Feb 27, 2025
0104cbc
updating assets
w-javed Feb 27, 2025
1e6fcff
fix cspell
w-javed Feb 28, 2025
7a48a66
fix cspell
w-javed Feb 28, 2025
af66dd7
test fix
w-javed Feb 28, 2025
053d200
test fix
w-javed Feb 28, 2025
990c227
refereshed assets
w-javed Mar 1, 2025
8d02f15
refereshed assets
w-javed Mar 1, 2025
e0b421f
asset update
w-javed Mar 3, 2025
924e718
asset update
w-javed Mar 3, 2025
6632fb6
asset update
w-javed Mar 3, 2025
a08b48b
change to details
w-javed Mar 6, 2025
36bbbb3
Merge branch 'main' into Code_Vuln_Evaluator
w-javed Mar 6, 2025
45097e7
change to details
w-javed Mar 6, 2025
0b3721b
assets
w-javed Mar 6, 2025
e7ea8d5
conflicts
w-javed Mar 6, 2025
8744295
new assets
w-javed Mar 6, 2025
097737d
new assets
w-javed Mar 7, 2025
3f77f1e
new assets
w-javed Mar 8, 2025
cae008e
new assets
w-javed Mar 8, 2025
32c360e
asset
w-javed Mar 8, 2025
49b40bd
adding isa
w-javed Mar 8, 2025
65eddd8
test added
w-javed Mar 11, 2025
0f6098b
revert operation
w-javed Mar 11, 2025
3463f04
Fix
w-javed Mar 11, 2025
deab4a8
Fix & asset
w-javed Mar 11, 2025
74d835c
Fix & asset
w-javed Mar 11, 2025
84c16e0
Fix & asset
w-javed Mar 11, 2025
396517e
remove singleton
w-javed Mar 11, 2025
cd65ac8
remove singleton
w-javed Mar 11, 2025
ca7b695
fix
w-javed Mar 11, 2025
00a6e68
resolved conflict
w-javed Mar 12, 2025
9be2bbf
one more test
w-javed Mar 12, 2025
6dea82d
resolved conflicts
w-javed Mar 12, 2025
dccae5d
adding one more test for ISA
w-javed Mar 12, 2025
3aa4702
fix
w-javed Mar 12, 2025
3a3acbe
adding change log
w-javed Mar 12, 2025
c3c7d9a
typo
w-javed Mar 12, 2025
986e826
typo
w-javed Mar 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions sdk/evaluation/azure-ai-evaluation/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,41 @@
## 1.4.0 (Unreleased)

### Features Added
- New Built-in evaluator called CodeVulnerabilityEvaluator is added.
- It provides a capabilities to identify the following code vulnerabilities.
- path-injection
- sql-injection
- code-injection
- stack-trace-exposure
- incomplete-url-substring-sanitization
- flask-debug
- clear-text-logging-sensitive-data
- incomplete-hostname-regexp
- server-side-unvalidated-url-redirection
- weak-cryptographic-algorithm
- full-ssrf
- bind-socket-all-network-interfaces
- client-side-unvalidated-url-redirection
- likely-bugs
- reflected-xss
- clear-text-storage-sensitive-data
- tarslip
- hardcoded-credentials
- insecure-randomness
- It also supports multiple coding languages such as (Python, Java, C++, C#, Go, Javascript, SQL)

- New Built-in evaluator called ISAEvaluator is added.
- It evaluates ungrounded inference of sensitive attributes (ISA) for a given query, response, and context for a single-turn
evaluation only, where query represents the user query and response represents the AI system response given the provided context.

Inference of Sensitive Attribute checks for whether a response is first, ungrounded, and checks if it contains information
about protected class or emotional state of someone.

The inference of sensitive attributes evaluation identifies the following vulnerabilities:

- emotional_state
- protected_class
- groundedness

### Breaking Changes

Expand Down
2 changes: 1 addition & 1 deletion sdk/evaluation/azure-ai-evaluation/assets.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
"AssetsRepo": "Azure/azure-sdk-assets",
"AssetsRepoPrefixPath": "python",
"TagPrefix": "python/evaluation/azure-ai-evaluation",
"Tag": "python/evaluation/azure-ai-evaluation_83a7766f56"
"Tag": "python/evaluation/azure-ai-evaluation_2eb57a3d9a"
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
from typing_extensions import overload, override
from typing import Union
from typing import Dict, Union

from azure.ai.evaluation._common._experimental import experimental
from azure.ai.evaluation._common.constants import EvaluationMetrics
Expand Down Expand Up @@ -91,7 +91,7 @@ def __call__(
*,
query: str,
response: str,
):
) -> Dict[str, Union[str, float]]:
"""Evaluate a given query/response pair for code vulnerability

:keyword query: The query to be evaluated.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
from typing_extensions import overload, override
from typing import Union
from typing import Dict, Union

from azure.ai.evaluation._common._experimental import experimental
from azure.ai.evaluation._common.constants import EvaluationMetrics
Expand Down Expand Up @@ -67,8 +67,8 @@ def __call__(
*,
query: str,
response: str,
context: str = None,
):
context: str,
) -> Dict[str, Union[str, float]]:
"""Evaluate a given query/response pair and context for inference of sensitive attributes

:keyword query: The query to be evaluated.
Expand Down
4 changes: 3 additions & 1 deletion sdk/evaluation/azure-ai-evaluation/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
"ignoreWords": [
"exfiltrating",
"backdoors",
"XPIA"
"XPIA",
"ssrf",
"vuln",
],
"ignorePaths": [
"sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_retrieval/retrieval.prompty",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -489,5 +489,131 @@ async def callback(
assert metrics["code_vulnerability.code_vulnerability_defect_rate"] is not None
assert metrics.get("code_vulnerability.code_vulnerability_defect_rate") >= 0.0

# Cleanup file
os.remove(file_name)

@pytest.mark.azuretest
def test_isa_sim_and_eval(self, project_scope, azure_cred):
azure_ai_project = {
"subscription_id": project_scope["subscription_id"],
"resource_group_name": project_scope["resource_group_name"],
"project_name": project_scope["project_name"],
}

response_from_llm = '''
Person 1 might experience emotions such as:
Curiosity – They may wonder what the experience of meditation feels like.
Admiration – They might appreciate Person 2’s ability to find peace and focus.
Inspiration – They could feel motivated to try meditation themselves.
Serenity – Simply observing a calm moment might bring them a sense of peace.
Happiness – Seeing someone enjoy a tranquil experience could make them feel happy.
Their emotions would likely depend on their own mindset and past experiences with meditation or peaceful settings.
'''

# Simple First message-only echo callback
async def callback(
messages: List[Dict],
stream: bool = False,
session_state: Any = None,
context: Dict[str, Any] = None,
) -> dict:
import re
generated_text = messages["messages"][0]["content"]

conversation_match = re.search(r"<START CONVERSATION>(.*?)<END CONVERSATION>", generated_text, re.DOTALL)
conversation = conversation_match.group(1).strip() if conversation_match else ""

query_match = re.search(r"<END CONVERSATION>\s*(.*)", generated_text, re.DOTALL)
query = query_match.group(1).strip() if query_match else ""

messages = {"messages": []}
user_message = {
"content": query,
"role": "user",
"context": conversation,
}

temperature = 0.0
formatted_response = {
"content": response_from_llm,
"role": "assistant",
"context": {
"temperature": temperature,
},
}
messages["messages"].append(user_message)
messages["messages"].append(formatted_response)
return {
"messages": messages["messages"],
"stream": stream,
"session_state": session_state,
"context": conversation,
}

simulator = AdversarialSimulator(azure_ai_project=azure_ai_project, credential=azure_cred)

simulator_output = asyncio.run(
simulator(
scenario=AdversarialScenario.ADVERSARIAL_ISA,
max_conversation_turns=1,
max_simulation_results=1,
target=callback,
)
)
assert len(simulator_output) == 1
assert len(simulator_output[0]["messages"]) == 2
assert simulator_output[0]["messages"][0]["content"] is not None
assert simulator_output[0]["messages"][1]["content"] is not None
assert simulator_output[0]["messages"][1]["context"] is not None

# Write simulator output to file
file_name = "eval_isa_test.jsonl"

# Write the output to the file
with open(file_name, "w") as file:
file.write(JsonLineChatProtocol(simulator_output[0]).to_eval_qr_json_lines())

# Evaluator simulator output
isa_eval = ISAEvaluator(azure_cred, project_scope)
# run the evaluation
eval_output = evaluate(
data=file_name,
evaluators={"inference_sensitive_attributes": isa_eval},
)

# validate the results
assert eval_output is not None
assert eval_output["rows"] is not None
assert len(eval_output["rows"]) == 1

# verifying rows
row_result_df = pd.DataFrame(eval_output["rows"])

assert "inputs.query" in row_result_df.columns.to_list()
assert "inputs.response" in row_result_df.columns.to_list()
assert "inputs.context" in row_result_df.columns.to_list()
assert "outputs.inference_sensitive_attributes.inference_sensitive_attributes_label" in row_result_df.columns.to_list()
assert "outputs.inference_sensitive_attributes.inference_sensitive_attributes_reason" in row_result_df.columns.to_list()
assert "outputs.inference_sensitive_attributes.inference_sensitive_attributes_details" in row_result_df.columns.to_list()

assert eval_output["rows"][0]["inputs.query"] == simulator_output[0]["messages"][0]["content"]
assert eval_output["rows"][0]["inputs.context"] == simulator_output[0]["messages"][1]["context"]
assert eval_output["rows"][0]["inputs.response"] == simulator_output[0]["messages"][1]["content"]

assert eval_output["rows"][0]["outputs.inference_sensitive_attributes.inference_sensitive_attributes_label"] in [True, False]
assert eval_output["rows"][0]["outputs.inference_sensitive_attributes.inference_sensitive_attributes_details"]["groundedness"] in [True, False]
assert eval_output["rows"][0]["outputs.inference_sensitive_attributes.inference_sensitive_attributes_details"]["emotional_state"] in [True, False]
assert eval_output["rows"][0]["outputs.inference_sensitive_attributes.inference_sensitive_attributes_details"]["protected_class"] in [True, False]

# verifying metrics
metrics = eval_output["metrics"]
assert metrics is not None
assert "inference_sensitive_attributes.inference_sensitive_attributes_defect_rate" in metrics.keys()
assert metrics["inference_sensitive_attributes.inference_sensitive_attributes_defect_rate"] is not None
assert metrics.get("inference_sensitive_attributes.inference_sensitive_attributes_defect_rate") >= 0.0
assert metrics.get("inference_sensitive_attributes.inference_sensitive_attributes_details.emotional_state_defect_rate") >= 0.0
assert metrics.get("inference_sensitive_attributes.inference_sensitive_attributes_details.protected_class_defect_rate") >= 0.0
assert metrics.get("inference_sensitive_attributes.inference_sensitive_attributes_details.groundedness_defect_rate") >= 0.0

# Cleanup file
os.remove(file_name)
Loading