### How to Find Out Most Common Errors:

The following notebook will outline the most common syntax and semantic errors found based on results obtained in a .yaml file. This relies on the ```Kusto.Language.dll``` file as well, so you will need to make sure that the file path to the .dll file is correct. If you need to find the full file path, use the ```!pwd``` command.

In [None]:
# Import necessary packages:
from collections import Counter
import yaml
import os
import pandas as pd
from pythonnet import load

try:
    load('coreclr')
except:
    load('mono')
    
import clr
from System import Reflection

# Insert the FULL file path to the Kusto.Language.dll that is contained within this folder.
file_path = "/home/amethystuser/llm-threat-query/code/offline_metrics_pipeline/offline-metrics-pipeline/Kusto.Language.dll"
Reflection.Assembly.LoadFile(file_path)

from typing import Set, Dict, List, Literal
from Kusto.Language import KustoCode, GlobalState
from Kusto.Language.Symbols import DatabaseSymbol, TableSymbol, ColumnSymbol, ScalarTypes

Sometimes the notebook might be "misplaced" and might not be able to recognize certain files. If this is the case, then run the following cell to change the directory.

In [None]:
from query_dataset import QueryDataset
from helpers import *
import pprint

From here, specify the path to the .yaml file that you wish to find the list of errors for. Furthermore, you will specify the type of error you wish to look for (i.e. Either **syntax** or **semantic**, these are the only two options).

In [None]:
# Change the path_to_yaml accordingly!

#  "/home/amethystuser/llm-threat-query/code/NL2KQL_Remakes/saleha-results/gemma-3-4b-it-refined.yaml"
path_to_yaml = "/home/amethystuser/llm-threat-query/code/offline_metrics_pipeline/offline-metrics-pipeline/gemma-3-4b-it-nl2kql-rq3-prompt-one-refined.yaml"
error_type = "semantic" #or "semantic"

namespace = create_kusto_namespace()
setup_global_state("Defender", namespace)
global_state = namespace.get('global_state')

Next, you will run the following cell and a list of most common error should be printed out.

In [None]:
with open(path_to_yaml, 'r') as f:
    data = yaml.safe_load(f)

sam_df = pd.DataFrame(data['queries'])
messages = []

for idx, row in sam_df.iterrows():
    try:
        if error_type == "syntax":
            code = KustoCode.Parse(str(row['llmResult']), global_state)
        elif error_type == "semantic":
            code = KustoCode.ParseAndAnalyze(str(row['llmResult']), global_state)
        
        diagnostics = code.GetDiagnostics()
        if diagnostics.Count > 0:
            for diag in diagnostics:
                messages.append({'query': row['llmResult'], 'severity': diag.Severity, 'message': diag.Message, 'length': diag.Length, 'start': diag.Start, 'end': diag.End, 'troublemaker': row['llmResult'][diag.Start:diag.End]})
    except:
        continue

lst_of_messages = [entry['message'] for entry in messages]
pprint.pprint(Counter(lst_of_messages))

In [None]:
print(f"\nTotal count: {len(lst_of_messages)}")