# 03 SDK | 07 Chat with Json

## Use case

Your data is represented by medium to large JSON objects, you want to enable your customers to interact with your data in a conversational manner. You want to enable your customers to ask questions about your data and get responses in a conversational manner. There are few options to implement this:

- **Incorporating a Full-Text Search Engine**:
   - **Elasticsearch or Azure Cognitive Search**: Index the JSON file content using Elasticsearch or Azure Cognitive Search.
   - **ASP.NET Core Web API**: Build a web API to accept natural language queries, process them, and query the search engine for relevant results.

- **Building a Custom Solution with Language Models**:
   - **OpenAI API or Azure OpenAI Service**: Integrate a pre-trained language model to understand user queries
   - **ASP.NET Core Web API**: Create an API to handle user inputs, query the JSON data, and return responses.

This notebook, focuses on the second option, while we will not create a full-fledged solution here, the outline of the solution is provided.

## Solution Approach

In order to be extract dynamic content from `json` objects, we would first need to have the schema and the user query. A prompt to LLM with the schema and the user query would return the response. The response (`Structured Query`) would be parsed to extract the relevant information.

### Domain Specific Query Language

The structured query format we use is a domain-specific query language (DSL) tailored for querying JSON datasets. This DSL allows for clear and concise representation of operations such as retrieval, summation, averaging, and filtering of data. It abstracts the complexity of JSON manipulation and provides a human-readable way to describe queries.

For example consider this natural language query: "I need the total for all invoices in year 2021." The equivalent structured query would be:

```json
{
  "operation": "sum",
  "path": "financials.monthlyFinancials",
  "fields": ["invoices"],
  "filters": [
    {
      "field": "year",
      "operator": "==",
      "value": 2021
    }
  ]
}
```

## Prerequisites

In [13]:
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.12"
#r "nuget: DotNetEnv, 2.5.0"
#r "nuget: System.Linq"
#r "nuget: System.Text.Json, 8.0.0"

### Instantiate OpenAI API

In [14]:
using Azure; 
using Azure.AI.OpenAI;
using DotNetEnv;
using System.IO;
using System.Text.Json; 

static string _configurationFile = @"../01_DemoEnvironment/conf/application.env";
Env.Load(_configurationFile);


string assetsFolder = Path.Combine(Directory.GetCurrentDirectory(), "..", "..", "assets");

string oAiApiKey = Environment.GetEnvironmentVariable("SKIT_AOAI_APIKEY") ?? "SKIT_AOAI_APIKEY not found";
string oAiEndpoint = Environment.GetEnvironmentVariable("SKIT_AOAI_ENDPOINT") ?? "SKIT_AOAI_ENDPOINT not found";
string chatCompletionDeploymentName = Environment.GetEnvironmentVariable("SKIT_CHATCOMPLETION_DEPLOYMENTNAME") ?? "SKIT_CHATCOMPLETION_DEPLOYMENTNAME not found";

AzureKeyCredential azureKeyCredential = new AzureKeyCredential(oAiApiKey);
OpenAIClient openAIClient = new OpenAIClient(new Uri(oAiEndpoint), azureKeyCredential);

Console.WriteLine($"OpenAI Client created...");

OpenAI Client created...


### Json Schema, Object, and Query

In [15]:
string userQuery = "i need a list of all invoices in 2021";
string jsonSchema = File.ReadAllText(Path.Combine(assetsFolder,"docs", "03_SDK", "sampleSchema.json"));



In [16]:
using System.Diagnostics;
var stopwatch = Stopwatch.StartNew();
// string jsonObject = File.ReadAllText(Path.Combine(assetsFolder,"docs", "03_SDK", "large_sample.json"));
string jsonObject = File.ReadAllText(Path.Combine(assetsFolder,"docs", "03_SDK", "sampleObject.json"));
stopwatch.Stop();
Console.WriteLine($"Time taken to read the JSON file: {stopwatch.ElapsedMilliseconds} ms");

Time taken to read the JSON file: 7 ms


### Prompt to extract intent and entities

In [5]:
var prompt = $@"
You are given a schema of a JSON dataset and a natural language query. Your task is to convert the natural language query into a structured JSON format that specifies the operation, path, fields, and filters needed to execute the query on the JSON dataset.

Input Schema:
{jsonSchema}

Output Schema:
{{
  ""operation"": ""string"",
  ""path"": ""string"",
  ""fields"": [""string""],
  ""filters"": [
    {{
      ""field"": ""string"",
      ""operator"": ""string"",
      ""value"": ""any""
    }}
  ]
}}

Convert the following natural language query into a structured JSON format:

Example Query: ""I need the total for all invoices in year 2021.""
Example Output:
{{
  ""operation"": ""sum"",
  ""path"": ""financials.monthlyFinancials"",
  ""fields"": [""invoices""],
  ""filters"": [
    {{
      ""field"": ""year"",
      ""operator"": ""=="",
      ""value"": 2021
    }}
  ]
}}

Example Query: ""Calculate the average sales for the year 2020.""
Example Output:
{{
  ""operation"": ""average"",
  ""path"": ""financials.monthlyFinancials"",
  ""fields"": [""invoices""],
  ""filters"": [
    {{
      ""field"": ""year"",
      ""operator"": ""=="",
      ""value"": 2020
    }}
  ]
}}

Query: ""{userQuery}""
Output:
";


### Method to extract intent using a call to OpenAI API

In [17]:
using System.Text.Json;
// using System.Text.Json.Serialization;



public async Task<string> GetStructuredQueryAsync(string userQuery)
{
    string sys_prompt = prompt;
    ChatCompletionsOptions simpleOption = new ChatCompletionsOptions()
    {
    //Request Properties
    ResponseFormat = ChatCompletionsResponseFormat.JsonObject,
    MaxTokens = 500,
    Temperature = 0.7f,
    NucleusSamplingFactor = 0.0f,
    FrequencyPenalty = 0.0f,
    PresencePenalty = 0.0f,
    DeploymentName = chatCompletionDeploymentName
    };

    simpleOption.Messages.Add(new ChatRequestSystemMessage(sys_prompt));
    var userMessage = $"Query: {userQuery}. Output:";
    simpleOption.Messages.Add(new ChatRequestUserMessage( userMessage));

    Response<ChatCompletions> simpleResponse = await openAIClient.GetChatCompletionsAsync(simpleOption);

    // Get the first choice from the response
    ChatCompletions simpleCompletions = simpleResponse.Value;

    string responseContent = simpleCompletions.Choices[0].Message.Content;

    return simpleCompletions.Choices[0].Message.Content;

}

### Test intent extraction

In [19]:
userQuery = "my boss wants to see all payments made on the year 20 that were over 1000";
var structuredQuery = await GetStructuredQueryAsync(userQuery);

Console.WriteLine(structuredQuery);

{
  "operation": "list",
  "path": "financials.monthlyFinancials",
  "fields": ["payments"],
  "filters": [
    {
      "field": "year",
      "operator": "==",
      "value": 2020
    },
    {
      "field": "payments",
      "operator": ">",
      "value": 1000
    }
  ]
}


### Json data extraction

This code uses the structured query, or `json action` to perform filtering and aggregation on the JSON data. The `json action` is a structured JSON object that specifies the operation, path, field, and filters needed to execute the query on the JSON dataset.
This supports the following operations:
- `sum`: Calculate the sum of the field.
- `average`: Calculate the average of the field.
- `count`: Count the number of records.

As a side note, simple operation such as obtain specific field value based on a filter could use just json path and filter.

In [20]:

using System;
using System.Linq;
using System.Text.Json;
using System.Text.Json.Nodes;
using System.Collections.Generic;

// Function to get a JsonArray from a dynamic path
public static JsonArray GetJsonArrayFromPath(JsonNode node, string path)
{
    if (string.IsNullOrEmpty(path))
    {
        Console.WriteLine("Path is empty. Returning the root node.");
        if (node is JsonArray rootArray)
        {
            return rootArray;
        }
        else
        {
            rootArray = new JsonArray { node };
            return rootArray;
        }
    }

    var parts = path.Split('.');
    foreach (var part in parts)
    {
        node = node[part];
        if (node == null)
        {
            throw new InvalidOperationException($"The path '{path}' does not exist in the JSON object.");
        }
    }

    if (node is JsonArray arrayNode)
    {
        return arrayNode;
    }
    else
    {
        throw new InvalidOperationException($"The node at path '{path}' is not an array.");
    }
}

// Function to apply filters
public static IEnumerable<JsonNode> ApplyFilters(IEnumerable<JsonNode> data, JsonArray filters)
{
    foreach (var filter in filters)
    {
        var field = filter["field"].GetValue<string>();
        var operator_ = filter["operator"].GetValue<string>();
        var value = filter["value"];

        data = data.Where(item => 
        {
            var itemValue = item[field];
            try
            {
                // Console.WriteLine($"Applying filter: {field} {operator_} {value}");
                switch (operator_)
                {
                    case "==":
                        return itemValue.GetValue<decimal>() == value.GetValue<decimal>();
                    case "!=":
                        return itemValue.GetValue<decimal>() != value.GetValue<decimal>();
                    case ">":
                        return itemValue.GetValue<decimal>() > value.GetValue<decimal>();
                    case "<":
                        return itemValue.GetValue<decimal>() < value.GetValue<decimal>();
                    case ">=":
                        return itemValue.GetValue<decimal>() >= value.GetValue<decimal>();
                    case "<=":
                        return itemValue.GetValue<decimal>() <= value.GetValue<decimal>();
                    default:
                        throw new NotSupportedException($"Operator {operator_} is not supported");
                }
            }
            catch (FormatException ex)
            {
                Console.WriteLine($"FormatException: {ex.Message}");
                throw;
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Exception: {ex.Message}");
                throw;
            }
        });
    }
    return data;
}


// Method to find specific fields based on filters
public static List<Dictionary<string, object>> FindFields(string jsonObject, JsonNode query)
{
    var jsonData = JsonNode.Parse(jsonObject);
    var jsonPath = query["path"].GetValue<string>();
    JsonNode dataNode;

    if (string.IsNullOrEmpty(jsonPath))
    {
        dataNode = jsonData;
    }
    else
    {
        var parts = jsonPath.Split('.');
        dataNode = jsonData;
        foreach (var part in parts)
        {
            dataNode = dataNode[part];
            if (dataNode == null)
            {
                throw new InvalidOperationException($"The path '{jsonPath}' does not exist in the JSON object.");
            }
        }
    }

    var results = new List<Dictionary<string, object>>();
    var fields = query["fields"].AsArray().Select(f => f.GetValue<string>()).ToList();

    if (dataNode is JsonArray arrayNode)
    {
        foreach (var item in arrayNode)
        {
            var result = new Dictionary<string, object>();
            foreach (var field in fields)
            {
                result[field] = item[field];
            }
            results.Add(result);
        }
    }
    else if (dataNode is JsonObject objectNode)
    {
        var result = new Dictionary<string, object>();
        foreach (var field in fields)
        {
            result[field] = objectNode[field];
        }
        results.Add(result);
    }
    else
    {
        throw new InvalidOperationException($"The node at path '{jsonPath}' is neither an array nor an object.");
    }

    return results;
}


// Method to sum specific fields based on filters
public static Dictionary<string, decimal> SumFields(string jsonObject, JsonNode query)
{
    var jsonData = JsonNode.Parse(jsonObject);
    var jsonPath = query["path"].GetValue<string>();
    var data = GetJsonArrayFromPath(jsonData, jsonPath);
    var filters = query["filters"].AsArray();
    var filteredData = ApplyFilters(data, filters);
    var fields = query["fields"].AsArray().Select(f => f.GetValue<string>()).ToList();
    var results = new Dictionary<string, decimal>();

    foreach (var field in fields)
    {
        results[field] = filteredData.Sum(item => item[field].GetValue<decimal>());
    }

    return results;
}

// Method to average specific fields based on filters
public static Dictionary<string, decimal> AverageFields(string jsonObject, JsonNode query)
{
    var jsonData = JsonNode.Parse(jsonObject);
    var jsonPath = query["path"].GetValue<string>();
    var data = GetJsonArrayFromPath(jsonData, jsonPath);
    var filters = query["filters"].AsArray();
    var filteredData = ApplyFilters(data, filters);
    var fields = query["fields"].AsArray().Select(f => f.GetValue<string>()).ToList();
    var results = new Dictionary<string, decimal>();

    foreach (var field in fields)
    {
        results[field] = filteredData.Average(item => item[field].GetValue<decimal>());
    }

    return results;
}

// Method to count specific fields based on filters
public static int CountFields(string jsonObject, JsonNode query)
{
    var jsonData = JsonNode.Parse(jsonObject);
    var jsonPath = query["path"].GetValue<string>();
    var data = GetJsonArrayFromPath(jsonData, jsonPath);
    var filters = query["filters"].AsArray();
    var filteredData = ApplyFilters(data, filters);
    return filteredData.Count();
}

// General method to execute the structured query and call the appropriate method
public static object ExecuteStructuredQuery(string jsonObject, string structuredQuery)
{
    var query = JsonNode.Parse(structuredQuery);
    var operation = query["operation"]?.GetValue<string>() ?? "list";

    return operation switch
    {
        "find" or "retrieve" => FindFields(jsonObject, query),
        "sum" => SumFields(jsonObject, query),
        "average" => AverageFields(jsonObject, query),
        "count" => CountFields(jsonObject, query),
        "list" => FindFields(jsonObject, query), // Default case is to list (find) fields
        _ => throw new NotSupportedException($"Operation {operation} is not supported")
    };
}

In [24]:
string userQuery = "i need a total of all payments and invoices made on 2020";
var structuredQuery = await GetStructuredQueryAsync(userQuery);
Console.WriteLine($"Structured Query: {structuredQuery}");

Structured Query: {
  "operation": "sum",
  "path": "financials.monthlyFinancials",
  "fields": ["payments", "invoices"],
  "filters": [
    {
      "field": "year",
      "operator": "==",
      "value": 2020
    }
  ]
}


In [25]:
var stopwatch = Stopwatch.StartNew();
var results = ExecuteStructuredQuery(jsonObject, structuredQuery);
stopwatch.Stop();
Console.WriteLine($"Time taken to extract data from JSON object: {stopwatch.ElapsedMilliseconds} ms");

Time taken to extract data from JSON object: 1 ms


## Print out the results

Since the return object may have dynamic fields populated based on the query, we traverse the object and print out the results.

In [26]:
if (results is List<Dictionary<string, object>> listResults)
{
    foreach (var result in listResults)
    {
        Console.WriteLine(string.Join(", ", result.Select(kvp => $"{kvp.Key}: {kvp.Value}")));
    }
}
else if (results is Dictionary<string, decimal> aggregateResults)
{
    foreach (var result in aggregateResults)
    {
        Console.WriteLine($"{result.Key}: {result.Value}");
    }
}
else if (results is int countResult)
{
    Console.WriteLine($"Count: {countResult}");
}

payments: 469030.39
invoices: 262753.09


## Creating JSON file based on the schema

In [None]:
#r "nuget: System.Text.Json"

using System;
using System.IO;
using System.Text.Json;
using System.Text.Json.Serialization;

var random = new Random();

var largeJson = new
{
    address = new {
        city = "Sample City",
        country = "Sample Country",
        postalCode = "12345",
        state = "Sample State",
        street = "Sample Street"
    },
    contactDetails = new {
        phone = "123-456-7890"
    },
    currency = "USD",
    customerId = "CUST123456",
    financials = new {
        _12MonthsSales = random.NextDouble() * 100,
        add = random.NextDouble() * 100000,
        cei = random.NextDouble() * 100000,
        creditMemos = random.NextDouble() * 100000,
        dso = random.NextDouble() * 100000,
        dueInvoices = random.NextDouble() * 100000,
        finScore = random.NextDouble() * 100000,
        monthlyFinancials = new object[10000], // 10,000->1.8MB - Adjust the number to change the size (its linear)
        overDueInvoices = random.NextDouble() * 100000,
        totalSales = random.NextDouble() * 1000000
    },
    firstInvoiceDate = DateTime.UtcNow.ToString("o"),
    foreignId = "FOR123456",
    name = "Sample Name",
    sectorNaicsDescription = "Sample Description",
    sectorNaicsId = "NAICS123456",
    taxId = "TAX123456",
    verifiedName = "Verified Sample Name"
};

// Populate the monthlyFinancials array with random data
for (int i = 0; i < largeJson.financials.monthlyFinancials.Length; i++)
{
    largeJson.financials.monthlyFinancials[i] = new {
        add = random.NextDouble() * 100000,
        cei = random.NextDouble() * 100000,
        dso = random.NextDouble() * 100000,
        invoices = random.NextDouble() * 100000,
        month = random.Next(1, 13),
        openAR = random.NextDouble() * 100000,
        payments = random.NextDouble() * 100000,
        year = random.Next(2000, 2024)
    };
}

string json = JsonSerializer.Serialize(largeJson, new JsonSerializerOptions { WriteIndented = false });

// Save the generated JSON to a file
File.WriteAllText("medium_sample.json", json);

Console.WriteLine("Large sample JSON file generated successfully.");
