In [1]:
#r "nuget: HtmlAgilityPack, 1.11.62"

## Fine-Tuning Source: National Gallery of Art

In an attempt to fine-tune **GPT-4 Omni Mini** to provide better image descriptions, this experiment utilizes the National Gallery of Art's existing description of collection highlights as a training set and their specified examples as the evaluation set.

### Models

In [2]:
using System.Text.Json.Serialization;

// Models from the search for NGA data

public record Artist(
    [property: JsonPropertyName("name")] string Name
);

public record ObjectResult(
    [property: JsonPropertyName("assistivetext")] string AssistiveText,
    [property: JsonPropertyName("imagepath")] string ImageUrl,
    [property: JsonPropertyName("artists")] Artist[] Artists,
    [property: JsonPropertyName("id")] int Id,
    [property: JsonPropertyName("title")] string Title,
    [property: JsonPropertyName("url")] string ObjectUrl
);

public record SearchResponse(
    [property: JsonPropertyName("results")] ObjectResult[] Results
);

// Models as needed to produce training and validation datasets

[JsonDerivedType(typeof(TextContent))]
[JsonDerivedType(typeof(ImageContent))]
public abstract record Content(
    [property: JsonPropertyName("type")] string Type
);

public record TextContent(
    [property: JsonPropertyName("text")] string Text
) : Content("text");

public record ImageUrl(
    [property: JsonPropertyName("url")] string Url
);

public record ImageContent(
    [property: JsonPropertyName("image_url")] ImageUrl ImageUrl
) : Content("image_url");

[JsonDerivedType(typeof(SystemMessage))]
[JsonDerivedType(typeof(UserMessage))]
[JsonDerivedType(typeof(AssistantMessage))]
public abstract record Message(
    [property: JsonPropertyName("role")] string Role
);

public record SystemMessage(
    [property: JsonPropertyName("content")] string Content
) : Message("system");

public record UserMessage(
    [property: JsonPropertyName("content")] Content[] Content
) : Message("user");

public record AssistantMessage(
    [property: JsonPropertyName("content")] string Content
) : Message("assistant");

public record TrainingData(
    [property: JsonPropertyName("messages")] Message[] Messages
);

### Training Dataset: Collection Highlights

<https://www.nga.gov/collection/highlights.html>

The collection highlights page depends on a search API that provides a JSON response. This chunk of code will directly request from the search API as many as 200 examples and write the response to a JSON file in the *output* directory.

Example response format truncated to fields of interest:

```json
{
   "results": [
      {
         "assistivetext": "This text is what we want as it is used for <img> alt text and the Image Description block on the object page.",
         "imagepath": "https://this/is/what/we/want/default.jpg",
         // EXTRAS THAT MIGHT BE GOOD TO HAVE...
         "artists": [
            {
               "name": "Artist Name"
            }
         ],
         "id": 123,
         "title": "Object title",
         "url": "/collection/path/object-page.html"
      }
   ]
}
```

In [3]:
using System.Net.Http;
using System.IO;

// Sourcing from data returned to build this page: https://www.nga.gov/collection/highlights.html. Note that 
// the request includes the object IDs. These may be different as the highlights change.

string searchUrl = "https://www.nga.gov/bin/ngaweb/collection-search-result/search.pageSize__200.pageNumber__1.lastFacet__pageSize.json?artobj_objectIds=119052&artobj_objectIds=108125&artobj_objectIds=132371&artobj_objectIds=78899&artobj_objectIds=1145&artobj_objectIds=105596&artobj_objectIds=60113&artobj_objectIds=102578&artobj_objectIds=80924&artobj_objectIds=194933&artobj_objectIds=95419&artobj_objectIds=1437&artobj_objectIds=53157&artobj_objectIds=124525&artobj_objectIds=69660&artobj_objectIds=166425&artobj_objectIds=46569&artobj_objectIds=169312&artobj_objectIds=206066&artobj_objectIds=215454&artobj_objectIds=86848&artobj_objectIds=52451&artobj_objectIds=54386&artobj_objectIds=1606&artobj_objectIds=50724&artobj_objectIds=12198&artobj_objectIds=432&artobj_objectIds=111152&artobj_objectIds=97&artobj_objectIds=79531&artobj_objectIds=102589&artobj_objectIds=142289&artobj_objectIds=35080&artobj_objectIds=52614&artobj_objectIds=73890&artobj_objectIds=170952&artobj_objectIds=154931&artobj_objectIds=60&artobj_objectIds=92718&artobj_objectIds=56670&artobj_objectIds=576&artobj_objectIds=92729&artobj_objectIds=74796&artobj_objectIds=138771&artobj_objectIds=67496&artobj_objectIds=1231&artobj_objectIds=92&artobj_objectIds=86864&artobj_objectIds=55819&artobj_objectIds=135512&artobj_objectIds=73438&artobj_objectIds=109637&artobj_objectIds=132078&artobj_objectIds=26&artobj_objectIds=30776&artobj_objectIds=1225&artobj_objectIds=160348&artobj_objectIds=140327&artobj_objectIds=46303&artobj_objectIds=30667&artobj_objectIds=75206&artobj_objectIds=88615&artobj_objectIds=166490&artobj_objectIds=85908&artobj_objectIds=166436&artobj_objectIds=41645&artobj_objectIds=165921&artobj_objectIds=39729&artobj_objectIds=118179&artobj_objectIds=55740&artobj_objectIds=41&artobj_objectIds=46579&artobj_objectIds=66250&artobj_objectIds=41714&artobj_objectIds=209632&artobj_objectIds=11502&artobj_objectIds=97499&artobj_objectIds=1005&artobj_objectIds=53814&artobj_objectIds=166463&artobj_objectIds=110292&artobj_objectIds=55742&artobj_objectIds=74978&artobj_objectIds=166484&artobj_objectIds=138710&artobj_objectIds=46665&artobj_objectIds=12189&artobj_objectIds=166446&artobj_objectIds=110216&artobj_objectIds=195052&artobj_objectIds=46201&artobj_objectIds=164039&artobj_objectIds=43624&artobj_objectIds=41581&artobj_objectIds=163184&artobj_objectIds=30228&artobj_objectIds=166457&artobj_objectIds=139124&artobj_objectIds=46&artobj_objectIds=1138&artobj_objectIds=70170&artobj_objectIds=120&artobj_objectIds=96688&artobj_objectIds=397&artobj_objectIds=159784&artobj_objectIds=51&artobj_objectIds=53130&artobj_objectIds=46471&artobj_objectIds=70179&artobj_objectIds=33253&artobj_objectIds=46114&artobj_objectIds=1147&artobj_objectIds=177822&artobj_objectIds=91009&artobj_objectIds=142055";

// Dump full JSON response to file

HttpClient httpClient = new();
HttpResponseMessage response = await httpClient.GetAsync(searchUrl);
response.EnsureSuccessStatusCode();
string responseBody = await response.Content.ReadAsStringAsync();

await File.WriteAllTextAsync("training-data/highlights.json", responseBody);

In [4]:
using System.Text.Json;

SearchResponse searchResponse = JsonSerializer.Deserialize<SearchResponse>(responseBody);

List<TrainingData> trainingDataset = new();

foreach (var obj in searchResponse.Results)
{
    if (string.IsNullOrWhiteSpace(obj.AssistiveText))
        continue;

    Message[] messages = 
    [
        new SystemMessage("You describe art for low vision folks. You will be given an image. Describe clearly and concisely only what is visible in the image. Avoid stylistic comparisons and suggestions."),
        new UserMessage([ new ImageContent(new ImageUrl(obj.ImageUrl)) ]),
        new AssistantMessage(obj.AssistiveText)
    ];
    trainingDataset.Add(new TrainingData(messages));
}

// Write to file

string outputFile = "training-data/training.jsonl";

if (File.Exists(outputFile))
{
    File.Delete(outputFile);
}

foreach(var data in trainingDataset)
{
    File.AppendAllText(outputFile, JsonSerializer.Serialize(data) + Environment.NewLine);
}

### Evaluation Dataset: Examples of Short Descriptions

<https://www.nga.gov/visit/accessibility/collection-image-descriptions/short-description-examples.html>

This page displays an image followed by artist, title (with link to the object page), date, and image description. The `<img>` tag also includes an `alt` property that is the same text as used for the image description. To more easily scrape, we can load this page, pull all `div.ngaimage` elements, pull their child `img` elements, and create an example user and assistant messages.

In [5]:
using HtmlAgilityPack;

string shortDescriptionExamplesPage = "https://www.nga.gov/visit/accessibility/collection-image-descriptions/short-description-examples.html";

var html = await httpClient.GetStringAsync(shortDescriptionExamplesPage);
var doc = new HtmlDocument();
doc.LoadHtml(html);

var imgNodes = doc.DocumentNode.Descendants("figure").Select(fig => fig.Descendants("img").First());

List<TrainingData> evaluationDataset = new();

foreach(var img in imgNodes)
{
    var altAtt = img.Attributes["alt"];
    var srcAtt = img.Attributes["src"];
    
    // Don't include if there is no alt
    if (string.IsNullOrWhiteSpace(altAtt.Value))
        continue;
    
    // Don't include if it's in the training set
    if (responseBody.Contains(altAtt.Value))
        continue;
    
    Message[] messages = 
    [
        new SystemMessage("You describe art for low vision folks. You will be given an image. Describe clearly and concisely only what is visible in the image. Avoid stylistic comparisons and suggestions."),
        new UserMessage([ new ImageContent(new ImageUrl(srcAtt.Value)) ]),
        new AssistantMessage(altAtt.Value)
    ];
    evaluationDataset.Add(new TrainingData(messages));
}

// Write to file

string outputFile = "training-data/evaluation.jsonl";

if (File.Exists(outputFile))
{
    File.Delete(outputFile);
}

foreach(var data in evaluationDataset)
{
    File.AppendAllText(outputFile, JsonSerializer.Serialize(data) + Environment.NewLine);
}

## Datasets

### Azure OpenAI Fine-tuning

- System message must be the same for all training, validation, and general usage.
- Example file format:
    ```json
    {"messages": [{"role": "system", "content": "You are an Xbox customer support agent whose primary goal is to help users with issues they are experiencing with their Xbox devices. You are friendly and concise. You only provide factual answers to queries, and do not provide answers that are not related to Xbox."}, {"role": "user", "content": "Is Xbox better than PlayStation?"}, {"role": "assistant", "content": "I apologize, but I cannot provide personal opinions. My primary job is to assist you with any issues related to your Xbox device. Do you have any Xbox-related issues that need addressing?"}]}
    {"messages": [{"role": "system", "content": "You are an Xbox customer support agent whose primary goal is to help users with issues they are experiencing with their Xbox devices. You are friendly and concise. You only provide factual answers to queries, and do not provide answers that are not related to Xbox."}, {"role": "user", "content": "Can I play PC games on my Xbox?"}, {"role": "assistant", "content": "While Xbox consoles don't natively support PC games, they can play an array of Xbox games, many of which are also available on PC. Can I assist you with finding games compatible with your Xbox device?"}]}
    {"messages": [{"role": "system", "content": "You are an Xbox customer support agent whose primary goal is to help users with issues they are experiencing with their Xbox devices. You are friendly and concise. You only provide factual answers to queries, and do not provide answers that are not related to Xbox."}, {"role": "user", "content": "I'm having trouble connecting my Xbox to the Wi-Fi."}, {"role": "assistant", "content": "No worries, let's go through the network settings on your Xbox. Can you please tell me what happens when you try to connect it to the Wi-Fi?"}]}
    ```

### Hugging Face Fine-tuning

- Uses `datasets` format, so see [./nga-data-converter.ipynb](./nga-data-converter.ipynb) for code that transforms the **.jsonl* format, downloads the image data, and creates a local-training-compatible dataset.