Skip to content

Latest commit

 

History

History
63 lines (45 loc) · 3.25 KB

Sample_BuildDocumentClassifier.md

File metadata and controls

63 lines (45 loc) · 3.25 KB

Build a document classifier

This sample demonstrates how to build a document classifier with your own data. A document classifier can accurately detect and identify documents you process within your application.

Please note that document classifiers can also be created using a graphical user interface such as the Document Intelligence Studio.

To get started you'll need a Cognitive Services resource or a Form Recognizer resource. See README for prerequisites and instructions.

Creating a DocumentModelAdministrationClient

To create a new DocumentModelAdministrationClient you need the endpoint and credentials from your resource. In the sample below you'll use a Form Recognizer API key credential by creating an AzureKeyCredential object, that if needed, will allow you to update the API key without creating a new client.

You can set endpoint and apiKey based on an environment variable, a configuration setting, or any way that works for your application.

string endpoint = "<endpoint>";
string apiKey = "<apiKey>";
var credential = new AzureKeyCredential(apiKey);
var client = new DocumentModelAdministrationClient(new Uri(endpoint), credential);

Build a document classifier

Document classifiers are trained with your own data, so they're tailored to your documents.

After building, a DocumentClassifierDetails instance is returned indicating the document types the classifier will recognize.

// For this sample, you can use the training documents found in the `classifierTrainingFiles` folder.
// Upload the documents to your storage container and then generate a container SAS URL. Note
// that a container URI without SAS is accepted only when the container is public or has a
// managed identity configured.
//
// For instructions to set up documents for training in an Azure Blob Storage Container, please see:
// https://aka.ms/azsdk/formrecognizer/buildclassifiermodel

Uri trainingFilesUri = new Uri("<trainingFilesUri>");
var client = new DocumentModelAdministrationClient(new Uri(endpoint), new AzureKeyCredential(apiKey));

var sourceA = new BlobContentSource(trainingFilesUri) { Prefix = "IRS-1040-A/train" };
var sourceB = new BlobContentSource(trainingFilesUri) { Prefix = "IRS-1040-B/train" };

var documentTypes = new Dictionary<string, ClassifierDocumentTypeDetails>()
{
    { "IRS-1040-A", new ClassifierDocumentTypeDetails(sourceA) },
    { "IRS-1040-B", new ClassifierDocumentTypeDetails(sourceB) }
};

BuildDocumentClassifierOperation operation = await client.BuildDocumentClassifierAsync(WaitUntil.Completed, documentTypes);
DocumentClassifierDetails classifier = operation.Value;

Console.WriteLine($"  Classifier Id: {classifier.ClassifierId}");
Console.WriteLine($"  Created on: {classifier.CreatedOn}");

Console.WriteLine("  Document types the classifier can recognize:");
foreach (KeyValuePair<string, ClassifierDocumentTypeDetails> documentType in classifier.DocumentTypes)
{
    Console.WriteLine($"    {documentType.Key}");
}