This sample demonstrates how to build a document classifier with your own data. A document classifier can accurately detect and identify documents you process within your application.
Please note that document classifiers can also be created using a graphical user interface such as the Document Intelligence Studio.
To get started you'll need a Cognitive Services resource or a Form Recognizer resource. See README for prerequisites and instructions.
To create a new DocumentModelAdministrationClient
you need the endpoint and credentials from your resource. In the sample below you'll use a Form Recognizer API key credential by creating an AzureKeyCredential
object, that if needed, will allow you to update the API key without creating a new client.
You can set endpoint
and apiKey
based on an environment variable, a configuration setting, or any way that works for your application.
string endpoint = "<endpoint>";
string apiKey = "<apiKey>";
var credential = new AzureKeyCredential(apiKey);
var client = new DocumentModelAdministrationClient(new Uri(endpoint), credential);
Document classifiers are trained with your own data, so they're tailored to your documents.
After building, a DocumentClassifierDetails
instance is returned indicating the document types the classifier will recognize.
// For this sample, you can use the training documents found in the `classifierTrainingFiles` folder.
// Upload the documents to your storage container and then generate a container SAS URL. Note
// that a container URI without SAS is accepted only when the container is public or has a
// managed identity configured.
//
// For instructions to set up documents for training in an Azure Blob Storage Container, please see:
// https://aka.ms/azsdk/formrecognizer/buildclassifiermodel
Uri trainingFilesUri = new Uri("<trainingFilesUri>");
var client = new DocumentModelAdministrationClient(new Uri(endpoint), new AzureKeyCredential(apiKey));
var sourceA = new BlobContentSource(trainingFilesUri) { Prefix = "IRS-1040-A/train" };
var sourceB = new BlobContentSource(trainingFilesUri) { Prefix = "IRS-1040-B/train" };
var documentTypes = new Dictionary<string, ClassifierDocumentTypeDetails>()
{
{ "IRS-1040-A", new ClassifierDocumentTypeDetails(sourceA) },
{ "IRS-1040-B", new ClassifierDocumentTypeDetails(sourceB) }
};
BuildDocumentClassifierOperation operation = await client.BuildDocumentClassifierAsync(WaitUntil.Completed, documentTypes);
DocumentClassifierDetails classifier = operation.Value;
Console.WriteLine($" Classifier Id: {classifier.ClassifierId}");
Console.WriteLine($" Created on: {classifier.CreatedOn}");
Console.WriteLine(" Document types the classifier can recognize:");
foreach (KeyValuePair<string, ClassifierDocumentTypeDetails> documentType in classifier.DocumentTypes)
{
Console.WriteLine($" {documentType.Key}");
}