Guide for migrating to azure-ai-formrecognizer (4.0.0-beta.1 - above) from azure-ai-formrecognizer (3.1.x - lower)

This guide is intended to assist in the migration to azure-ai-formrecognizer (4.0.0-beta.1 - above) from azure-ai-formrecognizer (3.1.x - lower). It will focus on side-by-side comparisons for similar operations between the two package versions.

We assume that you are familiar with the previous SDK azure-ai-formrecognizer (3.1.x - lower). If you are new to this library, please refer to the SDK README for azure-ai-formrecognizer directly rather than this migration guide.

Table of contents

Migration benefits

A natural question to ask when considering whether to adopt a new version of the library is what the benefits of doing so would be. As Azure Form Recognizer has matured and been embraced by a more diverse group of developers, we have been focused on learning the patterns and practices to best support developer productivity and add value to our customers.

To improve the development experience and address the consistent feedback across the Form Recognizer SDK, this new version of the library introduces two new clients DocumentAnalysisClient and the DocumentModelAdministrationClient that provide unified methods for analyzing documents and provide support for the new features added by the service in API version 2022-08-31 and later.

The below table describes the relationship of each client and its supported API version(s):

API version Supported clients
2023-07-31 DocumentAnalysisClient and DocumentModelAdministrationClient
2022-08-31 DocumentAnalysisClient and DocumentModelAdministrationClient
2.1 FormRecognizerClient and FormTrainingClient
2.0 FormRecognizerClient and FormTrainingClient

The newer Form Recognizer client library also provides the ability to share in some improvements made to the Azure development experience, such as:

  • A unified method, beginAnalyzeDocument and beginAnalyzeDocumentFromUrl, for analyzing text and structured data from documents. This method uses a modelId parameter for specifying the type of analysis to perform. The newly introduced method return type AnalyzeResult removes hierarchical dependencies between the previously known FormElements and move them to a more top level and easily accessible position such as AnalyzeResult.tables instead of RecognizedForm.pages.tables. The service has further matured to define cross-page elements by using the BoundingRegion model and by specifying the content and span information on document fields.
  • A unified return type DocumentModel indicating the document types the model can analyze and the specific fields it can analyze along with the estimated confidence for each field.
  • Specifying a modelId instead of the generated GUID when creating models, copying or composing models along with an optional description. See here, for the supported model types.
  • Modified Generate Copy Authorization operation response to return the target resource information so that it could be used directly when copying custom models method instead of needed to be provided by the user.
  • List Models operation now returns a paged list of prebuilt in addition to custom models that are built successfully. Also, when using the getModel() model, users can get the field schema (field names and types that the model can extract) for the model they specified, including for prebuilt models.
  • Added methods for getting/listing operations of the past 24 hours, useful to track the status of model creation/copying operations and any resulting errors.
  • FormRecognizerClient and FormTrainingClient will continue to work targeting API version 2.1 and 2.0.

Note: on July 2023, the Azure Cognitive Services Form Recognizer service was renamed to Azure AI Document Intelligence. Any mentions to Form Recognizer or Document Intelligence in documentation refer to the same Azure service.

Please refer to the README for more information on these new clients.

Important changes

Instantiating clients

In 3.x.x, the FormRecognizerClient and the FormRecognizerAsyncClient is instantiated via the FormRecognizerClientBuilder.

In 4.x.x, we have added the DocumentAnalysisClient and the DocumentAnalysisAsyncClient, instantiated via the DocumentAnalysisClientBuilder. The sync and async operations are separated to DocumentAnalysisClient and DocumentAnalysisAsyncClient.

Instantiating FormRecognizerClient client with 3.x.x:

FormRecognizerClient formRecognizerClient = new FormRecognizerClientBuilder()
    .credential(new AzureKeyCredential("{key}"))

Instantiating DocumentAnalysisClient client with 4.x.x:

DocumentAnalysisClient documentAnalysisClient = new DocumentAnalysisClientBuilder()
    .credential(new AzureKeyCredential("{key}"))

Similarly, with 4.x.x, we have added the DocumentModelAdministrationClient and DocumentModelAdministrationAsyncClient, instantiated via the DocumentModelAdministrationClientBuilder. The sync and async operations are separated to DocumentModelAdministrationClient and DocumentModelAdministrationAsyncClient.

Instantiating FormRecognizerClient client with 3.x.x:

FormTrainingClient formTrainingClient = new FormTrainingClientBuilder()
    .credential(new AzureKeyCredential("{key}"))

Instantiating DocumentModelAdministrationClient client with 4.x.x:

DocumentModelAdministrationClient client =
    new DocumentModelAdministrationClientBuilder()
        .credential(new AzureKeyCredential("{key}"))

Analyze documents

With 4.x.x, the unified method, beginAnalyzeDocument and beginAnalyzeDocumentFromUrl:

  • accepts a string type modelId to be any of the prebuilt model IDs or a custom model ID.
  • returns the AnalyzeResult model now exposes document elements, such as key-value pairs, entities, tables, document fields and values at the top level of the returned model. As compared to the previously returned model RecognizedForm which included hierarchical relationships between FormElements for instance tables were an element of a FormPage and not a top-level element.
  • provides the functionality of beginRecognizeCustomForms, beginRecognizeContent, beginRecognizeReceipt, beginRecognizeReceipts, beginRecognizeInvoices beginRecognizeIdentityDocuments and beginRecognizeBusinessCards from the previous (azure-ai-formrecognizer 3.1.X - lower) package versions.
  • accepts unified AnalyzeDocumentOptions to specify pages and locale information for the outgoing request
  • the includeFieldElements parameter is not supported with the DocumentAnalysisClient, text details are automatically included with API version 2022-08-31 and later.
  • the readingOrder parameter does not exist as the service uses natural reading order for the returned data.

Using a prebuilt model

  • In 3.x.x, beginRecognizeReceipts and beginRecognizeReceiptsFromUrl method was used to analyze receipts.
  • In 4.x.x, beginRecognizeReceipts and beginRecognizeReceiptsFromUrl has been replaced with beginAnalyzeDocument and beginAnalyzeDocumentFromUrl respectively.

NOTE: The beginAnalyzeMethod and beginAnalyzeDocumentFromUrl applies to all prebuilt models listed here.

Analyze receipt using 3.x.x beginRecognizeReceipts:

String receiptUrl = ""
    + "/contoso-allinone.jpg";
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> syncPoller =
List<RecognizedForm> receiptPageResults = syncPoller.getFinalResult();

for (int i = 0; i < receiptPageResults.size(); i++) {
    RecognizedForm recognizedForm = receiptPageResults.get(i);
    Map<String, FormField> recognizedFields = recognizedForm.getFields();
    System.out.printf("----------- Recognizing receipt info for page %d -----------%n", i);
    FormField merchantNameField = recognizedFields.get("MerchantName");
    if (merchantNameField != null) {
        if (FieldValueType.STRING == merchantNameField.getValue().getValueType()) {
            String merchantName = merchantNameField.getValue().asString();
            System.out.printf("Merchant Name: %s, confidence: %.2f%n",
                merchantName, merchantNameField.getConfidence());

    FormField merchantPhoneNumberField = recognizedFields.get("MerchantPhoneNumber");
    if (merchantPhoneNumberField != null) {
        if (FieldValueType.PHONE_NUMBER == merchantPhoneNumberField.getValue().getValueType()) {
            String merchantAddress = merchantPhoneNumberField.getValue().asPhoneNumber();
            System.out.printf("Merchant Phone number: %s, confidence: %.2f%n",
                merchantAddress, merchantPhoneNumberField.getConfidence());

    FormField transactionDateField = recognizedFields.get("TransactionDate");
    if (transactionDateField != null) {
        if (FieldValueType.DATE == transactionDateField.getValue().getValueType()) {
            LocalDate transactionDate = transactionDateField.getValue().asDate();
            System.out.printf("Transaction Date: %s, confidence: %.2f%n",
                transactionDate, transactionDateField.getConfidence());

    FormField receiptItemsField = recognizedFields.get("Items");
    if (receiptItemsField != null) {
        System.out.printf("Receipt Items: %n");
        if (FieldValueType.LIST == receiptItemsField.getValue().getValueType()) {
            List<FormField> receiptItems = receiptItemsField.getValue().asList();
                .filter(receiptItem -> FieldValueType.MAP == receiptItem.getValue().getValueType())
                .map(formField -> formField.getValue().asMap())
                .forEach(formFieldMap -> formFieldMap.forEach((key, formField) -> {
                    if ("Quantity".equals(key)) {
                        if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
                            Float quantity = formField.getValue().asFloat();
                            System.out.printf("Quantity: %f, confidence: %.2f%n",
                                quantity, formField.getConfidence());

Analyze receipt data using 4.x.x beginAnalyzeDocumentFromUrl:

String receiptUrl = ""
    + "/azure-ai-formrecognizer/src/samples/resources/sample-documents/receipts/contoso-allinone.jpg";

SyncPoller<OperationResult, AnalyzeResult> analyzeReceiptPoller =
    documentAnalysisClient.beginAnalyzeDocumentFromUrl("prebuilt-receipt", receiptUrl);

AnalyzeResult receiptResults = analyzeReceiptPoller.getFinalResult();

for (int i = 0; i < receiptResults.getDocuments().size(); i++) {
    AnalyzedDocument analyzedReceipt = receiptResults.getDocuments().get(i);
    Map<String, DocumentField> receiptFields = analyzedReceipt.getFields();
    System.out.printf("----------- Analyzing receipt info %d -----------%n", i);
    DocumentField merchantNameField = receiptFields.get("MerchantName");
    if (merchantNameField != null) {
        if (DocumentFieldType.STRING == merchantNameField.getType()) {
            String merchantName = merchantNameField.getValueAsString();
            System.out.printf("Merchant Name: %s, confidence: %.2f%n",
                merchantName, merchantNameField.getConfidence());

    DocumentField merchantPhoneNumberField = receiptFields.get("MerchantPhoneNumber");
    if (merchantPhoneNumberField != null) {
        if (DocumentFieldType.PHONE_NUMBER == merchantPhoneNumberField.getType()) {
            String merchantAddress = merchantPhoneNumberField.getValueAsPhoneNumber();
            System.out.printf("Merchant Phone number: %s, confidence: %.2f%n",
                merchantAddress, merchantPhoneNumberField.getConfidence());

    DocumentField transactionDateField = receiptFields.get("TransactionDate");
    if (transactionDateField != null) {
        if (DocumentFieldType.DATE == transactionDateField.getType()) {
            LocalDate transactionDate = transactionDateField.getValueAsDate();
            System.out.printf("Transaction Date: %s, confidence: %.2f%n",
                transactionDate, transactionDateField.getConfidence());

    DocumentField receiptItemsField = receiptFields.get("Items");
    if (receiptItemsField != null) {
        System.out.printf("Receipt Items: %n");
        if (DocumentFieldType.LIST == receiptItemsField.getType()) {
            List<DocumentField> receiptItems = receiptItemsField.getValueAsList();
                .filter(receiptItem -> DocumentFieldType.MAP == receiptItem.getType())
                .map(documentField -> documentField.getValueAsMap())
                .forEach(documentFieldMap -> documentFieldMap.forEach((key, documentField) -> {
                    if ("Name".equals(key)) {
                        if (DocumentFieldType.STRING == documentField.getType()) {
                            String name = documentField.getValueAsString();
                            System.out.printf("Name: %s, confidence: %.2fs%n",
                                name, documentField.getConfidence());
                    if ("Quantity".equals(key)) {
                        if (DocumentFieldType.DOUBLE == documentField.getType()) {
                            Double quantity = documentField.getValueAsDouble();
                            System.out.printf("Quantity: %f, confidence: %.2f%n",
                                quantity, documentField.getConfidence());

Using a layout model

Analyze layout using 3.x.x beginRecognizeContent:

// recognize form content using file input stream
File form = new File("local/file_path/filename.png");
byte[] fileContent = Files.readAllBytes(form.toPath());
InputStream inputStream = new ByteArrayInputStream(fileContent);

SyncPoller<FormRecognizerOperationResult, List<FormPage>> recognizeContentPoller =
    formRecognizerClient.beginRecognizeContent(inputStream, form.length());

List<FormPage> contentPageResults = recognizeContentPoller.getFinalResult();

for (int i = 0; i < contentPageResults.size(); i++) {
    FormPage formPage = contentPageResults.get(i);
    System.out.printf("----Recognizing content info for page %d ----%n", i);
    // Table information
    System.out.printf("Has width: %f and height: %f, measured with unit: %s.%n", formPage.getWidth(),
    formPage.getTables().forEach(formTable -> {
        System.out.printf("Table has %d rows and %d columns.%n", formTable.getRowCount(),
        formTable.getCells().forEach(formTableCell ->
            System.out.printf("Cell has text %s.%n", formTableCell.getText()));
    // Selection Mark
    formPage.getSelectionMarks().forEach(selectionMark -> System.out.printf(
        "Page: %s, Selection mark is '%s' within bounding box %s has a confidence score %.2f.%n",
        selectionMark.getPageNumber(), selectionMark.getState(), selectionMark.getBoundingBox().toString(),

Analyze layout using 4.x.x beginAnalyzeDocument:

// analyze document layout using file input stream
File layoutDocument = new File("local/file_path/filename.png");
Path filePath = layoutDocument.toPath();
BinaryData layoutDocumentData = BinaryData.fromFile(filePath, (int) layoutDocument.length());

SyncPoller<OperationResult, AnalyzeResult> analyzeLayoutResultPoller =
    documentAnalysisClient.beginAnalyzeDocument("prebuilt-layout", layoutDocumentData);

AnalyzeResult analyzeLayoutResult = analyzeLayoutResultPoller.getFinalResult();

// pages
analyzeLayoutResult.getPages().forEach(documentPage -> {
    System.out.printf("Page has width: %.2f and height: %.2f, measured with unit: %s%n",

    // lines
    documentPage.getLines().forEach(documentLine ->
        System.out.printf("Line '%s' is within a bounding box %s.%n",

    // selection marks
    documentPage.getSelectionMarks().forEach(documentSelectionMark ->
        System.out.printf("Selection mark is '%s' and is within a bounding box %s with confidence %.2f.%n",

// tables
List<DocumentTable> tables = analyzeLayoutResult.getTables();
for (int i = 0; i < tables.size(); i++) {
    DocumentTable documentTable = tables.get(i);
    System.out.printf("Table %d has %d rows and %d columns.%n", i, documentTable.getRowCount(),
    documentTable.getCells().forEach(documentTableCell -> {
        System.out.printf("Cell '%s', has row index %d and column index %d.%n", documentTableCell.getContent(),
            documentTableCell.getRowIndex(), documentTableCell.getColumnIndex());

Using a custom model

Analyze custom document using 3.x.x beginRecognizeCustomFormsFromUrl:

String formUrl = "{form_url}";
String modelId = "{custom_trained_model_id}";
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> recognizeFormPoller =
    formRecognizerClient.beginRecognizeCustomFormsFromUrl(modelId, formUrl);

List<RecognizedForm> recognizedForms = recognizeFormPoller.getFinalResult();

for (int i = 0; i < recognizedForms.size(); i++) {
    RecognizedForm form = recognizedForms.get(i);
    System.out.printf("----------- Recognized custom form info for page %d -----------%n", i);
    System.out.printf("Form type: %s%n", form.getFormType());
    System.out.printf("Form type confidence: %.2f%n", form.getFormTypeConfidence());
    form.getFields().forEach((label, formField) ->
        System.out.printf("Field %s has value %s with confidence score of %f.%n", label,

Analyze custom document using 4.x.x beginAnalyzeDocumentFromUrl

String documentUrl = "{document-url}";
String modelId = "{custom-built-model-ID}";
SyncPoller<OperationResult, AnalyzeResult> analyzeDocumentPoller =
    documentAnalysisClient.beginAnalyzeDocumentFromUrl(modelId, documentUrl);

AnalyzeResult analyzeResult = analyzeDocumentPoller.getFinalResult();

for (int i = 0; i < analyzeResult.getDocuments().size(); i++) {
    final AnalyzedDocument analyzedDocument = analyzeResult.getDocuments().get(i);
    System.out.printf("----------- Analyzing custom document %d -----------%n", i);
    System.out.printf("Analyzed document has doc type %s with confidence : %.2f%n",
        analyzedDocument.getDocType(), analyzedDocument.getConfidence());
    analyzedDocument.getFields().forEach((key, documentField) -> {
        System.out.printf("Document Field content: %s%n", documentField.getContent());
        System.out.printf("Document Field confidence: %.2f%n", documentField.getConfidence());
        System.out.printf("Document Field Type: %s%n", documentField.getType());
        System.out.printf("Document Field found within bounding region: %s%n",

analyzeResult.getPages().forEach(documentPage -> {
    System.out.printf("Page has width: %.2f and height: %.2f, measured with unit: %s%n",

    // lines
    documentPage.getLines().forEach(documentLine ->
        System.out.printf("Line '%s' is within a bounding box %s.%n",

    // words
    documentPage.getWords().forEach(documentWord ->
        System.out.printf("Word '%s' has a confidence score of %.2f.%n",

// tables
List<DocumentTable> tables = analyzeResult.getTables();
for (int i = 0; i < tables.size(); i++) {
    DocumentTable documentTable = tables.get(i);
    System.out.printf("Table %d has %d rows and %d columns.%n", i, documentTable.getRowCount(),
    documentTable.getCells().forEach(documentTableCell -> {
        System.out.printf("Cell '%s', has row index %d and column index %d.%n",
            documentTableCell.getRowIndex(), documentTableCell.getColumnIndex());

Analyzing general prebuilt document types with 4.x.x:

NOTE: Analyzing a document with the prebuilt-document model replaces training without labels in version 3.1.x of the library.

String documentUrl = "{document-url}";
String modelId = "prebuilt-document";
SyncPoller<OperationResult, AnalyzeResult> analyzeDocumentPoller =
    documentAnalysisClient.beginAnalyzeDocumentFromUrl(modelId, documentUrl);

AnalyzeResult analyzeResult = analyzeDocumentPoller.getFinalResult();

for (int i = 0; i < analyzeResult.getDocuments().size(); i++) {
    final AnalyzedDocument analyzedDocument = analyzeResult.getDocuments().get(i);
    System.out.printf("----------- Analyzing document %d -----------%n", i);
    System.out.printf("Analyzed document has doc type %s with confidence : %.2f%n",
        analyzedDocument.getDocType(), analyzedDocument.getConfidence());

analyzeResult.getPages().forEach(documentPage -> {
    System.out.printf("Page has width: %.2f and height: %.2f, measured with unit: %s%n",

    // lines
    documentPage.getLines().forEach(documentLine ->
        System.out.printf("Line '%s' is within a bounding box %s.%n",

    // words
    documentPage.getWords().forEach(documentWord ->
        System.out.printf("Word '%s' has a confidence score of %.2f.%n",

// tables
List<DocumentTable> tables = analyzeResult.getTables();
for (int i = 0; i < tables.size(); i++) {
    DocumentTable documentTable = tables.get(i);
    System.out.printf("Table %d has %d rows and %d columns.%n", i, documentTable.getRowCount(),
    documentTable.getCells().forEach(documentTableCell -> {
        System.out.printf("Cell '%s', has row index %d and column index %d.%n",
            documentTableCell.getRowIndex(), documentTableCell.getColumnIndex());

// Key-value
analyzeResult.getKeyValuePairs().forEach(documentKeyValuePair -> {
    System.out.printf("Key content: %s%n", documentKeyValuePair.getKey().getContent());
    System.out.printf("Key content bounding region: %s%n",

    System.out.printf("Value content: %s%n", documentKeyValuePair.getValue().getContent());
    System.out.printf("Value content bounding region: %s%n", documentKeyValuePair.getValue().getBoundingRegions().toString());

Build a custom document analysis model

  • In 3.x.x, creating a custom model required specifying useTrainingLabels to indicate whether to use labeled data when creating the custom model with the beginTraining method.
  • In 4.x.x, we introduced the new general document model (prebuilt-document) to replace the train without labels functionality from 3.x.x which extracts entities, key-value pairs, and layout from a document with the beginBuildModel method. In 4.x.x the beginBuildModel always returns labeled data otherwise.

Train a custom model using 3.x.x beginTraining:

String trainingFilesUrl = "{SAS_URL_of_your_container_in_blob_storage}";
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller =
        new TrainingOptions()
            .setModelName("my model trained without labels"),

CustomFormModel customFormModel = trainingPoller.getFinalResult();

// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model name given by user: %s%n", customFormModel.getModelName());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());

System.out.println("Recognized Fields:");
// looping through the subModels, which contains the fields they were trained on
// Since the given training documents are unlabeled we still group them but, they do not have a label.
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
    System.out.printf("Submodel Id: %s%n: ", customFormSubmodel.getModelId());
    // Since the training data is unlabeled, we are unable to return the accuracy of this model
    customFormSubmodel.getFields().forEach((field, customFormModelField) ->
        System.out.printf("Field: %s Field Label: %s%n",
            field, customFormModelField.getLabel()));

customFormModel.getTrainingDocuments().forEach(trainingDocumentInfo -> {
    System.out.printf("Document name: %s%n", trainingDocumentInfo.getName());
    System.out.printf("Document status: %s%n", trainingDocumentInfo.getStatus());
    System.out.printf("Document page count: %d%n", trainingDocumentInfo.getPageCount());
    if (!trainingDocumentInfo.getErrors().isEmpty()) {
        System.out.println("Document Errors:");
        trainingDocumentInfo.getErrors().forEach(documentModelOperationError ->
            System.out.printf("Error code %s, Error message: %s%n", documentModelOperationError.getErrorCode(),

Build a custom document model using 4.x.x beginBuildModel:

// Build custom document analysis model
String blobContainerUrl = "{SAS_URL_of_your_container_in_blob_storage}";
// The shared access signature (SAS) Url of your Azure Blob Storage container with your forms.
String prefix = "{blob_name_prefix}}";
SyncPoller<OperationResult, DocumentModelDetails> buildOperationPoller =
        new BuildDocumentModelOptions().setModelId("my-build-model").setDescription("model desc"),

DocumentModelDetails documentModelDetails = buildOperationPoller.getFinalResult();

// Model Info
System.out.printf("Model ID: %s%n", documentModelDetails.getModelId());
System.out.printf("Model Description: %s%n", documentModelDetails.getDescription());
System.out.printf("Model created on: %s%n%n", documentModelDetails.getCreatedOn());
documentModelDetails.getDocumentTypes().forEach((key, documentTypeDetails) -> {
    System.out.printf("Document type: %s%n", key);
    documentTypeDetails.getFieldSchema().forEach((name, documentFieldSchema) -> {
        System.out.printf("Document field: %s%n", name);
        System.out.printf("Document field type: %s%n", documentFieldSchema.getType().toString());
        System.out.printf("Document field confidence: %.2f%n", documentTypeDetails.getFieldConfidence().get(name));

Manage models

In 3.x.x, listing models returned only the custom trained models using the listCustomModel method.

With 4.x.x, list Models operation listModels:

  • returns a paged list of prebuilt in addition to custom models.
  • no longer includes submodels, instead a model can analyze different document types.
  • Only returns custom models that are built successfully. Unsuccessful model operations can be viewed with the get and list operation methods (note that document model operation data persists for only 24 hours).
  • In version 3.1.x of the library, models that had not succeeded were still created, had to be deleted by the user, and were returned in the list models response.

Additional samples

