- Added property
QueryFields
toAnalyzeDocumentOptions
to support field extraction without the need for added training. - Added property
Features
toAnalyzeDocumentOptions
to support add-on capabilities. - Added properties
SimilarFontFamily
,FontStyle
,FontWeight
,Color
, andBackgroundColor
toDocumentStyle
. These properties can only be populated whenDocumentAnalysisFeature.OcrFont
is enabled. - Added properties
Annotations
,Barcodes
,Formulas
,Images
, andKind
toDocumentPage
.Formulas
can only be populated whenDocumentAnalysisFeature.OcrFormula
is enabled. - Added member
FormulaBlock
toParagraphRole
. - Added methods in
DocumentAnalysisClient
to support custom document classification:ClassifyDocument
andClassifyDocumentFromUri
. - Added methods in
DocumentModelAdministrationClient
to support custom document classification:BuildDocumentClassifier
,GetDocumentClassifier
,GetDocumentClassifiers
, andDeleteDocumentClassifier
. - Added a new
DocumentClassifierBuildOperationDetails
class. Instances of this class can now be returned in calls toDocumentModelAdministrationClient.GetOperation
. - Added member
DocumentClassifierBuild
toDocumentOperationKind
. - Added member
Boolean
toDocumentFieldType
. - Added method
AsBoolean
toDocumentFieldValue
to support extracting values of boolean fields. - Added property
Code
to theCurrencyValue
class. - Added properties
Unit
,CityDistrict
,StateDistrict
,Suburb
,House
, andLevel
to theAddressValue
class. - Added property
CommonName
to theDocumentKeyValuePair
class. - Added property
ExpiresOn
to theDocumentModelDetails
andDocumentModelSummary
classes. - Added property
CustomNeuralDocumentModelBuilds
to theResourceDetails
class.
DocumentAnalysisClient
andDocumentModelAdministrationClient
now target service API version2023-02-28-preview
by default. Version2022-08-31
can still be targeted if specified in theDocumentAnalysisClientOptions
.
- Added
GetWords
method toDocumentLine
. It can be used to split the line into separateDocumentWord
instances. - Added derived classes to
DocumentModelOperationDetails
for each kind of operation:DocumentModelBuildOperationDetails
,DocumentModelCopyToOperationDetails
, andDocumentModelComposeOperationDetails
. - Added
DocumentField.ExpectedFieldType
property.
- The
DocumentAnalysisClient
andDocumentModelAdministrationClient
now target the service version2022-08-31
, so they don't support2020-06-30-preview
anymore. - Renamed
DocumentModelAdministrationClient
methods to use the termDocumentModel
instead ofModel
only. For example,BuildModel
andGetModels
becameBuildDocumentModel
andGetDocumentModels
.- Similarly,
Operation
types have been renamed to reflect this change. For example,ComposeModelOperation
becameComposeDocumentModelOperation
. - As a consequence,
BuildModelOptions
has been renamed toBuildDocumentModelOptions
.
- Similarly,
- Removed the
BoundingPolygon
type. AllBoundingPolygon
properties are now of typeIReadOnlyList<PointF>
. - Moved all
DocumentField
conversion methods, such asAsDate
andAsString
, to the newDocumentFieldValue
class. They can be accessed from theDocumentField.Value
property. DocumentField.ValueType
(now calledFieldType
) can now beUnknown
when the field value couldn't be parsed by the service. In this case,DocumentField.Content
can be used to get a textual representation of the field.- Updated
DocumentField.AsDate
to return aDateTimeOffset
instead of aDateTime
. - Renamed classes
DocumentModelOperationDetails
andDocumentModelOperationSummary
toOperationDetails
andOperationSummary
, respectively. - Moved property
Result
inDocumentModelOperationDetails
(now calledOperationDetails
) to each of its new derived classes. The property can't be accessed from the base class anymore. - Renamed class
DocTypeInfo
toDocumentTypeDetails
. - Renamed property
Offset
toIndex
in theDocumentSpan
class. - Renamed property
DocType
toDocumentType
in theAnalyzedDocument
class. - Renamed property
DocTypes
toDocumentTypes
in theDocumentModelDetails
class. - Renamed properties
DocumentModelCount
andDocumentModelLimit
toCustomDocumentModelCount
andCustomDocumentModelLimit
in theResourceDetails
class. - Removed property
BuildModelOptions.Prefix
. The prefix must now be set with theprefix
parameter in the methodBuildModel
. - Removed class
DocumentPageKind
and related properties. - Made
BoundingRegion
astruct
instead of aclass
. BoundingRegion
now implements theIEquatable<BoundingRegion>
interface.- Overrode
BoundingRegion.ToString
to include information about its page number and its bounding polygon in its string representation. DocumentSpan
now implements theIEquatable<DocumentSpan>
interface.- Overrode
DocumentSpan.ToString
to include information about its index and its length in its string representation. - Renamed
LengthUnit
toDocumentPageLengthUnit
. This change only affects the type defined in theDocumentAnalysis
namespace. - Renamed
SelectionMarkState
toDocumentSelectionMarkState
. This change only affects the type defined in theDocumentAnalysis
namespace. - Renamed
CopyAuthorization
toDocumentModelCopyAuthorization
. This change only affects the type defined in theDocumentAnalysis
namespace.
- Added
Length
property toBoundingPolygon
. - Added a public constructor to
CopyAuthorization
. - Added properties
AccessToken
andTargetResourceId
toCopyAuthorization
.
- Updated all long-running operation client methods to a new pattern. This affects
StartAnalyzeDocument
,StartAnalyzeDocumentFromUri
,StartBuildModel
,StartCopyModelTo
, andStartCreateComposedModel
methods. Changes are:- Removed the "Start" prefix. For example,
StartAnalyzeDocument
was renamed toAnalyzeDocument
. - Added a new required parameter:
waitUntil
. It specifies whether the operation should run to completion before returning or not, removing the need to callWaitForCompletion
in most scenarios.
- Removed the "Start" prefix. For example,
- Updated
DocumentModelInfo
andDocumentModel
:- Renamed them to
DocumentModelSummary
andDocumentModelDetails
, respectively. - Removed the inheritance between them.
- Renamed them to
- Updated
ModelOperationInfo
andModelOperation
:- Renamed them to
DocumentModelOperationSummary
andDocumentModelOperationDetails
, respectively. - Removed the inheritance between them.
- Updated
ResourceLocation
to be aUri
in both.
- Renamed them to
- Renamed
AccountProperties
toResourceDetails
. - Renamed method
GetAccountProperties
toGetResourceDetails
. - Renamed method
StartCreateComposedModel
toComposeModel
. - Renamed
BuildModelOptions.ModelDescription
toDescription
. - Renamed
modelDescription
parameters todescription
in methodsGetCopyAuthorization
andStartCreateComposedModel
(now calledComposeModel
). - Renamed
CopyAuthorization.ExpirationDateTime
toExpiresOn
. - Removed
DocumentCaption
andDocumentFootnote
features. - Updated the return type of
StartCreateComposedModel
(now calledComposeModel
) to aComposeModelOperation
. - Renamed class
CopyModelOperation
toCopyModelToOperation
. - Renamed parameter
analyzeDocumentOptions
tooptions
in theStartAnalyzeDocument
andStartAnalyzeDocumentFromUri
methods (now calledAnalyzeDocument
andAnalyzeDocumentFromUri
). - Renamed parameter
buildModelOptions
tooptions
in theStartBuildModel
method (now calledBuildModel
). FormRecognizerClientOptions.Audience
andDocumentAnalysisClientOptions.Audience
now default tonull
.- In the
DocumentAnalysis
namespace,CopyModelOperation.PercentCompleted
andBuildModelOperation.PercentCompleted
now throw anInvalidOperationException
if called before a call toUpdateStatus
. - Updated
CopyAuthorization.TargetModelLocation
to be aUri
instead ofstring
. - Removed method
DocumentAnalysisModelFactory.CopyAuthorization
.
- Added
Kind
property to theDocumentPage
class. - Added the
Paragraphs
property to theAnalyzeResult
class. This property holds information about the paragraphs extracted from the input documents. - Added
DocumentAnalysisClient
integration for ASP.NET Core (#27123).
- In the
DocumentAnalysis
namespace, renamedBoundingBox
model and properties toBoundingPolygon
. It will eventually be able to include more points to better fit the borders of a document element. - Removed the support for analyzing entities. The
DocumentEntity
class and related properties have been removed from the SDK. - Renamed
DocumentModelAdministrationClient.StartCopyModel
methods toStartCopyModelTo
. - Made
DocumentSpan
astruct
instead of aclass
. - In
AccountProperties
, renamedCount
andLimit
toDocumentModelCount
andDocumentModelLimit
. - In
DocumentPage
, propertiesAngle
,Height
,Unit
, andWidth
were made nullable. - In
DocumentTableCell
, propertiesKind
,RowSpan
, andColumnSpan
are not nullable anymore. - In
DocumentLanguage
, renamed propertyLanguageCode
toLocale
. - In the method
DocumentModelAdministrationClient.StartCreateComposedModel
, renamed parametermodelIds
tocomponentModelIds
. - The
DocumentAnalysisClient
andDocumentModelAdministrationClient
now target the service version2022-06-30-preview
, so they don't support2020-01-30-preview
anymore. DocumentAnalysisModelFactory.DocumentPage
has a newkind
parameter.
- Added the
DocumentField.AsCurrency
method and theDocumentFieldType.Currency
enum value to support analyzed currency fields. - Added the
Languages
property to theAnalyzeResult
class. This property is populated when using theprebuilt-read
model and holds information about the languages in which the document is written. - Added the
Tags
property to theBuildModelOptions
class. This property can be used to specify custom key-value attributes associated with the model to be built. - Added the
Tags
property to theDocumentModelInfo
and to theModelOperationInfo
classes. - Added the
BuildMode
property toDocTypeInfo
to indicate the technique used when building the correspoding model. - Added the
DocumentAnalysisModelFactory
static class to theAzure.AI.FormRecognizer.DocumentAnalysis
namespace. It contains methods for instantiatingDocumentAnalysis
models for mocking.
- Added the required parameter
buildMode
toStartBuildModel
methods. Users must now choose the technique (Template
orNeural
) used to build models. For more information about the available build modes and their differences, see here. - Added the
tags
parameter to theGetCopyAuthorization
methods. - Added the
tags
parameter to theStartCreateComposedModel
methods. - The
DocumentAnalysisClient
andDocumentModelAdministrationClient
now target the service version2022-01-30-preview
, so they don't support2021-09-30-preview
anymore.
- FormRecognizerAudience and DocumentAnalysisAudience have been added to allow the user to select the Azure cloud where the resource is located. Issue 17192.
BuildModelOperation
andCopyModelOperation
correctly populate thePercentCompleted
property, instead of always having a value of0
.
Note: Starting with version
2021-09-30-preview
, a new set of clients were introduced to leverage the newest features of the Form Recognizer service. Please see the Migration Guide for detailed instructions on how to update application code from client library version3.1.X
or lower to the latest version.
- This version of the SDK defaults to the latest supported Service API version, which currently is
2021_09_30_preview
. - Added class
DocumentAnalysisClient
to the newAzure.AI.FormRecognizer.DocumentAnalysis
namespace. This will be the main client to use when analyzing documents for service versions2021_09_30_preview
and higher. For lower versions, please use theFormRecognizerClient
. - Added methods
StartAnalyzeDocument
andStartAnalyzeDocumentFromUri
toDocumentAnalysisClient
. These methods substitute all existingStartRecognize<...>
methods, such asStartRecognizeContent
andStartRecognizeReceiptsFromUri
. - Added class
DocumentModelAdministrationClient
to the newAzure.AI.FormRecognizer.DocumentAnalysis
namespace. This will be the main client to use for model management for service versions2021_09_30_preview
and higher. For lower versions, please use theFormTrainingClient
. - Added methods
StartBuildModel
,StartCopyModel
,StartCreateComposedModel
,GetCopyAuthorization
,GetModel
,GetModels
,GetAccountProperties
,DeleteModel
,GetOperation
,GetOperations
, and the equivalent async methods toDocumentModelAdministrationClient
.
- Handles invoices and other recognition operations that return a
FormField
withText
and noBoundingBox
orPage
information.
- This General Availability (GA) release marks the stability of the changes introduced in package versions
3.1.0-beta.1
through3.1.0-beta.4
. - Updated the
FormRecognizerModelFactory
class to support missing model types for mocking. - Added support for service version
2.0
. This can be specified in theFormRecognizerClientOptions
object under theServiceVersion
enum. By default the SDK targets latest supported service version.
- The client defaults to the latest supported service version, which currently is
2.1
. - Renamed
Id
forIdentity
in all theStartRecognizeIdDocuments
functionalities. For example, the name of the method is nowStartRecognizeIdentityDocuments
. - Renamed the model
ReadingOrder
toFormReadingOrder
. - The model
TextAppearance
now includes the propertiesStyleName
andStyleConfidence
that were part of theTextStyle
object. - Removed the model
TextStyle
. - Renamed the method
AsCountryCode
toAsCountryRegion
. - Removed type
FieldValueGender
. - Removed value
Gender
from the modelFieldValuetype
.
- Updated dependency versions.
- Added support for pre-built passports and US driver licenses recognition with the
StartRecognizeIdDocuments
API. - Expanded the set of document languages that can be provided to the
StartRecognizeContent
API. - Added property
Pages
toRecognizeBusinessCardsOptions
,RecognizeCustomFormsOptions
,RecognizeInvoicesOptions
, andRecognizeReceiptsOptions
to specify the page numbers to recognize. - Added property
ReadingOrder
toRecognizeContentOptions
to specify the order in which recognized text lines are returned.
- The client defaults to the latest supported service version, which currently is
2.1-preview.3
. StartRecognizeCustomForms
now throws aRequestFailedException
when an invalid file is passed.
- Added protected constructors for mocking to
Operation
types, such asTrainingOperation
andRecognizeContentOperation
.
- Renamed the model
Appearance
toTextAppearance
. - Renamed the model
Style
toTextStyle
. - Renamed the extensible enum
TextStyle
toTextStyleName
. - Changed object type for property
Pages
underRecognizeContentOptions
fromIEnumerable
toIList
. - Changed model type of
Locale
fromstring
toFormRecognizerLocale
inRecognizeBusinessCardsOptions
,RecognizeInvoicesOptions
, andRecognizeReceiptsOptions
. - Changed model type of
Language
fromstring
toFormRecognizerLanguage
inRecognizeContentOptions
.
- It defaults to the latest supported service version, which currently is
2.1-preview.2
.
- Added integration for ASP.NET Core.
- Added support for pre-built business card recognition.
- Added support for pre-built invoices recognition.
- Added support for providing locale information when recognizing receipts and business cards. Supported locales include EN-US, EN-AU, EN-CA, EN-GB, EN-IN.
- Added support for providing the document language in
StartRecognizeContent
when recognizing a form. - Added support to train and recognize custom forms with selection marks such as check boxes and radio buttons. This functionality is only available in train with labels scenarios.
- Added support to
StartRecognizeContent
to recognize selection marks such as check boxes and radio buttons. - Added ability to create a composed model from the
FormTrainingClient
by calling methodStartCreateComposedModel
. - Added ability to pass parameter
ModelName
toStartTraining
methods. - Added the properties
ModelName
andProperties
to typesCustomFormModel
andCustomFormModelInfo
. - Added type
CustomFormModelProperties
that includes information like if a model is a composed model. - Added property
ModelId
toCustomFormSubmodel
andTrainingDocumentInfo
. - Added properties
ModelId
andFormTypeConfidence
toRecognizedForm
. - Added property
Appearance
toFormLine
to indicate the style of the extracted text. For example, "handwriting" or "other". - Added property
BoundingBox
toFormTable
. - Added support for
ContentType
image/bmp
in recognize content and prebuilt models. - Added property
Pages
toRecognizeContentOptions
to specify the page numbers to analyze.
- First stable release of the Azure.AI.FormRecognizer package.
- Renamed the model
BoundingBox
toFieldBoundingBox
.
- Added
FormRecognizerModelFactory
static class to support mocking model types.
- Bug in TaskExtensions.EnsureCompleted method that causes it to unconditionally throw an exception in the environments with synchronization context
- The library now targets the service's v2.0 API, instead of the v2.0-preview.1 API.
- Updated version number from
1.0.0-preview.5
to3.0.0-preview.1
. - Added models
RecognizeCustomFormsOptions
,RecognizeReceiptsOptions
, andRecognizeContentOptions
instead of a genericRecognizeOptions
to support passing configurable options to recognize APIs. - Added model
TrainingOptions
to support passing configurable options to training APIs. This type now includesTrainingFileFilter
. - Renamed the
FieldValue
propertyType
toValueType
. - Renamed the
TrainingDocumentInfo
propertyDocumentName
toName
. - Renamed the
TrainingFileFilter
propertyIncludeSubFolders
toIncludeSubfolders
. - Renamed the
FormRecognizerClient.StartRecognizeCustomForms
parameterformFileStream
toform
. - Renamed the
FormRecognizerClient.StartRecognizeCustomFormsFromUri
parameterformFileUri
toformUri
. - Renamed
CustomFormModelStatus.Training
toCustomFormModelStatus.Creating
. - Renamed
FormValueType.Integer
toFormValueType.Int64
. FormField
propertyValueData
is now set to null if there is no text, bounding box or page number associated with it.
- Made the
TrainingFileFilter
constructor public. - Fixed a bug in which
FormTrainingClient.GetCustomModel
threw an exception if the model was still being created (#13813). - Fixed a bug in which the
BoundingBox
indexer andToString
method threw aNullReferenceException
if it had no points (#13971). - Fixed a bug in which a default
FieldValue
threw aNullReferenceException
ifAsString
was called. The method now returnsnull
.
- Added diagnostics functionality to the
FormRecognizerClient
, to theFormTrainingClient
and to long-running operation types.
- Property
RequestedOn
renamed toTrainingStartedOn
onCustomFormModel
andCustomFormModelInfo
. - Property
CompletedOn
renamed toTrainingCompletedOn
onCustomFormModel
andCustomFormModelInfo
. - Property
LabelText
renamed toLabelData
onFormField
. - Property
ValueText
renamed toValueData
onFormField
. - Property
TextContent
renamed toFieldElements
onFieldData
andFormTableCell
. - Parameter
formUrl
inStartRecognizeContent
has been renamed toformUri
. - Parameter
receiptUrl
inStartRecognizeReceipts
has been renamed toreceiptUri
. - Parameter
accessToken
inCopyAuthorization.FromJson
has been renamed tocopyAuthorization
. - Parameter
IncludeTextContent
inRecognizeOptions
has been renamed toIncludeFieldElements
. - Model
FieldText
renamed toFieldData
. - Model
FormContent
renamed toFormElement
.
- Property
CopyAuthorization.ExpiresOn
type is nowDateTimeOffset
. RecognizedReceipt
andRecognizedReceiptsCollection
classes removed. Receipt field values must now be obtained from aRecognizedForm
.
- Fixed a bug in which the
FormPage.TextAngle
property sometimes fell out of the (-180, 180] range (#13082).
FormRecognizerError.Code
renamed toFormRecognizerError.ErrorCode
.FormTrainingClient.GetModelInfos
renamed toFormTrainingClient.GetCustomModels
.- Property
CreatedOn
in typesCustomFormModel
andCustomFormModelInfo
renamed toRequestedOn
. - Property
LastModified
in typesCustomFormModel
andCustomFormModelInfo
renamed toCompletedOn
. - Property
Models
inCustomFormModel
renamed toSubmodels
. - Type
CustomFormSubModel
renamed toCustomFormSubmodel
. ContentType
renamed toFormContentType
.- Parameter
useLabels
inFormTrainingClient.StartTraining
renamed touseTrainingLabels
. - Parameter
trainingFiles
inFormTrainingClient.StartTraining
renamed totrainingFilesUri
. - Parameter
filter
inFormTrainingClient.StartTraining
renamed totrainingFileFilter
. - Removed
Type
suffix from allFieldValueType
values. - Parameters
formFileStream
andformFileUri
inStartRecognizeContent
have been renamed toform
andformUrl
respectively. - Parameters
receiptFileStream
andreceiptFileUri
inStartRecognizeReceipts
have been renamed toreceipt
andreceiptUrl
respectively.
FormPageRange
is now astruct
.RecognizeContentOperation
now returns aFormPageCollection
.RecognizeReceiptsOperation
now returns aRecognizedReceiptCollection
.RecognizeCustomFormsOperation
now returns aRecognizedFormCollection
.- In preparation for service-side changes,
FieldValue.AsInt32
has been replaced byFieldValue.AsInt64
, which returns along
. - Parameter
useTrainingLabels
is now required forFormTrainingClient.StartTraining
. - Protected constructors have been removed from
Operation
types, such asTrainingOperation
orRecognizeContentOperation
. USReceipt
,USReceiptItem
,USReceiptType
andFormField{T}
types removed. Information about aRecognizedReceipt
must now be extracted from itsRecognizedForm
.ReceiptLocale
removed fromRecognizedReceipt
.- An
InvalidOperationException
is now raised if trying to access theValue
property of aTrainingOperation
when a trained model is invalid. - A
RequestFailedException
is now raised if a model withstatus=="invalid"
is returned from theStartTraining
andStartTrainingAsync
methods. - A
RequestFailedException
is now raised if an operation likeStartRecognizeReceipts
orStartRecognizeContent
fails. - An
InvalidOperationException
is now raised if trying to access theValue
property of axxOperation
object when the executed operation failed. - Method
GetFormTrainingClient
has been removed fromFormRecognizerClient
andGetFormRecognizerClient
has been added toFormTrainingClient
.
FormRecognizerClient
andFormTrainingClient
support authentication with Azure Active Directory.- Support to copy a custom model from one Form Recognizer resource to another.
- Headers and query parameters that were marked as
REDACTED
in error messages and logs are now exposed by default.
- Custom form recognition without labels can now handle multipaged forms (#11881).
RecognizedForm.Pages
now only contains pages whose numbers are withinRecognizedForm.PageRange
.FieldText.TextContent
cannot benull
anymore, and it will be empty when no element is returned from the service.- Custom form recognition with labels can now parse results from forms that do not contain all of the expected labels (#11821).
FormRecognizerClient.StartRecognizeCustomFormsFromUri
now works with URIs that contain blank spaces, encoded or not (#11564).- Receipt recognition can now parse results from forms that contain blank pages.
- All of
FormRecognizerClient
'sFormRecognizerClientOptions
are now passed to the client returned byFormRecognizerClient.GetFormTrainingClient
.
This is the first preview Azure Form Recognizer client library that follows the .NET Azure SDK Design Guidelines.
This library replaces the package Microsoft.Azure.CognitiveServices.FormRecognizer
.
This package's documentation and samples demonstrate the new API.
- This library supports only the Form Recognizer Service v2.0-preview API
- The namespace/package name for Azure Form Recognizer client library has changed from
Microsoft.Azure.CognitiveServices.FormRecognizer
toAzure.AI.FormRecognizer
- Two client design:
FormRecognizerClient
to recognize and extract fields/values on custom forms, receipts, and form content/layout.FormTrainingClient
to train custom models, and manage the custom models on your resource account.
- Different recognize methods based on input type: file stream or URI.
- File stream methods will automatically detect content-type of the input file.