Skip to content

Commit

Permalink
feat(client-textract): This release adds support for specifying and e…
Browse files Browse the repository at this point in the history
…xtracting information from documents using the Queries feature within Analyze Document API
  • Loading branch information
awstools committed Apr 19, 2022
1 parent ae5a8f2 commit 584f346
Show file tree
Hide file tree
Showing 8 changed files with 322 additions and 16 deletions.
15 changes: 12 additions & 3 deletions clients/client-textract/src/Textract.ts
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,16 @@ export class Textract extends TextractClient {
* All lines and words that are detected in the document are returned (including text that doesn't have a
* relationship with the value of <code>FeatureTypes</code>). </p>
* </li>
* <li>
* <p>Queries.A QUERIES_RESULT Block object contains the answer to the query, the alias associated and an ID that
* connect it to the query asked. This Block also contains a location and attached confidence score.</p>
* </li>
* </ul>
*
* <p>Selection elements such as check boxes and option buttons (radio buttons) can be detected in form data and in tables.
* A SELECTION_ELEMENT <code>Block</code> object contains information about a selection element,
* including the selection status.</p>
*
* <p>You can choose which type of analysis to perform by specifying the <code>FeatureTypes</code> list.
* </p>
* <p>The output is returned in a list of <code>Block</code> objects.</p>
Expand Down Expand Up @@ -165,7 +170,8 @@ export class Textract extends TextractClient {
/**
* <p>Analyzes identity documents for relevant information. This information is extracted
* and returned as <code>IdentityDocumentFields</code>, which records both the normalized
* field and value of the extracted text.</p>
* field and value of the extracted text.Unlike other Amazon Textract operations, <code>AnalyzeID</code>
* doesn't return any Geometry data.</p>
*/
public analyzeID(args: AnalyzeIDCommandInput, options?: __HttpHandlerOptions): Promise<AnalyzeIDCommandOutput>;
public analyzeID(args: AnalyzeIDCommandInput, cb: (err: any, data?: AnalyzeIDCommandOutput) => void): void;
Expand All @@ -192,7 +198,7 @@ export class Textract extends TextractClient {

/**
* <p>Detects text in the input document. Amazon Textract can detect lines of text and the
* words that make up a line of text. The input document must be an image in JPEG or PNG
* words that make up a line of text. The input document must be an image in JPEG, PNG, PDF, or TIFF
* format. <code>DetectDocumentText</code> returns the detected text in an array of <a>Block</a> objects. </p>
* <p>Each document page has as an associated <code>Block</code> of type PAGE. Each PAGE <code>Block</code> object
* is the parent of LINE <code>Block</code> objects that represent the lines of detected text on a page. A LINE <code>Block</code> object is
Expand Down Expand Up @@ -262,14 +268,17 @@ export class Textract extends TextractClient {
* relationship with the value of the <code>StartDocumentAnalysis</code>
* <code>FeatureTypes</code> input parameter). </p>
* </li>
* <li>
* <p>Queries. A QUERIES_RESULT Block object contains the answer to the query, the alias associated and an ID that
* connect it to the query asked. This Block also contains a location and attached confidence score</p>
* </li>
* </ul>
*
* <p>Selection elements such as check boxes and option buttons (radio buttons) can be detected in form data and in tables.
* A SELECTION_ELEMENT <code>Block</code> object contains information about a selection element,
* including the selection status.</p>
*
*
*
* <p>Use the <code>MaxResults</code> parameter to limit the number of blocks that are
* returned. If there are more results than specified in <code>MaxResults</code>, the value of
* <code>NextToken</code> in the operation response contains a pagination token for getting
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,16 @@ export interface AnalyzeDocumentCommandOutput extends AnalyzeDocumentResponse, _
* All lines and words that are detected in the document are returned (including text that doesn't have a
* relationship with the value of <code>FeatureTypes</code>). </p>
* </li>
* <li>
* <p>Queries.A QUERIES_RESULT Block object contains the answer to the query, the alias associated and an ID that
* connect it to the query asked. This Block also contains a location and attached confidence score.</p>
* </li>
* </ul>
*
* <p>Selection elements such as check boxes and option buttons (radio buttons) can be detected in form data and in tables.
* A SELECTION_ELEMENT <code>Block</code> object contains information about a selection element,
* including the selection status.</p>
*
* <p>You can choose which type of analysis to perform by specifying the <code>FeatureTypes</code> list.
* </p>
* <p>The output is returned in a list of <code>Block</code> objects.</p>
Expand Down
3 changes: 2 additions & 1 deletion clients/client-textract/src/commands/AnalyzeIDCommand.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ export interface AnalyzeIDCommandOutput extends AnalyzeIDResponse, __MetadataBea
/**
* <p>Analyzes identity documents for relevant information. This information is extracted
* and returned as <code>IdentityDocumentFields</code>, which records both the normalized
* field and value of the extracted text.</p>
* field and value of the extracted text.Unlike other Amazon Textract operations, <code>AnalyzeID</code>
* doesn't return any Geometry data.</p>
* @example
* Use a bare-bones client and the command you need to make an API call.
* ```javascript
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ export interface DetectDocumentTextCommandOutput extends DetectDocumentTextRespo

/**
* <p>Detects text in the input document. Amazon Textract can detect lines of text and the
* words that make up a line of text. The input document must be an image in JPEG or PNG
* words that make up a line of text. The input document must be an image in JPEG, PNG, PDF, or TIFF
* format. <code>DetectDocumentText</code> returns the detected text in an array of <a>Block</a> objects. </p>
* <p>Each document page has as an associated <code>Block</code> of type PAGE. Each PAGE <code>Block</code> object
* is the parent of LINE <code>Block</code> objects that represent the lines of detected text on a page. A LINE <code>Block</code> object is
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,17 @@ export interface GetDocumentAnalysisCommandOutput extends GetDocumentAnalysisRes
* relationship with the value of the <code>StartDocumentAnalysis</code>
* <code>FeatureTypes</code> input parameter). </p>
* </li>
* <li>
* <p>Queries. A QUERIES_RESULT Block object contains the answer to the query, the alias associated and an ID that
* connect it to the query asked. This Block also contains a location and attached confidence score</p>
* </li>
* </ul>
*
* <p>Selection elements such as check boxes and option buttons (radio buttons) can be detected in form data and in tables.
* A SELECTION_ELEMENT <code>Block</code> object contains information about a selection element,
* including the selection status.</p>
*
*
*
* <p>Use the <code>MaxResults</code> parameter to limit the number of blocks that are
* returned. If there are more results than specified in <code>MaxResults</code>, the value of
* <code>NextToken</code> in the operation response contains a pagination token for getting
Expand Down
104 changes: 101 additions & 3 deletions clients/client-textract/src/models/models_0.ts
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ export namespace Document {

export enum FeatureType {
FORMS = "FORMS",
QUERIES = "QUERIES",
TABLES = "TABLES",
}

Expand Down Expand Up @@ -174,11 +175,79 @@ export namespace HumanLoopConfig {
});
}

/**
* <p>Each query contains the question you want to ask in the Text and the alias you want to associate.</p>
*/
export interface Query {
/**
* <p>Question that Amazon Textract will apply to the document. An example would be "What is the customer's SSN?"</p>
*/
Text: string | undefined;

/**
* <p>Alias attached to the query, for ease of location.</p>
*/
Alias?: string;

/**
* <p>List of pages associated with the query. The following is a list of rules for using this parameter.</p>
* <ul>
* <li>
* <p>If a page is not specified, it is set to <code>["1"]</code> by default.</p>
* </li>
* <li>
* <p>The following characters are allowed in the parameter's string:
* <code>0 1 2 3 4 5 6 7 8 9 - *</code>. No whitespace is allowed.</p>
* </li>
* <li>
* <p>When using <code>*</code> to indicate all pages, it must be the only element
* in the string.</p>
* </li>
* <li>
* <p>You can use page intervals, such as <code>[“1-3”, “1-1”, “4-*”]</code>. Where <code>*</code> indicates last page of
* document.</p>
* </li>
* <li>
* <p>Specified pages must be greater than 0 and less than or equal to the number of pages in the document.</p>
* </li>
* </ul>
*/
Pages?: string[];
}

export namespace Query {
/**
* @internal
*/
export const filterSensitiveLog = (obj: Query): any => ({
...obj,
});
}

/**
* <p></p>
*/
export interface QueriesConfig {
/**
* <p></p>
*/
Queries: Query[] | undefined;
}

export namespace QueriesConfig {
/**
* @internal
*/
export const filterSensitiveLog = (obj: QueriesConfig): any => ({
...obj,
});
}

export interface AnalyzeDocumentRequest {
/**
* <p>The input document as base64-encoded bytes or an Amazon S3 object. If you use the AWS CLI
* to call Amazon Textract operations, you can't pass image bytes. The document must be an image
* in JPEG or PNG format.</p>
* in JPEG, PNG, PDF, or TIFF format.</p>
* <p>If you're using an AWS SDK to call Amazon Textract, you might not need to base64-encode
* image bytes that are passed using the <code>Bytes</code> field. </p>
*/
Expand All @@ -197,6 +266,11 @@ export interface AnalyzeDocumentRequest {
* <p>Sets the configuration for the human in the loop workflow for analyzing documents.</p>
*/
HumanLoopConfig?: HumanLoopConfig;

/**
* <p>Contains Queries and the alias for those Queries, as determined by the input. </p>
*/
QueriesConfig?: QueriesConfig;
}

export namespace AnalyzeDocumentRequest {
Expand All @@ -214,6 +288,8 @@ export enum BlockType {
LINE = "LINE",
MERGED_CELL = "MERGED_CELL",
PAGE = "PAGE",
QUERY = "QUERY",
QUERY_RESULT = "QUERY_RESULT",
SELECTION_ELEMENT = "SELECTION_ELEMENT",
TABLE = "TABLE",
TITLE = "TITLE",
Expand Down Expand Up @@ -334,6 +410,7 @@ export namespace Geometry {
}

export enum RelationshipType {
ANSWER = "ANSWER",
CHILD = "CHILD",
COMPLEX_FEATURES = "COMPLEX_FEATURES",
MERGED_CELL = "MERGED_CELL",
Expand Down Expand Up @@ -463,6 +540,17 @@ export interface Block {
* value of <code>SelectionStatus</code> to determine the status of the selection
* element.</p>
* </li>
* <li>
* <p>
* <i>QUERY</i> - A question asked during the call of AnalyzeDocument. Contains an
* alias and an ID that attachs it to its answer.</p>
* </li>
* <li>
* <p>
* <i>QUERY_RESULT</i> - A response to a question asked during the call
* of analyze document. Comes with an alias and ID for ease of locating in a
* response. Also contains location and confidence score.</p>
* </li>
* </ul>
*/
BlockType?: BlockType | string;
Expand Down Expand Up @@ -574,6 +662,11 @@ export interface Block {
* considered to be a single-page document.</p>
*/
Page?: number;

/**
* <p></p>
*/
Query?: Query;
}

export namespace Block {
Expand Down Expand Up @@ -880,8 +973,8 @@ export class ThrottlingException extends __BaseException {
}

/**
* <p>The format of the input document isn't supported. Documents for synchronous operations can be in
* PNG or JPEG format only. Documents for asynchronous operations can be in PDF format.</p>
* <p>The format of the input document isn't supported. Documents for operations can be in
* PNG, JPEG, PDF, or TIFF format.</p>
*/
export class UnsupportedDocumentException extends __BaseException {
readonly name: "UnsupportedDocumentException" = "UnsupportedDocumentException";
Expand Down Expand Up @@ -1826,6 +1919,11 @@ export interface StartDocumentAnalysisRequest {
* be encrypted server side,using SSE-S3.</p>
*/
KMSKeyId?: string;

/**
* <p></p>
*/
QueriesConfig?: QueriesConfig;
}

export namespace StartDocumentAnalysisRequest {
Expand Down
75 changes: 75 additions & 0 deletions clients/client-textract/src/protocols/Aws_json1_1.ts
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ import {
OutputConfig,
Point,
ProvisionedThroughputExceededException,
QueriesConfig,
Query,
Relationship,
S3Object,
StartDocumentAnalysisRequest,
Expand Down Expand Up @@ -1117,6 +1119,10 @@ const serializeAws_json1_1AnalyzeDocumentRequest = (input: AnalyzeDocumentReques
input.HumanLoopConfig !== null && {
HumanLoopConfig: serializeAws_json1_1HumanLoopConfig(input.HumanLoopConfig, context),
}),
...(input.QueriesConfig !== undefined &&
input.QueriesConfig !== null && {
QueriesConfig: serializeAws_json1_1QueriesConfig(input.QueriesConfig, context),
}),
};
};

Expand Down Expand Up @@ -1265,6 +1271,44 @@ const serializeAws_json1_1OutputConfig = (input: OutputConfig, context: __SerdeC
};
};

const serializeAws_json1_1Queries = (input: Query[], context: __SerdeContext): any => {
return input
.filter((e: any) => e != null)
.map((entry) => {
if (entry === null) {
return null as any;
}
return serializeAws_json1_1Query(entry, context);
});
};

const serializeAws_json1_1QueriesConfig = (input: QueriesConfig, context: __SerdeContext): any => {
return {
...(input.Queries !== undefined &&
input.Queries !== null && { Queries: serializeAws_json1_1Queries(input.Queries, context) }),
};
};

const serializeAws_json1_1Query = (input: Query, context: __SerdeContext): any => {
return {
...(input.Alias !== undefined && input.Alias !== null && { Alias: input.Alias }),
...(input.Pages !== undefined &&
input.Pages !== null && { Pages: serializeAws_json1_1QueryPages(input.Pages, context) }),
...(input.Text !== undefined && input.Text !== null && { Text: input.Text }),
};
};

const serializeAws_json1_1QueryPages = (input: string[], context: __SerdeContext): any => {
return input
.filter((e: any) => e != null)
.map((entry) => {
if (entry === null) {
return null as any;
}
return entry;
});
};

const serializeAws_json1_1S3Object = (input: S3Object, context: __SerdeContext): any => {
return {
...(input.Bucket !== undefined && input.Bucket !== null && { Bucket: input.Bucket }),
Expand Down Expand Up @@ -1294,6 +1338,10 @@ const serializeAws_json1_1StartDocumentAnalysisRequest = (
}),
...(input.OutputConfig !== undefined &&
input.OutputConfig !== null && { OutputConfig: serializeAws_json1_1OutputConfig(input.OutputConfig, context) }),
...(input.QueriesConfig !== undefined &&
input.QueriesConfig !== null && {
QueriesConfig: serializeAws_json1_1QueriesConfig(input.QueriesConfig, context),
}),
};
};

Expand Down Expand Up @@ -1430,6 +1478,10 @@ const deserializeAws_json1_1Block = (output: any, context: __SerdeContext): Bloc
: undefined,
Id: __expectString(output.Id),
Page: __expectInt32(output.Page),
Query:
output.Query !== undefined && output.Query !== null
? deserializeAws_json1_1Query(output.Query, context)
: undefined,
Relationships:
output.Relationships !== undefined && output.Relationships !== null
? deserializeAws_json1_1RelationshipList(output.Relationships, context)
Expand Down Expand Up @@ -1921,6 +1973,29 @@ const deserializeAws_json1_1ProvisionedThroughputExceededException = (
} as any;
};

const deserializeAws_json1_1Query = (output: any, context: __SerdeContext): Query => {
return {
Alias: __expectString(output.Alias),
Pages:
output.Pages !== undefined && output.Pages !== null
? deserializeAws_json1_1QueryPages(output.Pages, context)
: undefined,
Text: __expectString(output.Text),
} as any;
};

const deserializeAws_json1_1QueryPages = (output: any, context: __SerdeContext): string[] => {
const retVal = (output || [])
.filter((e: any) => e != null)
.map((entry: any) => {
if (entry === null) {
return null as any;
}
return __expectString(entry) as any;
});
return retVal;
};

const deserializeAws_json1_1Relationship = (output: any, context: __SerdeContext): Relationship => {
return {
Ids:
Expand Down
Loading

0 comments on commit 584f346

Please sign in to comment.