Text Analytics is a cloud-based service that provides advanced natural language processing over raw text, and includes six main functions:
- Sentiment Analysis
- Language Detection
- Key Phrase Extraction
- Named Entity Recognition
- Personally Identifiable Information Entity Recognition
- Linked Entity Recognition
- Healthcare Recognition beta
- Analyze Operation beta
Source code | Package (Maven) | API reference documentation | Product Documentation | Samples
- A Java Development Kit (JDK), version 8 or later.
- Azure Subscription
- Cognitive Services or Text Analytics account to use this package.
Text Analytics supports both multi-service and single-service access. Create a Cognitive Services resource if you plan to access multiple cognitive services under a single endpoint/key. For Text Analytics access only, create a Text Analytics resource.
You can create either resource using the
Option 1: Azure Portal
Option 2: Azure CLI
Below is an example of how you can create a Text Analytics resource using the CLI:
# Create a new resource group to hold the text analytics resource -
# if using an existing resource group, skip this step
az group create --name my-resource-group --location westus2
# Create text analytics
az cognitiveservices account create \
--name text-analytics-resource \
--resource-group my-resource-group \
--kind TextAnalytics \
--sku F0 \
--location westus2 \
--yes
Note: This version targets Azure Text Analytics service API version v3.0.
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-textanalytics</artifactId>
<version>5.1.0-beta.3</version>
</dependency>
In order to interact with the Text Analytics service, you will need to create an instance of the Text Analytics client,
both the asynchronous and synchronous clients can be created by using TextAnalyticsClientBuilder
invoking buildClient()
creates a synchronous client while buildAsyncClient()
creates its asynchronous counterpart.
You will need an endpoint and either a key or AAD TokenCredential to instantiate a client object.
You can find the endpoint for your Text Analytics resource in the Azure Portal under the "Keys and Endpoint", or Azure CLI.
# Get the endpoint for the text analytics resource
az cognitiveservices account show --name "resource-name" --resource-group "resource-group-name" --query "endpoint"
Once you have the value for the key, provide it as a string to the AzureKeyCredential. This can be found in the Azure Portal under the "Keys and Endpoint" section in your created Text Analytics resource or by running the following Azure CLI command:
az cognitiveservices account keys list --resource-group <your-resource-group-name> --name <your-resource-name>
Use the key as the credential parameter to authenticate the client:
TextAnalyticsClient textAnalyticsClient = new TextAnalyticsClientBuilder()
.credential(new AzureKeyCredential("{key}"))
.endpoint("{endpoint}")
.buildClient();
The Azure Text Analytics client library provides a way to rotate the existing key.
AzureKeyCredential credential = new AzureKeyCredential("{key}");
TextAnalyticsClient textAnalyticsClient = new TextAnalyticsClientBuilder()
.credential(credential)
.endpoint("{endpoint}")
.buildClient();
credential.update("{new_key}");
Azure SDK for Java supports an Azure Identity package, making it easy to get credentials from Microsoft identity platform.
Authentication with AAD requires some initial setup:
- Add the Azure Identity package
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-identity</artifactId>
<version>1.2.0</version>
</dependency>
- Register a new Azure Active Directory application
- Grant access to Text Analytics by assigning the
"Cognitive Services User"
role to your service principal.
After setup, you can choose which type of credential from azure.identity to use. As an example, DefaultAzureCredential can be used to authenticate the client: Set the values of the client ID, tenant ID, and client secret of the AAD application as environment variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET.
Authorization is easiest using DefaultAzureCredential. It finds the best credential to use in its running environment. For more information about using Azure Active Directory authorization with Text Analytics, please refer to the associated documentation.
TokenCredential defaultCredential = new DefaultAzureCredentialBuilder().build();
TextAnalyticsAsyncClient textAnalyticsClient = new TextAnalyticsClientBuilder()
.endpoint("{endpoint}")
.credential(defaultCredential)
.buildAsyncClient();
The Text Analytics client library provides a TextAnalyticsClient and TextAnalyticsAsyncClient to do analysis on batches of documents. It provides both synchronous and asynchronous operations to access a specific use of Text Analytics, such as language detection or key phrase extraction.
A text input, also called a document, is a single unit of document to be analyzed by the predictive models in the Text Analytics service. Operations on a Text Analytics client may take a single document or a collection of documents to be analyzed as a batch. See service limitations for the document, including document length limits, maximum batch size, and supported text encoding.
For each supported operation, the Text Analytics client provides method overloads to take a single document, a batch
of documents as strings, or a batch of either TextDocumentInput
or DetectLanguageInput
objects. The overload
taking the TextDocumentInput
or DetectLanguageInput
batch allows callers to give each document a unique ID,
indicate that the documents in the batch are written in different languages, or provide a country hint about the
language of the document.
An operation result, such as AnalyzeSentimentResult
, is the result of a Text Analytics operation, containing a
prediction or predictions about a single document and a list of warnings inside of it. An operation's result type also
may optionally include information about the input document and how it was processed. An operation result contains a
isError
property that allows to identify if an operation executed was successful or unsuccessful for the given
document. When the operation results an error, you can simply call getError()
to get TextAnalyticsError
which
contains the reason why it is unsuccessful. If you are interested in how many characters are in your document,
or the number of operation transactions that have gone through, simply call getStatistics()
to get the
TextDocumentStatistics
which contains both information.
An operation result collection, such as AnalyzeSentimentResultCollection
, which is the collection of
the result of a Text Analytics analyzing sentiment operation. It also includes the model
version of the operation and statistics of the batch documents.
Note: It is recommended to use the batch methods when working on production environments as they allow you to send one request with multiple documents. This is more performant than sending a request per each document.
The following sections provide several code snippets covering some of the most common text analytics tasks, including:
- Analyze Sentiment
- Detect Language
- Extract Key Phrases
- Recognize Entities
- Recognize Personally Identifiable Information Entities
- Recognize Linked Entities
Text analytics support both synchronous and asynchronous client creation by using
TextAnalyticsClientBuilder
,
TextAnalyticsClient textAnalyticsClient = new TextAnalyticsClientBuilder()
.credential(new AzureKeyCredential("{key}"))
.endpoint("{endpoint}")
.buildClient();
TextAnalyticsAsyncClient textAnalyticsClient = new TextAnalyticsClientBuilder()
.credential(new AzureKeyCredential("{key}"))
.endpoint("{endpoint}")
.buildAsyncClient();
Run a Text Analytics predictive model to identify the positive, negative, neutral or mixed sentiment contained in the provided document or batch of documents.
String document = "The hotel was dark and unclean. I like microsoft.";
DocumentSentiment documentSentiment = textAnalyticsClient.analyzeSentiment(document);
System.out.printf("Analyzed document sentiment: %s.%n", documentSentiment.getSentiment());
documentSentiment.getSentences().forEach(sentenceSentiment ->
System.out.printf("Analyzed sentence sentiment: %s.%n", sentenceSentiment.getSentiment()));
For samples on using the production recommended option AnalyzeSentimentBatch
see here.
To get more granular information about the opinions related to aspects of a product/service, also knows as Aspect-based Sentiment Analysis in Natural Language Processing (NLP), see sample on sentiment analysis with opinion mining see here.
Please refer to the service documentation for a conceptual discussion of sentiment analysis.
Run a Text Analytics predictive model to determine the language that the provided document or batch of documents are written in.
String document = "Bonjour tout le monde";
DetectedLanguage detectedLanguage = textAnalyticsClient.detectLanguage(document);
System.out.printf("Detected language name: %s, ISO 6391 name: %s, confidence score: %f.%n",
detectedLanguage.getName(), detectedLanguage.getIso6391Name(), detectedLanguage.getConfidenceScore());
For samples on using the production recommended option DetectLanguageBatch
see here.
Please refer to the service documentation for a conceptual discussion of language detection.
Run a model to identify a collection of significant phrases found in the provided document or batch of documents.
String document = "My cat might need to see a veterinarian.";
System.out.println("Extracted phrases:");
textAnalyticsClient.extractKeyPhrases(document).forEach(keyPhrase -> System.out.printf("%s.%n", keyPhrase));
For samples on using the production recommended option ExtractKeyPhrasesBatch
see here.
Please refer to the service documentation for a conceptual discussion of key phrase extraction.
Run a predictive model to identify a collection of named entities in the provided document or batch of documents and categorize those entities into categories such as person, location, or organization. For more information on available categories, see Text Analytics Named Entity Categories.
String document = "Satya Nadella is the CEO of Microsoft";
textAnalyticsClient.recognizeEntities(document).forEach(entity ->
System.out.printf("Recognized entity: %s, category: %s, subcategory: %s, confidence score: %f.%n",
entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore()));
For samples on using the production recommended option RecognizeEntitiesBatch
see here.
Please refer to the service documentation for a conceptual discussion of named entity recognition.
Run a predictive model to identify a collection of Personally Identifiable Information(PII) entities in the provided document. It recognizes and categorizes PII entities in its input text, such as Social Security Numbers, bank account information, credit card numbers, and more. This endpoint is only supported for API versions v3.1-preview.1 and above.
String document = "My SSN is 859-98-0987";
PiiEntityCollection piiEntityCollection = textAnalyticsClient.recognizePiiEntities(document);
System.out.printf("Redacted Text: %s%n", piiEntityCollection.getRedactedText());
piiEntityCollection.forEach(entity -> System.out.printf(
"Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s,"
+ " confidence score: %f.%n",
entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore()));
For samples on using the production recommended option RecognizePiiEntitiesBatch
see here.
Please refer to the service documentation for supported PII entity types.
Run a predictive model to identify a collection of entities found in the provided document or batch of documents, and include information linking the entities to their corresponding entries in a well-known knowledge base.
String document = "Old Faithful is a geyser at Yellowstone Park.";
textAnalyticsClient.recognizeLinkedEntities(document).forEach(linkedEntity -> {
System.out.println("Linked Entities:");
System.out.printf("Name: %s, entity ID in data source: %s, URL: %s, data source: %s.%n",
linkedEntity.getName(), linkedEntity.getDataSourceEntityId(), linkedEntity.getUrl(), linkedEntity.getDataSource());
linkedEntity.getMatches().forEach(match ->
System.out.printf("Text: %s, confidence score: %f.%n", match.getText(), match.getConfidenceScore()));
});
For samples on using the production recommended option RecognizeLinkedEntitiesBatch
see here.
Please refer to the service documentation for a conceptual discussion of entity linking.
Text Analytics for health is a containerized service that extracts and labels relevant medical information from unstructured texts such as doctor's notes, discharge summaries, clinical documents, and electronic health records. Currently, Azure Active Directory (AAD) is not supported in the Healthcare recognition feature. In order to use this functionality, request to access public preview is required. For more information see How to: Use Text Analytics for health.
List<TextDocumentInput> documents = Arrays.asList(new TextDocumentInput("0",
"RECORD #333582770390100 | MH | 85986313 | | 054351 | 2/14/2001 12:00:00 AM | "
+ "CORONARY ARTERY DISEASE | Signed | DIS | Admission Date: 5/22/2001 "
+ "Report Status: Signed Discharge Date: 4/24/2001 ADMISSION DIAGNOSIS: "
+ "CORONARY ARTERY DISEASE. HISTORY OF PRESENT ILLNESS: "
+ "The patient is a 54-year-old gentleman with a history of progressive angina over the past several months. "
+ "The patient had a cardiac catheterization in July of this year revealing total occlusion of the RCA and "
+ "50% left main disease , with a strong family history of coronary artery disease with a brother dying at "
+ "the age of 52 from a myocardial infarction and another brother who is status post coronary artery bypass grafting. "
+ "The patient had a stress echocardiogram done on July , 2001 , which showed no wall motion abnormalities ,"
+ "but this was a difficult study due to body habitus. The patient went for six minutes with minimal ST depressions "
+ "in the anterior lateral leads , thought due to fatigue and wrist pain , his anginal equivalent. Due to the patient's "
+ "increased symptoms and family history and history left main disease with total occasional of his RCA was referred "
+ "for revascularization with open heart surgery."
));
RecognizeHealthcareEntityOptions options = new RecognizeHealthcareEntityOptions().setIncludeStatistics(true);
SyncPoller<TextAnalyticsOperationResult, PagedIterable<HealthcareTaskResult>> syncPoller =
textAnalyticsClient.beginAnalyzeHealthcare(documents, options, Context.NONE);
syncPoller.waitForCompletion();
syncPoller.getFinalResult().forEach(healthcareTaskResult ->
healthcareTaskResult.getResult().forEach(healthcareEntitiesResult -> {
System.out.println("Document entities: ");
HealthcareEntityCollection healthcareEntities = healthcareEntitiesResult.getEntities();
AtomicInteger ct = new AtomicInteger();
healthcareEntities.forEach(healthcareEntity -> {
System.out.printf("i = %d, Text: %s, category: %s, subcategory: %s, confidence score: %f.%n",
ct.getAndIncrement(),
healthcareEntity.getText(), healthcareEntity.getCategory(), healthcareEntity.getSubcategory(),
healthcareEntity.getConfidenceScore());
List<HealthcareEntityLink> links = healthcareEntity.getDataSourceEntityLinks();
if (links != null) {
links.forEach(healthcareEntityLink ->
System.out.printf("\tHealthcare data source ID: %s, data source: %s.%n",
healthcareEntityLink.getDataSourceId(), healthcareEntityLink.getDataSource()));
}
});
healthcareEntities.getEntityRelations().forEach(
healthcareEntityRelation ->
System.out.printf("Is bidirectional: %s, target: %s, source: %s, relation type: %s.%n",
healthcareEntityRelation.isBidirectional(),
healthcareEntityRelation.getTargetLink(),
healthcareEntityRelation.getSourceLink(),
healthcareEntityRelation.getRelationType()));
}));
To cancel a long-running healthcare task,
SyncPoller<TextAnalyticsOperationResult, Void> textAnalyticsOperationResultVoidSyncPoller
= textAnalyticsClient.beginCancelHealthcareTask("{healthcare_task_id}",
new RecognizeHealthcareEntityOptions().setPollInterval(Duration.ofSeconds(10)), Context.NONE);
PollResponse<TextAnalyticsOperationResult> poll = textAnalyticsOperationResultVoidSyncPoller.poll();
System.out.printf("Task status: %s.%n", poll.getStatus());
The Analyze
functionality allows to choose which of the supported Text Analytics features to execute in the same
set of documents. Currently, the supported features are: entity recognition
, key phrase extraction
, and
Personally Identifiable Information (PII) recognition
.
List<TextDocumentInput> documents = Arrays.asList(
new TextDocumentInput("0",
"We went to Contoso Steakhouse located at midtown NYC last week for a dinner party, and we adore"
+ " the spot! They provide marvelous food and they have a great menu. The chief cook happens to be"
+ " the owner (I think his name is John Doe) and he is super nice, coming out of the kitchen and "
+ "greeted us all. We enjoyed very much dining in the place! The Sirloin steak I ordered was tender"
+ " and juicy, and the place was impeccably clean. You can even pre-order from their online menu at"
+ " www.contososteakhouse.com, call 312-555-0176 or send email to order@contososteakhouse.com! The"
+ " only complaint I have is the food didn't come fast enough. Overall I highly recommend it!")
);
SyncPoller<TextAnalyticsOperationResult, PagedIterable<AnalyzeTasksResult>> syncPoller =
textAnalyticsClient.beginAnalyzeTasks(documents,
new AnalyzeTasksOptions().setDisplayName("{tasks_display_name}")
.setKeyPhrasesExtractionTasks(Arrays.asList(new KeyPhrasesTask()))
.setPiiEntitiesRecognitionTasks(Arrays.asList(new PiiTask())),
Context.NONE);
syncPoller.waitForCompletion();
syncPoller.getFinalResult().forEach(analyzeJobState -> {
analyzeJobState.getKeyPhraseExtractionTasks().forEach(taskResult -> {
AtomicInteger counter = new AtomicInteger();
for (ExtractKeyPhraseResult extractKeyPhraseResult : taskResult) {
System.out.printf("%n%s%n", documents.get(counter.getAndIncrement()));
System.out.println("Extracted phrases:");
extractKeyPhraseResult.getKeyPhrases()
.forEach(keyPhrases -> System.out.printf("\t%s.%n", keyPhrases));
}
});
analyzeJobState.getEntityRecognitionPiiTasks().forEach(taskResult -> {
AtomicInteger counter = new AtomicInteger();
for (RecognizePiiEntitiesResult entitiesResult : taskResult) {
System.out.printf("%n%s%n", documents.get(counter.getAndIncrement()));
PiiEntityCollection piiEntityCollection = entitiesResult.getEntities();
System.out.printf("Redacted Text: %s%n", piiEntityCollection.getRedactedText());
piiEntityCollection.forEach(entity -> System.out.printf(
"Recognized Personally Identifiable Information entity: %s, entity category: %s, "
+ "entity subcategory: %s, offset: %s, confidence score: %f.%n",
entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getOffset(),
entity.getConfidenceScore()));
}
});
});
For more examples, such as asynchronous samples, refer to here.
Text Analytics clients raise exceptions. For example, if you try to detect the languages of a batch of text with same
document IDs, 400
error is return that indicating bad request. In the following code snippet, the error is handled
gracefully by catching the exception and display the additional information about the error.
List<DetectLanguageInput> documents = Arrays.asList(
new DetectLanguageInput("1", "This is written in English.", "us"),
new DetectLanguageInput("1", "Este es un documento escrito en Español.", "es")
);
try {
textAnalyticsClient.detectLanguageBatchWithResponse(documents, null, Context.NONE);
} catch (HttpResponseException e) {
System.out.println(e.getMessage());
}
You can set the AZURE_LOG_LEVEL
environment variable to view logging statements made in the client library. For
example, setting AZURE_LOG_LEVEL=2
would show all informational, warning, and error log messages. The log levels can
be found here: log levels.
All client libraries by default use the Netty HTTP client. Adding the above dependency will automatically configure the client library to use the Netty HTTP client. Configuring or changing the HTTP client is detailed in the HTTP clients wiki.
All client libraries, by default, use the Tomcat-native Boring SSL library to enable native-level performance for SSL operations. The Boring SSL library is an uber jar containing native libraries for Linux / macOS / Windows, and provides better performance compared to the default SSL implementation within the JDK. For more information, including how to reduce the dependency size, refer to the performance tuning section of the wiki.
- Samples are explained in detail here.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.