<h1 style="color:orange; font-size:48px; text-align:center">Topic 5: Language Understanding solution with Azure AI Language</h1>

Azure AI Language Service offers a plethora of features that allow for a comprehensive understanding of human language. It ensures applications can effectively communicate with users and derive insights from user language patterns.

# Categories of Features:
Azure AI Language Service classifies its features into two categories:

- **1) Pre-configured features** - These are out-of-the-box features that do not require additional training or model labeling. They're ready for use upon resource creation.
- **2) Learned features** - These require custom data labeling, model training, and deployment. They are tailored to specific needs based on the input data and training.

For in-depth details, always refer to the official Azure AI Language service documentation through following link.

https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/overview

https://learn.microsoft.com/en-us/python/api/azure-ai-textanalytics/azure.ai.textanalytics.textanalyticsclient?view=azure-python#azure-ai-textanalytics-textanalyticsclient-begin-abstract-summary

**Note:** This notebook covers only Pre-Configured features. In upcomming module, we will look at the Custom/Learned features.

# How it Works:
All queries to the Azure AI Language Service follow a specific URL pattern:

https://{ENDPOINT}/text/analytics/{VERSION}/{FEATURE}

- **{ENDPOINT}:** This represents the endpoint for authenticating your API request, such as myLanguageService.cognitiveservices.azure.com.
- **{VERSION}:** The version number of the service, like v3.0 or 2022-05-01.
- **{FEATURE}:** This is the specific feature you are trying to access, like keyPhrases for key phrase detection.

A JSON body accompanying the POST request defines input documents, tasks, and other metadata.

# Pre-configured Features
## 1. Summarization
Summarization condenses large pieces of text into shorter, key sentences that capture the essence of the content.

**Example:** If you provide a lengthy review of a book, the summarization feature could return a concise summary, highlighting the main points of the review.

### 1.1 Summary Approaches

Abstractive and extractive summarization are two different approaches to automatically generating concise and coherent summaries of longer texts, such as articles, documents, or news articles. They differ in how they generate summaries:

#### a) **Extractive Summarization**

   - Extractive summarization works by selecting and extracting sentences or phrases directly from the input text.
   - It identifies the most informative and important sentences or phrases from the original text and includes them in the summary.
   - The sentences selected for the summary are typically not modified; they are presented as-is from the original text.
   - Extractive summarization methods often use techniques like sentence scoring, ranking, and keyword extraction to determine the most relevant content.

**Advantages:**

   - Extractive summaries are often more factually accurate since they use sentences directly from the source text.
   - They are relatively easier to implement and evaluate.
**Limitations:**

   - Extractive summaries may not always be as coherent or fluent as abstractive summaries since they are composed of individual sentences that may not fit together seamlessly.
   - They might not capture the nuances or broader context of the text effectively.

In [1]:
# Reading Credentials from Excel
import pandas as pd
df_cred = pd.read_excel("credentials.xlsx")

## How to get KEY & ENDPOINT?

To create an Azure AI Services multi-service account and obtain the endpoint and key for a Python client app, you can follow these steps:

Step 1: Sign in to Azure Portal
- If you don't have an Azure account, you'll need to sign up for one. Once you have an account, log in to the Azure Portal.

Step 2: Create a New Azure AI Service Account

- In the Azure Portal, click on "Create a resource" in the left-hand menu.

- Search for "Azure AI" in the search bar and select "Azure AI."

- Click on "Create" to start configuring your Azure AI service account.

- In the "Basics" tab, provide the following information:

    - Subscription: Choose your Azure subscription.
    - Resource Group: Create a new resource group or select an existing one.
    - Region: Choose the region where you want to deploy the AI service.
    - Under the "Configuration" section, select "Multi-service."

- Provide a unique name for your multi-service account.

- Leave the rest of the options as their default values or configure them according to your needs.

- Review the configuration, and then click "Review + create."

- Review the summary, and if everything looks correct, click "Create" to create the multi-service account.

Step 3: Obtain Endpoint and Key
- After the multi-service account is created, follow these steps to obtain the endpoint and key:

- Once the deployment is complete, navigate to the multi-service account you just created in the Azure Portal.

- In the left-hand menu, under "Settings," click on "Keys and Endpoint."

- You will see two keys, usually named "Key 1" and "Key 2." Either key can be used for authentication. Copy one of the keys and store it securely.

- Below the keys, you will find the "Endpoint" URL. Copy the URL; this is the endpoint you will use to access the AI services.

In [2]:
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient

endpoint = df_cred["endpoint"][0]
key = df_cred["key"][0]

text_analytics_client = TextAnalyticsClient(
   endpoint=endpoint,
   credential=AzureKeyCredential(key),
)

document = [
   "At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, "
   "human-centric approach to learning and understanding. As Chief Technology Officer of Azure AI Cognitive "
   "Services, I have been working with a team of amazing scientists and engineers to turn this quest into a "
   "reality. In my role, I enjoy a unique perspective in viewing the relationship among three attributes of "
   "human cognition: monolingual text (X), audio or visual sensory signals, (Y) and multilingual (Z). At the "
   "intersection of all three, there's magic-what we call XYZ-code as illustrated in Figure 1-a joint "
   "representation to create more powerful AI that can speak, hear, see, and understand humans better. "
   "We believe XYZ-code will enable us to fulfill our long-term vision: cross-domain transfer learning, "
   "spanning modalities and languages. The goal is to have pretrained models that can jointly learn "
   "representations to support a broad range of downstream AI tasks, much in the way humans do today. "
   "Over the past five years, we have achieved human performance on benchmarks in conversational speech "
   "recognition, machine translation, conversational question answering, machine reading comprehension, "
   "and image captioning. These five breakthroughs provided us with strong signals toward our more ambitious "
   "aspiration to produce a leap in AI capabilities, achieving multisensory and multilingual learning that "
   "is closer in line with how humans learn and understand. I believe the joint XYZ-code is a foundational "
   "component of this aspiration, if grounded with external knowledge sources in the downstream AI tasks."
]

poller = text_analytics_client.begin_extract_summary(document)
extract_summary_results = poller.result()
for result in extract_summary_results:
    if result.kind == "ExtractiveSummarization":
        print("Summary extracted: \n{}".format(
           " ".join([sentence.text for sentence in result.sentences]))
       )
    elif result.is_error is True:
        print("...Is an error with code '{}' and message '{}'".format(
           result.error.code, result.error.message
       ))

Summary extracted: 
At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, human-centric approach to learning and understanding. The goal is to have pretrained models that can jointly learn representations to support a broad range of downstream AI tasks, much in the way humans do today. Over the past five years, we have achieved human performance on benchmarks in conversational speech recognition, machine translation, conversational question answering, machine reading comprehension, and image captioning.


### b) **Abstractive Summarization**

   - Abstractive summarization, on the other hand, aims to generate summaries that are not directly copied from the input text but are generated in a more human-like manner.
   - It involves understanding the input text, interpreting its meaning, and then generating concise and coherent summaries in natural language.
   - Abstractive summarization methods use techniques such as natural language generation (NLG) and may rephrase and restructure sentences to create summaries that are more fluent and coherent.
   - They often require more advanced natural language processing and machine learning techniques, including neural networks and language models.

**Advantages:**

   - Abstractive summaries can provide more human-like, fluent, and coherent summaries that may capture the essence of the text better.
   - They are flexible and can generate summaries even when there are no suitable sentences to extract directly.

**Limitations:**

   - Abstractive summarization is a more challenging task, and the quality of abstractive summaries can vary depending on the complexity of the text and the quality of the summarization model.
   - There is a risk of generating inaccurate or biased summaries, especially if the model misunderstands the input.

In [3]:
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient

endpoint = df_cred["endpoint"][0]
key = df_cred["key"][0]

text_analytics_client = TextAnalyticsClient(
   endpoint=endpoint,
   credential=AzureKeyCredential(key))

document = [
   "At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, "
   "human-centric approach to learning and understanding. As Chief Technology Officer of Azure AI Cognitive "
   "Services, I have been working with a team of amazing scientists and engineers to turn this quest into a "
   "reality. In my role, I enjoy a unique perspective in viewing the relationship among three attributes of "
   "human cognition: monolingual text (X), audio or visual sensory signals, (Y) and multilingual (Z). At the "
   "intersection of all three, there's magic-what we call XYZ-code as illustrated in Figure 1-a joint "
   "representation to create more powerful AI that can speak, hear, see, and understand humans better. "
   "We believe XYZ-code will enable us to fulfill our long-term vision: cross-domain transfer learning, "
   "spanning modalities and languages. The goal is to have pretrained models that can jointly learn "
   "representations to support a broad range of downstream AI tasks, much in the way humans do today. "
   "Over the past five years, we have achieved human performance on benchmarks in conversational speech "
   "recognition, machine translation, conversational question answering, machine reading comprehension, "
   "and image captioning. These five breakthroughs provided us with strong signals toward our more ambitious "
   "aspiration to produce a leap in AI capabilities, achieving multisensory and multilingual learning that "
   "is closer in line with how humans learn and understand. I believe the joint XYZ-code is a foundational "
   "component of this aspiration, if grounded with external knowledge sources in the downstream AI tasks."
]

poller = text_analytics_client.begin_abstract_summary(document)
abstract_summary_results = poller.result()
for result in abstract_summary_results:
    if result.kind == "AbstractiveSummarization":
        print("Summaries abstracted:")
        [print(f"{summary.text}\n") for summary in result.summaries]
    elif result.is_error is True:
        print("...Is an error with code '{}' and message '{}'".format(
           result.error.code, result.error.message
       ))

Summaries abstracted:
Microsoft has been working to advance AI beyond existing techniques by taking a more holistic, human-centric approach to learning and understanding. The Chief Technology Officer of Azure AI Cognitive Services, who enjoys a unique perspective in viewing the relationship among three attributes of human cognition: monolingual text, audio or visual sensory signals, and multilingual, has created XYZ-code, a joint representation to create more powerful AI that can speak, hear, see, and understand humans better. Over the past five years, Microsoft has achieved human performance on benchmarks in conversational speech recognition, machine translation, conversational question answering, machine reading comprehension, and image captioning.



## 2. Named Entity Recognition (NER)
NER identifies entities such as names of people, places, organizations, and other specifics within a given text.

**Example:** From the sentence "The Eiffel Tower is located in Paris.", NER would identify "Eiffel Tower" as a monument and "Paris" as a location.

In [4]:
import os
import typing
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient

endpoint = df_cred["endpoint"][0]
key = df_cred["key"][0]

text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key))

reviews = [
   """I work for Foo Company, and we hired Contoso for our annual founding ceremony. The food
   was amazing and we all can't say enough good words about the quality and the level of service.""",
    
   """We at the Foo Company re-hired Contoso after all of our past successes with the company.
   Though the food was still great, I feel there has been a quality drop since their last time
   catering for us. Is anyone else running into the same problem?""",
    
   """Bar Company is over the moon about the service we received from Contoso, the best sliders ever!!!!"""
]

result = text_analytics_client.recognize_entities(reviews)
result = [review for review in result if not review.is_error]

for idx, review in enumerate(result):
    for entity in review.entities:
        print(f"Entity '{entity.text}' has category '{entity.category}'")

Entity 'Foo Company' has category 'Organization'
Entity 'Contoso' has category 'Organization'
Entity 'annual' has category 'DateTime'
Entity 'founding ceremony' has category 'Event'
Entity 'food' has category 'Product'
Entity 'Foo Company' has category 'Organization'
Entity 'Contoso' has category 'Person'
Entity 'food' has category 'Product'
Entity 'catering' has category 'Skill'
Entity 'Bar Company' has category 'Organization'
Entity 'Contoso' has category 'Organization'
Entity 'sliders' has category 'Product'


## 3. Personally Identifiable Information (PII) Detection
In today's digital age, ensuring data privacy and security is paramount. This is particularly true for Personally Identifiable Information (PII) - any information that can be used to identify an individual. This might include names, addresses, social security numbers, email addresses, and more. Given the importance of safeguarding PII, how can you swiftly detect and handle such data in your applications?

Azure AI Language Service provides the recognize_pii_entities feature, which is designed precisely for this purpose.

#### What is recognize_pii_entities?
This feature of the Azure AI Language Service is adept at detecting PII within a given text. For instance, if you supply a document that contains text like "John's email address is john.doe@example.com", the service would identify "John" and "john.doe@example.com" as PII. It does more than just identifying; it categorizes the type of PII it found. In this example, it would categorize "John" as a person's name and "john.doe@example.com" as an email address.

Here's a hypothetical example:

In [5]:
documents = [
   """Parker Doe has repaid all of their loans as of 2020-04-25.
   Their SSN is 859-98-0987. To contact them, use their phone number
   555-555-5555. They are originally from Brazil and have Brazilian CPF number 998.214.865-68"""
]

result = text_analytics_client.recognize_pii_entities(documents)
docs = [doc for doc in result if not doc.is_error]

print(
   "Let's compare the original document with the documents after redaction. "
   "I also want to comb through all of the entities that got redacted"
)
for idx, doc in enumerate(docs):
    print(f"Document text: {documents[idx]}")
    print(f"Redacted document text: {doc.redacted_text}")
    for entity in doc.entities:
        print("...Entity '{}' with category '{}' got redacted".format(
           entity.text, entity.category
       ))

Let's compare the original document with the documents after redaction. I also want to comb through all of the entities that got redacted
Document text: Parker Doe has repaid all of their loans as of 2020-04-25.
   Their SSN is 859-98-0987. To contact them, use their phone number
   555-555-5555. They are originally from Brazil and have Brazilian CPF number 998.214.865-68
Redacted document text: ********** has repaid all of their loans as of **********.
   Their SSN is ***********. To contact them, use their phone number
   ************. They are originally from Brazil and have Brazilian CPF number 998.214.865-68
...Entity 'Parker Doe' with category 'Person' got redacted
...Entity '2020-04-25' with category 'DateTime' got redacted
...Entity '859-98-0987' with category 'USSocialSecurityNumber' got redacted
...Entity '555-555-5555' with category 'PhoneNumber' got redacted


## 4. Key Phrase Extraction
It extracts main ideas or themes from a provided text.

**Example:** For the statement "Pollution is adversely affecting marine life.", the extracted key phrase might be "adversely affecting marine life".

In [6]:
articles = [
   """
   Washington, D.C. Autumn in DC is a uniquely beautiful season. The leaves fall from the trees
   in a city chock-full of forests, leaving yellow leaves on the ground and a clearer view of the
   blue sky above...
   """,
   """
   Redmond, WA. In the past few days, Microsoft has decided to further postpone the start date of
   its United States workers, due to the pandemic that rages with no end in sight...
   """,
   """
   Redmond, WA. Employees at Microsoft can be excited about the new coffee shop that will open on campus
   once workers no longer have to work remotely...
   """
]

result = text_analytics_client.extract_key_phrases(articles)
for idx, doc in enumerate(result):
    if not doc.is_error:
        print("Key phrases in article #{}: {}".format(
           idx + 1,
           ", ".join(doc.key_phrases)
       ))

Key phrases in article #1: D.C. Autumn, beautiful season, clearer view, blue sky, yellow leaves, Washington, DC, trees, city, forests, ground
Key phrases in article #2: United States workers, start date, Redmond, WA, past, days, Microsoft, pandemic, end, sight
Key phrases in article #3: new coffee shop, Redmond, WA, Employees, Microsoft, campus, workers


## 5. Sentiment Analysis
This determines the sentiment or tone behind a text, classifying it as positive, negative, neutral, or mixed.

**Example:** "I love this product!" would be classified as positive.

In [7]:
documents = [
   """I had the best day of my life. I decided to go sky-diving and it made me appreciate my whole life so much more.
   I developed a deep-connection with my instructor as well, and I feel as if I've made a life-long friend in her.""",
   """This was a waste of my time. All of the views on this drop are extremely boring, all I saw was grass. 0/10 would
   not recommend to any divers, even first timers.""",
   """This was pretty good! The sights were ok, and I had fun with my instructors! Can't complain too much about my experience""",
   """I only have one word for my experience: WOW!!! I can't believe I have had such a wonderful skydiving company right
   in my backyard this whole time! I will definitely be a repeat customer, and I want to take my grandmother skydiving too,
   I know she'll love it!"""
]


result = text_analytics_client.analyze_sentiment(documents, show_opinion_mining=True)
docs = [doc for doc in result if not doc.is_error]

for idx, doc in enumerate(docs):
    print(f"Overall sentiment: {doc.sentiment}")
    print(f"Document text: {documents[idx]}")
    print("="*110)

Overall sentiment: positive
Document text: I had the best day of my life. I decided to go sky-diving and it made me appreciate my whole life so much more.
   I developed a deep-connection with my instructor as well, and I feel as if I've made a life-long friend in her.
Overall sentiment: negative
Document text: This was a waste of my time. All of the views on this drop are extremely boring, all I saw was grass. 0/10 would
   not recommend to any divers, even first timers.
Overall sentiment: positive
Document text: This was pretty good! The sights were ok, and I had fun with my instructors! Can't complain too much about my experience
Overall sentiment: positive
Document text: I only have one word for my experience: WOW!!! I can't believe I have had such a wonderful skydiving company right
   in my backyard this whole time! I will definitely be a repeat customer, and I want to take my grandmother skydiving too,
   I know she'll love it!


## 6. Language Detection
Detects the language of a given text.

**Example:** "Hola, ¿cómo estás?" would be identified as Spanish.

In [8]:
documents = ["I hated the movie. It was so slow!", 
             "J'ai détesté le film. C'était si lent !", 
             "ما له فلم څخه کرکه وکړه. دا دومره ورو وه!"
             , "مجھے فلم سے نفرت تھی۔ یہ بہت سست تھا!", 
             "The movie made it into my top ten favorites. What a great movie!"]

languages = text_analytics_client.detect_language(documents)

language = []
confidence = []

for res in languages:
    language.append(res["primary_language"]["name"])
    confidence.append(res["primary_language"]["confidence_score"])

df = pd.DataFrame({"Message": documents, "Language": language, "Confidence": confidence})
df.head()

Unnamed: 0,Message,Language,Confidence
0,I hated the movie. It was so slow!,English,0.98
1,J'ai détesté le film. C'était si lent !,French,1.0
2,ما له فلم څخه کرکه وکړه. دا دومره ورو وه!,Pashto,1.0
3,مجھے فلم سے نفرت تھی۔ یہ بہت سست تھا!,Urdu,1.0
4,The movie made it into my top ten favorites. W...,English,0.99


## Multiple Analysis at Once

**begin_analyze_actions** performs multiple analyses over one set of documents in a single request. Currently it is supported using any combination of the following Language APIs in a single request:

- Entities Recognition
- PII Entities Recognition
- Linked Entity Recognition
- Key Phrase Extraction
- Sentiment Analysis
- Extractive Summarization (API version 2022-10-01-preview and newer)
- Abstractive Summarization (API version 2022-10-01-preview and newer)

In [10]:
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import (
    TextAnalyticsClient,
    RecognizeEntitiesAction,
    RecognizeLinkedEntitiesAction,
    RecognizePiiEntitiesAction,
    ExtractKeyPhrasesAction,
    AnalyzeSentimentAction,
)

endpoint = df_cred["endpoint"][0]
key = df_cred["key"][0]

text_analytics_client = TextAnalyticsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key),
)

documents = [
    'We went to Contoso Steakhouse located at midtown NYC last week for a dinner party, and we adore the spot! '
    'They provide marvelous food and they have a great menu. The chief cook happens to be the owner (I think his name is John Doe) '
    'and he is super nice, coming out of the kitchen and greeted us all.'
    ,

    'We enjoyed very much dining in the place! '
    'The Sirloin steak I ordered was tender and juicy, and the place was impeccably clean. You can even pre-order from their '
    'online menu at www.contososteakhouse.com, call 312-555-0176 or send email to order@contososteakhouse.com! '
    'The only complaint I have is the food didn\'t come fast enough. Overall I highly recommend it!'
]

poller = text_analytics_client.begin_analyze_actions(
    documents,
    display_name="Sample Text Analysis",
    actions=[
        RecognizeEntitiesAction(),
        RecognizePiiEntitiesAction(),
        ExtractKeyPhrasesAction(),
        RecognizeLinkedEntitiesAction(),
        AnalyzeSentimentAction(),
    ],
)

document_results = poller.result()
for doc, action_results in zip(documents, document_results):
    print(f"\nDocument text: {doc}")
    for result in action_results:
        if result.kind == "EntityRecognition":
            print("...Results of Recognize Entities Action:")
            for entity in result.entities:
                print(f"......Entity: {entity.text}")
                print(f".........Category: {entity.category}")
                print(f".........Confidence Score: {entity.confidence_score}")
                print(f".........Offset: {entity.offset}")

        elif result.kind == "PiiEntityRecognition":
            print("...Results of Recognize PII Entities action:")
            for pii_entity in result.entities:
                print(f"......Entity: {pii_entity.text}")
                print(f".........Category: {pii_entity.category}")
                print(f".........Confidence Score: {pii_entity.confidence_score}")

        elif result.kind == "KeyPhraseExtraction":
            print("...Results of Extract Key Phrases action:")
            print(f"......Key Phrases: {result.key_phrases}")

        elif result.kind == "EntityLinking":
            print("...Results of Recognize Linked Entities action:")
            for linked_entity in result.entities:
                print(f"......Entity name: {linked_entity.name}")
                print(f".........Data source: {linked_entity.data_source}")
                print(f".........Data source language: {linked_entity.language}")
                print(
                    f".........Data source entity ID: {linked_entity.data_source_entity_id}"
                )
                print(f".........Data source URL: {linked_entity.url}")
                print(".........Document matches:")
                for match in linked_entity.matches:
                    print(f"............Match text: {match.text}")
                    print(f"............Confidence Score: {match.confidence_score}")
                    print(f"............Offset: {match.offset}")
                    print(f"............Length: {match.length}")

        elif result.kind == "SentimentAnalysis":
            print("...Results of Analyze Sentiment action:")
            print(f"......Overall sentiment: {result.sentiment}")
            print(
                f"......Scores: positive={result.confidence_scores.positive}; \
                neutral={result.confidence_scores.neutral}; \
                negative={result.confidence_scores.negative} \n"
            )

        elif result.is_error is True:
            print(
                f"...Is an error with code '{result.error.code}' and message '{result.error.message}'"
            )

    print("------------------------------------------")


Document text: We went to Contoso Steakhouse located at midtown NYC last week for a dinner party, and we adore the spot! They provide marvelous food and they have a great menu. The chief cook happens to be the owner (I think his name is John Doe) and he is super nice, coming out of the kitchen and greeted us all.
...Results of Recognize Entities Action:
......Entity: Contoso Steakhouse
.........Category: Location
.........Confidence Score: 0.68
.........Offset: 11
......Entity: midtown
.........Category: Location
.........Confidence Score: 0.57
.........Offset: 41
......Entity: NYC
.........Category: Location
.........Confidence Score: 0.7
.........Offset: 49
......Entity: last week
.........Category: DateTime
.........Confidence Score: 0.8
.........Offset: 53
......Entity: dinner party
.........Category: Event
.........Confidence Score: 0.93
.........Offset: 69
......Entity: chief cook
.........Category: PersonType
.........Confidence Score: 0.86
.........Offset: 166
......Entity: ow