## Azure AI Language

TODO list 
deployment, introduction to AI Langauage
 some text for each solution but keep the code in one cell if possibnle

## Custom text classification


## Conversational language understanding 


## Entity linking

## 🕵️‍♂️ Language detection

In [None]:
print(
        "In this sample we own a hotel with customers from all around the globe. We want to eventually "
        "translate these reviews into English so our manager can read them. However, we first need to know which language "
        "they are in for more accurate translation. This is the step we will be covering in this sample\n"
    )
    # [START detect_language]
    import os
    from azure.core.credentials import AzureKeyCredential
    from azure.ai.textanalytics import TextAnalyticsClient

    endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
    key = os.environ["AZURE_LANGUAGE_KEY"]

    text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
    documents = [
        """
        The concierge Paulette was extremely helpful. Sadly when we arrived the elevator was broken, but with Paulette's help we barely noticed this inconvenience.
        She arranged for our baggage to be brought up to our room with no extra charge and gave us a free meal to refurbish all of the calories we lost from
        walking up the stairs :). Can't say enough good things about my experience!
        """,
        """
        最近由于工作压力太大，我们决定去富酒店度假。那儿的温泉实在太舒服了，我跟我丈夫都完全恢复了工作前的青春精神！加油！
        """
    ]

    result = text_analytics_client.detect_language(documents)
    reviewed_docs = [doc for doc in result if not doc.is_error]

    print("Let's see what language each review is in!")

    for idx, doc in enumerate(reviewed_docs):
        print("Review #{} is in '{}', which has ISO639-1 name '{}'\n".format(
            idx, doc.primary_language.name, doc.primary_language.iso6391_name
        ))
    # [END detect_language]
    print(
        "When actually storing the reviews, we want to map the review to their ISO639-1 name "
        "so everything is more standardized"
    )

    review_to_language = {}
    for idx, doc in enumerate(reviewed_docs):
        review_to_language[documents[idx]] = doc.primary_language.iso6391_name

<h2 style="font-family: 'Comic Sans MS'">
    🗝️ Key phrase extraction
</h2>

## 🔍 Named Entity Recognition (NER)

## 🧩 Orchestration workflow


<h2 style="font-family: 'Comic Sans MS'">
    🆔 Personally Identifiable Information (PII)
</h2>

<h2 style="font-family: 'Comic Sans MS'">
    ❓ Custom question answering
</h2>

<h2 style="font-family: 'Comic Sans MS'">
    📊 Sentiment analysis and opinion mining
</h2>

In [None]:
import os
import typing
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient

endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
key = os.environ["AZURE_LANGUAGE_KEY"]

text_analytics_client = TextAnalyticsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key)
)

print("In this sample we will be a hotel owner going through reviews of their hotel to find complaints.")

print(
    "I first found a handful of reviews for my hotel. Let's see what we have to improve."
)

documents = [
    """
    The food and service were unacceptable, but the concierge were nice.
    After talking to them about the quality of the food and the process to get room service they refunded
    the money we spent at the restaurant and gave us a voucher for near by restaurants.
    """,
    """
    The rooms were beautiful. The AC was good and quiet, which was key for us as outside it was 100F and our baby
    was getting uncomfortable because of the heat. The breakfast was good too with good options and good servicing times.
    The thing we didn't like was that the toilet in our bathroom was smelly. It could have been that the toilet was broken before we arrived.
    Either way it was very uncomfortable. Once we notified the staff, they came and cleaned it and left candles.
    """,
    """
    Nice rooms! I had a great unobstructed view of the Microsoft campus but bathrooms were old and the toilet was dirty when we arrived.
    It was close to bus stops and groceries stores. If you want to be close to campus I will recommend it, otherwise, might be better to stay in a cleaner one
    """
]

result = text_analytics_client.analyze_sentiment(documents, show_opinion_mining=True)
doc_result = [doc for doc in result if not doc.is_error]

print("\nLet's first see the general sentiment of each of these reviews")
positive_reviews = [doc for doc in doc_result if doc.sentiment == "positive"]
mixed_reviews = [doc for doc in doc_result if doc.sentiment == "mixed"]
negative_reviews = [doc for doc in doc_result if doc.sentiment == "negative"]
print("...We have {} positive reviews, {} mixed reviews, and {} negative reviews. ".format(
    len(positive_reviews), len(mixed_reviews), len(negative_reviews)
))
print(
    "\nSince these reviews seem so mixed, and since I'm interested in finding exactly what it is about my hotel that should be improved, "
    "let's find the complaints users have about individual aspects of this hotel"
)

print(
    "\nIn order to do that, I'm going to extract targets of a negative sentiment. "
    "I'm going to map each of these targets to the mined opinion object we get back to aggregate the reviews by target. "
)
target_to_complaints: typing.Dict[str, typing.Any] = {}

for document in doc_result:
    for sentence in document.sentences:
        if sentence.mined_opinions:
            for mined_opinion in sentence.mined_opinions:
                target = mined_opinion.target
                if target.sentiment == 'negative':
                    target_to_complaints.setdefault(target.text, [])
                    target_to_complaints[target.text].append(mined_opinion)

print("\nLet's now go through the aspects of our hotel people have complained about and see what users have specifically said")

for target_name, complaints in target_to_complaints.items():
    print("Users have made {} complaint(s) about '{}', specifically saying that it's '{}'".format(
        len(complaints),
        target_name,
        "', '".join(
            [assessment.text for complaint in complaints for assessment in complaint.assessments]
        )
    ))


print(
    "\n\nLooking at the breakdown, I can see what aspects of my hotel need improvement, and based off of both the number and "
    "content of the complaints users have made about my toilets, I need to get that fixed ASAP."

In [None]:
print(
        "In this sample we will be combing through reviews customers have left about their"
        "experience using our skydiving company, Contoso."
    )
    print(
        "We start out with a list of reviews. Let us extract the reviews we are sure are "
        "positive, so we can display them on our website and get even more customers!"
    )

    # [START analyze_sentiment]
    import os
    from azure.core.credentials import AzureKeyCredential
    from azure.ai.textanalytics import TextAnalyticsClient

    endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
    key = os.environ["AZURE_LANGUAGE_KEY"]

    text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key))

    documents = [
        """I had the best day of my life. I decided to go sky-diving and it made me appreciate my whole life so much more.
        I developed a deep-connection with my instructor as well, and I feel as if I've made a life-long friend in her.""",
        """This was a waste of my time. All of the views on this drop are extremely boring, all I saw was grass. 0/10 would
        not recommend to any divers, even first timers.""",
        """This was pretty good! The sights were ok, and I had fun with my instructors! Can't complain too much about my experience""",
        """I only have one word for my experience: WOW!!! I can't believe I have had such a wonderful skydiving company right
        in my backyard this whole time! I will definitely be a repeat customer, and I want to take my grandmother skydiving too,
        I know she'll love it!"""
    ]


    result = text_analytics_client.analyze_sentiment(documents, show_opinion_mining=True)
    docs = [doc for doc in result if not doc.is_error]

    print("Let's visualize the sentiment of each of these documents")
    for idx, doc in enumerate(docs):
        print(f"Document text: {documents[idx]}")
        print(f"Overall sentiment: {doc.sentiment}")
    # [END analyze_sentiment]

    print("Now, let us extract all of the positive reviews")
    positive_reviews = [doc for doc in docs if doc.sentiment == 'positive']

    print("We want to be very confident that our reviews are positive since we'll be posting them on our website.")
    print("We're going to confirm our chosen reviews are positive using two different tests")

    print(
        "First, we are going to check how confident the sentiment analysis model is that a document is positive. "
        "Let's go with a 90% confidence."
    )
    positive_reviews = [
        review for review in positive_reviews
        if review.confidence_scores.positive >= 0.9
    ]

    print(
        "Finally, we also want to make sure every sentence is positive so we only showcase our best selves!"
    )
    positive_reviews_final = []
    for idx, review in enumerate(positive_reviews):
        print(f"Looking at positive review #{idx + 1}")
        any_sentence_not_positive = False
        for sentence in review.sentences:
            print("...Sentence '{}' has sentiment '{}' with confidence scores '{}'".format(
                sentence.text,
                sentence.sentiment,
                sentence.confidence_scores
                )
            )
            if sentence.sentiment != 'positive':
                any_sentence_not_positive = True
        if not any_sentence_not_positive:
            positive_reviews_final.append(review)

    print("We now have the final list of positive reviews we are going to display on our website!")

<h2 style="font-family: 'Comic Sans MS'">
    💊 Text Analytics for health
</h2>

<h2 style="font-family: 'Comic Sans MS'">
    ✍️ Summarization
</h2>

In [1]:
!pip install azure-ai-textanalytics
!pip install azure-core
!pip install azure-identity

Collecting azure-ai-textanalytics
  Downloading azure_ai_textanalytics-5.3.0-py3-none-any.whl.metadata (82 kB)
Collecting azure-core<2.0.0,>=1.24.0 (from azure-ai-textanalytics)
  Downloading azure_core-1.32.0-py3-none-any.whl.metadata (39 kB)
Collecting azure-common~=1.1 (from azure-ai-textanalytics)
  Downloading azure_common-1.1.28-py2.py3-none-any.whl.metadata (5.0 kB)
Collecting isodate<1.0.0,>=0.6.1 (from azure-ai-textanalytics)
  Downloading isodate-0.7.2-py3-none-any.whl.metadata (11 kB)
Collecting typing-extensions>=4.0.1 (from azure-ai-textanalytics)
  Downloading typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting requests>=2.21.0 (from azure-core<2.0.0,>=1.24.0->azure-ai-textanalytics)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting charset-normalizer<4,>=2 (from requests>=2.21.0->azure-core<2.0.0,>=1.24.0->azure-ai-textanalytics)
  Downloading charset_normalizer-3.4.1-cp312-cp312-macosx_10_13_universal2.whl.metadata (35 kB)
C

In [None]:
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient

endpoint = os.environ["AZURE_LANGUAGE_ENDPOINT"]
key = os.environ["AZURE_LANGUAGE_KEY"]

text_analytics_client = TextAnalyticsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key),
)

document = [
    "At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, "
    "human-centric approach to learning and understanding. As Chief Technology Officer of Azure AI Cognitive "
    "Services, I have been working with a team of amazing scientists and engineers to turn this quest into a "
    "reality. In my role, I enjoy a unique perspective in viewing the relationship among three attributes of "
    "human cognition: monolingual text (X), audio or visual sensory signals, (Y) and multilingual (Z). At the "
    "intersection of all three, there's magic-what we call XYZ-code as illustrated in Figure 1-a joint "
    "representation to create more powerful AI that can speak, hear, see, and understand humans better. "
    "We believe XYZ-code will enable us to fulfill our long-term vision: cross-domain transfer learning, "
    "spanning modalities and languages. The goal is to have pretrained models that can jointly learn "
    "representations to support a broad range of downstream AI tasks, much in the way humans do today. "
    "Over the past five years, we have achieved human performance on benchmarks in conversational speech "
    "recognition, machine translation, conversational question answering, machine reading comprehension, "
    "and image captioning. These five breakthroughs provided us with strong signals toward our more ambitious "
    "aspiration to produce a leap in AI capabilities, achieving multisensory and multilingual learning that "
    "is closer in line with how humans learn and understand. I believe the joint XYZ-code is a foundational "
    "component of this aspiration, if grounded with external knowledge sources in the downstream AI tasks."
]

poller = text_analytics_client.begin_abstract_summary(document)
abstract_summary_results = poller.result()
for result in abstract_summary_results:
    if result.kind == "AbstractiveSummarization":
        print("Summaries abstracted:")
        [print(f"{summary.text}\n") for summary in result.summaries]
    elif result.is_error is True:
        print("...Is an error with code '{}' and message '{}'".format(
            result.error.code, result.error.message
        ))

Summaries abstracted:
The Chief Technology Officer of Azure AI Cognitive Services discusses Microsoft's commitment to advancing AI by integrating monolingual text, audio or visual signals, and multilingual capabilities, termed as the XYZ-code. This approach aims to create AI that can better understand humans across different domains and languages. Through their efforts, Microsoft has achieved human-level performance on key benchmarks in speech recognition, machine translation, conversational question answering, reading comprehension, and image captioning. The ultimate goal is to develop pretrained models that can learn from multiple modalities and languages, akin to human learning, and incorporate external knowledge sources for downstream AI tasks. This progress is seen as a stepping stone towards a significant leap in AI capabilities, with a focus on multisensory and multilingual learning. The XYZ-code is central to this vision, promising a more holistic and human-centric AI.

