<h1 style="color:orange; font-size:48px; text-align:center">Topic 7: Build custom text analytics solutions</h1>

# Introduction

Natural Language Processing (NLP) empowers software to decipher human language. Through Azure AI Language service, developers can not only classify text but also identify sentiments, languages, and user-defined categories.

In this guide, you'll:

- Understand different classification projects.
- Learn to develop custom text classification projects.
- Discover ways to label, train, and deploy your model.

## 1. Understanding Types of Classification Projects

Custom text classification, using the Azure AI Language service, involves assigning developer-defined labels to text. For instance, a video game's description can be labeled as "Adventure", "Action", etc.

There are two primary custom text classification projects:

- **Single label classification:** One class is assigned to a file. E.g., a video game can either be "Adventure" or "Strategy", but not both.

- **Multiple label classification:** A file can have more than one class. So, a video game might be both "Adventure" and "Strategy".

It's crucial to decide the type of classification project during its creation. Moreover, understanding the differences between single vs. multiple label projects is essential in labeling data, evaluating the model's accuracy, and deciding on the API payload for tasks.

## 2. Labeling Data for Classification

For single label projects, one class is assigned to each file. However, for multiple label projects, a file can have more than one label. Properly labeling the data, especially in multiple label projects, will determine the success of your model. It's imperative to maintain a quality dataset that is clear, varied, and representative of possible inputs for effective learning.

## 3. Evaluating and Improving Your Model

In the real world, inaccuracies in classification can occur, such as false positives and false negatives. Azure AI Language provides metrics:

- **Recall:** This checks out of all the actual labels, how many were identified correctly.

- **Precision:** It calculates how many predicted labels are correct.

- **F1 Score:** Combines recall and precision to provide a single score.

Depending on the errors, you can refine your model, ensuring better performance with each iteration.

## 4. Building Text Classification Projects

Using Azure AI Language, developers can create custom text classification projects. The steps involved are:

Define Labels: Determine possible labels for your data.

- **Tag Data:** Properly label existing data.

- **Train Model:** Use labeled data to train your model.

- **View Model:** After training, assess the model's results.

- **Improve Model:** Identify errors and enhance model performance.

- **Deploy Model:** Make the trained model accessible via API.

- **Classify Text:** Use the deployed model to classify text.

Additionally, data can be categorized into Training and Testing datasets:

- **Training Dataset:** Used to train the model.

- **Testing Dataset:** Used to verify the model's performance post-training.

Azure provides options for Automatic or Manual split for these datasets, catering to different project needs.

# Guide to Create a Text Classifier with Azure AI Language

### Overview
This guide will help you understand how to create a text classifier using Azure AI Language. The Language service offers several NLP capabilities, and among these, custom text classification is a powerful tool. Let's walk through each step in detail.

#### Steps
1. **Set Up Azure AI Language Service Resource**

Go to the Azure portal and sign in with your Microsoft account.

Search for Azure AI services and select to create a Language Service resource.

Choose the Custom text classification & extraction feature.

Configure your new resource with:

- Subscription: Your Azure subscription

- Resource group: A unique name

- Region: Any available

- Name: A unique resource name

- Pricing tier: Standard S

- Storage account: New storage account

- Storage account name: A unique name

- Storage account type: Standard LRS

- Responsible AI notice: Checked

2. **Retrieve Language Resource Key and Endpoint**

Navigate to the resource group in the Azure portal and choose the Azure AI Language resource.

Go to Keys and Endpoint and copy one of the keys and the endpoint. Save these for later use.

3. **Upload Sample Articles**


Download sample articles from the provided GitHub repository.

Go to the Azure portal and select your storage account.

Choose Containers and create a new one named "articles".

Upload the sample articles to this container.

4. **Create a Custom Text Classification Project**

Sign into the Language Studio.

Ensure your Azure subscription and resource are selected.

Under the Classify text tab, choose Custom text classification.

Create a new project and configure the given options.

5. **Label Your Data**

Once your project is created, start labeling (or tagging) your data.

Use the provided classes: Classifieds, Sports, News, and Entertainment.

Assign each article the appropriate class and dataset.

6. **Train Your Model**

Navigate to Training jobs.

Start a training job and name your model (e.g., ClassifyArticles).

Start the training process.

7. **Evaluate Your Model**

Review model performance metrics.

Check the Test set details tab for any inconsistencies between the model's predictions and the actual labels.

8. Deploy Your Model

Select Deploying model from the left panel.

Add a deployment and provide the necessary details.

9. Test Your Model using Azure Cloud Shell

One way is to go to option "Test Deployment* and enter the text and and test your model.

Other way is to use Python SDK to test the model through python code. For the the code is given below.

In [3]:
# [START single_label_classify]
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient

# Reading Credentials from Excel
import pandas as pd
df_cred = pd.read_excel("credentials.xlsx")


endpoint = df_cred["endpoint"][0]
key = df_cred["key"][0]

project_name = "SentimentAnalysis"
deployment_name = "sentdep"

document = ["""The first newspaper that originated in thee boundaries, of what now is Pakistan is the Lahore Chronicle which started appearing in 1849. 
The paper was started, by Syed Muhammad Azim, father of the historian from Punjab, Syed Muhammad Latif, in 1849"""]

text_analytics_client = TextAnalyticsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key),
)


poller = text_analytics_client.begin_single_label_classify(
    document,
    project_name=project_name,
    deployment_name=deployment_name
)

document_results = poller.result()
for doc, classification_result in zip(document, document_results):
    if classification_result.kind == "CustomDocumentClassification":
        classification = classification_result.classifications[0]
        print("The document text '{}' was classified as '{}' with confidence score {}.".format(
            doc, classification.category, classification.confidence_score)
        )
    elif classification_result.is_error is True:
        print("Document text '{}' has an error with code '{}' and message '{}'".format(
            doc, classification_result.error.code, classification_result.error.message
        ))
# [END single_label_classify]

The document text 'The first newspaper that originated in thee boundaries, of what now is Pakistan is the Lahore Chronicle which started appearing in 1849. 
The paper was started, by Syed Muhammad Azim, father of the historian from Punjab, Syed Muhammad Latif, in 1849' was classified as 'Entertainment' with confidence score 0.29.


10. **Clean Up**

Delete your project from the Projects page in Language Studio.

Remove the Azure AI Language service and the associated storage account in the Azure portal.

### Conclusion

By following this guide, you can harness the power of Azure's AI Language service to classify text. Whether you're building a news categorization system, sentiment analysis tool, or any other application that requires understanding and categorizing text, Azure provides a robust solution.