# Named Entity Recognition (NER)

Entity categories and entity types:
https://learn.microsoft.com/en-us/azure/ai-services/language-service/named-entity-recognition/concepts/named-entity-categories?tabs=ga-api?wt.mc_id=MVP_322781

## Install Library

In [None]:
%pip install azure-ai-textanalytics

## Load Azure Configurations

In [2]:
import os

# Load Azure configurations from environment variables
# Ensure that AZURE_AI_LANGUAGE_KEY and AZURE_AI_LANGUAGE_ENDPOINT are set in your environment
language_key = os.environ.get('AZURE_AI_LANGUAGE_KEY')
language_endpoint = os.environ.get('AZURE_AI_LANGUAGE_ENDPOINT')

## Create a Text Analytics client

In [3]:
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

# Authenticate the client using Azure Key and Endpoint
def authenticate_client():
    """
    Authenticates the Azure Text Analytics client using the provided key and endpoint.

    Returns:
        TextAnalyticsClient: An authenticated client for Azure Text Analytics.
    """
    ta_credential = AzureKeyCredential(language_key)
    text_analytics_client = TextAnalyticsClient(
        endpoint=language_endpoint,
        credential=ta_credential
    )
    return text_analytics_client

# Initialize the client
client = authenticate_client()

## Recognize Entities Function

In [4]:
def entity_recognition(client, documents):
    """
    Recognizes named entities in the provided documents using the Azure Text Analytics client.

    Args:
        client (TextAnalyticsClient): An authenticated Azure Text Analytics client.
        documents (list of str): A list of text documents to analyze for named entities.

    Returns:
        None: Prints the recognized entities and their details for each document.
    """
    # Call the Azure Text Analytics API to recognize entities in the documents
    result = client.recognize_entities(documents=documents)

    print("Named Entities:\n")

    # Iterate through the results for each document
    for idx, doc_result in enumerate(result):  # Enumerate to get the document index
        print(f"Document {idx + 1}:\n")  # Print the document number (1-based index)

        # Check if the result for the document is valid
        if not doc_result.is_error:
            # Iterate through the recognized entities in the document
            for entity in doc_result.entities:
                # Print details of each entity
                print(
                    "\tText: \t", entity.text,
                    "\tCategory: \t", entity.category,
                    "\tSubCategory: \t", entity.subcategory,
                    "\n\tConfidence Score: \t", round(entity.confidence_score, 2),
                    "\tLength: \t", entity.length,
                    "\tOffset: \t", entity.offset,
                    "\n"
                )
        else:
            # Print an error message if the document result contains an error
            print(f"\tError in document with ID {doc_result.id}: {doc_result.error}")
        
        # Add a blank line between documents for better readability
        print("\n")

In [5]:
documents = [
    """John Doe, a renowned figure in the field of cybersecurity, recently attended the International Cybersecurity 
    Conference held at the Grand Hyatt in Makati. As a Chief Information Security Officer (CISO) at SecureTech, 
    a leading company specializing in advanced security solutions, John demonstrated the latest product from SecureTech, 
    the CyberShield 3000, which is designed to protect against sophisticated cyber threats. His presentation included a live 
    demonstration of the product's capabilities, showcasing its ability to detect and neutralize threats in real-time. 
    The conference also featured discussions on emerging skills required for cybersecurity professionals, 
    emphasizing the need for continuous learning and adaptation.
    """,
    """Jane Smith, with extensive experience in urban planning, recently moved to a new address at 1234 Elm Street, Springfield. 
    As a Senior Urban Planner at GreenCity Solutions, a company dedicated to sustainable urban development, Jane's work involves 
    coordinating large-scale projects that aim to improve city infrastructure and promote eco-friendly practices. She frequently 
    collaborates with local government officials and attends various events such as the Annual Urban Development Summit. Jane's 
    expertise in project management and her proficiency in GIS mapping are highly valued in her field. She can be reached at her 
    phone number (555-1234) or via email (jane.smith@greencity.com) for consultations and project inquiries.
    """
]

entity_recognition(client, documents)

Named Entities:

Document 1:

	Text: 	 John Doe 	Category: 	 Person 	SubCategory: 	 None 
	Confidence Score: 	 1.0 	Length: 	 8 	Offset: 	 0 

	Text: 	 recently 	Category: 	 DateTime 	SubCategory: 	 None 
	Confidence Score: 	 0.98 	Length: 	 8 	Offset: 	 59 

	Text: 	 International Cybersecurity 	Category: 	 Event 	SubCategory: 	 None 
	Confidence Score: 	 0.91 	Length: 	 27 	Offset: 	 81 

	Text: 	 Conference 	Category: 	 Event 	SubCategory: 	 None 
	Confidence Score: 	 0.89 	Length: 	 10 	Offset: 	 114 

	Text: 	 Grand Hyatt 	Category: 	 Location 	SubCategory: 	 None 
	Confidence Score: 	 0.73 	Length: 	 11 	Offset: 	 137 

	Text: 	 Makati 	Category: 	 Location 	SubCategory: 	 City 
	Confidence Score: 	 1.0 	Length: 	 6 	Offset: 	 152 

	Text: 	 Chief Information Security Officer 	Category: 	 PersonType 	SubCategory: 	 None 
	Confidence Score: 	 0.93 	Length: 	 34 	Offset: 	 165 

	Text: 	 CISO 	Category: 	 PersonType 	SubCategory: 	 None 
	Confidence Score: 	 0.76 	Length: 	 4 	Offs