# OCI Language Service Examples

### What this file does:

Demonstrates OCI Language service capabilities for natural language processing (NLP) tasks using small language models. Covers sentiment analysis (aspects and sentences), key phrase extraction, named entity recognition (NER), text classification, PII identification and masking, language detection, and text translation. This notebook provides interactive examples with sample texts and builds on the oci_language.py file for structured result display.

**Documentation to reference:**

- OCI Language: https://docs.oracle.com/en-us/iaas/language/using/home.htm
- Sentiment Analysis: https://docs.oracle.com/en-us/iaas/language/using/sentiment.htm
- Key Phrases: https://docs.oracle.com/en-us/iaas/language/using/keyphrases.htm
- Named Entities: https://docs.oracle.com/en-us/iaas/language/using/ner.htm
- Text Classification: https://docs.oracle.com/en-us/iaas/language/using/text-classification.htm
- PII Detection: https://docs.oracle.com/en-us/iaas/language/using/pii.htm
- Language Detection: https://docs.oracle.com/en-us/iaas/language/using/lang-detect.htm
- Translation: https://docs.oracle.com/en-us/iaas/language/using/translate-text.htm
- OCI Python SDK: https://github.com/oracle/oci-python-sdk/tree/master/src/oci/ai_language

**Relevant slack channels:**

- #oci_ai_lang_service_users: *for questions on OCI Language service*
- #igiu-innovation-lab: *general discussions on your project*
- #igiu-ai-learning: *help with sandbox environment or running this code*

**Env setup:**

- sandbox.yaml: Contains OCI config, compartment, and other details.
- .env: Load environment variables (e.g., API keys if needed).
- configure cwd for jupyter match your workspace python code:
  - vscode menu -> Settings > Extensions > Jupyter > Notebook File Root
  - change from `${fileDirname}` to `${workspaceFolder}`

**How to run in notebook:**

- Make sure your runtime environment has all dependencies and access to required config files.
- Run the notebook cells in order.

If you have errors running sample code, reach out for help in #igiu-ai-learning.

### Overview of Capabilities

OCI Language service provides efficient, cost-effective NLP using small models:

1. **Sentiment Analysis:** Detect positive/negative/neutral sentiment on aspects or full sentences.
2. **Key Phrase Extraction:** Identify important phrases in text.
3. **Named Entity Recognition (NER):** Extract entities like persons, organizations, locations.
4. **Text Classification:** Categorize text into predefined labels.
5. **PII Identification & Masking:** Detect and mask sensitive info (e.g., phone numbers, addresses).
6. **Language Detection:** Identify dominant language in text.
7. **Translation:** Translate text between supported languages (e.g., English to Dutch).

Supported languages: Primarily English (en), with multilingual support for some features.

### Step 1: Load Config and Initialize Client

Set up the OCI Language client and prepare sample texts for analysis. This initializes the connection and defines documents for batch processing.

In [None]:
import oci, os, json 
from dotenv import load_dotenv
from envyaml import EnvYAML

#####
#make sure your sandbox.yaml file is setup for your environment. You might have to specify the full path depending on  your `cwd` 
#####
SANDBOX_CONFIG_FILE = "sandbox.yaml"

# read the sandbox config 
scfg = EnvYAML(SANDBOX_CONFIG_FILE)
                
#read the oci config
config = oci.config.from_file(os.path.expanduser(scfg["oci"]["configFile"]),scfg["oci"]["profile"])

compartmentId=  scfg["oci"]["compartment"]

lang_client = oci.ai_language.AIServiceLanguageClient(config)

# Sample texts for demonstration (contains sentiments, entities, PII)
test_string1 = """
    Oracle Cloud Infrastructure is built for enterprises seeking higher performance, lower costs, and easier cloud migration for their applications. 
    Customers choose Oracle Cloud Infrastructure over AWS for several reasons:
    First, they can consume cloud services in the public cloud or within their  own data center with Oracle Dedicated Region Cloud@Customer. 
    Second, they can migrate and run any workload as is on Oracle Cloud, including Oracle databases and applications, VMware, or bare metal servers. 
    Third, customers can easily implement security controls and automation to prevent misconfiguration errors and implement security best practices. 
    Fourth, they have lower risks with Oracle's end-to-end SLAs covering performance, availability, and manageability of services. 
    Finally, their workloads achieve better performance at a significantly lower cost with Oracle Cloud Infrastructure than AWS.
    
    Take a look at what makes Oracle Cloud Infrastructure a better cloud platform than AWS."""

test_string2 = " The restaurant Chinese Garden on 100  Broadway, Denver, CO-80503 serves delicious meal, but the food can be expensive."

test_string3 = " The wet, slushy rain in Denver can lead to accidents, but if yuo send an emai to help@denver.org they will come out and help which is awesome"

test_doc1 = oci.ai_language.models.TextDocument(
        key="oci",
        text=test_string1,
        language_code="en"
        )

test_doc2 = oci.ai_language.models.TextDocument(
        key="chinese_garden",
        text=test_string2,
        language_code="en"
        )
test_doc3 = oci.ai_language.models.TextDocument(
        key="Denver",
        text=test_string3,
        language_code="en"
        )

test_docs=[test_doc1, test_doc2, test_doc3]

# Function for structured display (from oci_language.py)
def display_analysis_results(response_type, response_data):
    print(f"ðŸ“Š {response_type.upper()} RESULTS:")
    print("-" * 30)
    if response_data and response_data.documents:
        for doc in response_data.documents:
            if hasattr(doc, 'aspects') and doc.aspects:
                for aspect in doc.aspects:
                    print(f"  Text: '{aspect.text}'")
                    print(f"  Sentiment: {aspect.sentiment}")
                    print(f"  Length: {aspect.length}, Offset: {aspect.offset}")
            elif hasattr(doc, 'document_sentiment'):
                print(f"  Document sentiment: {doc.document_sentiment}")
                print(f"  scores: {doc.document_scores}")
                
            elif hasattr(doc, 'key_phrases') and doc.key_phrases:
                for phrase in doc.key_phrases:
                    print(f"  Phrase: '{phrase.text}' (Score: {phrase.score:.3f})")
            elif hasattr(doc, 'masked_text'):
                print(f"  Masked Text: {doc.masked_text}")
            elif hasattr(doc, 'entities') and doc.entities:
                for entity in doc.entities:
                    print(f"  Text: '{entity.text}'")
                    print(f"  Type: {entity.type}, Sub-type: {entity.sub_type}")
                    print(f"  Length: {entity.length}, Offset: {entity.offset}")
            elif hasattr(doc, 'text_classification') and doc.text_classification:
                for classification in doc.text_classification:
                    print(f"  Label: {classification.label} (Score: {classification.score:.3f})")
     
    else:
        print("  No data available")
    print()

### Step 2: Sentiment Analysis

Analyze sentiment at aspect (specific topics) or sentence level. Returns positive, negative, neutral, or mixed scores.

**Experiment:** Try your own texts with mixed sentiments (e.g., product reviews) and switch between ASPECT and SENTENCE levels to compare granularity.

In [None]:
#Sentiment analysis - aspect
senti_details =oci.ai_language.models.BatchDetectLanguageSentimentsDetails(
    documents = test_docs,
    compartment_id = compartmentId)

senti_res = lang_client.batch_detect_language_sentiments(batch_detect_language_sentiments_details=senti_details , level=["ASPECT"] )

display_analysis_results("Sentiment (Aspect)", senti_res.data)

In [None]:
#Sentiment analysis - sentence
senti_details =oci.ai_language.models.BatchDetectLanguageSentimentsDetails(
    documents=test_docs,
    compartment_id = compartmentId)

senti_res = lang_client.batch_detect_language_sentiments(batch_detect_language_sentiments_details=senti_details , level=["SENTENCE"] )

display_analysis_results("Sentiment (Sentence)", senti_res.data)

### Step 3: Key Phrase Extraction

Extracts salient phrases from text, ranked by relevance score.

**Experiment:** Test with longer documents (e.g., articles) and observe how scores reflect importance. Adjust text to include domain-specific terms.

In [None]:
## Key Phrase Extraction
keyphrase_extraction = lang_client.batch_detect_language_key_phrases(
            batch_detect_language_key_phrases_details=oci.ai_language.models.BatchDetectLanguageKeyPhrasesDetails(documents=test_docs,compartment_id = compartmentId)
        )
         
display_analysis_results("Key Phrases", keyphrase_extraction.data)

### Step 4: Named Entity Recognition (NER)

Identifies entities like PERSON, ORGANIZATION, LOCATION, with sub-types and offsets.

**Experiment:** Use texts with PII (names, addresses) or news articles. Check sub-types for finer classification (e.g., DATE vs. TIME).

In [None]:
## Named Entity Extractions

#see : https://docs.oracle.com/en-us/iaas/language/using/ner.htm
ner_extraction = lang_client.batch_detect_language_entities(
            batch_detect_language_entities_details=oci.ai_language.models.BatchDetectLanguageEntitiesDetails(documents=test_docs,compartment_id = compartmentId)
        )
         
display_analysis_results("Named Entities", ner_extraction.data)

### Step 5: Text Classification

Classifies text into categories (e.g., sports, business) with confidence scores.

**Experiment:** Input texts from different domains (e.g., emails, tweets) and analyze score distributions. Combine with sentiment for richer insights.

In [None]:
## Text Classification

#see: https://docs.oracle.com/en-us/iaas/language/using/text-class.htm

# Run text classification on text_document
text_classification = lang_client.batch_detect_language_text_classification(
            batch_detect_language_text_classification_details=oci.ai_language.models.BatchDetectLanguageTextClassificationDetails(
                documents=test_docs,compartment_id = compartmentId
            )
        )
print(text_classification.data)
display_analysis_results("Text Classification", text_classification.data)

### Step 6: PII Identification & Masking

Detects personally identifiable information (e.g., emails, phones) and applies masking (e.g., *** for sensitive parts).

**Experiment:** Customize masking (e.g., leave more characters unmasked) or test with fabricated PII. Observe how it handles different formats (e.g., international phones).

In [None]:
# PII Identification

#see: https://docs.oracle.com/en-us/iaas/language/using/pii.htm


piiEntityMasking = oci.ai_language.models.PiiEntityMask(mode="MASK", masking_character="*", leave_characters_unmasked=4,
                                                        is_unmasked_from_end=True)

masking = {"ALL": piiEntityMasking}
pii_identification = lang_client.batch_detect_language_pii_entities(
            batch_detect_language_pii_entities_details=oci.ai_language.models.BatchDetectLanguagePiiEntitiesDetails(
                documents=test_docs,compartment_id = compartmentId,
                masking = masking
            )
        )

display_analysis_results("PII Masking", pii_identification.data)

### Step 7: Language Detection

Detects the dominant language in mixed or unknown text.

**Experiment:** Mix multiple languages in one text or use short phrases. Test edge cases like code-switching (e.g., Spanglish).

In [None]:
# Language detection 
#https://docs.oracle.com/en-us/iaas/language/using/lang-detect.htm

# AI Service : Language detection


lang_doc1 = oci.ai_language.models.DominantLanguageDocument(
        key="french",
        text="Et encore une autre langue, es-possible qu'il le comprend ?",
        )

lang_doc2 = oci.ai_language.models.DominantLanguageDocument(
        key="dutch",
        text="Een tekst in mijn moedertaal om het een beetje moeilijker te maken voor de service",
        )
lang_doc3 = oci.ai_language.models.DominantLanguageDocument(
        key="english",
        text="This should be fairly easy to detect, I'll avoid using the name of the actual language in this text",
        )

lang_docs=[lang_doc1, lang_doc2, lang_doc3]

response = lang_client.batch_detect_dominant_language (batch_detect_dominant_language_details=
        oci.ai_language.models.BatchDetectDominantLanguageDetails(documents=lang_docs,compartment_id = compartmentId)
    )

print(response.data)
display_analysis_results("Language Detection", response.data)

### Step 8: Text Translation

Translates text between supported languages (source auto-detected or specified).

**Experiment:** Change target_language (e.g., 'es' for Spanish, 'hi' for Hindi) or translate back to original to check fidelity. Test with idiomatic expressions.

In [None]:
# Translation
# https://docs.oracle.com/en-us/iaas/language/using/translate-text.htm
# Translate a few sentences from English to Dutch.  Feel free to change the text or the languages


key1 = "doc1"
key2 = "doc2"
text1 = "The Indy Autonomous Challenge is the worlds first head-to-head, high speed autonomous race taking place at the Indianapolis Motor Speedway"
text2 = "OCI will be the cloud engine for the artificial intelligence models that drive the MIT Driverless cars."
target_language = "nl" #TODO specify the target language

doc1 = oci.ai_language.models.TextDocument(key=key1, text=text1, language_code="en")
doc2 = oci.ai_language.models.TextDocument(key=key2, text=text2, language_code="en")
documents = [doc1, doc2]


batch_language_translation_details = oci.ai_language.models.BatchLanguageTranslationDetails(
    documents=documents, 
    compartment_id=compartmentId, 
    target_language_code=target_language)
output = lang_client.batch_language_translation (batch_language_translation_details)
print(output.data)
display_analysis_results("Translation", output.data)

### Practice Exercises and Discussion Prompts

1. **Sentiment & Classification Pipeline:**
   - Build a pipeline: Classify text, then analyze sentiment on top categories.
   - Experiment: Use customer feedback data; visualize sentiment trends per class.

2. **NER & PII in Data Cleaning:**
   - Extract entities from resumes or emails, then mask PII for anonymization.
   - Experiment: Test accuracy on varied formats (e.g., abbreviations, non-English names).

3. **Multilingual Workflow:**
   - Detect language, translate to English, then run sentiment/NER.
   - Experiment: Handle mixed-language texts; compare pre/post-translation results.

4. **Key Phrases for Summarization:**
   - Extract key phrases from long articles and generate a summary prompt for an LLM.
   - Experiment: Rank phrases by score and limit to top-N.

5. **Build a Mini-Project:**
   - Create a text analyzer app: Input text, output all analyses in a report.
   - Integrate with OCI Gen AI: Use Language outputs to prompt an LLM for insights.

**Discussion Prompts:**
- How do small models like OCI Language compare to LLMs for targeted NLP tasks (cost, speed, accuracy)?
- When would you chain multiple Language features (e.g., detect -> translate -> classify)?
- Discuss privacy implications of PII detection in production apps.

For help, reach out in #igiu-ai-learning!