# Audio Analysis with Amazon Bedrock Data Automation: The Voice of Information

This notebook continues your journey with Amazon Bedrock Data Automation (BDA) by exploring the rich world of audio data. While document processing provided the foundation for structured extraction, and image analysis allowed us to see beyond text, audio analysis represents another fundamental dimension of human communication: the voice.

As you work through this notebook, you'll build capabilities to:
- Convert spoken content into structured transcripts with speaker identification
- Generate comprehensive summaries of audio content
- Identify potentially sensitive or inappropriate content through moderation
- Classify audio according to standard categories
- Extract semantic insights from conversations and discussions

This notebook demonstrates how to use Amazon Bedrock Data Automation (BDA) to analyze audio files and extract valuable insights. BDA can process audio to generate transcripts, identify speakers, detect content moderation issues, create summaries, and more.

In this enhanced notebook, we'll focus on the core BDA workflow for audio analysis:

1. Preparing a sample audio file
2. Creating a BDA project with appropriate output configurations
3. Processing the audio with BDA
4. Analyzing the results (summaries, transcripts, and content moderation)

The setup cell below contains helper functions and initialization code. **It's collapsed by default** - you can expand it to see the details, but you don't need to understand all of it to follow the main BDA workflow.

In [None]:
# This cell contains setup code and helper functions.
# You can expand it to see the details, but you don't need to understand all of it.

%pip install "boto3>=1.37.4" --upgrade -qq

import boto3
import json
import uuid
import time
import os
import sagemaker
import matplotlib.pyplot as plt
from datetime import datetime
from IPython.display import Audio, clear_output, JSON, HTML, Markdown, display
import warnings
warnings.filterwarnings('ignore')

# Import all utilities from the consolidated utils module
from utils.utils import BDAAudioUtils, show_business_context, ensure_bda_results_dir

# Initialize our utility class
bda_utils = BDAAudioUtils()

# Display comprehensive business context for audio analysis
show_business_context("audio_complete")

print(f"Setup complete. BDA utilities initialized for region: {bda_utils.current_region}")
print(f"Using S3 bucket: {bda_utils.bucket_name}")

## Industry Applications

As you work through this notebook, consider how audio analysis could transform workflows in your specific domain:

**Call Centers**: How would automatic transcription, speaker identification, and sentiment analysis of customer calls improve your service quality and agent performance?

**Media & Entertainment**: What if your content libraries could automatically generate transcripts, summaries, and topic analysis for podcasts, interviews, and broadcasts?

**Healthcare**: How could transcription of patient interactions, medical dictations, and automated documentation transform your clinical workflows?

**Financial Services**: What insights could you gain from analyzing earnings calls, advisory conversations, and compliance monitoring of recorded interactions?

**Your Industry**: What types of audio content are critical to your organization's workflows? What insights within these recordings would provide the most business value if automatically extracted?

## 1. Prepare Sample Audio

First, we'll download a sample audio file and upload it to S3 for processing with BDA. We'll use a podcast audio file that contains spoken content that BDA can analyze.

The audio will be stored in an S3 bucket that BDA can access. This is a required step as BDA needs to read the audio from S3.

In [None]:
# Download sample audio file
sample_audio = 'podcastdemo.mp3'
source_url = 'https://ws-assets-prod-iad-r-pdx-f3b3f9f1a7d6a3d0.s3.us-west-2.amazonaws.com/335119c4-e170-43ad-b55c-76fa6bc33719/podcastdemo.mp3'

# Download the audio using our enhanced download function
local_path = bda_utils.download_audio(source_url, sample_audio)

# Display the audio player
display(Audio(sample_audio, autoplay=False))

# Upload to S3
s3_key = f'{bda_utils.data_prefix}/{sample_audio}'
s3_uri = bda_utils.upload_to_s3(sample_audio, s3_key)
print(f"Uploaded audio to S3: {s3_uri}")

## 2. Define BDA Configuration and Create Project

Now we'll define the standard output configuration for audio analysis and create a BDA project. This configuration determines what information BDA will extract from the audio.

### Project Architecture for Audio Analysis

### Key Configuration Options for Audio Analysis:

- **Audio Content Moderation**: Detects inappropriate or unsafe content in the audio
- **Topic Content Moderation**: Analyzes topics for potentially sensitive content
- **Transcript**: Generates a transcript of spoken content with speaker identification
- **Audio Summary**: Generates an overall summary of the audio content
- **Topic Summary**: Provides summaries for each detected topic
- **IAB Categories**: Classifies content into Internet Advertising Bureau categories

In [None]:
# Show business context for project architecture
show_business_context("project_architecture")

# Define standard output configuration for audio processing
standard_output_config = {
    "audio": {
        "extraction": {
            "category": {
                "state": "ENABLED", 
                "types": [
                    "AUDIO_CONTENT_MODERATION",  # Detect inappropriate content in audio
                    "TOPIC_CONTENT_MODERATION",  # Analyze topics for sensitive content
                    "TRANSCRIPT"                 # Generate transcript with speaker identification
                ]
            }
        },
        "generativeField": {
            "state": "ENABLED",
            "types": [
                "AUDIO_SUMMARY",                # Generate overall audio summary
                "TOPIC_SUMMARY",                # Generate summaries for each topic
                "IAB"                           # Classify into IAB categories
            ]
        }
    }
}

# Create a unique project name
project_name = f'bda-workshop-audio-project-{str(uuid.uuid4())[0:4]}'

# Create a BDA project with our standard output configuration
print("Creating BDA project for audio analysis...")
response = bda_utils.bda_client.create_data_automation_project(
    projectName=project_name,
    projectDescription='BDA workshop audio sample project',
    projectStage='DEVELOPMENT',
    standardOutputConfiguration=standard_output_config
)

# Get the project ARN
audio_project_arn = response.get("projectArn")
print(f"BDA project created with ARN: {audio_project_arn}")

## 3. Process Audio with BDA

Now we'll use the `invoke_data_automation_async` API to process our audio with BDA. This API starts an asynchronous job to analyze the audio and extract insights based on our project configuration.

The API returns an invocation ARN that we can use to check the status of the processing job. Audio processing can take several minutes depending on the length and complexity of the audio.

In [None]:
# Show business context for processing pipeline
show_business_context("processing_pipeline")

# Invoke BDA to process the audio
print(f"Processing audio: {s3_uri}")
print(f"Results will be stored at: s3://{bda_utils.bucket_name}/{bda_utils.output_prefix}")

# Call the invoke_data_automation_async API
response = bda_utils.bda_runtime_client.invoke_data_automation_async(
    inputConfiguration={
        's3Uri': s3_uri  # The S3 location of our audio
    },
    outputConfiguration={
        's3Uri': f's3://{bda_utils.bucket_name}/{bda_utils.output_prefix}'  # Where to store results
    },
    dataAutomationConfiguration={
        'dataAutomationProjectArn': audio_project_arn,  # The project we created
        'stage': 'DEVELOPMENT'                          # Must match the project stage
    },
    dataAutomationProfileArn=f'arn:aws:bedrock:{bda_utils.current_region}:{bda_utils.account_id}:data-automation-profile/us.data-automation-v1'
)

# Get the invocation ARN
invocation_arn = response.get("invocationArn")
print(f"Invocation ARN: {invocation_arn}")

# Wait for processing to complete
status_response = bda_utils.wait_for_completion(
    get_status_function=bda_utils.bda_runtime_client.get_data_automation_status,
    status_kwargs={'invocationArn': invocation_arn},
    completion_states=['Success'],
    error_states=['ClientError', 'ServiceError'],
    status_path_in_response='status',
    max_iterations=15,
    delay=10
)

# Check if processing was successful
if status_response['status'] == 'Success':
    output_config_uri = status_response.get("outputConfiguration", {}).get("s3Uri")
    print(f"\nAudio processing completed successfully!")
    print(f"Output configuration: {output_config_uri}")
else:
    print(f"\nAudio processing failed with status: {status}")
    if 'error_message' in status_response:
        print(f"Error message: {status_response['error_message']}")

## 4. Access and Parse BDA Results

Now we'll access the BDA results from S3 and parse them. The results include the audio summary, transcript, content moderation analysis, and other metadata.

In [None]:
# Show business context for speaker identification
show_business_context("speaker_identification")

# Extract the output location from the configuration URI
bucket, key = bda_utils.get_bucket_and_key(output_config_uri)
output_folder = os.path.dirname(key)
result_key = f"{output_folder}/0/standard_output/0/result.json"

# Create bda-results directory if it doesn't exist
ensure_bda_results_dir()

# Download the result file to the bda-results directory
local_result_file = '../bda-results/audio_result.json'
bda_utils.s3_client.download_file(bda_utils.bucket_name, result_key, local_result_file)
print(f"Downloaded result file to: {local_result_file}")


# Load the result data
with open(local_result_file, 'r') as f:
    result_data = json.load(f)

# Display a preview of the result structure
print("\nBDA Result Structure:")
top_level_keys = list(result_data.keys())
print(f"Top-level keys: {', '.join(top_level_keys)}")

audio_keys = list(result_data.get('audio', {}).keys())
print(f"Audio section keys: {', '.join(audio_keys)}")

## 5. Analyze Audio Summary and Transcript

Let's examine the audio summary and transcript generated by BDA. The summary provides a concise overview of the audio content, while the transcript captures the spoken content with speaker identification.

In [None]:
# Show business context for audio summarization
show_business_context("audio_summarization")

# Display audio summary
print("=== Audio Summary ===\n")
if "summary" in result_data["audio"]:
    print(result_data["audio"]["summary"])
else:
    print("No summary available")

# Display audio transcript
print("\n=== Audio Transcript ===\n")
if "transcript" in result_data["audio"] and "representation" in result_data["audio"]["transcript"]:
    transcript_data = result_data["audio"]["transcript"]
    print(transcript_data["representation"]["text"])
    
    # Visualize speaker segments using our enhanced function
    print("\n=== Speaker Visualization ===\n")
    bda_utils.visualize_transcript(transcript_data)
else:
    print("No transcript available")

# Display audio statistics
if "statistics" in result_data:
    print("\n=== Audio Statistics ===\n")
    print(json.dumps(result_data["statistics"], indent=2))

## 6. Analyze Content Moderation Results

BDA provides content moderation analysis for audio, identifying potentially inappropriate or unsafe content. Let's examine these results to understand the content safety profile of our audio.

In [None]:
# Show business context for audio content moderation
show_business_context("audio_content_moderation")

# Analyze content moderation results
print("=== Content Moderation Analysis ===\n")
bda_utils.analyze_content_moderation(result_data)

# Generate visual summary of moderation scores
print("\n=== Content Moderation Score Visualization ===\n")
bda_utils.generate_moderation_summary(result_data)

## Summary

In this notebook, we demonstrated how to use Amazon Bedrock Data Automation (BDA) to analyze audio files and extract valuable insights. We covered the key steps in the BDA workflow:

1. **Audio Preparation**: Uploaded a sample audio file to S3 for processing
2. **Configuration**: Defined standard output configuration for audio analysis
3. **Project Creation**: Created a BDA project with our configuration
4. **Audio Processing**: Used the `invoke_data_automation_async` API to process the audio
5. **Results Analysis**: Retrieved and analyzed the extracted insights

### Key BDA APIs Used:
- `create_data_automation_project`: Creates a project with configuration settings
- `invoke_data_automation_async`: Processes an audio file asynchronously
- `get_data_automation_status`: Checks the status of a processing job

### BDA Capabilities Demonstrated:
- Generating audio summaries
- Creating transcripts with speaker identification
- Detecting inappropriate content through content moderation
- Analyzing audio statistics
- Visualizing speaker distributions in conversations

These capabilities can be used for various applications such as podcast analysis, call center analytics, meeting transcription, content moderation, and building audio intelligence solutions.

## Looking Ahead: Dynamic Content Understanding

Having mastered text, image, and audio processing, you're now ready to explore the most complex modality: video. In the next module, "Dynamic Content Understanding", you'll discover how BDA can process video content to extract insights from moving images, audio tracks, and scene transitions - combining multiple modalities into a comprehensive understanding of video content.