# Apache Kafka Performance Testing Framework - Create Visualization

This notebook provides visualization capabilities for analyzing Apache Kafka performance test results.

The first cell sets up the required dependencies and AWS service clients for the performance analysis workflow.

In [None]:
# System and Utility Imports
import boto3

# Custom Module Imports
from utils import (
    get_test_details,
    query_experiment_details,
    aggregate_statistics,
    plot
)

# AWS Client Initialization
stepfunctions = boto3.client('stepfunctions')
cloudwatch_logs = boto3.client('logs')
cloudformation = boto3.client('cloudformation')

## Test Parameters Configuration

The next cell initializes the parameters needed to identify and retrieve test execution data from AWS Step Functions.

### Usage
Replace `'your-execution-arn-here'` with the actual ARN of your Step Functions execution. To analyze multiple test executions, add more dictionaries to the list:

```python
test_params.extend([
    {'execution_arn': 'arn-1'},
    {'execution_arn': 'arn-2'},
    # Add more ARNs as needed
])

In [None]:
# Test parameters configuration
test_params = []
test_params.extend([
    {'execution_arn': 'your-execution-arn-here'}  # Replace with your actual ARN
])

## Test Details Retrieval

This cell fetches detailed information about the Kafka performance tests using the provided execution ARNs.

The output provides a comprehensive view of both the cluster configuration and the performance test parameters in a well-organized, readable format.

In [None]:
# Get test details
try:
    test_details = get_test_details.get_test_details(test_params, stepfunctions, cloudformation)
except Exception as e:
    print(f"Error getting test details: {str(e)}")
    raise

## Query Experiment Details

This cell retrieves and processes performance statistics from CloudWatch Logs for the Kafka performance tests.

The output provides the total number of statistics gathered for both producers and consumers.

In [None]:
# Query experiment details
producer_stats, consumer_stats, test_details = query_experiment_details.query_cw_logs(
    test_details,
    cloudwatch_logs
)

## Data Preparation and Aggregation

This cell prepares the data for visualization by defining visualization parameters, setting up filters, and aggregating the data.

In [None]:
# Define partitions for the visualization
partitions = {
    'ignore_keys': [
        'topic_id', 
        'cluster_id', 
        'test_id', 
        'cluster_name'
    ],
    'title_keys': [
        'kafka_version',
        'broker_storage',
        'in_cluster_encryption',
        'producer.security.protocol'
    ],
    'row_keys': [
        'producer.acks',
        'producer.batch.size',
        'num_partitions'
    ],
    'column_keys': [
        'num_producers',
        'consumer_groups.num_groups'
    ],
    'metric_color_keys': [
        'broker_type'
    ]
}

# Define filter functions
filter_fn = lambda x: True
filter_agg_fn = lambda x: True

# Apply filters and aggregate data
filtered_producer_stats = list(filter(filter_fn, producer_stats))
filtered_consumer_stats = list(filter(filter_fn, consumer_stats))

(producer_aggregated_stats, consumer_aggregated_stats, combined_stats) = aggregate_statistics.aggregate_cw_logs(
    filtered_producer_stats, 
    filtered_consumer_stats, 
    partitions,
    test_details
)

filtered_producer_aggregated_stats = list(filter(filter_agg_fn, producer_aggregated_stats))

## Latency Visualization

This cell creates a visualization of producer latency metrics across different throughput levels.

This visualization helps identify how producer latency changes as throughput increases, revealing potential performance bottlenecks and the relationship between typical and worst-case latency.

In [None]:
# Create visualization
if filtered_producer_aggregated_stats:
    plot.plot_measurements(
        filtered_producer_aggregated_stats,
        ['latency_ms_p50', 'latency_ms_p99'],
        'producer put latency (ms)',
        xlogscale=True,
        ylogscale=True,
        xmin=1,
        ymin=1,
        **partitions
    )
else:
    print("\nNo data available for plotting!")

## Throughput Visualization

This cell creates a visualization focusing on the actual throughput achieved by Kafka producers.

This visualization complements the latency plot by showing the throughput dimension of performance, allowing for a complete understanding of the throughput-latency tradeoff in Kafka deployments.

In [None]:
# Create throughput visualization
if filtered_producer_aggregated_stats:
    plot.plot_measurements(
        filtered_producer_aggregated_stats,
        ['sent_mb_sec'],
        'producer throughput (MB/s)',
        xlogscale=True,
        ylogscale=True,
        xmin=1,
        ymin=1,
        **partitions
    )
else:
    print("\nNo data available for plotting!")