aws-samples · nicusX · May 27, 2025 · May 27, 2025 · Jul 10, 2025 · Jul 11, 2025
diff --git a/.gitattributes b/.gitattributes
diff --git a/.github/workflows/check-arns.yml b/.github/workflows/check-arns.yml
@@ -0,0 +1,24 @@
+name: Check for Exposed ARNs
+
+on:
+  pull_request:
+
+jobs:
+  check-arns:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Check for exposed ARNs
+        run: |
+          # Find files containing ARN patterns with actual account IDs
+          # Exclude .git directory, markdown files, and this workflow file itself
+          if grep -r --include="*" --exclude="*.md" --exclude-dir=".git" --exclude=".github/workflows/check-arns.yml" -E 'arn:aws:[^:]+:[^:]+:[0-9]{12}:' .; then
+            echo "ERROR: Found unsanitized ARNs in the repository"
+            echo "Please replace account IDs with a placeholder such as <account-id>"
+            echo "Files with exposed ARNs:"
+            grep -r --include="*" --exclude="*.md" --exclude-dir=".git" --exclude=".github/workflows/check-arns.yml" -l -E 'arn:aws:[^:]+:[^:]+:[0-9]{12}:' .
+            exit 1
+          fi
+
+          echo "All files checked - no exposed ARNs found"
diff --git a/.gitignore b/.gitignore
@@ -13,7 +13,7 @@ venv/
 .java-version
 /pyflink/
 __pycache__/
-
+.vscode/
 /.run/
 
 clean.sh

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -59,17 +59,27 @@ The AWS team managing the repository reserves the right to modify or reject new
   versions, external dependencies, permissions, and runtime configuration. Use [this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/java/KafkaConfigProviders/Kafka-SASL_SSL-ConfigProviders)
   as a reference.
 * Make sure the example works with what explained in the README, and without any implicit dependency or configuration.
+* Add an entry for the new example in the top-level [README](README.md) or in the README of the subfolder, if the example is in a subfolder such as `java/FlinkCDC` or `java/Iceberg`
 
 #### AWS authentication and credentials
 
 * AWS credentials must never be explicitly passed to the application. 
 * Any permissions must be provided from the IAM Role assigned to the Managed Apache Flink application. When running locally, leverage the IDE AWS plugins.
 
 #### Dependencies in PyFlink examples
-  * Use the pattern illustrated by [this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/python/GettingStarted)
-    to provide JAR dependencies and build the ZIP using Maven.
-  * If the application also requires Python dependencies, use the pattern illustrated by [this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/python/PythonDependencies)
-    leveraging `requirements.txt`.
+
+* Use the pattern illustrated by [this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/python/GettingStarted)
+  to provide JAR dependencies and build the ZIP using Maven.
+* If the application also requires Python dependencies used for UDF and data processing in general, use the pattern illustrated by [this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/python/PythonDependencies)
+  leveraging `requirements.txt`.
+* Only if the application requires Python dependencies used during the job initialization, in the main(), use the pattern 
+  illustrated in [this other example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/python/PackagedPythonDependencies),
+  packaging dependencies in the ZIP artifact.
+
+#### Top POM-file for Java examples
+
+* Add the new Java example also to the `pom.xml` file in the `java/` folder
+
 
 ## Reporting Bugs/Feature Requests
 

diff --git a/README.md b/README.md
@@ -2,14 +2,89 @@
 
 Example applications in Java, Python, Scala and SQL for Amazon Managed Service for Apache Flink (formerly known as Amazon Kinesis Data Analytics), illustrating various aspects of Apache Flink applications, and simple "getting started" base projects.
 
-* [Java examples](./java)
-* [Python examples](./python)
-* [Scala examples](/scala)
-* [Operational utilities and infrastructure code](./infrastructure)
+## Table of Contents
 
-## Security
+### Java Examples
 
-See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
+#### Getting Started
+- [**Getting Started - DataStream API**](./java/GettingStarted) - Skeleton project for a basic Flink Java application using DataStream API
+- [**Getting Started - Table API & SQL**](./java/GettingStartedTable) - Basic Flink Java application using Table API & SQL with DataStream API
+
+#### Connectors
+- [**Kinesis Connectors**](./java/KinesisConnectors) - Examples of Flink Kinesis Connector source and sink (standard and EFO)
+- [**Kinesis Source Deaggregation**](./java/KinesisSourceDeaggregation) - Handling Kinesis record deaggregation in the Kinesis source
+- [**Kafka Connectors**](./java/KafkaConnectors) - Examples of Flink Kafka Connector source and sink
+- [**Kafka Config Providers**](./java/KafkaConfigProviders) - Examples of using Kafka Config Providers for secure configuration management
+- [**DynamoDB Stream Source**](./java/DynamoDBStreamSource) - Reading from DynamoDB Streams as a source
+- [**Kinesis Firehose Sink**](./java/KinesisFirehoseSink) - Writing data to Amazon Kinesis Data Firehose
+- [**SQS Sink**](./java/SQSSink) - Writing data to Amazon SQS
+- [**Prometheus Sink**](./java/PrometheusSink) - Sending metrics to Prometheus
+- [**Flink CDC**](./java/FlinkCDC) - Change Data Capture examples using Flink CDC
+
+#### Reading and writing files and transactional data lake formats
+- [**Iceberg**](./java/Iceberg) - Working with Apache Iceberg and Amazon S3 Tables
+- [**S3 Sink**](./java/S3Sink) - Writing JSON data to Amazon S3
+- [**S3 Avro Sink**](./java/S3AvroSink) - Writing Avro format data to Amazon S3
+- [**S3 Avro Source**](./java/S3AvroSource) - Reading Avro format data from Amazon S3
+- [**S3 Parquet Sink**](./java/S3ParquetSink) - Writing Parquet format data to Amazon S3
+- [**S3 Parquet Source**](./java/S3ParquetSource) - Reading Parquet format data from Amazon S3
+
+#### Data Formats & Schema Registry
+- [**Avro with Glue Schema Registry - Kinesis**](./java/AvroGlueSchemaRegistryKinesis) - Using Avro format with AWS Glue Schema Registry and Kinesis
+- [**Avro with Glue Schema Registry - Kafka**](./java/AvroGlueSchemaRegistryKafka) - Using Avro format with AWS Glue Schema Registry and Kafka
+
+#### Stream Processing Patterns
+- [**Serialization**](./java/Serialization) - Serialization of record and state
+- [**Windowing**](./java/Windowing) - Time-based window aggregation examples
+- [**Side Outputs**](./java/SideOutputs) - Using side outputs for data routing and filtering
+- [**Async I/O**](./java/AsyncIO) - Asynchronous I/O patterns with retries for external API calls\
+- [**Custom Metrics**](./java/CustomMetrics) - Creating and publishing custom application metrics
+
+#### Utilities
+- [**Fink Data Generator (JSON)**](java/FlinkDataGenerator) - How to use a Flink application as data generator, for functional and load testing.
+
+### Python Examples
+
+#### Getting Started
+- [**Getting Started**](./python/GettingStarted) - Basic PyFlink application Table API & SQL
+
+#### Handling Python dependencies
+- [**Python Dependencies**](./python/PythonDependencies) - Managing Python dependencies in PyFlink applications using `requirements.txt`
+- [**Packaged Python Dependencies**](./python/PackagedPythonDependencies) - Managing Python dependencies packaged with the PyFlink application at build time
+
+#### Connectors
+- [**Datastream Kafka Connector**](./python/DatastreamKafkaConnector) - Using Kafka connector with PyFlink DataStream API
+- [**Kafka Config Providers**](./python/KafkaConfigProviders) - Secure configuration management for Kafka in PyFlink
+- [**S3 Sink**](./python/S3Sink) - Writing data to Amazon S3 using PyFlink
+- [**Firehose Sink**](./python/FirehoseSink) - Writing data to Amazon Kinesis Data Firehose
+- [**Iceberg Sink**](./python/IcebergSink) - Writing data to Apache Iceberg tables
+
+#### Stream Processing Patterns
+- [**Windowing**](./python/Windowing) - Time-based window aggregation examples with PyFlink/SQL
+- [**User Defined Functions (UDF)**](./python/UDF) - Creating and using custom functions in PyFlink
+
+#### Utilities
+- [**Data Generator**](./python/data-generator) - Python script for generating sample data to Kinesis Data Streams
+- [**Local Development on Apple Silicon**](./python/LocalDevelopmentOnAppleSilicon) - Setup guide for local development of Flink 1.15 on Apple Silicon Macs (not required with Flink 1.18 or later)
+
+
+### Scala Examples
+
+#### Getting Started
+- [**Getting Started - DataStream API**](./scala/GettingStarted) - Skeleton project for a basic Flink Scala application using DataStream API
+
+### Infrastructure & Operations
+
+- [**Auto Scaling**](./infrastructure/AutoScaling) - Custom autoscaler for Amazon Managed Service for Apache Flink
+- [**Scheduled Scaling**](./infrastructure/ScheduledScaling) - Scale applications up and down based on daily time schedules
+- [**Monitoring**](./infrastructure/monitoring) - Extended CloudWatch Dashboard examples for monitoring applications
+- [**Scripts**](./infrastructure/scripts) - Useful shell scripts for interacting with Amazon Managed Service for Apache Flink control plane API
+
+---
+
+## Contributing
+
+See [Contributing Guidelines](CONTRIBUTING.md#security-issue-notifications) for more information.
 
 ## License Summary
 

diff --git a/infrastructure/AutoScaling/.gitignore b/infrastructure/AutoScaling/.gitignore
@@ -0,0 +1,5 @@
+node_modules
+cdk.out
+/cdk.context.json
+Autoscaler-*.yaml
+/.DS_Store