Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitattributes

This file was deleted.

24 changes: 24 additions & 0 deletions .github/workflows/check-arns.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Check for Exposed ARNs

on:
pull_request:

jobs:
check-arns:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Check for exposed ARNs
run: |
# Find files containing ARN patterns with actual account IDs
# Exclude .git directory, markdown files, and this workflow file itself
if grep -r --include="*" --exclude="*.md" --exclude-dir=".git" --exclude=".github/workflows/check-arns.yml" -E 'arn:aws:[^:]+:[^:]+:[0-9]{12}:' .; then
echo "ERROR: Found unsanitized ARNs in the repository"
echo "Please replace account IDs with a placeholder such as <account-id>"
echo "Files with exposed ARNs:"
grep -r --include="*" --exclude="*.md" --exclude-dir=".git" --exclude=".github/workflows/check-arns.yml" -l -E 'arn:aws:[^:]+:[^:]+:[0-9]{12}:' .
exit 1
fi

echo "All files checked - no exposed ARNs found"
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ venv/
.java-version
/pyflink/
__pycache__/

.vscode/
/.run/

clean.sh
Expand Down
18 changes: 14 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,17 +59,27 @@ The AWS team managing the repository reserves the right to modify or reject new
versions, external dependencies, permissions, and runtime configuration. Use [this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/java/KafkaConfigProviders/Kafka-SASL_SSL-ConfigProviders)
as a reference.
* Make sure the example works with what explained in the README, and without any implicit dependency or configuration.
* Add an entry for the new example in the top-level [README](README.md) or in the README of the subfolder, if the example is in a subfolder such as `java/FlinkCDC` or `java/Iceberg`

#### AWS authentication and credentials

* AWS credentials must never be explicitly passed to the application.
* Any permissions must be provided from the IAM Role assigned to the Managed Apache Flink application. When running locally, leverage the IDE AWS plugins.

#### Dependencies in PyFlink examples
* Use the pattern illustrated by [this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/python/GettingStarted)
to provide JAR dependencies and build the ZIP using Maven.
* If the application also requires Python dependencies, use the pattern illustrated by [this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/python/PythonDependencies)
leveraging `requirements.txt`.

* Use the pattern illustrated by [this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/python/GettingStarted)
to provide JAR dependencies and build the ZIP using Maven.
* If the application also requires Python dependencies used for UDF and data processing in general, use the pattern illustrated by [this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/python/PythonDependencies)
leveraging `requirements.txt`.
* Only if the application requires Python dependencies used during the job initialization, in the main(), use the pattern
illustrated in [this other example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/python/PackagedPythonDependencies),
packaging dependencies in the ZIP artifact.

#### Top POM-file for Java examples

* Add the new Java example also to the `pom.xml` file in the `java/` folder


## Reporting Bugs/Feature Requests

Expand Down
87 changes: 81 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,89 @@

Example applications in Java, Python, Scala and SQL for Amazon Managed Service for Apache Flink (formerly known as Amazon Kinesis Data Analytics), illustrating various aspects of Apache Flink applications, and simple "getting started" base projects.

* [Java examples](./java)
* [Python examples](./python)
* [Scala examples](/scala)
* [Operational utilities and infrastructure code](./infrastructure)
## Table of Contents

## Security
### Java Examples

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
#### Getting Started
- [**Getting Started - DataStream API**](./java/GettingStarted) - Skeleton project for a basic Flink Java application using DataStream API
- [**Getting Started - Table API & SQL**](./java/GettingStartedTable) - Basic Flink Java application using Table API & SQL with DataStream API

#### Connectors
- [**Kinesis Connectors**](./java/KinesisConnectors) - Examples of Flink Kinesis Connector source and sink (standard and EFO)
- [**Kinesis Source Deaggregation**](./java/KinesisSourceDeaggregation) - Handling Kinesis record deaggregation in the Kinesis source
- [**Kafka Connectors**](./java/KafkaConnectors) - Examples of Flink Kafka Connector source and sink
- [**Kafka Config Providers**](./java/KafkaConfigProviders) - Examples of using Kafka Config Providers for secure configuration management
- [**DynamoDB Stream Source**](./java/DynamoDBStreamSource) - Reading from DynamoDB Streams as a source
- [**Kinesis Firehose Sink**](./java/KinesisFirehoseSink) - Writing data to Amazon Kinesis Data Firehose
- [**SQS Sink**](./java/SQSSink) - Writing data to Amazon SQS
- [**Prometheus Sink**](./java/PrometheusSink) - Sending metrics to Prometheus
- [**Flink CDC**](./java/FlinkCDC) - Change Data Capture examples using Flink CDC

#### Reading and writing files and transactional data lake formats
- [**Iceberg**](./java/Iceberg) - Working with Apache Iceberg and Amazon S3 Tables
- [**S3 Sink**](./java/S3Sink) - Writing JSON data to Amazon S3
- [**S3 Avro Sink**](./java/S3AvroSink) - Writing Avro format data to Amazon S3
- [**S3 Avro Source**](./java/S3AvroSource) - Reading Avro format data from Amazon S3
- [**S3 Parquet Sink**](./java/S3ParquetSink) - Writing Parquet format data to Amazon S3
- [**S3 Parquet Source**](./java/S3ParquetSource) - Reading Parquet format data from Amazon S3

#### Data Formats & Schema Registry
- [**Avro with Glue Schema Registry - Kinesis**](./java/AvroGlueSchemaRegistryKinesis) - Using Avro format with AWS Glue Schema Registry and Kinesis
- [**Avro with Glue Schema Registry - Kafka**](./java/AvroGlueSchemaRegistryKafka) - Using Avro format with AWS Glue Schema Registry and Kafka

#### Stream Processing Patterns
- [**Serialization**](./java/Serialization) - Serialization of record and state
- [**Windowing**](./java/Windowing) - Time-based window aggregation examples
- [**Side Outputs**](./java/SideOutputs) - Using side outputs for data routing and filtering
- [**Async I/O**](./java/AsyncIO) - Asynchronous I/O patterns with retries for external API calls\
- [**Custom Metrics**](./java/CustomMetrics) - Creating and publishing custom application metrics

#### Utilities
- [**Fink Data Generator (JSON)**](java/FlinkDataGenerator) - How to use a Flink application as data generator, for functional and load testing.

### Python Examples

#### Getting Started
- [**Getting Started**](./python/GettingStarted) - Basic PyFlink application Table API & SQL

#### Handling Python dependencies
- [**Python Dependencies**](./python/PythonDependencies) - Managing Python dependencies in PyFlink applications using `requirements.txt`
- [**Packaged Python Dependencies**](./python/PackagedPythonDependencies) - Managing Python dependencies packaged with the PyFlink application at build time

#### Connectors
- [**Datastream Kafka Connector**](./python/DatastreamKafkaConnector) - Using Kafka connector with PyFlink DataStream API
- [**Kafka Config Providers**](./python/KafkaConfigProviders) - Secure configuration management for Kafka in PyFlink
- [**S3 Sink**](./python/S3Sink) - Writing data to Amazon S3 using PyFlink
- [**Firehose Sink**](./python/FirehoseSink) - Writing data to Amazon Kinesis Data Firehose
- [**Iceberg Sink**](./python/IcebergSink) - Writing data to Apache Iceberg tables

#### Stream Processing Patterns
- [**Windowing**](./python/Windowing) - Time-based window aggregation examples with PyFlink/SQL
- [**User Defined Functions (UDF)**](./python/UDF) - Creating and using custom functions in PyFlink

#### Utilities
- [**Data Generator**](./python/data-generator) - Python script for generating sample data to Kinesis Data Streams
- [**Local Development on Apple Silicon**](./python/LocalDevelopmentOnAppleSilicon) - Setup guide for local development of Flink 1.15 on Apple Silicon Macs (not required with Flink 1.18 or later)


### Scala Examples

#### Getting Started
- [**Getting Started - DataStream API**](./scala/GettingStarted) - Skeleton project for a basic Flink Scala application using DataStream API

### Infrastructure & Operations

- [**Auto Scaling**](./infrastructure/AutoScaling) - Custom autoscaler for Amazon Managed Service for Apache Flink
- [**Scheduled Scaling**](./infrastructure/ScheduledScaling) - Scale applications up and down based on daily time schedules
- [**Monitoring**](./infrastructure/monitoring) - Extended CloudWatch Dashboard examples for monitoring applications
- [**Scripts**](./infrastructure/scripts) - Useful shell scripts for interacting with Amazon Managed Service for Apache Flink control plane API

---

## Contributing

See [Contributing Guidelines](CONTRIBUTING.md#security-issue-notifications) for more information.

## License Summary

Expand Down
5 changes: 5 additions & 0 deletions infrastructure/AutoScaling/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
node_modules
cdk.out
/cdk.context.json
Autoscaler-*.yaml
/.DS_Store
Loading
Loading