Skip to content

Commit

Permalink
Merge pull request #317 from Acxiom/develop
Browse files Browse the repository at this point in the history
Release 1.9.0
  • Loading branch information
dafreels committed Jul 20, 2022
2 parents 2d56553 + f0d8363 commit 57907dd
Show file tree
Hide file tree
Showing 60 changed files with 462 additions and 963 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/develop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
strategy:
matrix:
os: [ ubuntu-latest, macos-latest ]
spark: [ '2.4_2.11', '2.4_2.12', '3.0_2.12', '3.1_2.12' ]
spark: [ '2.4_2.11', '2.4_2.12', '3.0_2.12', '3.1_2.12', '3.2_2.12' ]
runs-on: ${{ matrix.os }}
steps:
- name: Source Checkout
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
strategy:
matrix:
os: [ ubuntu-latest, macos-latest ]
spark: [ '2.4_2.11', '2.4_2.12', '3.0_2.12', '3.1_2.12' ]
spark: [ '2.4_2.11', '2.4_2.12', '3.0_2.12', '3.1_2.12', '3.2_2.12' ]
runs-on: ${{ matrix.os }}
steps:
- name: Source Checkout
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
strategy:
matrix:
os: [ ubuntu-latest ]
spark: [ '2.4_2.11', '2.4_2.12', '3.0_2.12', '3.1_2.12' ]
spark: [ '2.4_2.11', '2.4_2.12', '3.0_2.12', '3.1_2.12', '3.2_2.12' ]
runs-on: ${{ matrix.os }}
steps:
- name: Source Checkout
Expand Down Expand Up @@ -175,7 +175,7 @@ jobs:
path: ~/.m2
key: metalus-build-${{ env.cache-name }}
- name: Remove SNAPSHOT
run: -P spark_3.1-B versions:set -DremoveSnapshot
run: mvn -P spark_3.1 -B versions:set -DremoveSnapshot
- name: Set Metalus Version
run: |
echo 'METALUS_VERSION<<EOF' >> $GITHUB_ENV
Expand Down
3 changes: 2 additions & 1 deletion docs/executions.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@ An execution may process a list of values in parallel by changing the _execution
attribute. The behavior is similar to [fork steps](fork-join.md) within pipelines with the exception that the fork and
join executions will run the provided pipelines. The _forkByValue_ is a mapping string will be applied to the execution
globals in an effort to locate the list which is used to spin up parallel processes. Within the _fork_ execution, the
individual fork value will be assigned to a global named _executionForkValue_. All child executions of the fork will process
individual fork value will be assigned to a global named _executionForkValue_. A second global named _executionForkValueIndex_
will be set which contains the index of the value in the original list. All child executions of the fork will process
in parallel until a join execution (executionType will be join) is reached. The join execution will be executed once and
the output (pipelineParameters and globals) of the parallel executions will be merged into a list. A join execution is
required.
Expand Down
4 changes: 0 additions & 4 deletions docs/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,6 @@
* [S3 Load to Bronze](../metalus-aws/docs/s3loadtobronze.md)
* Extensions
* [S3FileManager](../metalus-aws/docs/s3filemanager.md)
* [Kinesis Pipeline Driver](../metalus-aws/docs/kinesispipelinedriver.md)
* [AWS Secrets Manager Credential Provider](../metalus-aws/docs/awssecretsmanager-credentialprovider.md)
* [Metalus GCP](../metalus-gcp/readme.md)
* Steps
Expand All @@ -74,13 +73,10 @@
* [PubSubSteps](../metalus-gcp/docs/pubsubsteps.md)
* Extensions
* [GCSFileManager](../metalus-gcp/docs/gcsfilemanager.md)
* [PubSub Pipeline Driver](../metalus-gcp/docs/pubsubpipelinedriver.md)
* [GCP Secrets Manager Credential Provider](../metalus-gcp/docs/gcpsecretsmanager-credentialprovider.md)
* [Metalus Kafka](../metalus-kafka/readme.md)
* Steps
* [KafkaSteps](../metalus-kafka/docs/kafkasteps.md)
* Extensions
* [Kafka Pipeline Driver](../metalus-kafka/docs/kafkapipelinedriver.md)
* [Metalus Mongo](../metalus-mongo/readme.md)
* Steps
* [Mongo Steps](../metalus-mongo/docs/mongosteps.md)
Expand Down
9 changes: 0 additions & 9 deletions docs/streaming-utils.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,5 @@
# Streaming Utilities
This utility object provides serveral functions useful for working with Spark Streaming API.

## Create Streaming Context
Two functions are provided to build out a streaming context. Both require a SparkContext, but each differs in how duration
is configured. One function takes a _Duration_ object, while the other takes a string type and string duration.

## Get Duration
Given a string type and string duration, this function will return a _Duration_ object.

## Set Termination State
This function takes streaming context and a parameter map, this function will determine if the context should terminate
after a period of time or wait until the process is killed. If the parameters map has the _terminationPeriod_ key,
then the value will be converted to a long and treated as milliseconds. The context will end at once time has expired.
38 changes: 37 additions & 1 deletion manual_tests/manual-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ validateResult() {
usage() {
echo "manual-tests.sh [OPTIONS]"
echo "--save-metadata -> When true, all metadata generated during the test will be saved to the metadata_templates directory"
echo "--version -> The specific version to test. Allowed versions are: 2.4, 2.4_2.12, 3.0 and 3.1. Defaults to 'all'"
echo "--version -> The specific version to test. Allowed versions are: 2.4, 2.4_2.12, 3.0, 3.1, 3.2 and 3.3. Defaults to 'all'"
}

buildVersion="all"
Expand Down Expand Up @@ -91,6 +91,42 @@ then
validateResult ${?} "Failed Metadata Extractor Test"
fi

# 3.2
if [[ "${buildVersion}" == "3.2" || "${buildVersion}" == "all" ]]
then
echo "Testing Spark 3.2"
mvn -P spark_3.2 clean install
validateResult ${?} "Failed to build project"
manual_tests/spark-test.sh
validateResult ${?} "Failed Spark Test"
manual_tests/metadata-extractor-test.sh $storeMetadata
validateResult ${?} "Failed Metadata Extractor Test"
fi

# 3.3 TODO Add || "${buildVersion}" == "all" once delta lake support Spark 3.3
if [[ "${buildVersion}" == "3.3" ]]
then
echo "Testing Spark 3.3"
mvn -P spark_3.2 clean install
validateResult ${?} "Failed to build project"
manual_tests/spark-test.sh
validateResult ${?} "Failed Spark Test"
manual_tests/metadata-extractor-test.sh $storeMetadata
validateResult ${?} "Failed Metadata Extractor Test"
fi

# 3.2 Scala 2.13 TODO Some libraries do not support scala 2.13 like scalamock
#if [[ "${buildVersion}" == "3.2_2.13" || "${buildVersion}" == "all" ]]
#then
# echo "Testing Spark 3.2"
# mvn -P spark_3.2,scala_2.13 clean install
# validateResult ${?} "Failed to build project"
# manual_tests/spark-test.sh
# validateResult ${?} "Failed Spark Test"
# manual_tests/metadata-extractor-test.sh $storeMetadata
# validateResult ${?} "Failed Metadata Extractor Test"
#fi

# Set the version back to the original
version=`mvn -P spark_3.1 -q -Dexec.executable='echo' -Dexec.args='${project.version}' --non-recursive exec:exec`
mvn -P spark_3.1 versions:set -DnewVersion="${version}-SNAPSHOT"
50 changes: 41 additions & 9 deletions manual_tests/spark-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,15 @@ jarFiles=""
echo "Testing Spark ${sparkCompat} Scala ${scalaCompat}"
if [[ "${sparkCompat}" == "2.4" ]]
then
if [[ ! -f $serversDir/spark-2.4.7-bin-hadoop2.7.tgz ]]
if [[ ! -f $serversDir/spark-2.4.8-bin-hadoop2.7.tgz ]]
then
echo "Downloading 2.4 Spark"
curl -L https://downloads.apache.org/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.7.tgz > $serversDir/spark-2.4.7-bin-hadoop2.7.tgz
curl -L https://downloads.apache.org/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz > $serversDir/spark-2.4.8-bin-hadoop2.7.tgz
curl -L https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.11/2.4.2/mongo-spark-connector_2.11-2.4.2.jar > $serversDir/mongo-spark-connector_2.11-2.4.2.jar
curl -L https://repo1.maven.org/maven2/org/mongodb/mongo-java-driver/3.12.7/mongo-java-driver-3.12.7.jar > $serversDir/mongo-java-driver-3.12.7.jar
tar xf $serversDir/spark-2.4.7-bin-hadoop2.7.tgz --directory $serversDir
tar xf $serversDir/spark-2.4.8-bin-hadoop2.7.tgz --directory $serversDir
fi
sparkDir="${serversDir}/spark-2.4.7-bin-hadoop2.7"
sparkDir="${serversDir}/spark-2.4.8-bin-hadoop2.7"
SPARK_HOME=$sparkDir
jarFiles="${serversDir}/mongo-spark-connector_2.11-2.4.2.jar,${serversDir}/mongo-java-driver-3.12.7.jar,"
fi
Expand All @@ -40,23 +40,23 @@ then
if [[ ! -f $serversDir/spark-3.0.2-bin-hadoop2.7.tgz ]]
then
echo "Downloading 3.0 Spark"
curl -L https://downloads.apache.org/spark/spark-3.0.2/spark-3.0.2-bin-hadoop2.7.tgz > $serversDir/spark-3.0.2-bin-hadoop2.7.tgz
curl -L https://downloads.apache.org/spark/spark-3.0.3/spark-3.0.3-bin-hadoop2.7.tgz > $serversDir/spark-3.0.3-bin-hadoop2.7.tgz
curl -L https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.12/3.0.1/mongo-spark-connector_2.12-3.0.1.jar > $serversDir/mongo-spark-connector_2.12-3.0.1.jar
curl -L https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-core/4.0.5/mongodb-driver-core-4.0.5.jar > $serversDir/mongodb-driver-core-4.0.5.jar
curl -L https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-sync/4.0.5/mongodb-driver-sync-4.0.5.jar > $serversDir/mongodb-driver-sync-4.0.5.jar
curl -L https://repo1.maven.org/maven2/org/mongodb/bson/4.0.5/bson-4.0.5.jar > $serversDir/bson-4.0.5.jar
tar xf $serversDir/spark-3.0.2-bin-hadoop2.7.tgz --directory $serversDir
tar xf $serversDir/spark-3.0.3-bin-hadoop2.7.tgz --directory $serversDir
fi
sparkDir="${serversDir}/spark-3.0.2-bin-hadoop2.7"
sparkDir="${serversDir}/spark-3.0.3-bin-hadoop2.7"
SPARK_HOME=$sparkDir
jarFiles="${serversDir}/mongo-spark-connector_2.12-3.0.1.jar,${serversDir}/mongodb-driver-sync-4.0.5.jar,${serversDir}/mongodb-driver-core-4.0.5.jar,${serversDir}/bson-4.0.5.jar,"
fi
if [[ "${sparkCompat}" == "3.1" ]]
then
if [[ ! -f $serversDir/spark-3.1.3-bin-hadoop2.7.tgz ]]
then
echo "Downloading 3.0 Spark"
curl -L https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.3-bin-hadoop2.7.tgz > $serversDir/spark-3.1.3-bin-hadoop2.7.tgz
echo "Downloading 3.1 Spark"
curl -L https://downloads.apache.org/spark/spark-3.1.3/spark-3.1.3-bin-hadoop2.7.tgz > $serversDir/spark-3.1.3-bin-hadoop2.7.tgz
curl -L https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.12/3.0.1/mongo-spark-connector_2.12-3.0.1.jar > $serversDir/mongo-spark-connector_2.12-3.0.1.jar
curl -L https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-core/4.0.5/mongodb-driver-core-4.0.5.jar > $serversDir/mongodb-driver-core-4.0.5.jar
curl -L https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-sync/4.0.5/mongodb-driver-sync-4.0.5.jar > $serversDir/mongodb-driver-sync-4.0.5.jar
Expand All @@ -67,6 +67,38 @@ then
SPARK_HOME=$sparkDir
jarFiles="${serversDir}/mongo-spark-connector_2.12-3.0.1.jar,${serversDir}/mongodb-driver-sync-4.0.5.jar,${serversDir}/mongodb-driver-core-4.0.5.jar,${serversDir}/bson-4.0.5.jar,"
fi
if [[ "${sparkCompat}" == "3.2" ]]
then
if [[ ! -f $serversDir/spark-3.2.1-bin-hadoop2.7.tgz ]]
then
echo "Downloading 3.2 Spark"
curl -L https://dlcdn.apache.org/spark/spark-3.2.1/spark-3.2.1-bin-hadoop2.7.tgz > $serversDir/spark-3.2.1-bin-hadoop2.7.tgz
curl -L https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.12/3.0.2/mongo-spark-connector_2.12-3.0.2.jar > $serversDir/mongo-spark-connector_2.12-3.0.2.jar
curl -L https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-core/4.0.5/mongodb-driver-core-4.0.5.jar > $serversDir/mongodb-driver-core-4.0.5.jar
curl -L https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-sync/4.0.5/mongodb-driver-sync-4.0.5.jar > $serversDir/mongodb-driver-sync-4.0.5.jar
curl -L https://repo1.maven.org/maven2/org/mongodb/bson/4.0.5/bson-4.0.5.jar > $serversDir/bson-4.0.5.jar
tar xf $serversDir/spark-3.2.1-bin-hadoop2.7.tgz --directory $serversDir
fi
sparkDir="${serversDir}/spark-3.2.1-bin-hadoop2.7"
SPARK_HOME=$sparkDir
jarFiles="${serversDir}/mongo-spark-connector_2.12-3.0.2.jar,${serversDir}/mongodb-driver-sync-4.0.5.jar,${serversDir}/mongodb-driver-core-4.0.5.jar,${serversDir}/bson-4.0.5.jar,"
fi
if [[ "${sparkCompat}" == "3.3" ]]
then
if [[ ! -f $serversDir/spark-3.3.0-bin-hadoop2.tgz ]]
then
echo "Downloading 3.3 Spark"
curl -L https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz > $serversDir/spark-3.3.0-bin-hadoop2.tgz
curl -L https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.12/3.0.2/mongo-spark-connector_2.12-3.0.2.jar > $serversDir/mongo-spark-connector_2.12-3.0.2.jar
curl -L https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-core/4.0.5/mongodb-driver-core-4.0.5.jar > $serversDir/mongodb-driver-core-4.0.5.jar
curl -L https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-sync/4.0.5/mongodb-driver-sync-4.0.5.jar > $serversDir/mongodb-driver-sync-4.0.5.jar
curl -L https://repo1.maven.org/maven2/org/mongodb/bson/4.0.5/bson-4.0.5.jar > $serversDir/bson-4.0.5.jar
tar xf $serversDir/spark-3.3.0-bin-hadoop2.tgz --directory $serversDir
fi
sparkDir="${serversDir}/spark-3.3.0-bin-hadoop2"
SPARK_HOME=$sparkDir
jarFiles="${serversDir}/mongo-spark-connector_2.12-3.0.2.jar,${serversDir}/mongodb-driver-sync-4.0.5.jar,${serversDir}/mongodb-driver-core-4.0.5.jar,${serversDir}/bson-4.0.5.jar,"
fi
cd ..
# Startup Mongo
mkdir -p $tmpDir/data
Expand Down
2 changes: 1 addition & 1 deletion manual_tests/testData/metalus-common/pipelines.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion manual_tests/testData/metalus-common/steps.json

Large diffs are not rendered by default.

3 changes: 1 addition & 2 deletions metalus-application/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
<parent>
<groupId>com.acxiom</groupId>
<artifactId>metalus</artifactId>
<version>1.8.7-SNAPSHOT</version>
<version>1.9.0-SNAPSHOT</version>
</parent>
<dependencies>
<dependency>
Expand Down Expand Up @@ -41,7 +41,6 @@
<exclude>log4j:log4j:jar:</exclude>
<exclude>org.slf4j:*:jar:</exclude>
<exclude>org.scala-lang:*:jar:</exclude>
<exclude>org.scala-lang.modules:*:jar:</exclude>
<exclude>org.apache.spark:jar:</exclude>
<exclude>org.scalamock:jar:</exclude>
<exclude>org.scalatest:jar:</exclude>
Expand Down
35 changes: 0 additions & 35 deletions metalus-aws/docs/kinesispipelinedriver.md

This file was deleted.

2 changes: 1 addition & 1 deletion metalus-aws/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
<parent>
<groupId>com.acxiom</groupId>
<artifactId>metalus</artifactId>
<version>1.8.7-SNAPSHOT</version>
<version>1.9.0-SNAPSHOT</version>
</parent>

<properties>
Expand Down
47 changes: 26 additions & 21 deletions metalus-aws/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,26 +6,32 @@ is compiled against specific versions of the AWS Java SDK libraries based on the
column shows the different scopes that can be used when running the [DependencyManager](../docs/dependency-manager.md).
Only one of the scopes is needed.

|Spark|Library|Version|Scopes|
------|-------|-------|------|
|2.4|spark-streaming-kinesis-asl_2.11|2.4.6|extraction,stream
|2.4|amazon-kinesis-client|1.12.0|extraction,stream
|2.4|aws-java-sdk-secretsmanager|1.11.595|extraction,secretsmanager
|2.4|aws-java-sdk-s3|1.11.595|extraction,s3
|2.4|aws-java-sdk-core|1.11.595|extraction,sdk
|2.4|hadoop-aws|2.7.7|extraction,s3a
|3.0|spark-streaming-kinesis-asl_2.12|3.0.0|extraction,stream
|3.0|amazon-kinesis-client|1.12.0|extraction,stream
|3.0|aws-java-sdk-secretsmanager|1.11.655|extraction,secretsmanager
|3.0|aws-java-sdk-s3|1.11.655|extraction,s3
|3.0|aws-java-sdk-core|1.11.655|extraction,sdk
|3.0|hadoop-aws|2.7.7|extraction,s3a
|3.1|spark-streaming-kinesis-asl_2.12|3.0.0|extraction,stream
|3.1|amazon-kinesis-client|1.12.0|extraction,stream
|3.1|aws-java-sdk-secretsmanager|1.11.655|extraction,secretsmanager
|3.1|aws-java-sdk-s3|1.11.655|extraction,s3
|3.1|aws-java-sdk-core|1.11.655|extraction,sdk
|3.1|hadoop-aws|2.7.7|extraction,s3a
| Spark |Library| Version |Scopes|
-------|-------|----------|------|
| 2.4 |spark-streaming-kinesis-asl_2.11| 2.4.6 |extraction,stream
| 2.4 |amazon-kinesis-client| 1.12.0 |extraction,stream
| 2.4 |aws-java-sdk-secretsmanager| 1.11.595 |extraction,secretsmanager
| 2.4 |aws-java-sdk-s3| 1.11.595 |extraction,s3
| 2.4 |aws-java-sdk-core| 1.11.595 |extraction,sdk
| 2.4 |hadoop-aws| 2.7.7 |extraction,s3a
| 3.0 |spark-streaming-kinesis-asl_2.12| 3.0.0 |extraction,stream
| 3.0 |amazon-kinesis-client| 1.12.0 |extraction,stream
| 3.0 |aws-java-sdk-secretsmanager| 1.11.655 |extraction,secretsmanager
| 3.0 |aws-java-sdk-s3| 1.11.655 |extraction,s3
| 3.0 |aws-java-sdk-core| 1.11.655 |extraction,sdk
| 3.0 |hadoop-aws| 2.7.7 |extraction,s3a
| 3.1 |spark-streaming-kinesis-asl_2.12| 3.1.3 |extraction,stream
| 3.1 |amazon-kinesis-client| 1.12.0 |extraction,stream
| 3.1 |aws-java-sdk-secretsmanager| 1.11.655 |extraction,secretsmanager
| 3.1 |aws-java-sdk-s3| 1.11.655 |extraction,s3
| 3.1 |aws-java-sdk-core| 1.11.655 |extraction,sdk
| 3.1 |hadoop-aws| 2.7.7 |extraction,s3a
| 3.2 |spark-streaming-kinesis-asl_2.12| 3.2.1 |extraction,stream
| 3.2 |amazon-kinesis-client| 1.12.0 |extraction,stream
| 3.2 |aws-java-sdk-secretsmanager| 1.11.655 |extraction,secretsmanager
| 3.2 |aws-java-sdk-s3| 1.11.655 |extraction,s3
| 3.2 |aws-java-sdk-core| 1.11.655 |extraction,sdk
| 3.2 |hadoop-aws| 2.7.7 |extraction,s3a

## Step Classes
* [S3Steps](docs/s3steps.md)
Expand All @@ -40,5 +46,4 @@ Only one of the scopes is needed.

## Extensions
* [S3FileManager](docs/s3filemanager.md)
* [Kinesis Pipeline Driver](docs/kinesispipelinedriver.md)
* [AWS Secrets Manager Credential Provider](docs/awssecretsmanager-credentialprovider.md)

0 comments on commit 57907dd

Please sign in to comment.