Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
kupferk committed Oct 7, 2022
2 parents db59d45 + f83c366 commit 3dba2aa
Show file tree
Hide file tree
Showing 305 changed files with 28,203 additions and 42,255 deletions.
4 changes: 2 additions & 2 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ build-hadoop2.7-spark3.2:

build-hadoop3.3-spark3.2:
stage: build
script: 'mvn ${MAVEN_CLI_OPTS} clean package -Phadoop-3.3 -Pspark-3.2 -Ddockerfile.skip'
script: 'mvn ${MAVEN_CLI_OPTS} clean package -Phadoop-3.3 -Pspark-3.2 -Dhadoop.version=3.3.1 -Ddockerfile.skip'
artifacts:
name: "flowman-dist-hadoop3.3-spark3.2"
paths:
Expand All @@ -155,7 +155,7 @@ build-hadoop2.7-spark3.3:

build-hadoop3.3-spark3.3:
stage: build
script: 'mvn ${MAVEN_CLI_OPTS} clean package -Phadoop-3.3 -Pspark-3.3 -Ddockerfile.skip'
script: 'mvn ${MAVEN_CLI_OPTS} clean package -Phadoop-3.3 -Pspark-3.3 -Dhadoop.version=3.3.2 -Ddockerfile.skip'
artifacts:
name: "flowman-dist-hadoop3.3-spark3.3"
paths:
Expand Down
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
# Version 0.28.0

* Improve support for MariaDB / MySQL as data sinks
* github-245: Bump ejs, @vue/cli-plugin-babel, @vue/cli-plugin-eslint and @vue/cli-service in /flowman-studio-ui
* github-246: Bump ejs, @vue/cli-plugin-babel, @vue/cli-plugin-eslint and @vue/cli-service in /flowman-server-ui
* github-247: Automatically generate YAML schemas as part of build process
* github-248: Bump scss-tokenizer and node-sass in /flowman-server-u
* github-249: Add new options -X and -XX to increase logging
* github-251: Support for log4j2 Configuration
* github-252: Move sftp target into separate plugin
* github-253: SQL Server relation should support explicit staging table
* github-254: Use DATETIME2 for timestamps in MS SQL Server
* github-256: Provide Maven archetype for simple Flowman projects
* github-258: Support clustered indexes in MS SQL Server


# Version 0.27.0 - 2022-09-09

* github-232: [BUG] Column descriptions should be propagated in UNIONs
Expand Down
24 changes: 12 additions & 12 deletions build-release.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,22 +31,22 @@ build_profile() {


export JAVA_HOME=/usr/lib/jvm/java-1.8.0
build_profile -phadoop-2.6 -pspark-2.4
build_profile -phadoop-2.7 -pspark-2.4
build_profile -Phadoop-2.6 -Pspark-2.4
build_profile -Phadoop-2.7 -Pspark-2.4

export JAVA_HOME=
build_profile -phadoop-2.7 -pspark-3.0
build_profile -phadoop-3.2 -pspark-3.0
build_profile -phadoop-2.7 -pspark-3.1
build_profile -phadoop-3.2 -pspark-3.1
build_profile -phadoop-2.7 -pspark-3.2
build_profile -phadoop-3.3 -pspark-3.2 -Dhadoop.version=3.3.1
build_profile -phadoop-2.7 -pspark-3.3
build_profile -phadoop-3.3 -pspark-3.3 -Dhadoop.version=3.3.2
build_profile -Phadoop-2.7 -Pspark-3.0
build_profile -Phadoop-3.2 -Pspark-3.0
build_profile -Phadoop-2.7 -Pspark-3.1
build_profile -Phadoop-3.2 -Pspark-3.1
build_profile -Phadoop-2.7 -Pspark-3.2
build_profile -Phadoop-3.3 -Pspark-3.2 -Dhadoop.version=3.3.1
build_profile -Phadoop-2.7 -Pspark-3.3
build_profile -Phadoop-3.3 -Pspark-3.3 -Dhadoop.version=3.3.2

export JAVA_HOME=/usr/lib/jvm/java-1.8.0
build_profile -pCDH-6.3
build_profile -pCDP-7.1
build_profile -PCDH-6.3
build_profile -PCDP-7.1

# Finally build default version
export JAVA_HOME=
Expand Down
2 changes: 1 addition & 1 deletion docker/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
<parent>
<groupId>com.dimajix.flowman</groupId>
<artifactId>flowman-root</artifactId>
<version>0.27.0</version>
<version>0.28.0</version>
<relativePath>../pom.xml</relativePath>
</parent>

Expand Down
149 changes: 0 additions & 149 deletions docs/cli/flowexec.md

This file was deleted.

47 changes: 47 additions & 0 deletions docs/cli/flowexec/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Flowman Executor (flowexec)

`flowexec` is the primary tool for running a whole project, for building individual targets
or for inspecting individual entities.

### General Parameters
* `-h` displays help
* `-f <project_directory>` specifies a different directory than the current for locating a Flowman project
* `-P <profile_name>` activates a profile as being defined in the Flowman project
* `-D <key>=<value>` Sets a environment variable
* `--conf <key>=<value>` Sets a Flowman or Spark configuration variable
* `--info` Dumps the active configuration to the console
* `--spark-logging <level>` Sets the log level for Spark
* `--spark-master <master>` Explicitly sets the address of the Spark master
* `--spark-name <application_name>` Sets the Spark application name
* `-X` or `--verbose` Enables logging at more verbose level
* `-XX` or `--debug` Enables logging at debug level


### Exit Codes

`flowexec` provides different exit codes depending on the result of the execution

| exit code | description |
|-----------|--------------------------------------------------------------------------------|
| 0 | Everything worked out nicely, no error. This includes skipped |
| 2 | There were individual errors, but the run was successful (Success with Errors) |
| 3 | There were execution errors |
| 4 | The command line was not correct |
| 5 | An uncaught exception occurred |


## Commands

All commands for `flowexec` are organized in *command groups*, for example project commands, job commands, target
commands and so on. Please find an overview with links to the detailed documentation below:


```eval_rst
.. toctree::
:maxdepth: 1
project
job
target
misc
```
81 changes: 81 additions & 0 deletions docs/cli/flowexec/job.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Job Commands
THis command group operates on the level of individual jobs with different names than `main`.


## `list` - List all Jobs
The following command will list all jobs defined in a project
```shell
flowexec job list
```


## `validate|create|build|verify|truncate|destroy` - Execute Job Phase
This set of commands is used for *executing a job phase*, or a complete lifecycle containing multiple individual
phases.
```shell
flowexec job <validate|create|build|verify|truncate|destroy> <job_name> <args>
```
This will execute the whole job by executing the desired lifecycle for the `main` job. The `<args>` parameter
refers to the parameters as defined in a job. For example the following job defines one parameter `processing_date`
which needs to be specified on the command line.
```yaml
jobs:
main:
description: "Processes all outputs"
parameters:
- name: processing_date
type: string
targets:
- some_hive_table
- some_files
```
Additional parameters can be specified before or after `<args>` and are as follows:
* `-h` displays help
* `-f` or `--force` force execution of all targets in the job, even if Flowman considers the targets to be clean.
* `-t` or `--targets` explicitly specify targets to be executed. The targets can be specified as regular expressions
* `-d` or `--dirty` explicitly mark individual targets as being dirty, i.e. they need a rebuild. The targets can be
specified as regular expressions. The difference between `-d` and `-t` is that while `-t` tells Flowman to only rebuild
the specified targets if they are dirty, `-d` actually taints specific targets as being dirty, i.e. they need a rebuild.
The difference between `-f` and `-d` is that `-f` marks *all* targets as being dirty, while you can explicitly select
individual targets with `-d`.
* `-k` or `--keep-going` proceed with execution, in case of errors.
* `-j <n>` runs multiple job instances in parallel. This is very useful for running a job for a whole range of dates.
* `--dry-run` only simulate execution
* `-nl` or `--no-lifecycle` only execute the specified lifecycle phase, without all preceeding phases. For example
the whole lifecycle for `verify` includes the phases `create` and `build` and these phases would be executed before
`verify`. If this is not what you want, then use the option `-nl`


### Examples
In order to forcibly build (i.e. run `VALIDATE`, `CREATE` and `BUILD` execution phases) the `main` job of a project
stored in the subdirectory `examples/weather` which defines an (optional) parameter `year`, simply run

```shell
flowexec -f examples/weather job build main year=2018 --force
```

If you only want to execute the `BUILD` phase and skip the first two other phases, then you need to add the
command line option `-nl` to skip the lifecycle:

```shell
flowexec -f examples/weather job build main year=2018 -nl
```

The following example will only execute the `BUILD` phase of the job `daily`, which defines a parameter
`processing_datetime` with type datetime. The job will be executed for the whole date range from 2021-06-01 until
2021-08-10 with a step size of one day. Flowman will execute up to four jobs in parallel (`-j 4`).

```shell
flowexec job build daily processing_datetime:start=2021-06-01T00:00 processing_datetime:end=2021-08-10T00:00 processing_datetime:step=P1D --target parquet_lineitem --no-lifecycle -j 4
```



## `job inspect` - Retrieving general information
The `job inspect` commands provides some general information on an individual job, for example the list of all targets
within the job, parameters and environment variables.

The following example inspects the job `main`:
```shell
flowexec -f examples/weather job inspect main
```
10 changes: 10 additions & 0 deletions docs/cli/flowexec/misc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Miscellaneous Commands


## `info` Command
As a small debugging utility, Flowman also provides an `info` command, which simply shows all environment variables
and configuration settings.
```shell
flowexec info
```

0 comments on commit 3dba2aa

Please sign in to comment.