Skip to content

Commit

Permalink
Package updates. Small big fixes (#71)
Browse files Browse the repository at this point in the history
* Use the latest versions of Airflow, cwltool and constraints

* Fix unit tests

* Refactored tests

* Not important changes

* Fixin docker-compose to be able to run conformance tests

* Fix bug with ARG in dockerfile

* Fix bug with passing ARG to dockerfile from docker compose

* Not important changes

* Updated logging template to correspond to the latest Airflow

* Update dockerfile to be able to set Ubuntu & Python versions, GUI, UID

Run everything from the airflow user

* Update Airflow to 2.1.4

* Fix confilting in Airflow 2.1.4 constraint

* Fix docker-compose to run tests for custom Ubuntu and Python

* Preparing to release
  • Loading branch information
michael-kotliar committed Sep 21, 2021
1 parent 9c2c852 commit a963b38
Show file tree
Hide file tree
Showing 30 changed files with 1,807 additions and 858 deletions.
69 changes: 37 additions & 32 deletions .travis.yml
@@ -1,12 +1,11 @@
sudo: false
language: python

dist: bionic
os:
- linux

services:
- docker
- docker

python:
- 3.7
Expand All @@ -23,7 +22,7 @@ jobs:
- pip install . --constraint ./packaging/constraints/constraints-$TRAVIS_PYTHON_VERSION.txt
before_script: # to overwrite from the main config
- echo "Skip"
script: ./tests/run_unit_tests.sh
script: ./tests/unit_tests/run_unit_tests.sh
after_success:
- coveralls
deploy:
Expand All @@ -33,41 +32,47 @@ jobs:
secure: Mji1koR4nyt/KgoycpuvgIp9toFVNYaSxUmNY6EVt0pmIpRb/GKbw6TdyfAdtnSAwH3BcSUC/R1hCwyaXfv1GDPFYqv9Yg1MaNHR1clvo8E8KIIPt1JDqPM47lgPQQFFbwB+Cc6uSV0Nn9oDBkhWEPQqV3kI/GJkSUzSs/yjZqR4C+aZxsJzE+VX2ZzeGCD3x4mzhAAWan4MLrdgANaXQVTHhyHIhTp3l109FblYimMvx8HqKotMiM+32mVFxgwf/pMw/N8gDOFXd4VrtlaOqqHpn4VJko+jSNYuAdKn62N2KFKqExyU39ycvU9ngYaU38nmCjJdibRgNyxfdH6LfndS9xzu3KPY64ACLG1i8Ym+57Q7wSJZAb2WF/b8av1RnkKMUGHHYXBzVIGk7Abvuhde0DsV0lr9XsapQn7XySmhdBWYazZTr+AtgIdsx7AmHV1ug6nPp3tIQzW1+YAOf295Puwqbrn+SF3jYw6167jAl5M1a81kxqli1UTsLgpcaTbTD1ofwLn4gP3VuU1f4fKGzhrxl6ybHW+LpO/wkcN2wJDdBbqz5OQIYfshMQEooIODOw1OonmwbY3vcMATuvi7Hz3mIElqpu3TVxH9aoBzcvL1148wPhZF8u87T8nDgsHeUT66I56ILGcZszASolt2Cb6oPZmxg2jgajTREwk=
on:
tags: true

# Still valid, but soon will be deprecated, if we start using Airflow API
- name: DAG with embedded workflow (just one test)
script: cwl-airflow test --suite workflows/tests/conformance_tests.yaml --spin --range 1 --embed
- name: DAG with attached workflow using combined API call (just one test)
script: cwl-airflow test --suite workflows/tests/conformance_tests.yaml --spin --range 1 --combine
- name: DAG with embedded workflow using combined API call (just one test)
script: cwl-airflow test --suite workflows/tests/conformance_tests.yaml --spin --range 1 --embed --combine
- name: Test of `init --upgrade`
before_install:
- mkdir -p ~/airflow/dags
- cp ./tests/data/dags/bam_bedgraph_bigwig_single_old_format.py ~/airflow/dags
- cp ./tests/data/workflows/bam-bedgraph-bigwig-single.cwl ~/airflow/dags
install:
- pip install . --constraint ./packaging/constraints/constraints-$TRAVIS_PYTHON_VERSION.txt
before_script:
- cwl-airflow init --upgrade
- rm -f ~/airflow/dags/bam-bedgraph-bigwig-single.cwl
script: airflow dags list # to check if all DAGs are correct
- name: Test packaging for Ubuntu 18.04, Python 3.6
install:
- ./packaging/portable/ubuntu/pack.sh 18.04 3.6 $TRAVIS_BRANCH
- ls ./packaging/portable/ubuntu/build/
- tar xzf "./packaging/portable/ubuntu/build/python_3.6_with_cwl_airflow_${TRAVIS_BRANCH}_ubuntu_18.04.tar.gz"
before_script:
- ./python3/bin_portable/airflow --help # to generate airflow.cfg
- sed -i'.backup' -e 's/^executor.*/executor = LocalExecutor/g' ~/airflow/airflow.cfg
- sed -i'.backup' -e 's/^parsing_processes.*/parsing_processes = 1/g' ~/airflow/airflow.cfg
- sed -i'.backup' -e 's/^sql_alchemy_pool_enabled.*/sql_alchemy_pool_enabled = False/g' ~/airflow/airflow.cfg
- sed -i'.backup' -e 's/^dag_dir_list_interval =.*/dag_dir_list_interval = 60/g' ~/airflow/airflow.cfg
- sed -i'.backup' -e 's/^parallelism =.*/parallelism = 1/g' ~/airflow/airflow.cfg
- sed -i'.backup' -e 's/^sql_alchemy_conn.*/sql_alchemy_conn = mysql:\/\/airflow:airflow@127.0.0.1:6603\/airflow/g' ~/airflow/airflow.cfg
- ./python3/bin_portable/cwl-airflow init
- ./python3/bin_portable/airflow connections add process_report --conn-type http --conn-host localhost --conn-port 3070 # to add process_report connection
- ./python3/bin_portable/airflow scheduler > /dev/null 2>&1 &
- ./python3/bin_portable/cwl-airflow api --replay 600 > /dev/null 2>&1 &
script: ./python3/bin_portable/cwl-airflow test --suite workflows/tests/conformance_tests.yaml --spin --range 1

# TEST DEPRECATED as it not needed anymore. Everything already got updated.
# - name: Test of `init --upgrade`
# before_install:
# - mkdir -p ~/airflow/dags
# - cp ./tests/data/dags/bam_bedgraph_bigwig_single_old_format.py ~/airflow/dags
# - cp ./tests/data/workflows/bam-bedgraph-bigwig-single.cwl ~/airflow/dags
# install:
# - pip install . --constraint ./packaging/constraints/constraints-$TRAVIS_PYTHON_VERSION.txt
# before_script:
# - cwl-airflow init --upgrade
# - rm -f ~/airflow/dags/bam-bedgraph-bigwig-single.cwl
# script: airflow dags list # to check if all DAGs are correct

# TEST DEPRECATED as the packaging scripts should be replaced
# - name: Test packaging for Ubuntu 18.04, Python 3.6
# install:
# - ./packaging/portable/ubuntu/pack.sh 18.04 3.6 $TRAVIS_BRANCH
# - ls ./packaging/portable/ubuntu/build/
# - tar xzf "./packaging/portable/ubuntu/build/python_3.6_with_cwl_airflow_${TRAVIS_BRANCH}_ubuntu_18.04.tar.gz"
# before_script:
# - ./python3/bin_portable/airflow --help # to generate airflow.cfg
# - sed -i'.backup' -e 's/^executor.*/executor = LocalExecutor/g' ~/airflow/airflow.cfg
# - sed -i'.backup' -e 's/^parsing_processes.*/parsing_processes = 1/g' ~/airflow/airflow.cfg
# - sed -i'.backup' -e 's/^sql_alchemy_pool_enabled.*/sql_alchemy_pool_enabled = False/g' ~/airflow/airflow.cfg
# - sed -i'.backup' -e 's/^dag_dir_list_interval =.*/dag_dir_list_interval = 60/g' ~/airflow/airflow.cfg
# - sed -i'.backup' -e 's/^parallelism =.*/parallelism = 1/g' ~/airflow/airflow.cfg
# - sed -i'.backup' -e 's/^sql_alchemy_conn.*/sql_alchemy_conn = mysql:\/\/airflow:airflow@127.0.0.1:6603\/airflow/g' ~/airflow/airflow.cfg
# - ./python3/bin_portable/cwl-airflow init
# - ./python3/bin_portable/airflow connections add process_report --conn-type http --conn-host localhost --conn-port 3070 # to add process_report connection
# - ./python3/bin_portable/airflow scheduler > /dev/null 2>&1 &
# - ./python3/bin_portable/cwl-airflow api --replay 600 > /dev/null 2>&1 &
# script: ./python3/bin_portable/cwl-airflow test --suite workflows/tests/conformance_tests.yaml --spin --range 1

before_install:
- git clone https://github.com/datirium/workflows.git --recursive
Expand Down
32 changes: 32 additions & 0 deletions README.dev.md
@@ -0,0 +1,32 @@
## **Notes for developers**

When running on MacOS, you might need to set up the following env variable before starting `airflow scheduler/webserver`

```
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
```

**Conformance and unit tests were run for**
- macOS 11.4
- Python 3.8.6
- Ubuntu 18.04
- Python 3.6.8
- Python 3.7.9
- Python 3.8.10
- Ubuntu 20.04
- Python 3.6.8
- Python 3.7.9
- Python 3.8.10

*For Ubuntu the Python versions were selected based on latest available binary release at the time of testing.

**To run conformance tests in Docker container**
```
cd tests
./run_conformance_tests_docker.sh $UBUNTU_VERSION $PYTHON_VERSION $CWL_AIRFLOW_VERSION $REPO_URL $SUITE
```
**To run unit tests in Docker container**
```
cd tests
./run_unit_tests_docker.sh $UBUNTU_VERSION $PYTHON_VERSION $CWL_AIRFLOW_VERSION
```
9 changes: 2 additions & 7 deletions README.md
Expand Up @@ -8,7 +8,7 @@

# **CWL-Airflow**

Python package to extend **[Apache-Airflow 2.0.1](https://airflow.apache.org)**
Python package to extend **[Apache-Airflow 2.1.4](https://airflow.apache.org)**
functionality with **[CWL v1.1](https://www.commonwl.org/v1.1/)** support

## **Cite as**
Expand All @@ -29,9 +29,4 @@ pip install cwl-airflow==1.0.18
Published version [documentation](https://cwl-airflow.readthedocs.io/en/1.0.18/)

## **Notes**

When running on MacOS, you might need to set up the following env variable before starting `airflow scheduler/webserver`

```
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
```
Notes for developers can be found in [DEV README](https://github.com/Barski-lab/cwl-airflow/blob/master/README.dev.md)
2 changes: 1 addition & 1 deletion cwl_airflow/components/init/config.py
Expand Up @@ -15,7 +15,7 @@
with CleanAirflowImport():
from airflow.configuration import conf
from airflow.exceptions import AirflowConfigException
from airflow.utils.dag_processing import list_py_file_paths
from airflow.utils.file import list_py_file_paths
from cwl_airflow.utilities.cwl import overwrite_deprecated_dag


Expand Down
30 changes: 24 additions & 6 deletions cwl_airflow/config_templates/airflow_local_settings.py
Expand Up @@ -17,9 +17,6 @@
# under the License.
"""Airflow logging settings"""

# COPY of /airflow/config_templates/airflow_local_settings.py from Airflow 2.0.0
# with added cwltool logger and handler

import os
from pathlib import Path
from typing import Any, Dict, Union
Expand Down Expand Up @@ -57,6 +54,11 @@

PROCESSOR_FILENAME_TEMPLATE: str = conf.get('logging', 'LOG_PROCESSOR_FILENAME_TEMPLATE')


# COPY of /airflow/config_templates/airflow_local_settings.py from Airflow 2.1.3
# with added cwltool logger and handler


DEFAULT_LOGGING_CONFIG: Dict[str, Any] = {
'version': 1,
'disable_existing_loggers': False,
Expand All @@ -67,29 +69,38 @@
'class': COLORED_FORMATTER_CLASS if COLORED_LOG else 'logging.Formatter',
},
},
'filters': {
'mask_secrets': {
'()': 'airflow.utils.log.secrets_masker.SecretsMasker',
},
},
'handlers': {
'console': {
'class': 'airflow.utils.log.logging_mixin.RedirectStdHandler',
'formatter': 'airflow_coloured',
'stream': 'sys.stdout',
'filters': ['mask_secrets'],
},
'task': {
'class': 'airflow.utils.log.file_task_handler.FileTaskHandler',
'formatter': 'airflow',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
'filename_template': FILENAME_TEMPLATE,
'filters': ['mask_secrets'],
},
'processor': {
'class': 'airflow.utils.log.file_processor_handler.FileProcessorHandler',
'formatter': 'airflow',
'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
'filename_template': PROCESSOR_FILENAME_TEMPLATE,
'filters': ['mask_secrets'],
},
'cwltool': {
'class': 'airflow.utils.log.file_task_handler.FileTaskHandler',
'formatter': 'airflow',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
'filename_template': FILENAME_TEMPLATE + ".cwl"
'filename_template': FILENAME_TEMPLATE + ".cwl",
'filters': ['mask_secrets']
}
},
'loggers': {
Expand All @@ -102,6 +113,7 @@
'handlers': ['task'],
'level': LOG_LEVEL,
'propagate': False,
'filters': ['mask_secrets'],
},
'flask_appbuilder': {
'handler': ['console'],
Expand All @@ -111,12 +123,14 @@
'cwltool': {
'handlers': ['cwltool'],
'level': LOG_LEVEL,
'propagate': False
'propagate': False,
'filters': ['mask_secrets']
}
},
'root': {
'handlers': ['console'],
'level': LOG_LEVEL,
'filters': ['mask_secrets'],
},
}

Expand Down Expand Up @@ -258,6 +272,8 @@
ELASTICSEARCH_WRITE_STDOUT: bool = conf.getboolean('elasticsearch', 'WRITE_STDOUT')
ELASTICSEARCH_JSON_FORMAT: bool = conf.getboolean('elasticsearch', 'JSON_FORMAT')
ELASTICSEARCH_JSON_FIELDS: str = conf.get('elasticsearch', 'JSON_FIELDS')
ELASTICSEARCH_HOST_FIELD: str = conf.get('elasticsearch', 'HOST_FIELD')
ELASTICSEARCH_OFFSET_FIELD: str = conf.get('elasticsearch', 'OFFSET_FIELD')

ELASTIC_REMOTE_HANDLERS: Dict[str, Dict[str, Union[str, bool]]] = {
'task': {
Expand All @@ -272,6 +288,8 @@
'write_stdout': ELASTICSEARCH_WRITE_STDOUT,
'json_format': ELASTICSEARCH_JSON_FORMAT,
'json_fields': ELASTICSEARCH_JSON_FIELDS,
'host_field': ELASTICSEARCH_HOST_FIELD,
'offset_field': ELASTICSEARCH_OFFSET_FIELD,
},
}

Expand All @@ -280,5 +298,5 @@
raise AirflowException(
"Incorrect remote log configuration. Please check the configuration of option 'host' in "
"section 'elasticsearch' if you are using Elasticsearch. In the other case, "
"'remote_base_log_folder' option in 'core' section."
"'remote_base_log_folder' option in the 'logging' section."
)
2 changes: 1 addition & 1 deletion cwl_airflow/utilities/cwl.py
Expand Up @@ -26,7 +26,7 @@
from airflow.utils.state import State
from airflow.utils.db import provide_session
from airflow.exceptions import AirflowConfigException
from airflow.utils.dag_processing import list_py_file_paths
from airflow.utils.file import list_py_file_paths
from airflow.api.common.experimental import delete_dag
from cwltool.argparser import get_default_args
from cwltool.main import (
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Expand Up @@ -19,7 +19,7 @@ Welcome to CWL-Airflow's documentation!
:target: https://pepy.tech/project/cwl-airflow


Python package to extend `Apache-Airflow 2.0.1 <https://airflow.apache.org>`_ functionality with `CWL v1.1 <https://www.commonwl.org/v1.1/>`_ support.
Python package to extend `Apache-Airflow 2.1.4 <https://airflow.apache.org>`_ functionality with `CWL v1.1 <https://www.commonwl.org/v1.1/>`_ support.

.. raw:: html

Expand Down
2 changes: 1 addition & 1 deletion docs/readme/how_to_use.md
Expand Up @@ -21,7 +21,7 @@ optional arguments:

**Init command will run the following steps** for the specified `--home` and `--config` parameters:
- Call `airflow --help` to create a default `airflow.cfg`
- Update `airflow.cfg` to hide paused DAGs, skip loading example DAGs and connections and **do not** pause newly created DAGs. Also, we set our custom `logging_config_class` to split Airflow and CWL related logs into the separate files. In case of upgrading from the previous version of CWL-Airflow that used Airflow < 2.0.0 to the latest one, `airflow.cfg` will be backuped and upgraded to fit Airflow 2.0.1. You will have to manually make sure that all custom fields were properly copied to the new `airflow.cfg`
- Update `airflow.cfg` to hide paused DAGs, skip loading example DAGs and connections and **do not** pause newly created DAGs. Also, we set our custom `logging_config_class` to split Airflow and CWL related logs into the separate files. In case of upgrading from the previous version of CWL-Airflow that used Airflow < 2.0.0 to the latest one, `airflow.cfg` will be backuped and upgraded to fit Airflow 2.1.4. You will have to manually make sure that all custom fields were properly copied to the new `airflow.cfg`
- Call `airflow db init` to init/upgrade Airflow metadata database.
- If run with `--upgrade`, upgrade old CWLDAGs to correspond to the latest format, save original CWLDAGs into `deprecated_dags` folder.
- Put **clean_dag_run.py** into the DAGs folder.
Expand Down

0 comments on commit a963b38

Please sign in to comment.