Skip to content

Setting up your environment

Shahin Saadati edited this page Jul 20, 2022 · 10 revisions

Requirements

General

  • Python >=3.8,<3.10. We currently use 3.8. For more info, see the Cloud Composer version list.
  • Poetry for installing and managing dependencies.
  • gcloud command-line tool with Google Cloud Platform credentials configured. Instructions can be found here.

For building data pipelines

Environment setup

gcloud

Download and install the gcloud CLI tool here. Once installed, go through the initial setup via

gcloud init

Python

We strongly recommend using a Python manager on your machine. If you are a Mac user, one solution is to use Homebrew to install pyenv. Once you have pyenv installed, you can install any python version you want. Follow the instructions here once you install pyenv to properly set up your shell environment.

Dependency management using Poetry

We use Poetry to make environment dependencies more deterministic and uniform across different machines.

If you haven't done so, install Poetry using these instructions. We recommend using poetry's official installer:

# Note: We're currently using their preview branch for features not yet found in their official release.
curl -sSL https://install.python-poetry.org | python3 - --preview

You can activate the new virtual environment, if it is not already activated. You will need to run this command every time you come back to this repo in your console and want to work on things:

poetry shell

Installing dependencies

Dependencies are specified in pyproject.toml and can be installed:

poetry install

Additional setup

Pipeline development: Set up Airflow and run unit tests

After the dependencies are installed, initialize the Airflow database:

poetry run airflow db init

Finally, to ensure you have a proper setup for pipeline development, run the tests:

poetry run python -m pytest -v tests

Doc set development: Set up authentication for Colab

The default development environment for the documentation set is Colab. If you choose to develop doc set content using your local machine, you'll need to authenticate yourself using gcloud:

gcloud auth login