Example code to pair with Lean Data Engineering with Dagster and DuckDB on the DataJargon Substack
This is a Dagster project scaffolded with dagster project scaffold
.
First, install your Dagster code location as a Python package. By using the --editable flag, pip will install your Python package in "editable mode" so that as you develop, local code changes will automatically apply.
pip install -e ".[dev]"
Then, start the Dagster UI web server:
dagster dev
Open http://localhost:3000 with your browser to see the project.
You can specify new Python dependencies in setup.py
.
Tests are in the lean_data_eng_tests
directory and you can run tests using pytest
. In a real world setting, more tests would be warranted.
pytest lean_data_eng_tests
If you want to enable Dagster Schedules or Sensors for your jobs, the Dagster Daemon process must be running. This is done automatically when you run dagster dev
.
Once your Dagster Daemon is running, you can start turning on schedules and sensors for your jobs.
The easiest way to deploy your Dagster project is to use Dagster Cloud.
Check out the Dagster Cloud Documentation to learn more.
In the case that you a running into an error like Symbol not found: _CFRelease
(this may happen with Conda/ Anaconda environments), you likely have an issue with GRPC.
pip uninstall grpcio
conda install grpcio
When upgrading Dagster to >= 1.5, you may need to address this in a different way. See thread
pip uninstall grpcio
# Add these to your env or .zshrc
export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1
export GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1
export GRPC_PYTHON_LDFLAGS=" -framework CoreFoundation"
3. pip install grpcio --no-binary :all: