Skip to content

JoeNaso/lean-data-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lean Data Engineering

Example code to pair with Lean Data Engineering with Dagster and DuckDB on the DataJargon Substack

Simple Dagster Asset Graph

Getting started

This is a Dagster project scaffolded with dagster project scaffold.

First, install your Dagster code location as a Python package. By using the --editable flag, pip will install your Python package in "editable mode" so that as you develop, local code changes will automatically apply.

pip install -e ".[dev]"

Then, start the Dagster UI web server:

dagster dev

Open http://localhost:3000 with your browser to see the project.

Development

Adding new Python dependencies

You can specify new Python dependencies in setup.py.

Unit testing

Tests are in the lean_data_eng_tests directory and you can run tests using pytest. In a real world setting, more tests would be warranted.

pytest lean_data_eng_tests

Schedules and sensors

If you want to enable Dagster Schedules or Sensors for your jobs, the Dagster Daemon process must be running. This is done automatically when you run dagster dev.

Once your Dagster Daemon is running, you can start turning on schedules and sensors for your jobs.

Deploy on Dagster Cloud

The easiest way to deploy your Dagster project is to use Dagster Cloud.

Check out the Dagster Cloud Documentation to learn more.

Misc.

In the case that you a running into an error like Symbol not found: _CFRelease (this may happen with Conda/ Anaconda environments), you likely have an issue with GRPC.

pip uninstall grpcio
conda install grpcio

When upgrading Dagster to >= 1.5, you may need to address this in a different way. See thread

pip uninstall grpcio
# Add these to your env or .zshrc
export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1
export GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1
export GRPC_PYTHON_LDFLAGS=" -framework CoreFoundation"
3. pip install grpcio --no-binary :all:

About

Lean Data Engineering with Dagster and DuckDB

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published