Releases: dlt-hub/dlt
0.2.0a20
What's Changed
- groups all config and secret values used by all sources from a single python module in a single section (https://github.com/dlt-hub/dlt/blob/devel/docs/technical/secrets_and_config.md#the-sources-section)
- allows to provide alternatives of credentials for sources with multiple auth methods (https://github.com/dlt-hub/dlt/blob/devel/docs/technical/secrets_and_config.md#working-with-alternatives-of-credentials-union-types)
- gives write access to schemas in the source decorated function to allow to generate/modify settings and tables (https://github.com/dlt-hub/dlt/blob/devel/docs/technical/working_with_schemas.md#schema-is-modified-in-the-source-function-body)
- all generators are closed if there's an exception in the pipe during extraction
Full Changelog: 0.2.0a19...0.2.0a20
0.2.0a19
What's Changed
- update 2023-02-16-dlthub-mission.md by @matthauskrzykowski in #131
- adding blog + mission post stub by @TyDunn in #130
- json/normalizer performance improvements and bugfixes by @rudolfix in #133
- adds technical documentation for contributors by @rudolfix in #139
New Contributors
- @matthauskrzykowski made their first contribution in #131
dlt
library changes
orjson
is used instead ofsimplejson
by default (with a fallback for pypy)spawn
method now supported to run normalizer stage in parallel (so works on MacOS and Windows)- indexes are not created by default on
duckdb
(inserting slow), but enabled inpostgres
- bumped
duckdb
to 0.7.0 and newestdbt-duckdb
- bugfixes and additional tests
Full Changelog: 0.2.0a18...0.2.0a19
0.2.0a18
What's Changed
- small improvements to duckdb docs by @TyDunn in #128
- Fix current columns and new columns referencing the same object by @steinitzu in #127
- tests file rotation on schema changes + bumps duckdb to 0.7 by @rudolfix in #129
New Contributors
- @steinitzu made their first contribution in #127
0.2.0a17
What's Changed
-
adds duckdb destination and dbt support by @rudolfix in #124
The 🦆db
destination is added and may be used like any other destination. The multithreaded loading is quite fast thanks toduckdb
dropping the GIL when called. See more in our docs
We also support thedbt-duckdb
adapter, the jaffle shop example was converted toduckdb
to showcase this ability -
transaction support was added to
sql_client
and Big Query got multi-statement transactions via Sessions
Full Changelog: 0.2.0a16...0.2.0a17
0.2.0a16
What's Changed
- uses structured data types to store json by @rudolfix in #121
- brings back the functionality to run dbt packages by @rudolfix in #122
This update changes how the complex
data type is being stored. Most often complex
data type is generated to hold json
data when nesting_level
is limited in the source. The complex types are stored as JSONB (postgres), SUPER (redshift), JSON (BigQuery)
This update also allows to easily run dbt
packages together with dlt
pipelines. Most of the work went into making it user friendly and avoiding any dependency conflicts. This feature is also quite well tested. Take a look at those two examples:
https://github.com/dlt-hub/dlt/blob/devel/docs/examples/dbt_run_jaffle.py
https://github.com/dlt-hub/dlt/blob/devel/docs/examples/chess/chess_dbt.py
https://github.com/dlt-hub/dlt/tree/devel/docs/examples/chess/dbt_transform
Runing the dbt
package is just two lines and you have more control over it that with the CLI
0.1.0a15
What's Changed
- improving docs UX by @TyDunn in #110
- docstings and cli improvements by @rudolfix in #114
- allows hash mismatch when loading schema that needs migration
- makes full_refresh flag configurable
- fixes config/secrets injection when union with base configuration is used, fixes secret hint detection
- properly restores pipeline state when extract step fails
0.2.0a10
What's Changed
- only specific default config values are generated in
dlt init
ie BigQuery Location - correct postgres port generated
- sends traces to sentry if
RUNTIME__SENTRY_DSN
is present - sends slack notification if
RUNTIME__SLACK_INCOMING_HOOK
is present
0.2.0a9
0.2.0a8
What's Changed
- deletes all pipeline state (schemas, state, intermediate files) if destination dataset is dropped
- synchronizes state with the destination in the
run
method - you can opt out from state sync with
restore_from_destination=false
in ie.config.toml
- loads all schemas/source into single dataset by default. this will simplify experience for less advanced users. you can switch back to the old behavior (each source/schema has separate dataset) with
use_single_dataset=false
config option - enables CTRL-C when running user code
- commits all files extracted from several sources after all user code ran
0.2.0a7
What's Changed
- adds
max_table_nesting
argument todlt.source
to control the depth of parent-child table nesting - fixes the pipeline_name when runtime configuration is embedded
- reacts to signals (ie CTRL-C) during extraction and other signal improvements
- passes github vars to loggers/tracers