All notable changes to the codebase are documented in this file.
- Added
is_optional
parameter to theLazyNotebook.inputs
and input_table. This enhancement allows to make theinput_table
optional, preventing it from raising an exception when the specifiedfile_path
is missing.
- Added
dtflw.storage.fs.FileStorageBase.write_table
method. - Added a demo project showing how to abstract from a specific storage in notebooks.
- Fixed: some functions of
dtflw.databricks
used to catch and filter forjava.util.NoSuchElementException
. Filtering has been removed since it may be a different exception class.
- Improve validation, skip reading output table if table expected_columns are not set.
- Fixed bug with defining params for the act method of the notebook plugin.
- Add extras_require section in
setup.py
for the extra packages.
- Fixed bug with multiple NotebookPlugins installed.
- Added
dtflw.storage.DbfsStorage
. - Added a demo project. See
demos/dtflw_intro
.
- Prepared the package for PyPi.
- Added
dtflw.databricks.is_job_interactive
.
- Added
LazyNotebook.share_arguments
. - Removed
dtflw.events
.
- Added functions
init_args
,init_inputs
andinit_outputs
for initializing arguments in a callee notebook usingdbutils.widgets
API. - Updated
README.md
.
- Module
dtflw.io.storage
renamed todtflw.storage.fs
.
We open source
dtflw
framework to share our experience of building Databricks data pipelines.
We think that it might be found useful and inspiring for others and, we hope that it will serve them well.Its initial version is
0.1.0
denoting (major digit is0
) that its public API may still change any time and should not be considered stable.
- Added verbosity control for
DefaultLogger
.
- Added
dtflw.display.DefaultDisplay
service for interacting with a user in a notebook.
- Added
dtflw.init_flow
anddtflw.io.azure.init_storage
factory functions. - Updated the README and documents.
- Replaced
Runtime
withPipelineState
. RenamedLazyNotebook.collect_args
toLazyNotebook.collect_arguments
.
- Added
flow
,plugin
,flow_context
,input_table
,output_table
,lazy_notebook
.
- Refactored the unit tests to remove a need to run a cluster and to rely on an Azure blob container.
- Added
assertions
,databricks
,events
,logger
,runtime
,tables_repo
.
- Removed version restriction on
setuptools
.
- Added info to
setup.py
.
- Added a new function
dtflw.databricks.get_path_relative_to_project_dir
.
- Initialized the
dtflw
repo. - Added
dtflw.io
. - Set up unittest and a build definition.