Waterfall-logging

Waterfall-logging is a Python package to log (distinct) column counts in a DataFrame, export it as a Markdown table and plot a Waterfall statistics figure.

It provides an implementation in Pandas PandasWaterfall and PySpark SparkWaterfall.

Documentation with examples can be found here.

Developed by Louis de Bruijn, https://louisdebruijn.com.

Installation

Install to use

Install Waterfall-logging using PyPi:

pip install waterfall-logging

Install to contribute

git clone https://github.com/LouisdeBruijn/waterfall-logging
python -m pip install -e .

pre-commit install --hook-type pre-commit --hook-type pre-push

Documentation

Documentation can be created via

mkdocs serve

Usage

Instructions are provided in the documentation.

import pandas as pd
from waterfall_logging.log import PandasWaterfall

bicycle_rides = pd.DataFrame(data=[
    ['Shimano', 'race', 28, '2023-02-13', 1],
    ['Gazelle', 'comfort', 31, '2023-02-15', 1],
    ['Shimano', 'race', 31, '2023-02-16', 2],
    ['Batavia', 'comfort', 30, '2023-02-17', 3],
], columns=['brand', 'ride_type', 'wheel_size', 'date', 'bike_id']
)

bicycle_rides_log = PandasWaterfall(table_name='rides', columns=['brand', 'ride_type', 'wheel_size'],
    distinct_columns=['bike_id'])
bicycle_rides_log.log(table=bicycle_rides, reason='Logging initial column values', configuration_flag='')

bicycle_rides = bicycle_rides.loc[lambda row: row['wheel_size'] > 30]
bicycle_rides_log.log(table=bicycle_rides, reason='Remove small wheels',
    configuration_flag='small_wheel=False')

print(bicycle_rides_log.to_markdown())
'''
| Table   |   brand |   Δ brand |   ride_type |   Δ ride_type |   wheel_size |   Δ wheel_size |   bike_id |   Δ bike_id |   Rows |   Δ Rows | Reason                        | Configurations flag   |
|:--------|--------:|----------:|------------:|--------------:|-------------:|---------------:|----------:|------------:|-------:|---------:|:------------------------------|:----------------------|
| rides   |       4 |         0 |           4 |             0 |            4 |              0 |         3 |           0 |      4 |        0 | Logging initial column values |                       |
| rides   |       2 |        -2 |           2 |            -2 |            2 |             -2 |         2 |          -1 |      2 |       -2 | Remove small wheels           | small_wheel=False     |
'''

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
docs		docs
output/tests		output/tests
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Waterfall-logging

Installation

Install to use

Install to contribute

Documentation

Usage

About

Releases

Packages

Languages

License

LouisdeBruijn/waterfall-logging

Folders and files

Latest commit

History

Repository files navigation

Waterfall-logging

Installation

Install to use

Install to contribute

Documentation

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages