Skip to content

Releases: data-describe/data-describe

v0.1.0b3

03 Dec 23:03
00467dc
Compare
Choose a tag to compare
v0.1.0b3 Pre-release
Pre-release

This release contains an overhaul of the data_summary feature and minor bug fixes.

Changes

  • Updated the contributing guide @haishiro (#377) (#368)

Features

  • Reworked data summary (see below) @haishiro (#383)
  • Added progress bar when fitting topic model @truongc2 (#393)
  • Added support for Python 3.6 @haishiro (#369)

Bug Fixes

  • Fixed backend recursion bug @haishiro (#396)
  • Removed Extra Cell in User Guide @zack-soenen (#394)
  • Added kwargs to text preprocessing functions: filter_dictionary, create_doc_term_matrix, and create_tfidf_matrix @truongc2 (#386)

Maintenance

  • Disabled checks on draft PRs @truongc2 (#399)
  • Updated actions/setup-python requirement to v2.1.4 @dependabot (#421)
  • Added workflow dispatch events for manual workflow triggers @haishiro (#423)
  • Bumped peaceiris/actions-gh-pages from v3.7.0-8 to v3.7.3 @dependabot (#422)
  • Added dependabot @haishiro (#416)
  • Added script to rerun notebooks in CI prior to unit tests @truongc2 (#224)

Data Summary

image

  1. An additional display (DataFrame) of row count, column count, and size in memory was added
  2. The orientation of the summary table has been transposed so that the data columns are in rows. The motivation is for this change is that it is intended to scale better on datasets with a large amount of columns.
  3. Improved the performance of data_summary when using the pandas backend. The prior implementation using pandas .agg() resulted in very long computation times even for small datasets.
  4. Added Unique metric - the number of unique values
  5. Changed the ordering of metrics. The motivation is to present the metrics in a more logical order of inspection.
  6. Added additional display options:
    • as_percentage: Format any count metrics (zeroes, nulls, top frequency) as a percentage over the total row count instead.
    • auto_float: Attempted to add sensible defaults when displaying floats by avoiding scientific notation and excessive precision. Set this option to False to disable the new formatting.

v0.1.0b2

14 Oct 22:12
794e889
Compare
Choose a tag to compare
v0.1.0b2 Pre-release
Pre-release

This patch focuses on addressing errors related to installation of data-describe.

Bug Fixes

  • Fixed backend logic when unsupported data types are given @haishiro (#347)
  • Updated setup() metadata for PyPI @haishiro (#348)
  • Resolved errors when missing IPython and importlib.metadata semi-optional dependencies @haishiro (#346)
  • Data Heatmap: Added legend label and moved to object-oriented mpl API @haishiro (#343)

Maintenance

  • Updated CI Github Action @haishiro (#355)
  • Added codecov.io for coverage checks @haishiro (#350)

v0.1.0b1

11 Oct 22:45
8798cfc
Compare
Choose a tag to compare
v0.1.0b1 Pre-release
Pre-release

Changes

  • Standardized or updated documentation and naming conventions @haishiro (#328)
  • Moved backend implementations back into core @haishiro (#306)
  • Improved dependency management @haishiro (#302)

Features

  • Cleaned up docker (Resolves #176) @haishiro (#205)

Bug Fixes

  • Fixed statsmodels being required when it should be optional @haishiro (#340)
  • Fixed pyscagnostics being required when should be optional @haishiro (#339)
  • Prevented modin import on data-describe import @haishiro (#336)
  • Fixed presidio import on data-describe import @haishiro (#334)
  • Added random_state default to topic model @haishiro (#313)
  • Updated seaborn usage for upcoming 0.12 API @haishiro (#305)

Maintenance

  • Added exclude label for Release Drafter @haishiro (#337)
  • Disabled creation of alpha docs @haishiro (#326)
  • Added local api docs build directory to gitignore @haishiro (#335)
  • Enabled pypi release @haishiro (#327)
  • Added black to pre-commit checks @haishiro (#318)
  • Limited publish of latest docs on relevant paths @haishiro (#316)
  • Updated github cache action to v2 @haishiro (#315)

v0.1.0a2

28 Sep 03:27
Compare
Choose a tag to compare
v0.1.0a2 Pre-release
Pre-release

This release includes multiple changes and bugfixes for the alpha testing period.

Changes

  • sklearn requirement bumped to 0.23 @haishiro (#279)
  • seaborn requirement bumped to 0.11 to use new displot function @haishiro (#287)
  • Documentation and build workflows now trigger on release published event instead of created @haishiro (#304)

Features

  • Added more details to example notebooks @haishiro (#282)

Bug Fixes

  • Fixed data_summary when a column is entirely null @haishiro (#301)
  • Fixed data heatmap ordering @haishiro (#283)
  • Fixed correlation matrix style to be more consistent (Resolves #236) @haishiro (#277)
  • Fixed link to contributing guide (Fixes #163) @haishiro (#280)
  • General improvements to stability @haishiro (#274) (#275) (#273)
  • Renamed references to data describe in documentation to be more consistent with branding @haishiro (#259)

Maintenance

  • Fixed and improved auto-generated documentation @haishiro (#252)
  • Fixed PyPI release pipeline @haishiro (#253)
  • Added Release Drafter for automated release notes @haishiro (#286)
  • Simplified and updated issue templates @haishiro (#261)

v0.1.0a1

26 Aug 22:04
b590f41
Compare
Choose a tag to compare
v0.1.0a1 Pre-release
Pre-release

v0.1.0a1

First release for private beta testing

New Features

  • Clustering
  • Correlation
  • Data Heatmap
  • Data Summary
  • Distributions
  • Scatter plots
  • Feature importance
  • Time series analysis
  • Text preprocessing
  • Topic Modeling
  • Sensitive data (privacy)
  • Dimensionality Reduction