Releases: data-describe/data-describe
Releases · data-describe/data-describe
v0.1.0b3
This release contains an overhaul of the data_summary
feature and minor bug fixes.
Changes
Features
- Reworked data summary (see below) @haishiro (#383)
- Added progress bar when fitting topic model @truongc2 (#393)
- Added support for Python 3.6 @haishiro (#369)
Bug Fixes
- Fixed backend recursion bug @haishiro (#396)
- Removed Extra Cell in User Guide @zack-soenen (#394)
- Added kwargs to text preprocessing functions: filter_dictionary, create_doc_term_matrix, and create_tfidf_matrix @truongc2 (#386)
Maintenance
- Disabled checks on draft PRs @truongc2 (#399)
- Updated actions/setup-python requirement to v2.1.4 @dependabot (#421)
- Added workflow dispatch events for manual workflow triggers @haishiro (#423)
- Bumped peaceiris/actions-gh-pages from v3.7.0-8 to v3.7.3 @dependabot (#422)
- Added dependabot @haishiro (#416)
- Added script to rerun notebooks in CI prior to unit tests @truongc2 (#224)
Data Summary
- An additional display (DataFrame) of row count, column count, and size in memory was added
- The orientation of the summary table has been transposed so that the data columns are in rows. The motivation is for this change is that it is intended to scale better on datasets with a large amount of columns.
- Improved the performance of data_summary when using the pandas backend. The prior implementation using pandas .agg() resulted in very long computation times even for small datasets.
- Added
Unique
metric - the number of unique values - Changed the ordering of metrics. The motivation is to present the metrics in a more logical order of inspection.
- Added additional display options:
as_percentage
: Format any count metrics (zeroes
,nulls
,top frequency
) as a percentage over the total row count instead.auto_float
: Attempted to add sensible defaults when displaying floats by avoiding scientific notation and excessive precision. Set this option toFalse
to disable the new formatting.
v0.1.0b2
This patch focuses on addressing errors related to installation of data-describe.
Bug Fixes
- Fixed backend logic when unsupported data types are given @haishiro (#347)
- Updated setup() metadata for PyPI @haishiro (#348)
- Resolved errors when missing IPython and importlib.metadata semi-optional dependencies @haishiro (#346)
- Data Heatmap: Added legend label and moved to object-oriented mpl API @haishiro (#343)
Maintenance
v0.1.0b1
Changes
- Standardized or updated documentation and naming conventions @haishiro (#328)
- Moved backend implementations back into core @haishiro (#306)
- Improved dependency management @haishiro (#302)
Features
Bug Fixes
- Fixed statsmodels being required when it should be optional @haishiro (#340)
- Fixed pyscagnostics being required when should be optional @haishiro (#339)
- Prevented modin import on data-describe import @haishiro (#336)
- Fixed presidio import on data-describe import @haishiro (#334)
- Added random_state default to topic model @haishiro (#313)
- Updated seaborn usage for upcoming 0.12 API @haishiro (#305)
Maintenance
- Added exclude label for Release Drafter @haishiro (#337)
- Disabled creation of alpha docs @haishiro (#326)
- Added local api docs build directory to gitignore @haishiro (#335)
- Enabled pypi release @haishiro (#327)
- Added black to pre-commit checks @haishiro (#318)
- Limited publish of latest docs on relevant paths @haishiro (#316)
- Updated github cache action to v2 @haishiro (#315)
v0.1.0a2
This release includes multiple changes and bugfixes for the alpha testing period.
Changes
- sklearn requirement bumped to 0.23 @haishiro (#279)
- seaborn requirement bumped to 0.11 to use new
displot
function @haishiro (#287) - Documentation and build workflows now trigger on release
published
event instead ofcreated
@haishiro (#304)
Features
- Added more details to example notebooks @haishiro (#282)
Bug Fixes
- Fixed data_summary when a column is entirely null @haishiro (#301)
- Fixed data heatmap ordering @haishiro (#283)
- Fixed correlation matrix style to be more consistent (Resolves #236) @haishiro (#277)
- Fixed link to contributing guide (Fixes #163) @haishiro (#280)
- General improvements to stability @haishiro (#274) (#275) (#273)
- Renamed references to data describe in documentation to be more consistent with branding @haishiro (#259)
Maintenance
v0.1.0a1
v0.1.0a1
First release for private beta testing
New Features
- Clustering
- Correlation
- Data Heatmap
- Data Summary
- Distributions
- Scatter plots
- Feature importance
- Time series analysis
- Text preprocessing
- Topic Modeling
- Sensitive data (privacy)
- Dimensionality Reduction