Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discretize Table #143

Closed
lars-reimann opened this issue Apr 1, 2023 · 3 comments · Fixed by #327
Closed

Discretize Table #143

lars-reimann opened this issue Apr 1, 2023 · 3 comments · Fixed by #327
Assignees
Labels
enhancement 💡 New feature or request released Included in a release

Comments

@lars-reimann
Copy link
Member

lars-reimann commented Apr 1, 2023

Is your feature request related to a problem?

Discretization means to replace a continuous variable by a variable that only has a finite amount of values. This is a preprocessing step that we should support.

Desired solution

  • Add a class Discretizer in safeds.data.tabular.transformation that wraps the KBinsDiscretizer of scikit-learn
  • Make the class a subclass of TableTransformer
  • The __init__ should for now only have a parameter number_of_bins to control how many bins are created
  • If number_of_bins is less than 2, raise a ValueError
@lars-reimann lars-reimann added the enhancement 💡 New feature or request label Apr 1, 2023
@robmeth robmeth self-assigned this May 19, 2023
@robmeth robmeth linked a pull request May 26, 2023 that will close this issue
@guenterk
Copy link

guenterk commented Jun 9, 2023

@robmeth : Please add a comment explaining the problem why you marked this as blocked (you mentioned in the final stand up meeting today the failing pandasEqualsTest...).

@Marsmaennchen221
Copy link
Contributor

@robmeth Use the ordinal encoding. This transforms the data and returns the bin index rather than a bin as sparse matrix. This will also resolve the problem with the tests. See #327 (comment)

robmeth added a commit that referenced this issue Jul 7, 2023
Closes #143.

### Summary of Changes

* Added a class `Discretizer` in `safeds.data.tabular.transformation`
that wraps the [`KBinsDiscretizer` of
`scikit-learn`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html)
* Made the class a subclass of `TableTransformer`
* The `__init__` for now only has a parameter `number_of_bins` to
control how many bins are created
* If `number_of_bins` is less than 2, it raises a `ValueError`
lars-reimann pushed a commit that referenced this issue Jul 13, 2023
## [0.15.0](v0.14.0...v0.15.0) (2023-07-13)

### Features

* Add copy method for tables ([#405](#405)) ([72e87f0](72e87f0)), closes [#275](#275)
* add gaussian noise to image ([#430](#430)) ([925a505](925a505)), closes [#381](#381)
* add schema conversions when adding new rows to a table and schema conversion when creating a new table ([#432](#432)) ([6e9ff69](6e9ff69)), closes [#404](#404) [#322](#322) [#127](#127) [#322](#322) [#127](#127)
* add test for empty tables for the method `Table.sort_rows` ([#431](#431)) ([f94b768](f94b768)), closes [#402](#402)
* added color adjustment feature ([#409](#409)) ([2cbee36](2cbee36)), closes [#380](#380)
* added test_repr table tests ([#410](#410)) ([cb77790](cb77790)), closes [#349](#349)
* discretize table ([#327](#327)) ([5e3da8d](5e3da8d)), closes [#143](#143)
* Improve error handling of TaggedTable ([#450](#450)) ([c5da544](c5da544)), closes [#150](#150)
* Maintain tagging in methods inherited from `Table` class ([#332](#332)) ([bc73a6c](bc73a6c)), closes [#58](#58)
* new error class `OutOfBoundsError` ([#438](#438)) ([1f37e4a](1f37e4a)), closes [#262](#262)
* rename several `Table` methods for consistency ([#445](#445)) ([9954986](9954986)), closes [#439](#439)
* suggest similar columns if column gets accessed that doesnt exist ([#385](#385)) ([6a097a4](6a097a4)), closes [#203](#203)

### Bug Fixes

* added the missing ids in parameterized tests ([#412](#412)) ([dab6419](dab6419)), closes [#362](#362)
* don't warn if `Imputer` transforms column without missing values ([#448](#448)) ([f0cb6a5](f0cb6a5))
* Warnings raised by underlying seaborn and numpy libraries  ([#425](#425)) ([c4143af](c4143af)), closes [#357](#357)
@lars-reimann
Copy link
Member Author

🎉 This issue has been resolved in version 0.15.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 💡 New feature or request released Included in a release
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants