You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The homogeneous NumPy and SciPy data objects currently expected are most efficient to process for most operations. Extensive work would also be needed to support Pandas categorical types. Restricting input to homogeneous types therefore reduces maintenance cost and encourages usage of efficient data structures.
Note however that ColumnTransformer makes it convenient to handle heterogeneous pandas dataframes by mapping homogeneous subsets of dataframe columns selected by name or dtype to dedicated scikit-learn transformers. Therefore ColumnTransformer are often used in the first step of scikit-learn pipelines when dealing with heterogeneous dataframes (see Pipeline: chaining estimators for more details).
Our FAQ is not up to date when it comes to pandas,
As of version 1.2 we have pandas-in-pandas-out, see https://scikit-learn.org/1.4/auto_examples/release_highlights/plot_release_highlights_1_2_0.html#pandas-output-with-set-output-api according to SLEP018.
Also, https://scikit-learn.org/dev/install.html mentions pandas purpose:
We should document this better.
The text was updated successfully, but these errors were encountered: