Skip to content

Simplify code using skrub TableReport and TableVectorizer #866

@ArturoAmorQ

Description

@ArturoAmorQ
  • Add a notebook + video to show how all the pandas code in the Visual inspection of data subsection can be simplified using skrub.TableReport:
  • Replace ColumnTransformer with skrub.TableVectorizer starting from the Using numerical and categorical variables together notebook
    • In the same notebook, section Fitting a more powerful model, replace OrinalEncoder by skrub.ToCategorical.
    • Explicitly mention that TableVectorizer makes the column selection automatically by using its dtype
    • Introduce concept of "low/high cardinality" and demonstrate effect of cardinality_threshold on the "native-country" column in the Adult Census dataset.
    • Update visualizing scikit-learn pipelines video to use TableVectorizer (with scikit-learn version >= 1.8)
    • Modify wrap-up quizzes that use the Ames Housing dataset i.e. M1, M4 and M5 to select subset of numerical columns with pandas
  • Redo the datasets description using TableReport

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions