### Install the sk-transform package from PyPI 🤖

In [None]:
%%capture
! pip install sk-transformers

# sk-transformers playground 🛝


In this notebook you can try transformers from the [sk-transformers](https://github.com/chrislemke/sk-transformers) package. 

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/chrislemke/sk-transformers/blob/main/examples/playground.ipynb)

## [Datetime transformer](https://chrislemke.github.io/sk-transformers/API-reference/transformer/datetime_transformer/)

### [`DateColumnsTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/datetime_transformer/#sk_transformers.datetime_transformer.DateColumnsTransformer)

Splits a date column into multiple columns.

In [None]:
import pandas as pd
from sk_transformers import DateColumnsTransformer

X = pd.DataFrame({"foo": ["2021-01-01", "2022-02-02", "2023-03-03"]})
transformer = DateColumnsTransformer(["foo"])
transformer.fit_transform(X)

### [`DurationCalculatorTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/datetime_transformer/#sk_transformers.datetime_transformer.DurationCalculatorTransformer)

Calculates the duration between to given dates.

In [None]:
import pandas as pd
from sk_transformers import DurationCalculatorTransformer

X = pd.DataFrame(
    {
        "foo": ["1960-01-01", "1970-01-01", "1990-01-01"],
        "bar": ["1960-01-01", "1971-01-01", "1988-01-01"],
    }
)
transformer = DurationCalculatorTransformer(("foo", "bar"), "days", "foo_bar_duration")
transformer.fit_transform(X)["foo_bar_duration"].to_numpy()

### [`TimestampTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/datetime_transformer/#sk_transformers.datetime_transformer.TimestampTransformer)

Transforms a date column with a specified format into a timestamp column.

In [None]:
import pandas as pd
from sk_transformers import TimestampTransformer

X = pd.DataFrame({"foo": ["1960-01-01", "1970-01-01", "1990-01-01"]})
transformer = TimestampTransformer(["foo"])
transformer.fit_transform(X).to_numpy()

## [Encoder transformer](https://chrislemke.github.io/sk-transformers/API-reference/transformer/encoder_transformer/)

### [`MeanEncoderTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/encoder_transformer/#sk_transformers.encoder_transformer.MeanEncoderTransformer)

Scikit-learn API for the [feature-engine MeanEncoder](https://feature-engine.readthedocs.io/en/latest/api_doc/encoding/MeanEncoder.html).

In [None]:
import pandas as pd
from sk_transformers import MeanEncoderTransformer

X = pd.DataFrame({"foo": ["a", "b", "a", "c", "b", "a", "c", "a", "b", "c"]})
y = pd.Series([1, 0, 1, 0, 1, 0, 1, 0, 1, 0])

encoder = MeanEncoderTransformer()
encoder.fit_transform(X, y)

## [Generic transformer](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/)

### [`AggregateTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.AggregateTransformer)

This transformer uses Pandas `groupby` method and `aggregate` to apply function on a column grouped by another column.
Read more about Pandas [`aggregate`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.aggregate.html) method
to understand how to use function for aggregation. Other than Pandas function this transformer only support functions and string-names.

In [None]:
import pandas as pd
from sk_transformers import AggregateTransformer

X = pd.DataFrame(
    {
        "foo": ["mr", "mr", "ms", "ms", "ms", "mr", "mr", "mr", "mr", "ms"],
        "bar": [46, 32, 78, 48, 93, 68, 53, 38, 76, 56],
    }
)

transformer = AggregateTransformer([("foo", ("bar", "mean", "bar_mean"))])
transformer.fit_transform(X).to_numpy()

### [`ColumnDropperTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.ColumnDropperTransformer)

Drops columns from a dataframe using Pandas `drop` method.

In [None]:
import pandas as pd
from sk_transformers import ColumnDropperTransformer

X = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
transformer = ColumnDropperTransformer(["foo"])
transformer.fit_transform(X)

### [`ColumnEvalTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.ColumnEvalTransformer)

Provides the possibility to use Pandas methods on columns.

In [None]:
import pandas as pd
from sk_transformers import ColumnEvalTransformer

X = pd.DataFrame({"foo": ["a", "b", "c"], "bar": [1, 2, 3]})
transformer = ColumnEvalTransformer(
    [
        ("foo", "str.upper()"),
        (
            "bar",
            "swifter.apply(lambda x: x + 1)",
        ),  # swifter is optional. But it speed up the process!
    ]
)
transformer.fit_transform(X)

### [`DtypeTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.DtypeTransformer)

Transformer that converts a column to a different dtype.

In [None]:
import numpy as np
import pandas as pd
from sk_transformers import DtypeTransformer

X = pd.DataFrame({"foo": [1, 2, 3], "bar": ["a", "a", "b"]})
transformer = DtypeTransformer([("foo", np.float32), ("bar", "category")])
transformer.fit_transform(X).dtypes

### [`FunctionsTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.FunctionsTransformer)

This transformer is a plain wrapper around the [`sklearn.preprocessing.FunctionTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html).
Its main function is to apply multiple functions to different columns. Other than the scikit-learn transformer,
this transformer *does not* support the `inverse_func`, `accept_sparse`, `feature_names_out` and, `inv_kw_args` parameters.

In [None]:
import pandas as pd
from sk_transformers import FunctionsTransformer

X = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
transformer = FunctionsTransformer([("foo", np.log1p, None), ("bar", np.sqrt, None)])
transformer.fit_transform(X).to_numpy()

### [`MapTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.MapTransformer)

This transformer iterates over all columns in the `features` list and applies the given callback to the column. For this it uses the [`pandas.Series.map`](https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html) method.

In [None]:
import pandas as pd
from sk_transformers import MapTransformer

X = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
transformer = MapTransformer([("foo", lambda x: x + 1)])
transformer.fit_transform(X).to_numpy()

### [`LeftJoinTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.LeftJoinTransformer)

Performs a database-style left-join using `pd.merge`. This transformer is suitable for
replacing values in a column of a dataframe by looking-up another `pd.DataFrame`
or `pd.Series`. Note that, the join is based on the index of the right dataframe.

In [None]:
import pandas as pd
from sk_transformers import LeftJoinTransformer

X = pd.DataFrame({"foo": ["A", "B", "C", "A", "C"]})
lookup_df = pd.Series([1, 2, 3], index=["A", "B", "C"], name="values")
transformer = LeftJoinTransformer([("foo", lookup_df)])
transformer.fit_transform(X)

### [`NaNTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.NaNTransformer)

Replace NaN values with a specified value. Internally Pandas [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html) method is used.

In [None]:
from sk_transformers import NaNTransformer
import pandas as pd
import numpy as np

X = pd.DataFrame({"foo": [1, np.NaN, 3], "bar": ["a", np.NaN, "c"]})
transformer = NaNTransformer([("foo", -999), ("bar", "-999")])
transformer.fit_transform(X)

### [`QueryTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.QueryTransformer)

Applies a list of queries to a dataframe.
If it operates on a dataset used for supervised learning this transformer should
be applied on the dataframe containing `X` and `y`. So removing of columns by queries
also removes the corresponding `y` value.
Read more about queries [here](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html).

In [None]:
import pandas as pd
from sk_transformers import QueryTransformer

X = pd.DataFrame({"foo": [1, 8, 3, 6, 5, 4, 7, 2]})
transformer = QueryTransformer(["foo > 4"])
transformer.fit_transform(X)

### [`ValueIndicatorTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.ValueIndicatorTransformer)

Adds a column to a dataframe indicating if a value is equal to a specified value.
The idea behind this method is, that it is often useful to know if a `NaN` value was
present in the original data and has been changed by some imputation step.
Sometimes the present of a `NaN` value is actually important information.
But obviously this method works with any kind of data.

`NaN`, `None` or `np.nan` are **Not** caught by this implementation.

In [None]:
from sk_transformers import ValueIndicatorTransformer
import pandas as pd

X = pd.DataFrame({"foo": [1, -999, 3], "bar": ["a", "-999", "c"]})
transformer = ValueIndicatorTransformer([("foo", -999), ("bar", "-999")])
transformer.fit_transform(X)

### [`ValueReplacerTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.ValueIndicatorTransformer)

Uses Pandas `replace` method to replace values in a column. This transformer loops over the `features` and applies
`replace` to the according columns. If the column is not from type string but a valid regular expression is provided
the column will be temporarily changed to a string column and after the manipulation by `replace` changed back to its
original type. It may happen, that this type changing fails if the modified column is not compatible with its original type.

In [None]:
import pandas as pd
from sk_transformers import ValueReplacerTransformer

X = pd.DataFrame(
    {"foo": ["0000-01-01", "2022/01/08", "bar", "1982-12-7", "28-09-2022"]}
)
transformer = ValueReplacerTransformer(
    [
        (
            ["foo"],
            r"^(?!(19|20)\d\d[-\/.](0[1-9]|1[012]|[1-9])[-\/.](0[1-9]|[12][0-9]|3[01]|[1-9])$).*",
            "1900-01-01",
        )
    ]
)

transformer.fit_transform(X).to_numpy()

### [`AllowedValuesTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/generic_transformer/#sk_transformers.generic_transformer.AllowedValuesTransformer)

Replaces all values that are not in a list of allowed values with a replacement value.
This performs an complementary transformation to that of the ValueReplacerTransformer.
This is useful while lumping several minor categories together by selecting them
using a list of major categories.

In [None]:
import pandas as pd
from sk_transformers import AllowedValuesTransformer

X = pd.DataFrame({"foo": ["a", "b", "c", "d", "e"]})
transformer = AllowedValuesTransformer([("foo", ["a", "b"], "other")])
transformer.fit_transform(X)

## [Number transformer](https://chrislemke.github.io/sk-transformers/API-reference/transformer/number_transformer/)

[`MathExpressionTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/number_transformer/#sk_transformers.number_transformer.MathExpressionTransformer)

Applies an function/operation to a column and a given value or column.
The operation can be a function from NumPy's [mathematical functions](https://numpy.org/doc/stable/reference/routines.math.html#mathematical-functions)  or [`operator`](https://docs.python.org/3/library/operator.html#module-operator) package.

**Warning!** Some functions/operators may not work as expected. Especially not all NumPy methods are supported. For example:
various NumPy methods return values which are not fitting the size of the source column.

In [None]:
import pandas as pd
from sk_transformers import MathExpressionTransformer

X = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
transformer = MathExpressionTransformer([("foo", "np.add", "bar", None)])
transformer.fit_transform(X).to_numpy()

## [String transformer](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/)

### [`EmailTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/#sk_transformers.string_transformer.EmailTransformer)

Transforms an email address into multiple features.

In [None]:
import pandas as pd
from sk_transformers import EmailTransformer

X = pd.DataFrame({"foo": ["person-123@test.com"]})
transformer = EmailTransformer(["foo"])
transformer.fit_transform(X)

### [`IPAddressEncoderTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/#sk_transformers.string_transformer.IPAddressEncoderTransformer)

Encodes IPv4 and IPv6 strings addresses to a float representation.
To shrink the values to a reasonable size IPv4 addresses are divided by 2^10 and IPv6 addresses are divided by 2^48.
Those values can be changed using the `ip4_divisor` and `ip6_divisor` parameters.

In [None]:
import pandas as pd
from sk_transformers import IPAddressEncoderTransformer

X = pd.DataFrame({"foo": ["192.168.1.1", "2001:0db8:3c4d:0015:0000:0000:1a2f:1a2b"]})
transformer = IPAddressEncoderTransformer(["foo"])
transformer.fit_transform(X).to_numpy()

### [`PhoneTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/#sk_transformers.string_transformer.PhoneTransformer)

Transforms a phone number into multiple features.

In [None]:
import pandas as pd
from sk_transformers import PhoneTransformer

X = pd.DataFrame({"foo": ["+49123456789", "0044987654321", "3167891234"]})
transformer = PhoneTransformer(["foo"])
transformer.fit_transform(X)

### [`StringSimilarityTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/#sk_transformers.string_transformer.StringSimilarityTransformer)

Calculates the similarity between two strings using the `gestalt pattern matching` algorithm from the `SequenceMatcher` class.

In [None]:
import pandas as pd
from sk_transformers import StringSimilarityTransformer

X = pd.DataFrame(
    {
        "foo": ["abcdefgh", "ijklmnop", "qrstuvwx"],
        "bar": ["ghabcdef", "ijklmnop", "qr000000"],
    }
)
transformer = StringSimilarityTransformer(("foo", "bar"))
transformer.fit_transform(X)["foo_bar_similarity"].to_numpy()

### [`StringSlicerTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/#sk_transformers.string_transformer.StringSlicerTransformer)

Slices all entries of specified string features using the `slice()` function.

Note: The arguments for the `slice()` function are passed as a tuple. This shares
the python quirk of writing a tuple with a single argument with the trailing comma.

In [None]:
import pandas as pd
from sk_transformers import StringSlicerTransformer

X = pd.DataFrame({"foo": ["abc", "def", "ghi"], "bar": ["jkl", "mno", "pqr"]})
transformer = StringSlicerTransformer([("foo", (1, 3)), ("bar", (2,))])
transformer.fit_transform(X)

### [`StringSplitterTransformer`](https://chrislemke.github.io/sk-transformers/API-reference/transformer/string_transformer/#sk_transformers.string_transformer.StringSplitterTransformer)

Uses the pandas `str.split` method to split a column of strings into multiple columns.

In [None]:
import pandas as pd
from sk_transformers import StringSplitterTransformer

X = pd.DataFrame({"foo": ["a_b", "c_d", "e_f"], "bar": ["g*h*i", "j*k*l", "m*n*o"]})
transformer = StringSplitterTransformer([("foo", "_", 2), ("bar", "*", 3)])
transformer.fit_transform(X)

**This is just the beginning. We will continue adding more transformers 🤖 to this notebook. If you have any suggestions, please [let us know](https://github.com/chrislemke/sk-transformers/issues). 🙏**