# Scikit-Transformers : BoolColumnTransformer

## Imports 

Import warnings and disable warnings for this notebook.

In [1]:
import warnings
warnings.filterwarnings("ignore")

Import the data libraries

In [2]:
import pandas as pd
import numpy as np

Uncomment the following cell to install the necessary packages.

In [3]:
#!pip install scikit-transformers

Import scikit-transformers

In [4]:
try : 
    from sktransf import get_titanic
except Exception as e:
    print(e)
    print("Please install the package using the following command")
    print("pip install scikit-transformers")
    from sktransf._get_titanic import get_titanic

In [5]:
from sktransf import BoolColumnTransformer

## Data

Get the data from the [Kaggle](https://www.kaggle.com/c/titanic/data) Titanic dataset.

In [6]:
df, _ = get_titanic()


Display the first few rows of the data.

In [7]:
df.head()

Unnamed: 0,Pclass,Age,SibSp,Parch,Fare
0,3,22.0,1,0,7.25
1,1,38.0,1,0,71.2833
2,3,26.0,0,0,7.925
3,1,35.0,1,0,53.1
4,3,35.0,0,0,8.05


Add a dummy column to the data.

In [8]:
df["dummy_column"] = np.random.choice(["A", "B"], size=df.shape[0])

Display the new df

In [9]:
df.sample(10)

Unnamed: 0,Pclass,Age,SibSp,Parch,Fare,dummy_column
585,1,18.0,0,2,79.65,A
221,2,27.0,0,0,13.0,B
511,3,,0,0,8.05,A
432,2,42.0,1,0,26.0,A
481,2,,0,0,0.0,A
799,3,30.0,1,1,24.15,A
305,1,0.92,1,2,151.55,B
194,1,44.0,0,0,27.7208,A
592,3,47.0,0,0,7.25,B
247,2,24.0,0,2,14.5,A


## Using BoolColumnTransformer

Fit and transform the data using the BoolColumnTransformer

In [10]:
clean_df = BoolColumnTransformer().fit_transform(df)
clean_df

array([[3, 22.0, 1, 0, 7.25, 0],
       [1, 38.0, 1, 0, 71.2833, 0],
       [3, 26.0, 0, 0, 7.925, 1],
       ...,
       [3, nan, 1, 2, 23.45, 1],
       [1, 26.0, 0, 0, 30.0, 0],
       [3, 32.0, 0, 0, 7.75, 1]], dtype=object)

Clean_df is a np.ndarray object.
If you want to convert it back to a DataFrame, you can use the following code:

In [11]:
clean_df = BoolColumnTransformer(force_df_out=True).fit_transform(df)
clean_df

Unnamed: 0,Pclass,Age,SibSp,Parch,Fare,dummy_column
0,3,22.0,1,0,7.2500,0
1,1,38.0,1,0,71.2833,0
2,3,26.0,0,0,7.9250,1
3,1,35.0,1,0,53.1000,0
4,3,35.0,0,0,8.0500,1
...,...,...,...,...,...,...
886,2,27.0,0,0,13.0000,0
887,1,19.0,0,0,30.0000,1
888,3,,1,2,23.4500,1
889,1,26.0,0,0,30.0000,0


Check data types of the new df

In [12]:
clean_df.dtypes

Pclass            int64
Age             float64
SibSp             int64
Parch             int64
Fare            float64
dummy_column     object
dtype: object

Check nunique values of the new df

In [13]:
df.nunique()

Pclass            3
Age              88
SibSp             7
Parch             7
Fare            248
dummy_column      2
dtype: int64