# Scikit-Transformers : DropUniqueColumnTransformer

## Imports 

Import warnings and disable warnings for this notebook.

In [1]:
import warnings
warnings.filterwarnings("ignore")

Import the data libraries

In [2]:
import pandas as pd

Uncomment the following cell to install the necessary packages.

In [3]:
#!pip install scikit-transformers

Import scikit-transformers

In [4]:
try : 
    from sktransf import get_titanic
except Exception as e:
    print(e)
    print("Please install the package using the following command")
    print("pip install scikit-transformers")
    from sktransf._get_titanic import get_titanic

In [5]:
from sktransf import DropUniqueColumnTransformer

## Data

Get the data from the [Kaggle](https://www.kaggle.com/c/titanic/data) Titanic dataset.

In [6]:
df, _ = get_titanic()


Display the first few rows of the data.

In [7]:
df.head()

Unnamed: 0,Pclass,Age,SibSp,Parch,Fare
0,3,22.0,1,0,7.25
1,1,38.0,1,0,71.2833
2,3,26.0,0,0,7.925
3,1,35.0,1,0,53.1
4,3,35.0,0,0,8.05


Add a dummy column to the data.

In [8]:
df["dummy_column"] = "dummy"

Display the new df

In [9]:
df

Unnamed: 0,Pclass,Age,SibSp,Parch,Fare,dummy_column
0,3,22.0,1,0,7.2500,dummy
1,1,38.0,1,0,71.2833,dummy
2,3,26.0,0,0,7.9250,dummy
3,1,35.0,1,0,53.1000,dummy
4,3,35.0,0,0,8.0500,dummy
...,...,...,...,...,...,...
886,2,27.0,0,0,13.0000,dummy
887,1,19.0,0,0,30.0000,dummy
888,3,,1,2,23.4500,dummy
889,1,26.0,0,0,30.0000,dummy


## Using DropUniqueColumnTransformer

Fit and transform the data using the DropUniqueColumnTransformer

In [10]:
clean_df = DropUniqueColumnTransformer().fit_transform(df)
clean_df

array([[ 3.    , 22.    ,  1.    ,  0.    ,  7.25  ],
       [ 1.    , 38.    ,  1.    ,  0.    , 71.2833],
       [ 3.    , 26.    ,  0.    ,  0.    ,  7.925 ],
       ...,
       [ 3.    ,     nan,  1.    ,  2.    , 23.45  ],
       [ 1.    , 26.    ,  0.    ,  0.    , 30.    ],
       [ 3.    , 32.    ,  0.    ,  0.    ,  7.75  ]])

Clean_df is a np.ndarray object.
If you want to convert it back to a DataFrame, you can use the following code:

In [11]:
clean_df = DropUniqueColumnTransformer(force_df_out=True).fit_transform(df)
clean_df

Unnamed: 0,Pclass,Age,SibSp,Parch,Fare
0,3,22.0,1,0,7.2500
1,1,38.0,1,0,71.2833
2,3,26.0,0,0,7.9250
3,1,35.0,1,0,53.1000
4,3,35.0,0,0,8.0500
...,...,...,...,...,...
886,2,27.0,0,0,13.0000
887,1,19.0,0,0,30.0000
888,3,,1,2,23.4500
889,1,26.0,0,0,30.0000
