## Data Transformation

### ``ColumnTransformer``
* Applies transformers to columns of an array or pandas DataFrame.
```
class sklearn.compose.ColumnTransformer(transformers, *, remainder='drop', sparse_threshold=0.3, n_jobs=None, transformer_weights=None, verbose=False)
```
* [Docs](https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html)

> **OR**

### make_column_transformer

```
sklearn.compose.make_column_transformer(*transformers, remainder='drop', sparse_threshold=0.3, n_jobs=None, verbose=False)
```
* [Docs](https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_transformer.html)


### Example:
> Let's make some transformation on our pandas dataframe `social_network_adds.csv` let's say we want to transform **gender** using `OneHotEncoder()`, and **estimated salary**,  **age** and **purchased** columns using the `StandardScaler()`. We can do it as follows.

In [13]:
import pandas as pd
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

In [9]:
dataframe = pd.read_csv('Social_Network_Ads.csv')
dataframe.head()

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0


In [15]:
column_transfomer = ColumnTransformer(
   [ ("standardscaler", StandardScaler(), ['Age', 'EstimatedSalary']),
    ('onehotencoder', OneHotEncoder(), ['Gender'])]
)
column_transfomer

ColumnTransformer(transformers=[('standardscaler', StandardScaler(),
                                 ['Age', 'EstimatedSalary']),
                                ('onehotencoder', OneHotEncoder(), ['Gender'])])

> **OR**

In [18]:
column_transfomer2 = make_column_transformer(
   (StandardScaler(), ['Age', 'EstimatedSalary', 'Purchased']),
    (OneHotEncoder(), ['Gender'])
)
column_transfomer2

ColumnTransformer(transformers=[('standardscaler', StandardScaler(),
                                 ['Age', 'EstimatedSalary', 'Purchased']),
                                ('onehotencoder', OneHotEncoder(), ['Gender'])])

### Fitting the data

In [19]:
newDF = column_transfomer2.fit_transform(dataframe)
newDF

array([[-1.78179743, -1.49004624, -0.74593581,  0.        ,  1.        ],
       [-0.25358736, -1.46068138, -0.74593581,  0.        ,  1.        ],
       [-1.11320552, -0.78528968, -0.74593581,  1.        ,  0.        ],
       ...,
       [ 1.17910958, -1.46068138,  1.34059793,  1.        ,  0.        ],
       [-0.15807423, -1.07893824, -0.74593581,  0.        ,  1.        ],
       [ 1.08359645, -0.99084367,  1.34059793,  1.        ,  0.        ]])

> **Note** - We can split our data into train, then fit on the train data then transform on the train and test data. **Example:**

```
column_transfomer.fit(X_train)

X_train_transformed = column_transfomer.transform(X_train)
X_test_transformed = column_transfomer.transform(X_test)

```

**The transformer always returns a numpy array after all the transformation.**