# Feature Transformations In The Iris Dataset

This notebook illustrates finding transforming features in the Iris dataset. It is a version of the Scikit-Learn example [Concatenating multiple feature extraction methods](http://scikit-learn.org/stable/auto_examples/plot_feature_stacker.html#sphx-glr-auto-examples-plot-feature-stacker-py).

The main point it shows is how Ibex creates ``pandas.DataFrame`` structures with appropriate column headings.

## Loading The Data

First we load the dataset into a ``pandas.DataFrame``.

In [1]:
import pandas as pd                                                          
import numpy as np                                                           
from ibex.sklearn import datasets                                            
from ibex.sklearn.decomposition import PCA as PDPCA                          
from ibex.sklearn.feature_selection import SelectKBest as PDSelectKBest      

In [2]:
iris = datasets.load_iris()                                                  
features = iris['feature_names']                                             
iris = pd.DataFrame(                                                         
    np.c_[iris['data'], iris['target']],                                     
    columns=features+['class'])                                              

## Transforming Features

Now that the data is in a ``DataFrame``, we can transform the features. Notice how, in the resulting ``DataFrame``, the column headers indicate the meaning of the (numerical) features.

In [3]:
trn = PDPCA(n_components=2) + PDSelectKBest(k=1)                             
trn.fit(iris[features], iris['class']).transform(iris[features]).head()

Unnamed: 0_level_0,pca,pca,selectkbest
Unnamed: 0_level_1,comp_0,comp_1,petal length (cm)
0,-2.684207,0.326607,1.4
1,-2.715391,-0.169557,1.4
2,-2.88982,-0.137346,1.3
3,-2.746437,-0.311124,1.5
4,-2.728593,0.333925,1.4


## Comparison To The Original Example

Following is the example in the original form.

In [4]:
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

iris = load_iris()
X, y = iris.data, iris.target
pca = PCA(n_components=2)
selection = SelectKBest(k=1)
combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])
combined_features.fit(X, y).transform(X)

array([[-2.68420713,  0.32660731,  1.4       ],
       [-2.71539062, -0.16955685,  1.4       ],
       [-2.88981954, -0.13734561,  1.3       ],
       [-2.7464372 , -0.31112432,  1.5       ],
       [-2.72859298,  0.33392456,  1.4       ],
       [-2.27989736,  0.74778271,  1.7       ],
       [-2.82089068, -0.08210451,  1.4       ],
       [-2.62648199,  0.17040535,  1.5       ],
       [-2.88795857, -0.57079803,  1.4       ],
       [-2.67384469, -0.1066917 ,  1.5       ],
       [-2.50652679,  0.65193501,  1.5       ],
       [-2.61314272,  0.02152063,  1.6       ],
       [-2.78743398, -0.22774019,  1.4       ],
       [-3.22520045, -0.50327991,  1.1       ],
       [-2.64354322,  1.1861949 ,  1.2       ],
       [-2.38386932,  1.34475434,  1.5       ],
       [-2.6225262 ,  0.81808967,  1.3       ],
       [-2.64832273,  0.31913667,  1.4       ],
       [-2.19907796,  0.87924409,  1.7       ],
       [-2.58734619,  0.52047364,  1.5       ],
       [-2.3105317 ,  0.39786782,  1.7  

Note that, as is usual in `numpy.array`s, the meaning of the columns is absent. 