Feature Generation
==========

We will define feature vector generation (FVG) to be a function $f$ where it takes in one or more vector(s) $x_1, x_2, ...$ all of the same length and can output one or more vectors of the same length. 

We will define feature matrix generation (FMG) to be a function $g$ where it takes in a matrix $X$, indices $d_1, d_2$ and a feature vector generation function $f$ which outputs:

$g(X, f, d_1, d_2, ...) = \left[ X \mid f(X_{d_1}, X_{d_2}, ...) \right]$

To form a quasi-group, we will add the additional requirement that all feature matrix generation functions are

1.  Idempotent: $g(g(x)) = g(x)$
2.  Presence of an identity
3.  Is invertible


In [47]:
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn import datasets

import numpy as np
import pandas as pd

from sklearn.pipeline import Pipeline, FeatureUnion

iris = datasets.load_iris()
X = iris.data
y = iris.target

In [43]:
class Identity(BaseEstimator, TransformerMixin):
    """
    Parameters
    ----------
    
    mask : the column indices you wish to keep    
    """
    
    def __init__(self):
        pass
    
    def fit(self, x, y=None):
        return self

    def transform(self, x):
        return x

In [42]:
class HingeCreate(BaseEstimator, TransformerMixin):
    """
    this class is an example of "feature vector generation" function
    
    Parameters
    ----------
    
    mask: the column index (singular) you wish to perform hinge transform
    hinge: the value at which the hinge appears
    """
    
    def __init__(self, mask=0, hinge=0):
        self.mask = mask
        self.hinge = hinge
    
    def fit(self, x, y=None):
        """
        Learn hinge? or not to learn hinge        
        """
        # update hinge value, based on bayesian opt?
        # self.hinge = updated_val
        return self
    
    def transform(self, X_as_matrix):
        x_col = X_as_matrix[:, [self.mask]]
        
        x_left = np.apply_along_axis(lambda x: np.max(x - self.hinge, 0), 1, x_col)[:, np.newaxis]
        x_right = np.apply_along_axis(lambda x: np.max(self.hinge - x, 0), 1, x_col)[:, np.newaxis]
        return np.hstack([x_left, x_right])

In [49]:
# this is our "g"
def HingeTransform(mask=0, hinge=3):
    hinge_transform = FeatureUnion([
        ('identity', Identity()),
        ('hinge', CreateHinge(mask, hinge))
    ])
    return hinge_transform