# Visual Pipelines 

This notebook demonstrates a proof of concept for a visual pipeline for analytics. 

![Yellowbrick Prototype Pipeline Objects](figures/pipeline_prototype.png)

In [14]:
%matplotlib inline

import os
import sys 

# Modify the path 
sys.path.append("..")

import pandas as pd
import yellowbrick as yb 
import matplotlib as mpl 
import matplotlib.pyplot as plt 

## Load Datasets 

Note that if datasets do not exist, please see the `download.py` located in this directory. 

In [2]:
FIXTURES  = os.path.join(os.getcwd(), "data")
credit    = pd.read_excel(os.path.join(FIXTURES, "credit.xls"), header=1)
concrete  = pd.read_excel(os.path.join(FIXTURES, "concrete.xls"))
occupancy = pd.read_csv(os.path.join('data','occupancy','datatraining.txt'))

In [3]:
# Rename the columns of the datasets for ease of use. 
credit.columns = [
    'id', 'limit', 'sex', 'edu', 'married', 'age', 'apr_delay', 'may_delay',
    'jun_delay', 'jul_delay', 'aug_delay', 'sep_delay', 'apr_bill', 'may_bill',
    'jun_bill', 'jul_bill', 'aug_bill', 'sep_bill', 'apr_pay', 'may_pay', 'jun_pay',
    'jul_pay', 'aug_pay', 'sep_pay', 'default'
]

concrete.columns = [
    'cement', 'slag', 'ash', 'water', 'splast',
    'coarse', 'fine', 'age', 'strength'
]

occupancy.columns = [
    'date', 'temp', 'humid', 'light', 'co2', 'hratio', 'occupied'
]


In [12]:
from sklearn.svm import LinearSVC
from sklearn.preprocessing import StandardScaler 

model = Pipeline([
    ('scale', StandardScaler()), 
    ('model', LinearSVC())
])

model.steps

[('scale', StandardScaler(copy=True, with_mean=True, with_std=True)),
 ('model', LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
       intercept_scaling=1, loss='squared_hinge', max_iter=1000,
       multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
       verbose=0))]

## Evaluation Visualization Prototype

In [5]:
from sklearn.pipeline import Pipeline
from sklearn.base import BaseEstimator, TransformerMixin


class VisualPipeline(Pipeline):
    
    def draw(self):
        """
        Calls the draw method on every visual transformer/estimator  
        """
        for name, estimator in self.steps:
            try:
                estimator.draw()
            except AttributeError:
                continue 

                
class ClassifierEvaluation(object):
    
    def draw(self):
        yb.crplot()

## `VisualizerMixin` class

In base.py at the root of yellowbrick.

The intent is that visualizers should extend Scikit-Learn's `BaseEstimator`, `TransformerMixin` and our `VisualizerMixin` classes - giving it the following required methods:

 - `fit`    
 - `draw`    
 - `fit_draw`    

The idea is that fit will be passed X data and maybe y data and will prepare the data for drawing, and then draw will actual conduct the drawing.

The __init__ method should take styling arguments. So things like size, color, whether or not to save to a file, markers, line stuff, etc.

NOTE: check if the transformer mixin extends from `BaseEstimator` and if it does, then also subclass `VisualizerMixin` from `BaseEstimator` - to allow us use of set_param and get_param on rendering variables.

In [7]:
class VisualizerMixin(BaseEstimator, TransformerMixin):
    
    def __init__():
        pass
    
    def fit():
        pass
    
    def fit_draw():
        pass

## `FeatureVisualizer` base class

Comes between or at the end of transformers but before the estimator. 

Methods:
 - `fit`    
 - `transform`    
 - `draw`    
 - `fit_draw`    
 - `fit_transform` (alias for `fit_draw`)    
 
Needs to extend `BaseEstimator`, `TransformerMixin`, and `VisualizerMixin`.

In [9]:
class FeatureVisualizer(VisualizerMixin):
    
    def __init__():
        pass
    
    def fit():
        pass
    
    def transform():
        pass
    
    def draw():
        pass
    
    def fit_draw():
        pass
    
    def fit_transform():
        pass

## `ScoreVisualizer` class for estimators

Base class to follow an estimator in a pipeline. Not a transformer. Extends `BaseEstimator`, `VisualizerMixin`, `ScoringMixin`. 

Methods:
 - `fit`   
 - `predict`    
 - `score`    
 - `draw`    

Need to create a `ScoringMixin`. `ScoringMixin` has scoring function that saves the state of the scoring to the class so that we can draw it. 

Need to think how we'll access the model from `ScoreVisualizer`. Best option is to have it be instantiated with a model form - this makes sense because we would call fit down into the estimator. The trick will be that the visual pipeline needs to know not to run fit twice.

In [11]:
class ScoringMixin():
    
    def __init__():
        pass
    
    def score():
        pass

class ScoreVisualizer(VisualizerMixin, ScoringMixin):
    
    def __init__():
        pass
    
    def fit():
        pass
    
    def predict():
        pass
    
    def score():
        pass
    
    def draw():
        pass