# Plan: Meta Learning by Combining models

In order to improve prediction quality, I need to combine models using a meta learning combiner. This will involve taking probe predictions trained on the training data, then blending them to maximize the prediction accuracy of the probe set. 

The first step of this was to generate a large library of predictions of the probe set, without optimizing for performance with any hyperparamters/models. This has been completed.

The second step is to generate a blend that maximizes the blend accuracy when predicting the probe set. This will be accomplished by taking a random prediction, adding it to the blend, and then seeing whether it improves the blend accuracy. If it doesn't, the next prediction is tested. This will be done with replacement, as "Ensemble Learning from Libraries of Models" shows this causes a more stable blend accuracy curve as more models are added. 

Once the blend accuracy does not change when adding further models to the blend, the paramters for the models and blend model are stored. To improve the accuracy, all models will be re-trained on the combined training and probe set. This improves accuracy because the models see more training data than before. Next, the individual models predict the test data and are then combined using the blend parameters. 

# Meta Learning by Combining Models

In [1]:
import sys
import os

import socket
computer_name = socket.gethostname()
if computer_name == 'Alexs-MacBook-Pro.local':
    base_path = "/Users/alexsutherland/Documents/Programming/Python/Kaggle/Titanic---2015"
else:    
    base_path = 'C:\Users\Lundi\Documents\Programming\Python\Kaggle\Titanic - 2015'
sys.path.append(base_path)
sys.path.append(base_path + "\Stacked Generalization")


import TitanicPreprocessor as tp
import TitanicPredictor as tpred
import metaLearning as metaLearn
meta_learn = metaLearn.metaLearning()
import sklearn.ensemble as skl_ensemble
import sklearn.linear_model as skl_lm
import sklearn.grid_search as skl_gs
import sklearn.cross_validation as skl_cv
import numpy as np
import pandas as pd

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns



## Generating models

In [2]:
meta_learn.generateExampleLogisticModels()

100 models added


## Taking a random prediction, adding it to the blend, and then seeing whether it improves the blend accuracy

Generate predictions for the entire dataset using 10-fold CV.

Then take those the full training set predictions and classify

In [None]:
lm_clf = skl_lm.LogisticRegression()
y = meta_learn.data['y']
X = meta_learn.data['X']




In [15]:
#Find best individual predictor
best_blend_accuracy = 0
for current_prediction_probs in meta_learn.model_prediction_probs:
    current_predictions = map(lambda x: 1 if x >= 0.5 else 0, current_prediction_probs)
    current_accuracy = np.mean(current_predictions == y_probe)
    if current_accuracy > best_blend_accuracy:
        best_blend_accuracy = current_accuracy
print 'Best individual accuracy:', best_blend_accuracy

Best individual accuracy: 0.806722689076


In [5]:
blend_prediction_set = []
#Taking a random prediction
current_predictions = meta_learn.model_prediction_probs[np.random.randint(0, high=len(meta_learn.model_prediction_probs))]
blend_prediction_set.append(current_predictions)

lm_clf.fit(blend_prediction_set, y_train)
lm_clf.predict(blend_prediction_set)

#if current_blend_accuracy <= best_blend_accuracy 
    #blend_prediction_set.pop()

NameError: name 'best_blend_accuracy' is not defined

## Testing that blend accuracy does not change when adding further models to the blend

## Storing paramters for the models and blend model 

## Re-training models on the combined training and probe set

## Individual models predict the test data and are then combined using the blend parameters