# Classification: Communication & Delivery

## Goals

Build the pipeline to run all necessary steps on the 'live' data, or data that is not labeled. 
Summarizing the previous stages and making it available to your customers/end users.  Deliver the model, report summarizing actionable insights and recommendations, and predictions to the stakeholders.

## Skills

Applying preprocessing to new data
'Predict' function to apply model to new data

## Methods

- `knn.predict()`
- `scalar.transform()`
- `pd.join()`

## Preprocessing functions

*need to be able to hold these in a separate file for access*

In [1]:
## Needed in Final Model
# We will need this for pre-processing



def extractTitle(df, col, newcol = 'Title'):
    df[newcol] = df[col].str.extract('([A-Za-z]+)\.')
    return df

def getAgeClass(df, col, newcol = 'IsChild', cutoff = 18):
    conditions = [
        (df[col] >= cutoff), # adult
        (df[col] < cutoff), # child
    ]
    choices = ['False', 'True']
    df[newcol]= np.select(conditions, choices, default='NaN')
    return df


def estIsChild(df, ix=0):
    if ~df.IsChild.isnull()[ix]:
        return df.IsChild[ix]
    elif (df.Title[ix]=='Miss' and df.SibSp[ix]>0) or (df.Title[ix]=='Master'):
        return 'True'
    else:
        return 'False'

def fillIsChild(df):
    isChild = []
    for i in range(len(df_na)):
        isChild.append(estIsChild(df_na, ix=i))
    df['IsChild'] = pd.DataFrame(isChild)
    return df    

def impute_age(df):
    byChild = df.groupby('IsChild')
    df.Age = byChild.Age.transform(lambda x: x.fillna(x.median()))
    return df

def createTitleBin(df, nonother = ['Miss','Mrs','Mr','Master']):
    df['TitleBin'] = np.where(np.isin(df.Title, nonother), df.Title, 'Other')
    return df

def dropIsChild(df):
    df = df.drop(['IsChild'], axis=1)
    return df

# X_test.Embarked = enc_embarked.transform(list(X_test.Embarked))
# X_test.Sex = enc_sex.transform(list(X_test.Sex))
# X_test.Title = enc_title.transform(list(X_test.Title))
# X_test_scaled = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns.values)

def impute(train_df, test_df, variable_name, strategy = 'Median', missing_values = 'NaN'):
    imp = Imputer(missing_values = missing_values, strategy = strategy, axis = 0)
    imp.fit_transform(train_df[variable_name])
    test_df[variable_name] = imp.transform(test_df[variable_name])
    return train_df, test_df


## Pipeline

### Access data, scalars, and models

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier

%store -r svm
%store -r knn
%store -r gnb
%store -r logit
%store -r clf

Unnamed: 0,Title,Sex,AgeClass,AvgAge
0,Capt,male,adult,70.0
1,Col,male,adult,58.0
2,Countess,female,adult,33.0
3,Don,male,adult,40.0
4,Dr,female,adult,49.0


### Run functions to pre-process data

In [3]:
# ignore warnings
import warnings
warnings.filterwarnings("ignore")

### Run KNN model on new data

In [5]:
# ignore warnings
import warnings
warnings.filterwarnings("ignore")

X = scaler.transform(X)
survived_pred = knn.predict(X)
survived_hat = pd.DataFrame(survived_pred, columns=['survived_hat'])

### Join predictions back with features

In [6]:
pred_df = df0[~df0.isnull().any(1)]
pred_df = pred_df.join(survived_hat)
pred_df.head()

Unnamed: 0,Age,Fare,Parch,Pclass,SibSp,Embarked,Sex,Title,survived_hat
0,34.0,7.8292,0,3,0,1,1,2,0.0
1,47.0,7.0,0,3,1,2,0,3,0.0
2,62.0,9.6875,0,2,0,1,1,2,0.0
3,27.0,8.6625,0,3,0,2,1,2,0.0
4,22.0,12.2875,1,3,1,2,0,3,1.0
