# Which customers are happy customers?


From frontline support teams to C-suites, customer satisfaction is a key measure of success. Unhappy customers don't stick around. What's more, unhappy customers rarely voice their dissatisfaction before leaving.

Santander Bank is asking Kagglers to help them identify dissatisfied customers early in their relationship. Doing so would allow Santander to take proactive steps to improve a customer's happiness before it's too late.

In this competition, you'll work with hundreds of anonymized features to predict if a customer is satisfied or dissatisfied with their banking experience.



![img](https://kaggle2.blob.core.windows.net/competitions/kaggle/4986/media/santander_custsat_red.png)


# Load Data

In [1]:
import numpy as np
import pandas as pd

data = pd.read_csv("../data/train.csv")
test = pd.read_csv("../data/test.csv")
feature = set(data.columns)
print len(feature)

feature.discard("TARGET")
feature.discard("ID")

371


## exercise contents
remove unique value Feature and if the values of test's feature is not unique, and don't discard

In [3]:
def removeOneValueFeature() : 
    for i in data.columns:
        data[i]
        #print len(data[i].unique())
        if (len(data[i].unique()) == 1 and len(test[i].unique()) == 1):
            feature.discard(i)
            
        
        
        
removeOneValueFeature()
print len(feature)
    

335


# Cross validatae by ROCAUC


In [4]:
from sklearn.cross_validation import KFold
from sklearn import datasets
from sklearn.metrics import roc_auc_score


def rocauc_score(model, features, labels, num_folds = 5):
    kfolds = KFold(len(features), num_folds)
    
    total_score = 0.0
    
    for train_index, test_index in kfolds:
        train_features = features.iloc[train_index]
        test_features = features.iloc[test_index]
        train_labels = labels[train_index]
        test_labels = labels[test_index]
        
        model.fit(train_features, train_labels)
        prediction = model.predict_proba(test_features)
        
        nd = test_labels.get_values()
        pos_prediction = [x[1] for x in prediction]
        score = roc_auc_score(nd, pos_prediction)
        total_score = total_score + score
        
    total_score = total_score / num_folds
    
    return total_score   

# Prediction

## Predict 

In [11]:
#from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn import svm
import time

feature_names = pd.Series(list(feature))
feature_names2 = data.columns[1:370]
#배열로 넘김
label_name = "TARGET"
#컬럼 넘김
befo_time = time.time()
before = rocauc_score(RandomForestClassifier(n_estimators=200, n_jobs= -1), data[feature_names2], data[label_name])
befo_time = time.time() - befo_time

after_time = time.time()
after = rocauc_score(RandomForestClassifier(n_estimators=200, n_jobs= -1), data[feature_names], data[label_name])
after_time = time.time() - after_time

#svmsvc_score = rocauc_score(svm.SVC(), data[feature_names], data[label_name])
#Logistic_score = rocauc_score(LogisticRegression(n_jobs =-1), data[feature_names], data[label_name])

## Print score

In [12]:
print("before = %.5f" % (before))
print befo_time
print ("after = %.5f" % (after) )
print after_time


before = 0.76095
39.6575210094
after = 0.75989
37.7166130543
