# Which customers are happy customers?


From frontline support teams to C-suites, customer satisfaction is a key measure of success. Unhappy customers don't stick around. What's more, unhappy customers rarely voice their dissatisfaction before leaving.

Santander Bank is asking Kagglers to help them identify dissatisfied customers early in their relationship. Doing so would allow Santander to take proactive steps to improve a customer's happiness before it's too late.

In this competition, you'll work with hundreds of anonymized features to predict if a customer is satisfied or dissatisfied with their banking experience.



![img](https://kaggle2.blob.core.windows.net/competitions/kaggle/4986/media/santander_custsat_red.png)


# Load Data

In [1]:
import numpy as np
import pandas as pd

data = pd.read_csv("../data/train.csv")
 

## exercise contents
Randomforest Tree 50

# Cross validatae by ROCAUC


In [8]:
from sklearn.cross_validation import KFold
from sklearn import datasets
from sklearn.metrics import roc_auc_score


def rocauc_score(model, features, labels, num_folds = 5):
    kfolds = KFold(len(features), num_folds)
    
    total_score = 0.0
    
    for train_index, test_index in kfolds:
        train_features = features.iloc[train_index]
        test_features = features.iloc[test_index]
        train_labels = labels[train_index]
        test_labels = labels[test_index]
        
        model.fit(train_features, train_labels)
        prediction = model.predict_proba(test_features)
        
        nd = test_labels.get_values()
        pos_prediction = [x[1] for x in prediction]
        score = roc_auc_score(nd, pos_prediction)
        total_score = total_score + score
        
    total_score = total_score / num_folds
    
    return total_score   

# Prediction

## Predict 

In [4]:
#from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn import svm

feature_names = data.columns[1:370]
#배열로 넘김
label_name = "TARGET"
#컬럼 넘김

tree50_score = rocauc_score(RandomForestClassifier(n_estimators=50, n_jobs= -1), data[feature_names], data[label_name])
#svmsvc_score = rocauc_score(svm.SVC(), data[feature_names], data[label_name])
#Logistic_score = rocauc_score(LogisticRegression(n_jobs =-1), data[feature_names], data[label_name])

  tree 50 = 0.73968


## Print score

In [6]:
print(" test 50 = %.5f" % (tree50_score))

 test 50 = 0.73968


# Submit

In [7]:
train = pd.read_csv("../data/train.csv")
test = pd.read_csv("../data/test.csv")
 
feature_names = data.columns[1:370]

label_name = "TARGET"

model = RandomForestClassifier(n_estimators=50, n_jobs=-1)
model.fit(train[feature_names], train[label_name])
prediction = model.predict_proba(test[feature_names])

p = [x[1] for x in prediction]


submit = pd.DataFrame(data = {'TARGET' : p}, index = test["ID"] )
submit.index.names = ["ID"]


In [11]:
from time import strftime, localtime

current_time = strftime("%Y.%m.%d %H.%M.%S", localtime())

submit.to_csv("../submit/%s.csv" % current_time)