#### Background:

We are one of the fastest growing startups in the logistics and delivery domain. We work with several partners and make on-demand delivery to our customers. During the COVID-19 pandemic, we are facing several different challenges and everyday we are trying to address these challenges. 

We thrive on making our customers happy. As a growing startup, with a global expansion strategy we know that we need to make our customers happy and the only way to do that is to measure how happy each customer is. If we can predict what makes our customers happy or unhappy, we can then take necessary actions. 

Getting feedback from customers is not easy either, but we do our best to get constant feedback from our customers. This is a crucial function to improve our operations across all levels. 

We recently did a survey to a select customer cohort. You are presented with a subset of this data. We will be using the remaining data as a private test set.

#### Data Description:

Y = target attribute (Y) with values indicating 0 (unhappy) and 1 (happy) customers
X1 = my order was delivered on time
X2 = contents of my order was as I expected
X3 = I ordered everything I wanted to order
X4 = I paid a good price for my order 
X5 = I am satisfied with my courier
X6 = the app makes ordering easy for me 

Attributes X1 to X6 indicate the responses for each question and have values from 1 to 5 where the smaller number indicates less and the higher number indicates more towards the answer. 


#### Importing

In the cell below the import statement of the libraries used.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from keras.wrappers.scikit_learn import KerasClassifier
from keras.models import Sequential
from keras.layers import Dense, Activation
from sklearn.linear_model import LogisticRegressionCV

  from numpy.core.umath_tests import inner1d


In [36]:
HappinessSurvey2020 = pd.read_csv("ACME-HappinessSurvey2020.csv")


print(HappinessSurvey2020.shape)
HappinessSurvey2020.head()

(126, 7)


Unnamed: 0,Y,X1,X2,X3,X4,X5,X6
0,0,3,3,3,4,2,4
1,0,3,2,3,5,4,3
2,1,5,3,3,3,3,5
3,0,5,4,3,3,3,5
4,0,5,4,3,3,3,5


#### Train-Test Split

In [5]:
x_data = HappinessSurvey2020[['X1', 'X2', 'X3','X4','X5','X6']]
y_data = HappinessSurvey2020['Y']

X_train, X_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.15)

In [32]:
normalized_HappinessSurvey2020 = ((HappinessSurvey2020-HappinessSurvey2020.mean())/HappinessSurvey2020.std())

In [35]:
print(normalized_HappinessSurvey2020.shape)
normalized_HappinessSurvey2020.head()

(126, 7)


Unnamed: 0,Y,X1,X2,X3,X4,X5,X6
0,-1.095864,-1.666667,0.419999,-0.302435,0.289992,-1.438424,-0.313808
1,-1.095864,-1.666667,-0.476948,-0.302435,1.431836,0.304282,-1.549427
2,0.905279,0.833333,0.419999,-0.302435,-0.851852,-0.567071,0.921811
3,-1.095864,0.833333,1.316947,-0.302435,-0.851852,-0.567071,0.921811
4,-1.095864,0.833333,1.316947,-0.302435,-0.851852,-0.567071,0.921811


Normalize

In [33]:
normalized_x_data = normalized_HappinessSurvey2020[['X1', 'X2', 'X3','X4','X5','X6']]
normalized_y_data = normalized_HappinessSurvey2020['Y']

normalized_X_train, normalized_X_test, normalized_y_train, normalized_y_test = train_test_split(normalized_x_data, normalized_y_data, test_size=0.15)

In [34]:
normalized_clf = SVC(kernel='linear',probability=True)
normalized_clf.fit(normalized_X_train,normalized_y_train)

ValueError: Unknown label type: 'continuous'

#### Support-vector machine

In [31]:
clf = SVC(kernel='linear',probability=True)
clf.fit(X_train,y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=True, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [10]:
y_pred = clf.predict(X_test)
print("Training Score:" , clf.score(X_train,y_train)*100)
print('Accuracy :' , accuracy_score(y_test,y_pred))

Training Score: 63.55140186915887
Accuracy : 0.42105263157894735


#### Logistic Regression

In [11]:
logReg = LogisticRegression()
logReg.fit(X_train,y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [13]:
logregpred = logReg.predict(X_test)
print("Training Score:" , logReg.score(X_train,y_train)*100)
print('Accuracy :' , accuracy_score(y_test,logregpred))

Training Score: 63.55140186915887
Accuracy : 0.3684210526315789


#### Logistic Regression Cross-Validation Estimator

In [14]:
logRegCV = LogisticRegressionCV()
logRegCV.fit(X_train,y_train)

LogisticRegressionCV(Cs=10, class_weight=None, cv=None, dual=False,
           fit_intercept=True, intercept_scaling=1.0, max_iter=100,
           multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
           refit=True, scoring=None, solver='lbfgs', tol=0.0001, verbose=0)

In [15]:
logregCVpred = logRegCV.predict(X_test)
print("Training Score:" , logRegCV.score(X_train,y_train)*100)
print('Accuracy :' , accuracy_score(y_test,logregCVpred))

Training Score: 60.747663551401864
Accuracy : 0.3684210526315789


#### Random Forest Classifier

In [24]:
ranfor1 = RandomForestClassifier(n_estimators= 8)
ranfor1.fit(X_train,y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=8, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

In [25]:
ranfor1pred = ranfor1.predict(X_test)
print("Training Score:" , ranfor1.score(X_train,y_train)*100)
print('Accuracy :' , accuracy_score(y_test,ranfor1pred))

Training Score: 94.39252336448598
Accuracy : 0.5789473684210527


#### K Neighbors Classifier

In [21]:
neigh = KNeighborsClassifier(n_neighbors=6)
neigh.fit(X_train,y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=6, p=2,
           weights='uniform')

In [30]:
neighpred = neigh.predict(X_test)
print("Training Score:" , neigh.score(X_train,y_train)*100)
print('Accuracy :' , accuracy_score(y_test,neighpred))

Training Score: 72.89719626168224
Accuracy : 0.5263157894736842


#### Neural Network Classifier

In [28]:
def build_model():
    model = Sequential()
    model.add(Dense(500, activation='relu', input_dim=6))
    model.add(Dense(100, activation='relu'))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(2, activation='softmax'))
    
    model.compile(optimizer='adam', 
                  loss='categorical_crossentropy', 
                  metrics=['accuracy'])
    
    
    return model

Kclf = KerasClassifier(build_model, epochs=10)
Kclf.fit(X_train,y_train)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x1a5da2eba8>

In [29]:
Kclfpred = Kclf.predict(X_test)
print("Training Score:" , Kclf.score(X_train,y_train)*100)
print('Accuracy :' , accuracy_score(y_test,Kclfpred))



Training Score: 68.22429895401001
Accuracy : 0.42105263157894735


In [37]:
1 + 1

2