# Classification Project
This is a modified version of a classification project first completed as an exercise for an online course. I then went ahead and did some of my own analysis of it using Tensorflow. What we have here is a benchmarking of two different approaches to classifying a person's income bracket based on a host of different features. The first approach uses a linear classifier from the Tensorflow library and the second uses a Dense Neural Network. Their accuracies and confusion matrices are available down below.

We'll be working with some California Census Data, we'll be trying to use various features of an individual to predict what class of income they belong in (>50k or <=50k). 

Here is some information about the data:

<table>
<thead>
<tr>
<th>Column Name</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>age</td>
<td>Continuous</td>
<td>The age of the individual</td>
</tr>
<tr>
<td>workclass</td>
<td>Categorical</td>
<td>The type of employer the  individual has (government,  military, private, etc.).</td>
</tr>
<tr>
<td>fnlwgt</td>
<td>Continuous</td>
<td>The number of people the census  takers believe that observation  represents (sample weight). This  variable will not be used.</td>
</tr>
<tr>
<td>education</td>
<td>Categorical</td>
<td>The highest level of education  achieved for that individual.</td>
</tr>
<tr>
<td>education_num</td>
<td>Continuous</td>
<td>The highest level of education in  numerical form.</td>
</tr>
<tr>
<td>marital_status</td>
<td>Categorical</td>
<td>Marital status of the individual.</td>
</tr>
<tr>
<td>occupation</td>
<td>Categorical</td>
<td>The occupation of the individual.</td>
</tr>
<tr>
<td>relationship</td>
<td>Categorical</td>
<td>Wife, Own-child, Husband,  Not-in-family, Other-relative,  Unmarried.</td>
</tr>
<tr>
<td>race</td>
<td>Categorical</td>
<td>White, Asian-Pac-Islander,  Amer-Indian-Eskimo, Other, Black.</td>
</tr>
<tr>
<td>gender</td>
<td>Categorical</td>
<td>Female, Male.</td>
</tr>
<tr>
<td>capital_gain</td>
<td>Continuous</td>
<td>Capital gains recorded.</td>
</tr>
<tr>
<td>capital_loss</td>
<td>Continuous</td>
<td>Capital Losses recorded.</td>
</tr>
<tr>
<td>hours_per_week</td>
<td>Continuous</td>
<td>Hours worked per week.</td>
</tr>
<tr>
<td>native_country</td>
<td>Categorical</td>
<td>Country of origin of the  individual.</td>
</tr>
<tr>
<td>income</td>
<td>Categorical</td>
<td>"&gt;50K" or "&lt;=50K", meaning  whether the person makes more  than \$50,000 annually.</td>
</tr>
</tbody>
</table>

### THE DATA

** Read in the census_data.csv data with pandas**

In [3]:
import numpy as np
import pandas as pd

In [4]:
import tensorflow  as tf
import matplotlib.pyplot as plt
%matplotlib inline

In [5]:
census = pd.read_csv('census_data.csv')

In [6]:
census.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 14 columns):
age               32561 non-null int64
workclass         32561 non-null object
education         32561 non-null object
education_num     32561 non-null int64
marital_status    32561 non-null object
occupation        32561 non-null object
relationship      32561 non-null object
race              32561 non-null object
gender            32561 non-null object
capital_gain      32561 non-null int64
capital_loss      32561 non-null int64
hours_per_week    32561 non-null int64
native_country    32561 non-null object
income_bracket    32561 non-null object
dtypes: int64(5), object(9)
memory usage: 3.5+ MB


In [7]:
census.describe()

Unnamed: 0,age,education_num,capital_gain,capital_loss,hours_per_week
count,32561.0,32561.0,32561.0,32561.0,32561.0
mean,38.581647,10.080679,1077.648844,87.30383,40.437456
std,13.640433,2.57272,7385.292085,402.960219,12.347429
min,17.0,1.0,0.0,0.0,1.0
25%,28.0,9.0,0.0,0.0,40.0
50%,37.0,10.0,0.0,0.0,40.0
75%,48.0,12.0,0.0,0.0,45.0
max,90.0,16.0,99999.0,4356.0,99.0


In [8]:
census.head()

Unnamed: 0,age,workclass,education,education_num,marital_status,occupation,relationship,race,gender,capital_gain,capital_loss,hours_per_week,native_country,income_bracket
0,39,State-gov,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


** TensorFlow won't be able to understand strings as labels,  use pandas .apply() method to apply a custom function that converts them to 0s and 1s. 

** Convert the Label column to 0s and 1s instead of strings.**

In [9]:
def income_dummy(inc):
    if inc ==' <=50K':
        return 0
    else:
        return 1

In [10]:
census['labels2'] = census['income_bracket'].apply(lambda x: income_dummy(x))

In [11]:
census['labels2'].describe()

count    32561.000000
mean         0.240810
std          0.427581
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max          1.000000
Name: labels2, dtype: float64

In [12]:
census.drop('income_bracket',axis=1,inplace=True)

In [13]:
census.head()

Unnamed: 0,age,workclass,education,education_num,marital_status,occupation,relationship,race,gender,capital_gain,capital_loss,hours_per_week,native_country,labels2
0,39,State-gov,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,0
1,50,Self-emp-not-inc,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,0
2,38,Private,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,0
3,53,Private,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,0
4,28,Private,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,0


### Perform a Train Test Split on the Data

In [14]:
X = census.drop('labels2',axis=1)
y = census['labels2']

In [15]:
from sklearn.model_selection import train_test_split

In [16]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

### Create the Feature Columns for tf.esitmator

** Take note of categorical vs continuous values! **

In [17]:
census.columns

Index(['age', 'workclass', 'education', 'education_num', 'marital_status',
       'occupation', 'relationship', 'race', 'gender', 'capital_gain',
       'capital_loss', 'hours_per_week', 'native_country', 'labels2'],
      dtype='object')

** Import Tensorflow **

In [18]:
import tensorflow as tf

** Create the tf.feature_columns for the categorical values. Use vocabulary lists or just use hash buckets. **

In [19]:
#There's no way around it, you have to type it in, going to look for a for-loop
workclass = tf.feature_column.categorical_column_with_hash_bucket('workclass',hash_bucket_size=10)
education = tf.feature_column.categorical_column_with_hash_bucket('education',hash_bucket_size=17)
marital_status =  tf.feature_column.categorical_column_with_hash_bucket('marital_status',hash_bucket_size=8)
occupation =  tf.feature_column.categorical_column_with_hash_bucket('occupation',hash_bucket_size=20)
relationship =  tf.feature_column.categorical_column_with_hash_bucket('relationship',hash_bucket_size=7)
race =  tf.feature_column.categorical_column_with_hash_bucket('race',hash_bucket_size=6)
gender =  tf.feature_column.categorical_column_with_hash_bucket('gender',hash_bucket_size=2)
native_country =  tf.feature_column.categorical_column_with_hash_bucket('native_country',hash_bucket_size=45)

** Create the continuous feature_columns for the continuous values using numeric_column **

In [20]:
capital_gain = tf.feature_column.numeric_column('capital_gain')
capital_loss = tf.feature_column.numeric_column('capital_loss')
hours_per_week = tf.feature_column.numeric_column('hours_per_week')
age =  tf.feature_column.numeric_column('age')

In [21]:
#Bucketize Age before training the Classifier
age = tf.feature_column.bucketized_column(age,boundaries=[10,20,30,40,50,60,70,80,90])

** Put all these variables into a single list with the variable name feat_cols **

In [45]:
feat_cols = [age, workclass, education, marital_status, occupation, relationship, race, gender, native_country, capital_gain, capital_loss, hours_per_week]

### Create Input Function

** Batch_size is up to you. But do make sure to shuffle!**

In [46]:
input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train,
                                                batch_size=10, num_epochs=1000,
                                                shuffle=True)

#### Create your model with tf.estimator

**Create a LinearClassifier.(If you want to use a DNNClassifier, keep in mind you'll need to create embedded columns out of the cateogrical feature that use strings, check out the previous lecture on this for more info.)**

In [47]:
linearClassifier = tf.estimator.LinearClassifier(feature_columns=feat_cols, n_classes=2)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_log_step_count_steps': 100, '_keep_checkpoint_max': 5, '_save_summary_steps': 100, '_tf_random_seed': 1, '_save_checkpoints_secs': 600, '_keep_checkpoint_every_n_hours': 10000, '_save_checkpoints_steps': None, '_model_dir': 'C:\\Users\\shahj\\AppData\\Local\\Temp\\tmp0jpun782', '_session_config': None}


** Train your model on the data, for at least 5000 steps. **

In [48]:
linearClassifier.train(input_fn=input_func,steps=10000)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into C:\Users\shahj\AppData\Local\Temp\tmp0jpun782\model.ckpt.
INFO:tensorflow:step = 1, loss = 6.931472
INFO:tensorflow:global_step/sec: 129.503
INFO:tensorflow:step = 101, loss = 5.5073614 (0.782 sec)
INFO:tensorflow:global_step/sec: 150.599
INFO:tensorflow:step = 201, loss = 1.386481 (0.662 sec)
INFO:tensorflow:global_step/sec: 149.247
INFO:tensorflow:step = 301, loss = 1.0221984 (0.666 sec)
INFO:tensorflow:global_step/sec: 128.668
INFO:tensorflow:step = 401, loss = 114.11214 (0.787 sec)
INFO:tensorflow:global_step/sec: 139.833
INFO:tensorflow:step = 501, loss = 16.53448 (0.706 sec)
INFO:tensorflow:global_step/sec: 144.09
INFO:tensorflow:step = 601, loss = 2.1320727 (0.697 sec)
INFO:tensorflow:global_step/sec: 158.236
INFO:tensorflow:step = 701, loss = 12.649017 (0.633 sec)
INFO:tensorflow:global_step/sec: 143.871
INFO:tensorflow:step = 801, loss = 11.366744 (0.696 sec)
INFO:tensorflow:global_step/s

INFO:tensorflow:step = 8201, loss = 1.2693332 (0.705 sec)
INFO:tensorflow:global_step/sec: 134.564
INFO:tensorflow:step = 8301, loss = 4.3738675 (0.740 sec)
INFO:tensorflow:global_step/sec: 145.762
INFO:tensorflow:step = 8401, loss = 2.879949 (0.695 sec)
INFO:tensorflow:global_step/sec: 144.915
INFO:tensorflow:step = 8501, loss = 1.4454378 (0.685 sec)
INFO:tensorflow:global_step/sec: 146.189
INFO:tensorflow:step = 8601, loss = 3.7181957 (0.684 sec)
INFO:tensorflow:global_step/sec: 161.044
INFO:tensorflow:step = 8701, loss = 6.6376395 (0.619 sec)
INFO:tensorflow:global_step/sec: 172.71
INFO:tensorflow:step = 8801, loss = 2.5558 (0.580 sec)
INFO:tensorflow:global_step/sec: 179.619
INFO:tensorflow:step = 8901, loss = 9.712483 (0.557 sec)
INFO:tensorflow:global_step/sec: 146.618
INFO:tensorflow:step = 9001, loss = 2.1728253 (0.678 sec)
INFO:tensorflow:global_step/sec: 162.353
INFO:tensorflow:step = 9101, loss = 4.1912637 (0.617 sec)
INFO:tensorflow:global_step/sec: 160.526
INFO:tensorflow:

<tensorflow.python.estimator.canned.linear.LinearClassifier at 0x2a01afe9f60>

### Evaluation

** Create a prediction input function. Remember to only supprt X_test data and keep shuffle=False. **

In [49]:
pred_input_func =  tf.estimator.inputs.pandas_input_fn(x=X_test,batch_size=10,
                                                num_epochs=1,shuffle=False)

** Use model.predict() and pass in your input function. This will produce a generator of predictions, which you can then transform into a list, with list() **

In [50]:
predictions = list(linearClassifier.predict(input_fn=pred_input_func))

INFO:tensorflow:Restoring parameters from C:\Users\shahj\AppData\Local\Temp\tmp0jpun782\model.ckpt-10000


** Each item in your list will look like this: **

In [51]:
predictions

[{'class_ids': array([0], dtype=int64),
  'classes': array([b'0'], dtype=object),
  'logistic': array([0.2909895], dtype=float32),
  'logits': array([-0.89058316], dtype=float32),
  'probabilities': array([0.7090105 , 0.29098952], dtype=float32)},
 {'class_ids': array([0], dtype=int64),
  'classes': array([b'0'], dtype=object),
  'logistic': array([5.6648994e-05], dtype=float32),
  'logits': array([-9.77858], dtype=float32),
  'probabilities': array([9.999434e-01, 5.664899e-05], dtype=float32)},
 {'class_ids': array([1], dtype=int64),
  'classes': array([b'1'], dtype=object),
  'logistic': array([0.55518585], dtype=float32),
  'logits': array([0.22164631], dtype=float32),
  'probabilities': array([0.4448142 , 0.55518585], dtype=float32)},
 {'class_ids': array([0], dtype=int64),
  'classes': array([b'0'], dtype=object),
  'logistic': array([4.3361775e-05], dtype=float32),
  'logits': array([-10.045889], dtype=float32),
  'probabilities': array([9.9995661e-01, 4.3361775e-05], dtype=float

** Create a list of only the class_ids key values from the prediction list of dictionaries, these are the predictions you will use to compare against the real y_test values. **

In [52]:
final_pred = []

for pred in predictions:
    final_pred.append(pred['class_ids'][0])

In [53]:
final_pred

[0,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 1,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,


** Import classification_report from sklearn.metrics and then see if you can figure out how to use it to easily get a full report of your model's performance on the test data. **

In [54]:
from sklearn.metrics import classification_report, confusion_matrix

In [55]:
print(confusion_matrix(y_test,final_pred))
print('\n')
print(classification_report(y_test,final_pred))

[[6615  821]
 [ 785 1548]]


             precision    recall  f1-score   support

          0       0.89      0.89      0.89      7436
          1       0.65      0.66      0.66      2333

avg / total       0.84      0.84      0.84      9769



## Check to See if a DNN Classifier Does better

I want to see if the DNN classifier does better than the Linear Classifier

In [25]:
feat_cols

[_BucketizedColumn(source_column=_NumericColumn(key='age', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), boundaries=(10, 20, 30, 40, 50, 60, 70, 80, 90)),
 _HashedCategoricalColumn(key='workclass', hash_bucket_size=10, dtype=tf.string),
 _HashedCategoricalColumn(key='education', hash_bucket_size=17, dtype=tf.string),
 _HashedCategoricalColumn(key='marital_status', hash_bucket_size=8, dtype=tf.string),
 _HashedCategoricalColumn(key='occupation', hash_bucket_size=20, dtype=tf.string),
 _HashedCategoricalColumn(key='relationship', hash_bucket_size=7, dtype=tf.string),
 _HashedCategoricalColumn(key='race', hash_bucket_size=6, dtype=tf.string),
 _HashedCategoricalColumn(key='gender', hash_bucket_size=2, dtype=tf.string),
 _HashedCategoricalColumn(key='native_country', hash_bucket_size=45, dtype=tf.string),
 _NumericColumn(key='capital_gain', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='capital_loss', shape=(1,), default

In [27]:
embedded_workclass = tf.feature_column.embedding_column(workclass, dimension=9)
embedded_education = tf.feature_column.embedding_column(education, dimension=16)
embedded_marital_status = tf.feature_column.embedding_column(marital_status, dimension=7)
embedded_occupation = tf.feature_column.embedding_column(occupation, dimension=15)
embedded_relationship = tf.feature_column.embedding_column(relationship, dimension=6)
embedded_race = tf.feature_column.embedding_column(race, dimension=5)
embedded_gender = tf.feature_column.embedding_column(gender, dimension=2)
embedded_native_country = tf.feature_column.embedding_column(native_country, dimension=42)

In [28]:
feat_cols2 = [capital_gain, capital_loss, hours_per_week, age,  embedded_workclass, embedded_education, 
                      embedded_marital_status, embedded_occupation, embedded_native_country,
                      embedded_relationship, embedded_race, embedded_gender]

In [29]:
input_func2 = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train,
                                                batch_size=10, num_epochs=1000,
                                                shuffle=True)

In [30]:
DNNClassifier = tf.estimator.DNNClassifier(hidden_units=[13,13, 13, 13], feature_columns=feat_cols2,n_classes=2)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_log_step_count_steps': 100, '_keep_checkpoint_max': 5, '_save_summary_steps': 100, '_tf_random_seed': 1, '_save_checkpoints_secs': 600, '_keep_checkpoint_every_n_hours': 10000, '_save_checkpoints_steps': None, '_model_dir': 'C:\\Users\\shahj\\AppData\\Local\\Temp\\tmpjz16kqoi', '_session_config': None}


In [32]:
DNNClassifier.train(input_fn=input_func2,steps=10000)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into C:\Users\shahj\AppData\Local\Temp\tmpjz16kqoi\model.ckpt.
INFO:tensorflow:step = 1, loss = 12.253739
INFO:tensorflow:global_step/sec: 126.549
INFO:tensorflow:step = 101, loss = 7.0110383 (0.795 sec)
INFO:tensorflow:global_step/sec: 159.755
INFO:tensorflow:step = 201, loss = 3.6758003 (0.625 sec)
INFO:tensorflow:global_step/sec: 153.139
INFO:tensorflow:step = 301, loss = 6.4826183 (0.652 sec)
INFO:tensorflow:global_step/sec: 148.14
INFO:tensorflow:step = 401, loss = 2.5851936 (0.676 sec)
INFO:tensorflow:global_step/sec: 162.854
INFO:tensorflow:step = 501, loss = 4.492189 (0.622 sec)
INFO:tensorflow:global_step/sec: 156.349
INFO:tensorflow:step = 601, loss = 5.677469 (0.624 sec)
INFO:tensorflow:global_step/sec: 167.237
INFO:tensorflow:step = 701, loss = 3.040234 (0.598 sec)
INFO:tensorflow:global_step/sec: 152.07
INFO:tensorflow:step = 801, loss = 3.2710392 (0.662 sec)
INFO:tensorflow:global_step/se

INFO:tensorflow:step = 8201, loss = 3.6338372 (0.614 sec)
INFO:tensorflow:global_step/sec: 158.739
INFO:tensorflow:step = 8301, loss = 1.4360163 (0.629 sec)
INFO:tensorflow:global_step/sec: 133.843
INFO:tensorflow:step = 8401, loss = 4.3632 (0.746 sec)
INFO:tensorflow:global_step/sec: 141.426
INFO:tensorflow:step = 8501, loss = 4.0012445 (0.706 sec)
INFO:tensorflow:global_step/sec: 128.173
INFO:tensorflow:step = 8601, loss = 2.9793577 (0.781 sec)
INFO:tensorflow:global_step/sec: 136.774
INFO:tensorflow:step = 8701, loss = 4.5035667 (0.730 sec)
INFO:tensorflow:global_step/sec: 144.499
INFO:tensorflow:step = 8801, loss = 2.3577082 (0.691 sec)
INFO:tensorflow:global_step/sec: 146.618
INFO:tensorflow:step = 8901, loss = 6.67253 (0.682 sec)
INFO:tensorflow:global_step/sec: 159.5
INFO:tensorflow:step = 9001, loss = 0.96872807 (0.628 sec)
INFO:tensorflow:global_step/sec: 151.744
INFO:tensorflow:step = 9101, loss = 4.3725033 (0.658 sec)
INFO:tensorflow:global_step/sec: 148.361
INFO:tensorflow:

<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x2a01cacf630>

In [33]:
eval_input_func = tf.estimator.inputs.pandas_input_fn(x=X_test,y=y_test,batch_size=10,
                                                      num_epochs=1,shuffle=False)

In [35]:
DNNClassifier.evaluate(eval_input_func)

INFO:tensorflow:Starting evaluation at 2019-06-17-17:32:13
INFO:tensorflow:Restoring parameters from C:\Users\shahj\AppData\Local\Temp\tmpjz16kqoi\model.ckpt-10000
INFO:tensorflow:Finished evaluation at 2019-06-17-17:32:21
INFO:tensorflow:Saving dict for global step 10000: accuracy = 0.8206572, accuracy_baseline = 0.7611833, auc = 0.87333333, auc_precision_recall = 0.6293771, average_loss = 0.36417878, global_step = 10000, label/mean = 0.23881666, loss = 3.6414149, prediction/mean = 0.24858457


{'accuracy': 0.8206572,
 'accuracy_baseline': 0.7611833,
 'auc': 0.87333333,
 'auc_precision_recall': 0.6293771,
 'average_loss': 0.36417878,
 'global_step': 10000,
 'label/mean': 0.23881666,
 'loss': 3.6414149,
 'prediction/mean': 0.24858457}

In [36]:
pred_input_func2 =  tf.estimator.inputs.pandas_input_fn(x=X_test,batch_size=10,
                                                num_epochs=1,shuffle=False)

In [40]:
predictions2 = list(DNNClassifier.predict(input_fn=pred_input_func2))

INFO:tensorflow:Restoring parameters from C:\Users\shahj\AppData\Local\Temp\tmpjz16kqoi\model.ckpt-10000


In [56]:
final_pred2 = []

for pred in predictions2:
    final_pred2.append(pred['class_ids'][0])

In [58]:
print("LINEAR CLASSIFICATION RESULTS:")
print(confusion_matrix(y_test,final_pred))
print('\n')
print(classification_report(y_test,final_pred))
print("DNN CLASSIFICATION RESULTS:")
print(confusion_matrix(y_test,final_pred2))
print('\n')
print(classification_report(y_test,final_pred2))

LINEAR CLASSIFICATION RESULTS:
[[6615  821]
 [ 785 1548]]


             precision    recall  f1-score   support

          0       0.89      0.89      0.89      7436
          1       0.65      0.66      0.66      2333

avg / total       0.84      0.84      0.84      9769

DNN CLASSIFICATION RESULTS:
[[6513  923]
 [ 829 1504]]


             precision    recall  f1-score   support

          0       0.89      0.88      0.88      7436
          1       0.62      0.64      0.63      2333

avg / total       0.82      0.82      0.82      9769



It looks like there isn't much of a difference between the Linear Classifier and the DNN Classifier in terms of precision and recall. In fact the Linear Classifier does slightly better