<p style="font-size:24pt; text-decoration:underline; font-weight:bold; color:#003057; text-align:center">
    PACE - Applications of Machine Learning
</p>

<br>
<center>
        <a href = "mailto: dnagendra3@gatech.edu"><b>Deepa Phanish, PhD</b></a> : <a href = "https://pace.gatech.edu" target = "_blank"><b>PACE, Georgia Tech</b></a> <br>
    <a href = "mailto: ajezghani3@gatech.edu"><b>Aaron Jezghani, PhD</b></a> : <a href = "https://pace.gatech.edu" target = "_blank"><b>PACE, Georgia Tech</b></a> <br>
    <a href = "mailto: chris_blanton@ncsu.edu"><b>Chris Blanton, PhD</b></a> : <a href = "https://www.lib.ncsu.edu" target= "_blank"><b>Research Consulting, NCSU</b></a>
    
</center>

<font size=5 color=B3A369><u><b>Introduction</b></u></font>

<br>
<center><font size=4><i>[Machine Learning is a] field of study that gives computers the ability to learn without being explicitly programmed.</i> <br> --- Arthur Samuel, 1959</font></center>

Machine learning has existed for decades; however, until recently, computing power and data storage were too limited to allow machines to solve many problems in the field effectively. Advances in speed and density have taken machine learning from an abstract idea to the forefront of scientific research across many domains, including:
- Bioinformatics
- Molecular dynamics
- Astrophysics
- Signal processing
- Health data analytics
- Finance and marketing
- Urban planning
- ...and many more

The volume of data being generated and made available for research is rapidly increasing, and while the technology is constantly being reconsidered to keep up, the challenge of meaningfully utilizing it is becoming ever more present. Even if we ignore the challenges of human bias, the efficiency of humans in reviewing data remains 

<font size=4 color=B3A369><b>Spam Filters: The Traditional Way</b></font>

1. We want to start by identifying some common features in spam emails; these could be:
    - phrases ("4U", "one simple trick", "aliens", "lottery"), 
    - questionable domains ("google.asdqwkjf92.ohno", "definitely-not-stealing-your-info.org"),
    - mismatches in sender's name/email ("Mary Smith from aaron.aaron@itsascam.net"),
    - etc.
2. We would write some rules to capture these bits
    - You may have already done this for your school/work emails to sort by topic!
3. We then execute the email rules to test their validity
    - We need to identify correctness, including false positives (good emails mislabeled as spam) and false negatives (spam emails that were missed by our rules)
4. We update our rules accordingly, and repeat until we're satisfied.
    - Note that as new approaches are deployed, we have to identify and define new features to update our filter.

<img src="image/traditional-programming.png" alt="Traditional Programming" width="600"> 

<font size=4 color=B3A369><b>Spam Filters: Using Machine Learning</b></font>

1. We need a reasonably large collection of emails that have been labeled as spam or not spam.
    - The dataset may have defined characteristics (sender, message length, domain, etc.), or it may not, in which case we have to define our own.
    - Additionally, we can always define additional fields from existing data through a process known as **feature engineering** (e.g., _x_ and _y_ coordinates might be better presented as polar coordinates for data centered around some location).
2. We choose an algorithm that can consider the input characteristics and the categorization to determine which emails are spam and which are not.
    - There are numerous existing algorithms (linear/logistic regression, neural networks, nearest neighbor, etc.) that can be modified, or a new algorithm can be defined (this is a very active research area after all!).
3. We divide the dataset into training and test subsets, typically through some form of random selection.
    - Different algorithms perform differently depending on the data, so we want to ascertain our model's efficacy before deploying it in the wild.
    - Because of the underlying statistcs used in ML, it can often be helpful to explore resampling in an effort to assess the validity of our model.
4. We tune our algorithm until we reach the desired level of accuracy, and then we deploy the model in the real world.
    - Unlike the traditional method, machine learning has the ability to adapt to novel spam data - the model might need retraining, but we can in theory still use the same algorithm (this is what you're doing when you provide feedback on applications like Google Maps).

<img src="image/machine_learning.png" alt="Machine Learning" width="600"> 

<font size=5 color=B3A369><u><b>Using the Correct Tools...Correctly</b></u></font>

There's often a difficult progression with ML projects from proof-of-concept to large-scale application, and a little foresight can reduce headaches significantly. Fortunately, framework such as TensorFlow and PyTorch significantly ease the transition from CPU to GPU, to the point where developing directly for GPU training is accessible. However, issues with scaling, either in terms of data or process distribution, can often manifest as one increases the scope of their project. Additionally, hardware-specific optimizations may be missed as code is migrated to new hardware, especially if outdated versions of computational libraries are used.

Based on the hardware you've chosen to use, it's always worthwhile to explore the vendor's recommended settings. Since we'll be using Intel CPUs, we can take a look at [their recommendations](https://www.intel.com/content/www/us/en/developer/articles/technical/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference.html):

- KMP_AFFINITY=granularity=fine,verbose,compact,1,0 <font color=008080><em>#Bind OpenMP threads to physical cores</em></font>
- TF_ENABLE_ONEDNN_OPTS=1 <font color=008080><em>#Enable Intel® oneAPI Deep Neural Network Library capabilities</em></font>
- OMP_NUM_THREADS=${PBS_NP} <font color=008080><em>#One thread per physical core</em></font>
- KMP_BLOCKTIME=0 <font color=008080><em>#Sets time that threads wait before sleeping</em></font>

In [None]:
import os
os.environ['KMP_AFFINITY']="granularity=fine,verbose,compact" #No hyper-threading
os.environ['TF_ENABLE_ONEDNN_OPTS']="1" #OneNN CPU optimizations
os.environ['OMP_NUM_THREADS']=os.getenv('PBS_NP') #Single hardware thread per core
os.environ['KMP_BLOCKTIME']="0" #empirically test the correct value - >0 for non-threaded code embedding 

<font size=5 color=B3A369><u><b>Practical Example: ML in Medicine</b></u></font>

When developing a machine learning workflow, it can be tempting to focus solely on the machine learning algorithm and model refinement. However, before that aspect can be considered, there are other challenges:
- Where do we get the data?
- Is the data formatted appropriately for the system?
- What framework/hardware will be used?

To explore these issues, we can use a standard training example from the community: automated breast cancer detection. As you may be aware, breast cancer is one of the most common cancers among women worldwide. Early diagnosis of breast cancer can greatly increase the outlook for patients, but accurate diagnosis can be a challenge as it requires expert analysis, and thus areas lacking in experts can be greatly affected.

The Wisconsin breast cancer dataset consists of 30 parameters obtained via analysis of fine needle aspiration (FNA) biopsy of breast masses. This dataset has been previously studied in several papers, including [Breast Cancer Detection with Reduced Feature Set](https://www.hindawi.com/journals/cmmm/2015/265138/). Because each mass is labeled as benign or malignant, we can use the data to explore the application of machine learning techniques and gain insights into the viability of ML for real-world applications such as this.

<font size=4 color=B3A369><b>Our Plan of Action</b></font>

In this workshop, we will look at an example workflow on one of PACE's instructional clusters using CPUs for analysis. In short, we will use our labeled dataset of thirty features to train a classifier that will attempt to label new data as either benign or malignant. The steps we will take are as follows:
1. Import the necessary libraries to explore/analyze our data and develop our model.
2. Acquire our data and transform it to a useable form.
3. Explore our data and garner any initial insights that might help us in our efforts.
4. Prepare the data for training.
5. Split the data into training and test subsets. !! This needs to be done before data prep
6. Pick our ML algorithm and train our model.
7. Test and evaluate our model to further explore the training process.

<font size=5 color=B3A369><u><b>1. Import Libraries</b></u></font>

There are numerous libraries that can be utilized in an ML project - we'll try to touch on several in this workshop to provide broader familiarity.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
import datetime
import sklearn
#show images inline with code block
%matplotlib inline 

Sometimes, it is helpful to check versions of packages to verify capability/compatibility:

In [None]:
print("TensorFlow version: {}".format(tf.__version__))
print("Eager execution is: {}".format(tf.executing_eagerly()))
print("Keras version: {}".format(tf.keras.__version__))

**Eager execution** is enabled by default (and was meant to tackle a big issue in TFv1). It:
- evaluates operations immediately
- returns actual values rather than computational graphs to be run later
- calculates the values of tensors as they occur

Broadly speaking, it's meant to make things simpler and more accessible for beginners.

*However*, disabling this feature provides an opportunity for more optimization, as you can extract tensor computations and build a more effecient graph before proceeding. Thus, more advanced users may desire to run in this mode instead - for today, though, we'll keep it enabled.

<font size=5 color=B3A369><u><b>2. Acquire/Transform Data</b></u></font>

There are numerous sources for ML training sets, including directly from the Python libraries themselves. Picking one from the ML framework you're using has the advantage that it is usually formatted correctly, but since we want to explore the data transformation component of our workflow, we'll take our dataset from the SciKit-Learn datasets.

In [None]:
from sklearn.datasets import load_breast_cancer # Loading the breast cancer from a standard datasource within SciKit
cancer = load_breast_cancer()

Let's take a look at the data to get a better understanding of how it is formatted:

In [None]:
cancer

Although the data uses human-readable characters, it's not formatted for easy reading by a human. We can manipulate it to change that!

First, let's start by getting the names of the fields:

In [None]:
cancer.keys()

Using this Python dictionary, we can focus on a single component of the dataset rather than dumping a block of information. For example, we can read the data description:

In [None]:
print(cancer['DESCR'])

Because we want to develop a classifier, ultimately we need to explore the labels, or targets, for the data:

In [None]:
print(cancer['target'])

As you can see, the target data is presented as a binary encoding (either 0 or 1). If we want to know which value maps to which label, we can use the "target_names" field:

In [None]:
print(cancer['target_names'])

Since the array is 0-indexed, that means a value of '0' maps to 'malignant' and a value of '1' corresponds to 'benign'. 

Carrying on, the features of the set can be found by looking at the 'features_names' entry:

In [None]:
print(cancer['feature_names'])

<font size=4 color=B3A369><b>Pandas Dataframes</b></font>

Since the data exhibits a fair amount of variety, we want to store it into an appropriate object. Pandas dataframes offer an excellent solution - they are data structures that provide labeled axes for heterogeneous data types, so they do exactly what we want! To convert our dataset to a dataframe:

In [None]:
df_cancer = pd.DataFrame(np.c_[cancer['data'],cancer['target']],columns=np.append(cancer['feature_names'],['target']))

Unfortunately, TensorFlow does not allow spaces in feature names, so we'll have to fix that. This can be accomplished by looping over our dataset, replacing the spaces with a suitable character (e.g. an underscore), and updating our dataframe:

In [None]:
for key in df_cancer.keys():
    newkey = key.replace(" ", "_")
    df_cancer.rename(index=str,columns={key:newkey},inplace=True)
print(df_cancer.keys())

Now we can use the nice functionality of a dataframe to look at the beginning of our dataset:

In [None]:
df_cancer.head()

...or the end of it:

In [None]:
df_cancer.tail()

If we're feeling wild, we can even request more than 5 rows at a time!

In [None]:
df_cancer.tail(7)

<font size=4 color=B3A369><b>Feature Scaling</b></font>

One thing that may immediately jump out is the variation in scale of the values for the different features. This can cause a few issues in our analysis if we're not careful:
1. Some algorithms may not function when the scale of features is wildly different. For example, if a Euclidean distance is calculated, the larger feature may completely dominate the calculation.
2. Gradient descent can converge more quickly if features are normalized, which can aid in training time.
3. If regularization is used as part of the loss function, feature scaling is important to ensure that coefficients are penalized properly.

Scikit-learn has built-in functionality that can perform our scaling and put all values in the range 0 to 1:

In [None]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df_scaled = df_cancer.copy()
df_scaled[['mean_radius', 'mean_texture', 'mean_perimeter', 'mean_area',
       'mean_smoothness', 'mean_compactness', 'mean_concavity',
       'mean_concave_points', 'mean_symmetry', 'mean_fractal_dimension',
       'radius_error', 'texture_error', 'perimeter_error', 'area_error',
       'smoothness_error', 'compactness_error', 'concavity_error',
       'concave_points_error', 'symmetry_error', 'fractal_dimension_error',
       'worst_radius', 'worst_texture', 'worst_perimeter', 'worst_area',
       'worst_smoothness', 'worst_compactness', 'worst_concavity',
       'worst_concave_points', 'worst_symmetry', 'worst_fractal_dimension']]=  scaler.fit_transform(df_scaled[['mean_radius', 'mean_texture', 'mean_perimeter', 'mean_area',
       'mean_smoothness', 'mean_compactness', 'mean_concavity',
       'mean_concave_points', 'mean_symmetry', 'mean_fractal_dimension',
       'radius_error', 'texture_error', 'perimeter_error', 'area_error',
       'smoothness_error', 'compactness_error', 'concavity_error',
       'concave_points_error', 'symmetry_error', 'fractal_dimension_error',
       'worst_radius', 'worst_texture', 'worst_perimeter', 'worst_area',
       'worst_smoothness', 'worst_compactness', 'worst_concavity',
       'worst_concave_points', 'worst_symmetry', 'worst_fractal_dimension']])
print(df_scaled.head())
print(df_scaled.tail())

<font size=5 color=B3A369><u><b>3. Explore Data</b></u></font>

Often it is assumed that with all the data, everything can be known. As it turns out, though, gathering the data isn't the only challenge of this approach; redundancy in the data and too many features can lead to inefficiencies in training. For example, in our dataset are there multiple descriptions of the same property that don't provide a additional insights? Or perhaps, are there any data points that seem purely superfluous?

The challenge, of course, is how can we meaningfully pare down our data. Since we want to determine the target with our classifier, we can start by exploring how strongly correlated each feature is with the target values:

In [None]:
print(df_scaled.corr()['target'])

Of course, it helps if we organize our list, so let's start there:

In [None]:
df_scaled.corr()['target'].sort_values()

The above can give us insights into the impact of any one feature in terms of determining the target value, but lacks any information about data redundancies that may exist between different features. 

<font size=4 color=B3A369><b>Dataset Visualization</b></font>

To explore the relationships between variables, we can utilize a correlation matrix heatmap, where the color indicates how strongly correlated each pair of features is. Values close to -1 show a strong negative correlation (one variable increases as the other decreases), while values close to +1 show a strong positive correlation (both increase or decrease together).

While we can use other libraries to generate this heatmap, the Seaborn library provides a very simple, dataframe-compatible function to achieve our goal:

In [None]:
sns.heatmap(df_scaled.corr(),annot=True)

However, the default size can be difficult to view. Let's try to make that a little better:

In [None]:
plt.figure(figsize=(20,10))
ax = sns.heatmap(df_scaled.corr(),annot=True) # This is because of an issue in matplotlib. 
bottom, top = ax.get_ylim() 
ax.set_ylim(bottom+0.5, top-0.5)

It is tempting to use all the features in a haphazard way. In our case, we see several parameters, such as the size of the cells to have multiple types of measurements. It is often advanatageous to try to minimize the number of features because of this and computational expense. 

Thinking about this in a mathematical sense, the features do not necessarily form a orthogonal basis set. This can lead to degenerate answers which may complicate the optimization process and either lead to a local extrema or failure of convergence. **This is in general terms of optimization, not strictly ML terms.** 

As related and practical matter, the large the feature set, the more expensive the calcuation is. By reducing the number of features, we try to increase the "siginal-to-noise" while decreasing the computational expense. 

In our case, we will use the mean parameters for a starting point because it reduces the number of features to 5. Inuitively, mean values tend to be a good choice for measuring trends.  

In [None]:
sns.pairplot(df_scaled, vars=['mean_radius','mean_texture','mean_perimeter','mean_area','mean_smoothness','mean_concave_points'])

We can use Seaborn to color-code each datapoint based on the target value. This can help identify feature relationships that are the most indicative of a certain target value.

In [None]:
g = sns.pairplot(df_scaled,hue='target', vars=['mean_radius','mean_texture','mean_perimeter','mean_area','mean_smoothness','mean_concave_points'])
# Below is to allow the legend to use words instead of numbers. 
handles = g._legend_data.values()
labels = ['Malignant','Benign'] 
g._legend.remove()
g.fig.legend(handles=handles,labels=labels, loc='center right',ncol=1)
g.fig.subplots_adjust(top=0.92,bottom=0.08,right=0.9)

Also, we might find it helpful to understand just how many values we have for each target label. A significant disparity can unintentionally bias our model.

In [None]:
sns.countplot(x=df_scaled['target'])

<font size=5 color=B3A369><u><b>4. Prepare Data</b></u></font>

Given the above, it seems fair to conclude that we have several values that are highly correlated, and that if we limit ourselves to just the mean values, we can reasonably represent the available data while maintaining efficiency (Disclaimer: this is just illustrative of a valid thought process, which may or may not actually be the optimal path forward with this dataset).

In preparation of our application, we can define the features and labels to use in our ML classifier:

In [None]:
features=['mean_radius','mean_concave_points','mean_perimeter','mean_area','mean_smoothness','mean_concavity','mean_texture']
labels=['target']

Next, we want to shuffle our dataset. It's not uncommon to be provided a dataset that is sorted, which can bias results, so randomizing the order is usually a good first step:

In [None]:
randomized_data = df_scaled.reindex(np.random.permutation(df_scaled.index))

Because we didn't reindex our dataframe, we can see the original row labels and confirm that the order is different:

In [None]:
randomized_data.head()

<font size=5 color=B3A369><u><b>5. Split Data</b></u></font>

Next, we need to define our training and test datasets. The trick here is to designate some fraction of our total dataset to be used for training, and then use the remainder to validate our model. 

In [None]:
total_records = len(randomized_data)
training_set_size_portion = 0.8
training_set_size = int(total_records*training_set_size_portion)
test_set_size = total_records - training_set_size
print(total_records,training_set_size,test_set_size)

We can generate our test by using the tail function on our randomized_data:

In [None]:
# Building the testing features and labels
testing_features = randomized_data.tail(test_set_size)[features].copy()
testing_labels = randomized_data.tail(test_set_size)[labels].copy()

Again, we can verify the content of the test data set:

In [None]:
testing_features.head()

...and see that the indices for the targets match those of the features:

In [None]:
testing_labels.head()

Next, we build our training set from the other portion of randomized_data and confirm the same about its indices:

In [None]:
training_features = randomized_data.head(training_set_size)[features].copy()
training_labels = randomized_data.head(training_set_size)[labels].copy()
print(training_features.head())
print(training_labels.head())

Lastly, we want to define our feature columns for TensorFlow:

In [None]:
feature_columns = [tf.feature_column.numeric_column(key) for key in features]

In [None]:
print(feature_columns)

<font size=5 color=B3A369><u><b>6. Select ML Algorithm and Train</b></u></font>

There are many models that can be used to attempt to solve the problem of classifying wheter the cancer is benign or malignant. In this example, we will use a neural network; which is a mathematical model that is inspired by how brains use.

The strength of neural networks has been shown in the ability of these algorithms to excel in certain problems, especially classification. In the case of this problem, there is a deep pattern that is inside the set of data and the cancer outcome (otherwise, how would the physician's determination be better than a random determination). It seems like a fruitiful approach to develop neural network to classify each patient's data in terms of malignant or benign. 

<font size=4 color=B3A369><b>Neural Networks</b></font>

Neural networks are a type of machine learning algorithm that are inspired by neurons in the human brain. Similar to neurons in the brains, neural networks are formed by interconnecting neurons that interact with each other. Each neuron takes input, does some simple alogrithm to it, and then passes an output to the next neuron.

Let us look at a perceptron; that is, a single layer neural network. 

The *perceptron* is a mathematical function that takes a set of inputs, performs some operation, and outputs the result. In this case,
$$ y = \sum_{i} w_{i}x{i} + w_0,$$
where $w_i$ is the weight of the perceptron and $w_0$ is the bias. Note that this is the form of a line (plane,hyperplane,...) The weights are used to determine the importance of the of that component and the bias shifts the activation function curve up and down. 

The results of the perceptron acting on the inputs, will be input into the activation function, which will determine how to classify the set. 

<font size=4 color=B3A369><b>Architecture of Neural Networks</b></font>

A neural network consists of 
* An input layer 
* Any number of hidden layers (these are called hidden because the external observe does not see the output)
* An output layer
* A set of weights and bias between each layer $\{w_i\}, \{b_i\}$
* An activation function for each layer, $\sigma$

<img src='image/neural_network_1.png'>

<font size=4 color=B3A369><b>Training Process</b></font>

Each iteration of the training process consists of the following steps:
1. Calculating the predicted output $\hat{y}$, known as _*Feedforward*_
2. Updating the weights and biases, known as _*Backpropagation*_

Schematicially, this can be illustated as 
<img src='image/nn_iteration.png'>

<font size=3 color=B3A369><b>Feedforward</b></font>

The forward motion is quite simply the calculation of the function in series, that is the the sum of the products of the weights and activations that lead to the neuron. Swe are moving forward in the network. 

The loss function comes into play at this point, since we must determine the "goodness" of our performance.
There are many possibilities to use for the *loss* function, such as the familar *sum-of-squares error*
$$ \mathrm{loss} = \sum_{i=1}^n (y-\hat{y})^2$$

<font size=3 color=B3A369><b>Backpropagation</b></font>

As we measure the error of our prediction, we can now find a way to use the error to improve the network, if desired. This is termed *backpropagation*. We work away back to update the weights and biases for the neurons. 

Minimization of the error function is how this optimization. There are multiple methods to optimize these multiple dimension functions, a popular one method may be to use the derviative of the loss function to determine the path of greatest decrease as in *gradient descent*.

<font size=4 color=B3A369><b>Hyperparameters</b></font>

*Hyperparameters* are the *variables which determine the network structure* and *how the network is trained*. Examples that effect the *learning rate* are *epoch*, *batches*, and *iterations*. These are important parameters that are not learned by the network so they must be specified by the model designer. 

An *epoch* is when an entire training dataset is passed forward and backward through the network *once*. It is at the end of an epoch that parameters (weights and biases) have updated. In short (batch_size * number_iterations >= number_data)

An *iteration* is the number of *batches* needed to complete one epoch.

In some cases, the dataset will need to be divided into *batches* in order to fit everything in memory in order complete the calculations. Many ML frameworks natively support this with batches, but sometimes you may have to manually specify them.

<font size=4 color=B3A369><b>Lets Try it Out</b></font>

For our initial attempt, lets define a DNN Classifier with 4 layers, and hidden nodes determined as $$round((2*nodes_{n-1})/3)

In [None]:
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,hidden_units=[12,10,9,8], n_classes=2,model_dir='tmp/model')

<font size=4 color=B3A369><b>Train the Network</b></font>

We define the training the input function now. 

The function that does this is 

`train_input_fn = tf.estimator.inputs.pandas_input_fn(x=training_features, y=training_labels['target'], num_epochs=15,shuffle=True)`

In this case, we will pass through the data set 15 times, updating the weight and biases based on the loss.
<https://www.tensorflow.org/api_docs/python/tf/estimator/inputs/pandas_input_fn> for complete documentation of the function.


In [None]:
train_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(x=training_features,y=training_labels['target'],num_epochs=15,shuffle=True)

In [None]:
print(type(training_features['mean_radius']), type(training_labels['target']))

*Note** If you are reruning the calculation, it may be necessary to clean out the tmp directory.

In [None]:
classifier.train(input_fn=train_input_fn,steps=2000)

<font size=5 color=B3A369><u><b>7. Test Model</b></u></font>

Now that we've trained our model, let's test it out:

In [None]:
test_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(x=testing_features,y=testing_labels['target'],num_epochs=15,shuffle=False)

In [None]:
classifier2.evaluate(input_fn=test_input_fn)

Our accuracy wasn't particularly high - and in fact, due to the stochastic nature of the algorithm, we can rerun it and get wildly different results (I've seen accuracy ranging from 50-95% accuracy!).

So how do we improve our model? There are multiple approaches we can consider:
- Increase hidden layers
- Change activiation function
- Change activation function in output layer
- Increase number of neurons
- Weight initialization
- More data
- Normalization/scaling data
- Change learning algorithm parameters
- Change our algorithm

As an example, let's say we simply want to change algorithms and consider something simpler - for classification such as this, linear regression is often a great starting point. (Note - a neural network is extremely robust, and can actually achieve higher overall accuracy, but given the small dataset, there are multiple knobs to tune here)

In [None]:
classifier2 = tf.estimator.LinearClassifier(feature_columns=feature_columns,n_classes=2,model_dir='tmp/model2')

In [None]:
classifier2.train(input_fn=train_input_fn,steps=2000)

In [None]:
classifier2.evaluate(input_fn=test_input_fn)

So we can see that the accuracy improved, and consistently so - this is likely due to a variety of reasons, including:
- small data size (more susceptible to random behavior)
- less than optimal features (there were other, more highly correlated features)
- naive NN design (hidden layers, number neurons, etc.)

Again, this was less about the theory of ML and more about the tools, so we can still appreciate those benefits.

[Please provide feedback!](https://gatech.co1.qualtrics.com/jfe/form/SV_55uzMYLufTuiLch)