# Introduction to Data Science
## A Machine Learning Review
***

Through the course of the semester we have discussed a lot of different data science techniques and explored a lot of Python code for putting these concepts into use. The goal of this notebook is to provide a review of some of these concepts while making an effort at tying them together. We will also review some of the Python code we used to apply everything we've learned to some real data.

## 1. A (Quick) Discussion on Python and Package Control
Python makes use of *many* packages to do a wide range of tasks. Some of these packages are maintained by the same people that work on the Python programming languages. Others are created by 3rd party teams. There are packages to do basic tasks like simple math and telling time. Other packages are used mainly for handling data, to do scientific computing, or machine learning.

### 1.1 Data Structures
In addition to storing strings, integers, and decimal (floats) numbers in Python, we have been using two main data structures: Python *lists* and *dictionaries*. I want to explain the difference between these two very briefly.

Lists and dictionaries are both key-value stores: given a key (location) you can recieve a value. A list uses ordered keys to retrieve values. A dictionary uses unordered keys to return values. For example,

In [1]:
my_list = [5, 2, 6, 1, 7]
my_list_2 = ['bob', 'natalia', 'panos', 'michelle', 'foster']
print "First list: %s" % str(my_list)
print "Second list: %s" % str(my_list_2)
print "The first (0'th) element in the first list is: %d" % my_list[0]
print "The first (0'th) element in the second list is: %s" % my_list_2[0]
print "The last element in the first list is : %d" % my_list[-1]
print "The last element in the second list is : %s" % my_list_2[-1]

First list: [5, 2, 6, 1, 7]
Second list: ['bob', 'natalia', 'panos', 'michelle', 'foster']
The first (0'th) element in the first list is: 5
The first (0'th) element in the second list is: bob
The last element in the first list is : 7
The last element in the second list is : foster


In [2]:
my_dictionary = {'bob': 5, 'natalia': 2, 'panos': 6, 'michelle': 1, 'foster': 7}
print my_dictionary
print "The value for 'bob' is %d" % my_dictionary['bob']
print "The value for 'panos' is %d" % my_dictionary['panos']

{'michelle': 1, 'bob': 5, 'foster': 7, 'panos': 6, 'natalia': 2}
The value for 'bob' is 5
The value for 'panos' is 6


Notice that the list printed in order while the dictionary did not! I **can not** refer to the first item of a dictionary. There is no order!

### 1.2 Packages

#### 1.2.1. General Use
These packages are used mainly to coordinate and structure your Python code. You can use `time` and `datetime` to keep track of how long it takes to run certain tasks or to format dates and times. The `os` and `sys` packages let you make calls to the computer and access programs outside of Python (e.g. the command line!). You can use `math` to do mathematical operations slightly more advanced than addition, subtraction, etc. (e.g. exponentiation).

In [3]:
import time
import datetime
import os
import sys
import math
import re

#### 1.2.2. Data Handling
Python comes with packages for reading `csv` and `json` files natively. If you want to use something with more features, `pandas` is useful for creating data frames (a common data structure used in data science and machine learning). Some of you may be dealing with HTML data from web pages and will find Beautiful Soup 4 (`bs4`) useful.

In [4]:
import csv
import json
import pandas as pd
# import bs4

You may notice that when I imported pandas I decided to call it `pd`. This isn't necessary but is commonly used to give long packages a shorter name so that typing and reading them is easier.

#### 1.2.3. Scientific Computing
The `numpy` and `scipy` packages are probably two of the most popular Python packages. They will give you the ability to use arrays and matrices (both dense and sparse). They also give a ton of basic operations (max, min, argmax, argmin, etc.) For those of you with Matlab experience, you may notice a lot of similarity as scipy and numpy were written based on Matlab.

In [5]:
import numpy as np
import scipy

Many people have asked why I use `np.max([1,2,3,4])` instead of just using Python's default function, `max([1,2,3,4])`. The answer is... I just happened to use the numpy version :) You can use whichever you like.

#### 1.2.4. Machine Learning
The package we have been using all semester to do machine learning, sci-kit learn (`sklearn`), is one of the most popular machine learning packages currently in use. Throughout the semeseter you have probably noticed that we have been using a *ton* of difference functions and features. The documentation on sklearn is vast.

In [6]:
import sklearn

You may have also noticed that I often do something like

In [7]:
from sklearn import metrics
print "The accuracy is %.2f" % metrics.accuracy_score([1,1], [1,1])

The accuracy is 1.00


But I could also do something like this

In [8]:
from sklearn.metrics import accuracy_score
print "The accuracy is %.2f" % accuracy_score([1,1], [1,1])

The accuracy is 1.00


There is no correct way of doing it. It's just a matter of preference.

## 2. The Data Science Workflow
&nbsp;
<div style="float: left; width: 50%">
We've talked about the "data science workflow" a lot through out the semeseter, but I just want to remind everyone of what it looks like.

<ol style="padding: 20px 0;">
<li>Business understanding</li>
<li>Data understanding</li>
<li>Data Preparation</li>
<li>Modeling</li>
<li>Evaluation</li>
<li>Deployment</li>
</ol>

While you have been working a lot on the business understanding phase of your project recently (as well as the others, I hope!), today we are going to focus a bit more on summarizing what handson skills you have learned.
</div>
<div style="float: left; width: 40%">
<img src="images/workflow.png" width="100%"/>
</div>

## 3. Data Exploration and Cleaning
We've talked about this in our very first class and went on to mention it a few more times throughout the semester. However, I'd like to review some of this again given some common questions I've been getting.

### 3.1. Structured Data
Almost all of the data we have dealt with so far can be called *structured* data. This means that every record in the data set is organized and structured in some machine readable way. The three most popular ways of storing structured data are:

- **.csv or .tsv** - Can be thought of as rows and columns, where each column will represent a single feature. All rows must have something for each column.
- **JSON** - Looks similar to Python dictionaries. Each row can have an unordered list of `key:value`s
- **XML**

The layout of any of these data types might seem straight forward, but there can be tons of complications. A file ending in `.csv` does *not* mean that it will be well structured. It is still just a text file.

In [9]:
!head data/strings_ugly.csv

63.0,neutral,eu,3.952074059758619,-19.48620784914078,Sodales "vivamus" in, risus molestie, egestas in.,0
28.0,neutral,sa,3.5295183836057595,51.284180040232215,Pellentesque arcu sed.,1
37.0,high,sa,4.254526317975149,97.34526006557826,Neque odio, in nulla, lorem nec.,0
42.0,high,sa,4.924077485580787,80.24260604790156,Lorem non pretium.,0
56.0,high,af,6.436250132712625,42.78962533750958,Sem dictum dolor.,0
40.0,neutral,af,4.576757605316351,-1.0876572412988317,Neque condimentum.,1
69.0,neutral,eu,5.365851342999525,-15.770934329395772,Nisl fames ipsum, amet laoreet.,0
44.0,high,sa,2.912293368604344,73.85944600120466,Arcu quisque, vitae turpis integer, fusce luctus.,1
63.0,neutral,eu,4.376757476733249,3.9510213482794034,Feugiat diam, at ipsum.,0
56.0,neutral,eu,3.461138913269333,-46.426105926443086,Metus elit.,1


In [10]:
data = pd.read_csv("data/strings_ugly.csv")
data.head()

Unnamed: 0,63.0,neutral,eu,3.952074059758619,-19.48620784914078,"Sodales ""vivamus"" in",risus molestie,egestas in.,0
0,28,neutral,sa,3.529518,51.28418,Pellentesque arcu sed.,1,,
1,37,high,sa,4.254526,97.34526,Neque odio,in nulla,lorem nec.,0.0
2,42,high,sa,4.924077,80.242606,Lorem non pretium.,0,,
3,56,high,af,6.43625,42.789625,Sem dictum dolor.,0,,
4,40,neutral,af,4.576758,-1.087657,Neque condimentum.,1,,


That doesn't look anywhere even close to what it should be. We can explicitely tell it to expect our columns.

In [11]:
# This is going to kill the notebook :(
# data = pd.read_csv("data/strings_ugly.csv", names=['age', 'satisfaction', 'location', 'time_spent', 'income', 'bio', 'purchased'])
# data.head()

Again, that can't be right. If you look at the data you'll see that there are commas in one of the fields. Encapsulate them.

In [12]:
!head data/strings_quoted.csv

63.0,neutral,eu,3.952074059758619,-19.48620784914078,"Sodales "vivamus in, risus molestie, egestas in.",0
28.0,neutral,sa,3.5295183836057595,51.284180040232215,Pellentesque arcu sed.,1
37.0,high,sa,4.254526317975149,97.34526006557826,"Neque odio, in nulla, lorem nec.",0
42.0,high,sa,4.924077485580787,80.24260604790156,Lorem non pretium.,0
56.0,high,af,6.436250132712625,42.78962533750958,Sem dictum dolor.,0
40.0,neutral,af,4.576757605316351,-1.0876572412988317,Neque condimentum.,1
69.0,neutral,eu,5.365851342999525,-15.770934329395772,"Nisl fames ipsum, amet laoreet.",0
44.0,high,sa,2.912293368604344,73.85944600120466,"Arcu quisque, vitae turpis integer, fusce luctus.",1
63.0,neutral,eu,4.376757476733249,3.9510213482794034,"Feugiat diam, at ipsum.",0
56.0,neutral,eu,3.461138913269333,-46.426105926443086,Metus elit.,1


In [13]:
# This will also kill the notebook :(
# data = pd.read_csv("data/strings_quoted.csv", names=['age', 'satisfaction', 'location', 'time_spent', 'income', 'bio', 'purchased'], 
#                    quotechar="\"")
# data.head()

Getting closer, but it looks like we also have quotes in the string. We have to escape them.

In [14]:
!head data/strings_escaped.csv

63.0,neutral,eu,3.952074059758619,-19.48620784914078,"Sodales \"vivamus\" in, risus molestie, egestas in.",0
28.0,neutral,sa,3.5295183836057595,51.284180040232215,Pellentesque arcu sed.,1
37.0,high,sa,4.254526317975149,97.34526006557826,"Neque odio, in nulla, lorem nec.",0
42.0,high,sa,4.924077485580787,80.24260604790156,Lorem non pretium.,0
56.0,high,af,6.436250132712625,42.78962533750958,Sem dictum dolor.,0
40.0,neutral,af,4.576757605316351,-1.0876572412988317,Neque condimentum.,1
69.0,neutral,eu,5.365851342999525,-15.770934329395772,"Nisl fames ipsum, amet laoreet.",0
44.0,high,sa,2.912293368604344,73.85944600120466,"Arcu quisque, vitae turpis integer, fusce luctus.",1
63.0,neutral,eu,4.376757476733249,3.9510213482794034,"Feugiat diam, at ipsum.",0
56.0,neutral,eu,3.461138913269333,-46.426105926443086,Metus elit.,1


In [15]:
import pandas as pd
data = pd.read_csv("data/strings_escaped.csv", names=['age', 'satisfaction', 'location', 'time_spent', 'income', 'bio', 'purchased'], quotechar="\"", escapechar="\\")
data.head()

Unnamed: 0,age,satisfaction,location,time_spent,income,bio,purchased
0,63,neutral,eu,3.952074,-19.486208,"Sodales ""vivamus"" in, risus molestie, egestas in.",0
1,28,neutral,sa,3.529518,51.28418,Pellentesque arcu sed.,1
2,37,high,sa,4.254526,97.34526,"Neque odio, in nulla, lorem nec.",0
3,42,high,sa,4.924077,80.242606,Lorem non pretium.,0
4,56,high,af,6.43625,42.789625,Sem dictum dolor.,0


This can go on for a very long time until you find all the small nuances to your data file. Notice that we keep adding levels of complexity to our parser. Doing this at the command line is very tricky, which is why using pandas and `read_csv()` are very nice. A lot of the problems we just saw are unfortunately solved by editing the raw data to conform to some kind of standards. Hopefully, most of your project data is already in a useable state!

We seem to have three fields that aren't numeric. Since we need numeric features for all of our machine learning algorithms, let's convert them.

In [16]:
# Convert 'satisfaction' to be on a scale from -2 to +2
data['satisfaction'] = data['satisfaction'].replace(['very low', 'low', 'neutral', 'high', 'very high'], 
                                                    [-2, -1, 0, 1, 2])
data.head()

Unnamed: 0,age,satisfaction,location,time_spent,income,bio,purchased
0,63,0,eu,3.952074,-19.486208,"Sodales ""vivamus"" in, risus molestie, egestas in.",0
1,28,0,sa,3.529518,51.28418,Pellentesque arcu sed.,1
2,37,1,sa,4.254526,97.34526,"Neque odio, in nulla, lorem nec.",0
3,42,1,sa,4.924077,80.242606,Lorem non pretium.,0
4,56,1,af,6.43625,42.789625,Sem dictum dolor.,0


In [17]:
# We can convert location into dummy variables by binarizing:
for value in np.unique(data['location'])[0:-1]:
    data['location_' + value] = pd.Series(data['location'] == value, dtype=int)
data = data.drop(['location'], axis=1)
data.head()

Unnamed: 0,age,satisfaction,time_spent,income,bio,purchased,location_a,location_af,location_aus,location_eu,location_in,location_na
0,63,0,3.952074,-19.486208,"Sodales ""vivamus"" in, risus molestie, egestas in.",0,0,0,0,1,0,0
1,28,0,3.529518,51.28418,Pellentesque arcu sed.,1,0,0,0,0,0,0
2,37,1,4.254526,97.34526,"Neque odio, in nulla, lorem nec.",0,0,0,0,0,0,0
3,42,1,4.924077,80.242606,Lorem non pretium.,0,0,0,0,0,0,0
4,56,1,6.43625,42.789625,Sem dictum dolor.,0,0,1,0,0,0,0


In [18]:
# Last up is our text data, let's using a binary vectorizer to conver these to numeric

In [19]:
from sklearn.feature_extraction.text import CountVectorizer
binary_vectorizer = CountVectorizer(binary=True)
binary_vectorizer.fit(data['bio'])

vocabulary = binary_vectorizer.vocabulary_
bv_columns = range(len(vocabulary))
for word in vocabulary:
    bv_columns[vocabulary[word]] = "bio_" + word
    
bio_numeric = pd.DataFrame(binary_vectorizer.transform(data['bio']).todense(), columns=bv_columns)
data = pd.concat([data, bio_numeric], axis=1)
data = data.drop(['bio'], axis=1)
data.head()

Unnamed: 0,age,satisfaction,time_spent,income,purchased,location_a,location_af,location_aus,location_eu,location_in,...,bio_velit,bio_venenatis,bio_vestibulum,bio_vitae,bio_vivamus,bio_viverra,bio_voluptatem,bio_volutpat,bio_vulputate,bio_wisi
0,63,0,3.952074,-19.486208,0,0,0,0,1,0,...,0,0,0,0,1,0,0,0,0,0
1,28,0,3.529518,51.28418,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,37,1,4.254526,97.34526,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,42,1,4.924077,80.242606,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,56,1,6.43625,42.789625,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0


While not required, you will often see people place their target variable into a Python variable called `Y` and to put all their predictors into `X`.

In [20]:
X = data.drop(['purchased'], axis=1)
Y = data['purchased']

### 3.2. Unstructured Data
We've never really talked much about this in class, but some of you will have unstructured data in your projects. For example, web pages are a jumble of HTML tags. The formats between pages can be drastically different.

## 4. Modeling
We've covered two different methods of modeling: supervised and unsupervised learning.

### 4.1. Supervised
Most of what we've done so far this semester involves having **labeled** data. For these data, we have a set of records where we know the value of the target variable. This allows us to learn some relationship between our feature set and the target variable. We've covered five machine learning algorithms that can do this. Here is a brief, and in no way comprehensive, overview.

<table>
<tr><td>Model</td>
<td>Overview</td>
<td>Pros</td>
<td>Cons</td>
<td>Use Case</td></tr>

<tr><td>Tree Structured</td>
 <td>Will create splits on any feature that gives maximum **information gain**.</td>
 <td>- Non-linear model (low bias) <br />
     - (specifically) Can arbitrarily carve up example space into "rectangular" regions<br />
     - Fast test for non-linearity in a data set</td>
 <td>- Separating planes will be perpendicular to a feature<br />
     - Prone to overfitting</td>
 <td>- Data with moderate numbers of (relevant) mixed numeric and categorical features</td></tr>
 
<tr><td>Logistic Regression</td>
 <td>Creates a hyperplane that can separate the data with the smallest **loss**.</td>
 <td>- Coefficients to interpret<br />
     - Low overfitting (low variance)</td>
 <td>- Coefficients to interpret<br />
     - No closed form solution<br />
     - Can be slow (need to think about optimization routine "under the hood")<br />
     - Will only learn "linear part" of true concept (high bias)</td>
 <td>- Always try it</td></tr>


<tr><td>SVM</td>
 <td>Creates a hyperplane that can separate the data with the maximal **margin**.</td>
 <td>- Different "kernels" available</td>
 <td>- No closed form solution<br />
     - Can be sloooooow </td>
 <td>- Text data</td></tr>

<tr><td>Naive Bayes</td>
 <td>Uses simple counts to calculate **conditional probabilities**.</td>
 <td>- Fast training<br />
     - Can be implemented with SQL queries (or in Excel!)</td>
 <td>- Treats all features as independent</td>
 <td>- Text data</td></tr>

<tr><td>k-NN</td>
 <td>Creates a **cluster** of the $k$-closest records and combines their labels (e.g., with majority voting or an average).</td>
 <td>- Works with any number of labels<br />
     - Very fast "learning" (lazy)</td>
 <td>- Very slow prediction</td>
 <td>- When choosing closest cases makes sense to the users/stakeholders</td></tr>
</table>

### 4.2. Unsupervised
Unsupervised algorithms are use for data where there is no target variable or no labels for your target variable. This semester we covered two algorithms (and will cover a third one in the second half of class).

<table>
<tr><td>Model</td>
<td>Overview</td>
<td>Pros</td>
<td>Cons</td></tr>

<tr><td>K-Means</td>
 <td>Creates **$k$ clusters** where each record belongs to the cluster with the closest mean (center)</td>
 <td></td>
 <td>- k is very likely unknown<br />
     - Nondeterministic</td></tr>
 
<tr><td>Hierarchical Clustering</td>
 <td>Creates an increasing number of clusters by continually **merging clusters** that are closest together (clusters can be single records)</td>
 <td>- The number of clusters does not need to be preset</td>
 <td></td></tr>

<!--<tr><td>Dimensionality Reduction</td>
 <td>Takes a set of records with $M$ features and reduces it to the top $m$ features (that explain the most variance), where $m < M$.</td>
 <td></td>
 <td></td></tr>-->
</table>

### 4.3 Implementation
All of these algorithms have an implementation in sklearn. Some algorithms, like SVM, have multiple implementations. Let's import one implementation of each.

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.naive_bayes import BernoulliNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cluster import KMeans

Given you have imported your model, the general process for using the model is the same.

In [None]:
model = DecisionTreeClassifier(criterion="entropy")
model.fit(X, Y) # there is no equal sign here!
prediction = model.predict(X)
probabilities = model.predict_proba(X)

It doesn't matter what model you are using, it is always the same!

In [None]:
model = LogisticRegression()
model.fit(X, Y) # there is no equal sign here!
prediction = model.predict(X)
probabilities = model.predict_proba(X)

## 5. Assessing
How do we know if our model is any good? There are many ways of doing this! It depends on your particular use case.  Here are the two metrics we talked about most for classification/ranking. 

<table>
<tr><td>Metric</td>
 <td>Overview</td>
 <td>Pros</td>
 <td>Cons</td></tr>
<tr><td>Accuracy</td>
 <td>The percentage of things you got correct.</td>
 <td>- Easy to calculate and interpret</td>
 <td>- Doesn't account for business costs<br />
 - Doesn't account for baseline</td></tr>
<tr><td>ROC/AUC</td>
 <td>False positive rate vs. True positive rate.</td>
 <td>- Allows for fine grained assessment<br />
 - Deals with skew</td>
 <td>- Can be difficult to understand<br />
 - Exploring multiple ROC curves can become messy</td></tr>
</table>

Accuracy, ROC curves, and area under the ROC curve calculations are straight forward in sklearn.

In [None]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score

import matplotlib.pylab as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = 10, 8

In [None]:
model = LogisticRegression()
model.fit(X, Y)
prediction = model.predict(X)
probabilities = model.predict_proba(X)

print "The accuracy is %.3f" % accuracy_score(Y, prediction)
print "The AUC is %.3f" % roc_auc_score(Y, probabilities[:, 1])

fpr, tpr, thresholds = roc_curve(Y, probabilities[:, 1])
plt.plot(fpr, tpr)
plt.plot([0, 1], [0, 1], '--')
plt.xlabel("False positive rate")
plt.ylabel("True positive rate")
plt.show()

### 6. Assessing II (splitting)
What are the downsides so training and predicting on the same data? Overfitting! We know two ways to work around this: (1) train/test splitting, and (2) cross validation. Both of these things are, again, built into sklearn.

In [None]:
from sklearn.cross_validation import train_test_split
from sklearn.cross_validation import cross_val_score

In [None]:
# Train/test splitting
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.75)

model = LogisticRegression()
model.fit(X_train, Y_train)
prediction = model.predict(X_test)
probabilities = model.predict_proba(X_test)

print "The accuracy is %.3f" % accuracy_score(Y_test, prediction)
print "The AUC is %.3f" % roc_auc_score(Y_test, probabilities[:, 1])

In [None]:
# Cross validation
model = LogisticRegression()

print "The accuracy is %.3f" % np.mean(cross_val_score(model, X, Y, cv=5, scoring="accuracy"))
print "The AUC is %.3f" % np.mean(cross_val_score(model, X, Y, cv=5, scoring="roc_auc"))

### 7. Tuning and Complexity
By default, all the models we use in sklearn have some settings that manage how complex they are. We've learned quite a few of these complexity parameters (usually called hyper parameters) already. The act of finding the "best" parameter is usually done through the act of "hyper parameter tuning".

<table>
<tr><td>Model Type</td>
 <td>Parameter</td>
 <td>Explanation</td>
 <td>Good Range</td></tr>

<tr><td>Tree</td>
 <td>max_depth</td>
 <td>Maximum number of levels to build</td>
 <td>[1, log<sub>2</sub>(# records)]</td></tr>

<tr><td>Tree</td>
 <td>min_samples_split</td>
 <td>Minimum number of records that must be in a node for it to be split.</td>
 <td>[1, # records]</td></tr>
 
<tr><td>Tree</td>
 <td>min_samples_leaf</td>
 <td>Minimum number of records that must be at a node to call it a leaf.</td>
 <td>[1, # records]</td></tr>

<tr><td>LR</td>
 <td>C</td>
 <td>Regularization parameter. How heavily should the model be penalized for being complex?</td>
 <td>[10<sup>-10</sup>, 10<sup>10</sup>]</td></tr>

<tr><td>SVM</td>
 <td>C</td>
 <td>Similar to logistic regression</td>
 <td>[10<sup>-10</sup>, 10<sup>10</sup>]</td></tr>
 
<tr><td>NB</td>
 <td>alpha</td>
 <td>Smoothing constant. Essential to ensure 0 probabilities don't zero-out the product.  Used also to keep small counts from making a big difference.</td>
 <td>[0, ...]</td></tr>
 
<tr><td>k-NN</td>
 <td>k</td>
 <td>Number of neighbors, usually odd to avoid ties</td>
 <td>[1, number_records]</td></tr>
</table>

In [None]:
import numpy as np

hyper_parameters = range(-10, 11)
accuracies = []
aucs = []

for hyper_parameter in hyper_parameters:
    c = np.power(10.0, hyper_parameter)
    
    model = LogisticRegression(C=c)
    
    accuracies.append(np.mean(cross_val_score(model, X, Y, cv=5, scoring="accuracy")))
    aucs.append(np.mean(cross_val_score(model, X, Y, cv=5, scoring="roc_auc")))

print "Maximum accuracy is %.3f and occured with parameter setting of %.3e" % (np.max(accuracies), hyper_parameters[np.argmax(accuracies)])

print "Maximum AUC is %.3f and occured with parameter setting of %.3e" % (np.max(aucs), hyper_parameters[np.argmax(aucs)])