# Project 2: Supervised Learning
### Building a Student Intervention System

## 1. Classification vs Regression

Your goal is to identify students who might need early intervention - which type of supervised machine learning problem is this, classification or regression? Why?

It is a classification problem. We have two group of students. The first group passed and the second group did not pass. Since we have this two classes it is considered as a classification problem.

## 2. Exploring the Data

Let's go ahead and read in the student dataset first.

_To execute a code cell, click inside it and press **Shift+Enter**._

In [2]:
# Import libraries
import numpy as np
import pandas as pd

In [3]:
# Read student data
student_data = pd.read_csv("student-data.csv")
print "Student data read successfully!"
# Note: The last column 'passed' is the target/label, all other are feature columns
print student_data

Student data read successfully!
    school sex  age address famsize Pstatus  Medu  Fedu      Mjob      Fjob  \
0       GP   F   18       U     GT3       A     4     4   at_home   teacher   
1       GP   F   17       U     GT3       T     1     1   at_home     other   
2       GP   F   15       U     LE3       T     1     1   at_home     other   
3       GP   F   15       U     GT3       T     4     2    health  services   
4       GP   F   16       U     GT3       T     3     3     other     other   
5       GP   M   16       U     LE3       T     4     3  services     other   
6       GP   M   16       U     LE3       T     2     2     other     other   
7       GP   F   17       U     GT3       A     4     4     other   teacher   
8       GP   M   15       U     LE3       A     3     2  services     other   
9       GP   M   15       U     GT3       T     3     4     other     other   
10      GP   F   15       U     GT3       T     4     4   teacher    health   
11      GP   F   15 

Now, can you find out the following facts about the dataset?
- Total number of students
- Number of students who passed
- Number of students who failed
- Graduation rate of the class (%)
- Number of features

_Use the code block below to compute these values. Instructions/steps are marked using **TODO**s._

In [4]:
# TODO: Compute desired values - replace each '?' with an appropriate expression/function call
n_students = len(student_data)
# since the 'pass' column is the label target, it should not be considered as a feature.
n_features = student_data.dtypes.size -1 
n_passed = len(student_data[student_data.passed == "yes"])
n_failed = len(student_data[student_data.passed == "no"])
grad_rate = (float(n_passed)/n_students)*100
print "Total number of students: {}".format(n_students)
print "Number of students who passed: {}".format(n_passed)
print "Number of students who failed: {}".format(n_failed)
print "Number of features: {}".format(n_features)
print "Graduation rate of the class: {:.2f}%".format(grad_rate)

Total number of students: 395
Number of students who passed: 265
Number of students who failed: 130
Number of features: 30
Graduation rate of the class: 67.09%


## 3. Preparing the Data
In this section, we will prepare the data for modeling, training and testing.

### Identify feature and target columns
It is often the case that the data you obtain contains non-numeric features. This can be a problem, as most machine learning algorithms expect numeric data to perform computations with.

Let's first separate our data into feature and target columns, and see if any features are non-numeric.<br/>
**Note**: For this dataset, the last column (`'passed'`) is the target or label we are trying to predict.

In [5]:
# Extract feature (X) and target (y) columns
feature_cols = list(student_data.columns[:-1])  # all columns but last are features
target_col = student_data.columns[-1]  # last column is the target/label
print "Feature column(s):-\n{}".format(feature_cols)
print "Target column: {}".format(target_col)

X_all = student_data[feature_cols]  # feature values for all students
y_all = student_data[target_col]  # corresponding targets/labels
print "\nFeature values:-"
print X_all.head()  # print the first 5 rows

Feature column(s):-
['school', 'sex', 'age', 'address', 'famsize', 'Pstatus', 'Medu', 'Fedu', 'Mjob', 'Fjob', 'reason', 'guardian', 'traveltime', 'studytime', 'failures', 'schoolsup', 'famsup', 'paid', 'activities', 'nursery', 'higher', 'internet', 'romantic', 'famrel', 'freetime', 'goout', 'Dalc', 'Walc', 'health', 'absences']
Target column: passed

Feature values:-
  school sex  age address famsize Pstatus  Medu  Fedu     Mjob      Fjob  \
0     GP   F   18       U     GT3       A     4     4  at_home   teacher   
1     GP   F   17       U     GT3       T     1     1  at_home     other   
2     GP   F   15       U     LE3       T     1     1  at_home     other   
3     GP   F   15       U     GT3       T     4     2   health  services   
4     GP   F   16       U     GT3       T     3     3    other     other   

    ...    higher internet  romantic  famrel  freetime goout Dalc Walc health  \
0   ...       yes       no        no       4         3     4    1    1      3   
1   ...    

### Preprocess feature columns

As you can see, there are several non-numeric columns that need to be converted! Many of them are simply `yes`/`no`, e.g. `internet`. These can be reasonably converted into `1`/`0` (binary) values.

Other columns, like `Mjob` and `Fjob`, have more than two values, and are known as _categorical variables_. The recommended way to handle such a column is to create as many columns as possible values (e.g. `Fjob_teacher`, `Fjob_other`, `Fjob_services`, etc.), and assign a `1` to one of them and `0` to all others.

These generated columns are sometimes called _dummy variables_, and we will use the [`pandas.get_dummies()`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html?highlight=get_dummies#pandas.get_dummies) function to perform this transformation.

In [6]:
# Preprocess feature columns
def preprocess_features(X):
    outX = pd.DataFrame(index=X.index)  # output dataframe, initially empty
    # Check each column
    for col, col_data in X.iteritems():
        # If data type is non-numeric, try to replace all yes/no values with 1/0
        if col_data.dtype == object:
            col_data = col_data.replace(['yes', 'no'], [1, 0])
        # Note: This should change the data type for yes/no columns to int

        # If still non-numeric, convert to one or more dummy variables
        if col_data.dtype == object:
            col_data = pd.get_dummies(col_data, prefix=col)  # e.g. 'school' => 'school_GP', 'school_MS'
        outX = outX.join(col_data)  # collect column(s) in output dataframe

    return outX

X_all = preprocess_features(X_all)
print X_all.head
print "Processed feature columns ({}):-\n{}".format(len(X_all.columns), list(X_all.columns))

<bound method DataFrame.head of      school_GP  school_MS  sex_F  sex_M  age  address_R  address_U  \
0            1          0      1      0   18          0          1   
1            1          0      1      0   17          0          1   
2            1          0      1      0   15          0          1   
3            1          0      1      0   15          0          1   
4            1          0      1      0   16          0          1   
5            1          0      0      1   16          0          1   
6            1          0      0      1   16          0          1   
7            1          0      1      0   17          0          1   
8            1          0      0      1   15          0          1   
9            1          0      0      1   15          0          1   
10           1          0      1      0   15          0          1   
11           1          0      1      0   15          0          1   
12           1          0      0      1   15          0   

### Split data into training and test sets

So far, we have converted all _categorical_ features into numeric values. In this next step, we split the data (both features and corresponding labels) into training and test sets.

In [7]:
# First, decide how many training vs test samples you want
num_all = student_data.shape[0]  # same as len(student_data)
num_train = 300  # about 75% of the data
num_test = num_all - num_train

# TODO: Then, select features (X) and corresponding labels (y) for the training and test sets
# Note: Shuffle the data or randomly select samples to avoid any bias due to ordering in the dataset
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X_all,  y_all, test_size=0.24)

X_train_200 = X_train[0:200]
X_train_100 = X_train[0:100]

y_train_200 = y_train[0:200]
y_train_100 = y_train[0:100]

print "Training set: {} samples".format(X_train.shape[0])
print "Test set: {} samples".format(X_test.shape[0])
# Note: If you need a validation set, extract it from within training data

Training set: 300 samples
Test set: 95 samples


## 4. Training and Evaluating Models
Choose 3 supervised learning models that are available in scikit-learn, and appropriate for this problem. For each model:

- What are the general applications of this model? What are its strengths and weaknesses?
- Given what you know about the data so far, why did you choose this model to apply?
- Fit this model to the training data, try to predict labels (for both training and test sets), and measure the F<sub>1</sub> score. Repeat this process with different training set sizes (100, 200, 300), keeping test set constant.

Produce a table showing training time, prediction time, F<sub>1</sub> score on training set and F<sub>1</sub> score on test set, for each training set size.

Note: You need to produce 3 such tables - one for each model.

In [8]:

class ListTable(list):
    """ Overridden list class which renders an HTML Table in IPython Notebook. """
    def __init__(self, classifier):
        self.classifier = classifier
    
    def _repr_html_(self):
        html = ["<table>"]
        html.append("<tr>")
        html.append("<th rowspan='2'>"+self.classifier+"</th><th colspan='3'>Training set size</th>")
        html.append("</tr>")
        html.append("<tr>")
        html.append("<td> 100 </td>")
        html.append("<td> 200 </td>")
        html.append("<td> 300 </td>")        
        html.append("</tr>")
        html.append("</table")
        for row in self:
            html.append("<tr>")
            
            for col in row:
                html.append("<td>{0}</td>".format(col))
            
            html.append("</tr>")
        html.append("</table>")
        return ''.join(html)
def insertData(classifier, t1, t2, t3, p1, p2, p3, ft1, ft2, ft3, fte1, fte2, fte3):
    table = ListTable(classifier)
    table.append(['Training time (secs)', t1, t2, t3])
    table.append(['Prediction time (secs)', p1, p2, p3])
    table.append(['F1 score for training set', ft1, ft2, ft3])
    table.append(['F1 score for test set', fte1, fte2, fte3])
    return table    
def transferIntoTable(classifier, sample100, sample200, sample300):
    return insertData(classifier, sample100.t, sample200.t, sample300.t, sample100.p, sample200.p, sample300.p, sample100.f1train, sample200.f1train, sample300.f1train, sample100.f1scoretest, sample200.f1scoretest, sample300.f1scoretest)

### Naive Bayes
Naive Bayes is a machine learning algorithm for classification which uses probabilities underlying the Bayes theorem to classify the input to a certain output. The Naive Bayes models are using maximum likelihood for parameter estimation. One advantage of Naive Bayes it that it only requires a small amount of data to estimate parameters for classification in comparision to other machine learning algorithms. Naive Bayes assumes independent features for each feature distribution and it can be seen as a one-dimensional distribution. This elminates the problems associated with the curse of dimensionality, which is about scaling the data when introducing a new feature.
The most disadvantage of naive bayes is when there are no occurrences of a certain class label with a certain attribute then the computed probability will be zero. This is reasoned by the given assumption of conditional independence. When there is one probability of zero multiplied with other probabilities the result will be zero.
I mainly chose Naive Bayes because there is no much data in this example and it produces a good parameter estimation.

In [9]:
# Train a model
import time

def train_classifier(clf, X_train, y_train):
    print "Training {}...".format(clf.__class__.__name__)
    start = time.time()
    clf.fit(X_train, y_train)
    end = time.time()
    difference = round((end - start),3)
    print "Done!\nTraining time (secs): {:.3f}".format(difference)
    return difference

# TODO: Choose a model, import it and instantiate an object
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()

# Fit model to training data
train_classifier(clf, X_train, y_train)  # note: using entire training set here
print clf# you can inspect the learned model by printing it

Training GaussianNB...
Done!
Training time (secs): 0.002
GaussianNB()


In [10]:
def f1score(classifier, xtest, ytest, dataset):
    predicted = predict_labels(classifier, xtest, ytest)
    print "F1 score for {0} set({1}): {2}".format(dataset, xtest.shape[0], predicted[1])
    return predicted

In [11]:
# Predict on training set and compute F1 score
from sklearn.metrics import f1_score

def predict_labels(clf, features, target):
    print "Predicting labels using {}...".format(clf.__class__.__name__)
    start = time.time()
    y_pred = clf.predict(features)
    end = time.time()
    difference = round(end-start, 4)
    print "Done!\nPrediction time (secs): {:.4f}".format(end - start)
    return (difference, f1_score(target.values, y_pred, pos_label='yes'))

train_f1_score = predict_labels(clf, X_train, y_train)
f1score(clf, X_train, y_train, "training")

Predicting labels using GaussianNB...
Done!
Prediction time (secs): 0.0010
Predicting labels using GaussianNB...
Done!
Prediction time (secs): 0.0008
F1 score for training set(300): 0.793893129771


(0.0008, 0.79389312977099236)

In [12]:
# Predict on test data
f1score(clf, X_test, y_test, "test")

Predicting labels using GaussianNB...
Done!
Prediction time (secs): 0.0005
F1 score for test set(95): 0.65625


(0.0005, 0.65625)

In [13]:
# Train and predict using different training set sizes
class ClassifierInformation:
    def __init__(self, trainingtime, predictiontime, f1scoretrain, f1scoretest):
        self.t = trainingtime
        self.p = predictiontime
        self.f1train = f1scoretrain
        self.f1scoretest = f1scoretest
        
def train_predict(clf, X_train, y_train, X_test, y_test):
    print "------------------------------------------"
    print "Training set size: {}".format(len(X_train))
    trainingtime = train_classifier(clf, X_train, y_train)
#   print "F1 score for training set: {}".format(predict_labels(clf, X_train, y_train))
    predictedTraining = f1score(clf, X_train, y_train, "training")
#    print "F1 score for test set: {}".format(predict_labels(clf, X_test, y_test))
    predictedTest = f1score(clf, X_test, y_test, "test")
    return ClassifierInformation(trainingtime, predictedTraining[0], predictedTraining[1], predictedTest[1])

# TODO: Run the helper function above for desired subsets of training data
nbsample300 = train_predict(clf, X_train, y_train, X_test, y_test)
# Note: Keep the test set constant
nbsample200 = train_predict(clf, X_train_200, y_train_200, X_test, y_test)
nbsample100 = train_predict(clf, X_train_100, y_train_100, X_test, y_test)
nbModel = transferIntoTable("Naive Bayes", nbsample100, nbsample200, nbsample300)
nbModel

------------------------------------------
Training set size: 300
Training GaussianNB...
Done!
Training time (secs): 0.002
Predicting labels using GaussianNB...
Done!
Prediction time (secs): 0.0005
F1 score for training set(300): 0.793893129771
Predicting labels using GaussianNB...
Done!
Prediction time (secs): 0.0004
F1 score for test set(95): 0.65625
------------------------------------------
Training set size: 200
Training GaussianNB...
Done!
Training time (secs): 0.001
Predicting labels using GaussianNB...
Done!
Prediction time (secs): 0.0004
F1 score for training set(200): 0.76
Predicting labels using GaussianNB...
Done!
Prediction time (secs): 0.0002
F1 score for test set(95): 0.634920634921
------------------------------------------
Training set size: 100
Training GaussianNB...
Done!
Training time (secs): 0.001
Predicting labels using GaussianNB...
Done!
Prediction time (secs): 0.0002
F1 score for training set(100): 0.806451612903
Predicting labels using GaussianNB...
Done!
Pred

Naive Bayes,Training set size,Training set size,Training set size
Naive Bayes,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
100,200.0,300.0,
Prediction time (secs),0.0002,0.0004,0.0005
F1 score for training set,0.806451612903,0.76,0.793893129771
F1 score for test set,0.624,0.634920634921,0.65625


### K-Nearest-Neighbors
K-Nearest-Neighbors is used for classification and regression. The input is compared to the nearest k neighbors to define the output. All features are treated equally. Since there are areas of certain labeled data it is very robost to noisy training data. This can also be improved by weighting the distance. If there is too much training data, which is not the case in this example, it is very effective because it is a lazy learner. A lazy learner has no need for a training phase, because every new query has be computed and will be classified through the k nearest neighbors.  

However there are also some disadvantages. Based on the number of feature grows, the amount of data that we need grows exponentially (curse of dimensionality). So we need alot of data. A decision about the number of the nearest neighbours has to be made. It is not always clear which type of distance should be used to produce the best results. Also a decision has to be made about which features are important. In a runtime environment it is a very expensive algorithm in comparision to other algorithms because each query has to be computed.

I used this algorithm because it is very simple and the features are treated equally. 

In [14]:
# TODO: Train and predict using two other models
# 1 Knn Model
from sklearn.neighbors import KNeighborsClassifier
knn_clf = KNeighborsClassifier(n_neighbors=5)

# Fit model to training data
knnsample300 = train_predict(knn_clf, X_train, y_train, X_test, y_test)
# Note: Keep the test set constant
knnsample200 = train_predict(knn_clf, X_train_200, y_train_200, X_test, y_test)
knnsample100 = train_predict(knn_clf, X_train_100, y_train_100, X_test, y_test)
knnModel = transferIntoTable("KNN",knnsample100, knnsample200, knnsample300)
knnModel

------------------------------------------
Training set size: 300
Training KNeighborsClassifier...
Done!
Training time (secs): 0.002
Predicting labels using KNeighborsClassifier...
Done!
Prediction time (secs): 0.0075
F1 score for training set(300): 0.854460093897
Predicting labels using KNeighborsClassifier...
Done!
Prediction time (secs): 0.0018
F1 score for test set(95): 0.8
------------------------------------------
Training set size: 200
Training KNeighborsClassifier...
Done!
Training time (secs): 0.001
Predicting labels using KNeighborsClassifier...
Done!
Prediction time (secs): 0.0038
F1 score for training set(200): 0.796875
Predicting labels using KNeighborsClassifier...
Done!
Prediction time (secs): 0.0018
F1 score for test set(95): 0.802816901408
------------------------------------------
Training set size: 100
Training KNeighborsClassifier...
Done!
Training time (secs): 0.000
Predicting labels using KNeighborsClassifier...
Done!
Prediction time (secs): 0.0010
F1 score for tr

KNN,Training set size,Training set size,Training set size
KNN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
100,200.0,300.0,
Prediction time (secs),0.001,0.0038,0.0075
F1 score for training set,0.861538461538,0.796875,0.854460093897
F1 score for test set,0.746478873239,0.802816901408,0.8


### Support Vector Machine 
Support Vector Machine are supervised learning models associated with learning algorithms that are used for classification or regression problems. The underlying technique is the kernel trick which transforms the data into the boundaries for the possible outputs. The boundaries are depending on the used kernel and can be linear and non-linear. On one side the non-linear boundary benefits much more complex relationships of the data. On other side the computation of the training set does have an higher latency. The algorithm is very effective in high dimensional spaces because of its ability to model complex relationship. If the number of the features is much greater than the number of the samples in the data set, svm will have a poor performance. The limitation is due to the choice of the kernel.

If we have complex data relationship, svm is clearly the winner. Its behaviour is a little bit like a black box, because it is very difficult to interpret the boundary plane. 

I chose SVM because the data has a lot of features where it is possible that there is a complex relationship between several features and also it is very effective in high dimensional space.

In [15]:
# SuportVectorMachine Model
from sklearn import svm
svc_clf = svm.SVC()
train_classifier(svc_clf, X_train, y_train)
svmsample300 = train_predict(svc_clf, X_train, y_train, X_test, y_test)
# Note: Keep the test set constant
svmsample200 = train_predict(svc_clf, X_train_200, y_train_200, X_test, y_test)
svmsample100 = train_predict(svc_clf, X_train_100, y_train_100, X_test, y_test)

svmModel = transferIntoTable("SVM",svmsample100, svmsample200, svmsample300)
svmModel


Training SVC...
Done!
Training time (secs): 0.008
------------------------------------------
Training set size: 300
Training SVC...
Done!
Training time (secs): 0.006
Predicting labels using SVC...
Done!
Prediction time (secs): 0.0041
F1 score for training set(300): 0.872727272727
Predicting labels using SVC...
Done!
Prediction time (secs): 0.0016
F1 score for test set(95): 0.813333333333
------------------------------------------
Training set size: 200
Training SVC...
Done!
Training time (secs): 0.003
Predicting labels using SVC...
Done!
Prediction time (secs): 0.0025
F1 score for training set(200): 0.867647058824
Predicting labels using SVC...
Done!
Prediction time (secs): 0.0017
F1 score for test set(95): 0.811188811189
------------------------------------------
Training set size: 100
Training SVC...
Done!
Training time (secs): 0.001
Predicting labels using SVC...
Done!
Prediction time (secs): 0.0007
F1 score for training set(100): 0.869565217391
Predicting labels using SVC...
Done!


SVM,Training set size,Training set size,Training set size
SVM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
100,200.0,300.0,
Prediction time (secs),0.0007,0.0025,0.0041
F1 score for training set,0.869565217391,0.867647058824,0.872727272727
F1 score for test set,0.8,0.811188811189,0.813333333333


## 5. Choosing the Best Model

- Based on the experiments you performed earlier, in 1-2 paragraphs explain to the board of supervisors what single model you chose as the best model. Which model is generally the most appropriate based on the available data, limited resources, cost, and performance?

In my opinion the best model is SVM. We have the highest f1 score at each size of training set. I think in this case it is a very good decision, because the data set is not that big that the training and prediction time are kept low. 

The computation is also really nearly at the same level like the other models. It is a litte bit higher at the training time. But in the prediction time it is like on the same level as the Naive Bayes which is really good and fast. 

Since the school preferes better results, I would go with the SVM!

- In 1-2 paragraphs explain to the board of supervisors in layman's terms how the final model chosen is supposed to work (for example if you chose a Decision Tree or Support Vector Machine, how does it make a prediction).

The image below is from the following source https://www.quora.com/What-does-support-vector-machine-SVM-mean-in-laymans-terms/answer/Premkumar-Natarajan?srid=7zNK
<img src="svm.PNG">

The image shows different data from the e.g. given student data in a 2 dimensional space. Based on the features the data is plotted differently. There are two groups of students which are showed in the data. One group is blue coloured, these students passed and other group is red coloured, these students failed. Now imagine we are getting a new student which has to be classified either in the blue or the red group. We want to predict if he will pass or fail. To do this we need a boundary between the students how passed or failed to define the area of a certain group. As a delimiter we need a line which separates the groups clearly. The best delimiter is the one that gives the maximum margin between the closets data points of each group. In a nutshell this is an optimization problem to find the maximum margin to separate the data groups from each other. SVM is the algorithm which finds the maximum margin between the groups. When the straight line like in the example can not do a job, there is also the option to use a non-linear line. This results in a more complex line. However, the SVM algorithm gives us a tool which always finds the best boundary. This refers to the kernel trick, where the developer can pass several parameters and the kernel trick decides which give us the best result. 


- Fine-tune the model. Use Gridsearch with at least one important parameter tuned and with at least 3 settings. Use the entire training set for this.
- What is the model's final F<sub>1</sub> score?

~ 0.92

In [16]:
nbModel

Naive Bayes,Training set size,Training set size,Training set size
Naive Bayes,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
100,200.0,300.0,
Prediction time (secs),0.0002,0.0004,0.0005
F1 score for training set,0.806451612903,0.76,0.793893129771
F1 score for test set,0.624,0.634920634921,0.65625


In [17]:
knnModel

KNN,Training set size,Training set size,Training set size
KNN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
100,200.0,300.0,
Prediction time (secs),0.001,0.0038,0.0075
F1 score for training set,0.861538461538,0.796875,0.854460093897
F1 score for test set,0.746478873239,0.802816901408,0.8


In [18]:
svmModel

SVM,Training set size,Training set size,Training set size
SVM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
100,200.0,300.0,
Prediction time (secs),0.0007,0.0025,0.0041
F1 score for training set,0.869565217391,0.867647058824,0.872727272727
F1 score for test set,0.8,0.811188811189,0.813333333333


In [19]:
# TODO: Fine-tune your model and report the best F1 score
from sklearn import grid_search
from sklearn.metrics import make_scorer
tuned_parameters = {'kernel':['rbf'], 'C':[1], 'gamma':[0.01,0.02,0.03,0.04,0.05]}
svc_tune = grid_search.GridSearchCV(svm.SVC(), tuned_parameters, cv=5, refit=True, scoring = make_scorer(f1_score, pos_label='yes'))

tunedsample300 = train_predict(svc_tune, X_train, y_train, X_test, y_test)
# Note: Keep the test set constant
tunedsample200 = train_predict(svc_tune, X_train_200, y_train_200, X_test, y_test)
tunedsample100 = train_predict(svc_tune, X_train_100, y_train_100, X_test, y_test)
print svc_tune.best_params_
svmModel = transferIntoTable("TunedSVM",tunedsample100, tunedsample200, tunedsample300)
svmModel

------------------------------------------
Training set size: 300
Training GridSearchCV...
Done!
Training time (secs): 0.235
Predicting labels using GridSearchCV...
Done!
Prediction time (secs): 0.0054
F1 score for training set(300): 0.926014319809
Predicting labels using GridSearchCV...
Done!
Prediction time (secs): 0.0016
F1 score for test set(95): 0.823529411765
------------------------------------------
Training set size: 200
Training GridSearchCV...
Done!
Training time (secs): 0.147
Predicting labels using GridSearchCV...
Done!
Prediction time (secs): 0.0021
F1 score for training set(200): 0.830324909747
Predicting labels using GridSearchCV...
Done!
Prediction time (secs): 0.0009
F1 score for test set(95): 0.818791946309
------------------------------------------
Training set size: 100
Training GridSearchCV...
Done!
Training time (secs): 0.104
Predicting labels using GridSearchCV...
Done!
Prediction time (secs): 0.0008
F1 score for training set(100): 0.923076923077
Predicting labe

TunedSVM,Training set size,Training set size,Training set size
TunedSVM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
100,200.0,300.0,
Prediction time (secs),0.0008,0.0021,0.0054
F1 score for training set,0.923076923077,0.830324909747,0.926014319809
F1 score for test set,0.825806451613,0.818791946309,0.823529411765
