## Precision Vs Recall
 As with the previous exercises, let's look at the performance of a couple of classifiers
 on the familiar Titanic dataset. Add a train/test split, then store the results in the
 dictionary provided

In [1]:
import numpy as np
import pandas as pd

In [50]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import recall_score as recall
from sklearn.metrics import precision_score as precision
from sklearn.naive_bayes import GaussianNB

In [51]:
from sklearn.cross_validation import train_test_split

In [52]:
# Load the dataset
X = pd.read_csv('titanic_data.csv')

X = X._get_numeric_data()
y = X['Survived']
del X['Age'], X['Survived']


#### TODO: split the data into training and testing sets,
#### using the standard settings for train_test_split.
#### Then, train and test the classifiers with your newly split data instead of X and y.

In [53]:
X,x_test,y,y_test = train_test_split(X,y)

In [54]:
clf1 = DecisionTreeClassifier()
clf1.fit(X, y)
DTC = recall(y_test,clf1.predict(x_test)),precision(y_test,clf1.predict(x_test))
print "Decision Tree recall: {:.2f} and precision: {:.2f}".format(*DTC)

clf2 = GaussianNB()
clf2.fit(X, y)
GNB = recall(y_test,clf2.predict(x_test)),precision(y_test,clf2.predict(x_test))
print "GaussianNB recall: {:.2f} and precision: {:.2f}".format(*GNB)
GNB[1]


Decision Tree recall: 0.43 and precision: 0.51
GaussianNB recall: 0.40 and precision: 0.72


0.72340425531914898

## Compute F1 Scores


In [28]:
from sklearn.metrics import f1_score


In [29]:
clf1 = DecisionTreeClassifier()
clf1.fit(X, y)
DTC = f1_score(y_test, clf1.predict(x_test))
print "Decision Tree F1 score: {:.2f}".format(DTC)

clf2 = GaussianNB()
clf2.fit(X, y)
GNB = f1_score(y_test, clf2.predict(x_test))
print "GaussianNB F1 score: {:.2f}".format(GNB)

F1_scores = {
 "Naive Bayes": GNB,
 "Decision Tree": DTC
}

Decision Tree F1 score: 0.50
GaussianNB F1 score: 0.47


# Compute Mean Absolute Error


In [45]:
import numpy as np
import pandas as pd

# Load the dataset
from sklearn.datasets import load_linnerud

linnerud_data = load_linnerud()
X = linnerud_data.data
y = linnerud_data.target

from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error as mae
from sklearn.linear_model import LinearRegression


TODO: split the data into training and testing sets,
    using the standard settings for train_test_split.
 Then, train and test the classifiers with your newly split data instead of X and y.

In [46]:
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X,y)

In [41]:
reg1 = DecisionTreeRegressor()
reg1.fit(x_train, y_train)
dtm = mae(y_test,reg1.predict(x_test))
print "Decision Tree mean absolute error: {:.2f}".format(dtm)


Decision Tree mean absolute error: 7.80


In [42]:
reg2 = LinearRegression()
reg2.fit(x_train, y_train)
lrm = mae(y_test,reg2.predict(x_test))
print "Linear regression mean absolute error: {:.2f}".format(lrm)

Linear regression mean absolute error: 7.52


In [43]:
results = {
 "Linear Regression": lrm,
 "Decision Tree": dtm
}

## Compute Mean Squared Error


In [49]:
from sklearn.metrics import mean_squared_error as mse


x_train, x_test, y_train, y_test = train_test_split(X,y)

reg1 = DecisionTreeRegressor()
reg1.fit(x_train, y_train)
dtm = mse(y_test,reg1.predict(x_test))
print "Decision Tree mean absolute error: {:.2f}".format(dtm)

reg2 = LinearRegression()
reg2.fit(x_train, y_train)
lrm = mse(y_test,reg2.predict(x_test))
print "Linear regression mean absolute error: {:.2f}".format(lrm)

results = {
 "Linear Regression": lrm,
 "Decision Tree": dtm
}

Decision Tree mean absolute error: 363.20
Linear regression mean absolute error: 362.12
