In this exercise you will use the Portuguese sea battles data (Links to an external site.) that contains outcomes of naval battles between Portuguese and Dutch/British ships between 1583 and 1663. The dataset has following features:

- Battle: Name of the battle place
- Year: Year of the battle
- Portuguese ships: Number of Portuguese ships
- Dutch ships: Number of Dutch ships
- English ships: Number of ships from English side
- Ratio of Portuguese to Dutch/British ships
- Spanish Involvement: 1=Yes, 0=No
- Portuguese outcome: -1=Defeat, 0=Draw, 1=Victory

Use an SVM based model to predict the Portuguese outcome of the battle from the number of ships involved in all sides and Spanish involvement. Try solving the same problem using two other classifiers that you know. Report and compare their results with those from SVM.

Submit a single PDF file with no more than two pages.

In [1]:
import os, glob

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import classification_report,confusion_matrix, accuracy_score
from sklearn import preprocessing
from sklearn import metrics

In [2]:
cols = ['Battle', 'Year', 'PortugeseShips', 'DutchShips', 'EnglishShips', 'Ratio', 
        'SpanishInvolvement', 'PortugeseOutcome']
df = pd.read_csv("http://users.stat.ufl.edu/~winner/data/armada.dat", sep=r'\s{2,}', 
                 names = cols, header = None)
df.head()

  after removing the cwd from sys.path.


Unnamed: 0,Battle,Year,PortugeseShips,DutchShips,EnglishShips,Ratio,SpanishInvolvement,PortugeseOutcome
0,Bantam,1601,6,3,0,2.0,0,0
1,Malacca Strait,1606,14,11,0,1.273,0,0
2,Ilha das Naus,1606,6,9,0,0.667,0,-1
3,Pulo Butum,1606,7,9,0,0.778,0,1
4,Surrat,1615,6,0,4,1.5,0,0


In [3]:
#Creating X and y
X = df.iloc[:, [2,3,4,6]]
y = df[['PortugeseOutcome']]

#Creating training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

#### SVC

In [4]:
#Building the SVC model and fitting the training data
model = SVC(kernel='linear')
model.fit(X_train,y_train.values.ravel())

#Predicting on the test data
predictions = model.predict(X_test)

#Printing the accuracy
print("Accuracy:", accuracy_score(y_test, predictions))
#Printing the confusion matrix
print(confusion_matrix(y_test,predictions))
#Printing the classification report
print(classification_report(y_test,predictions))

Accuracy: 0.3333333333333333
[[1 5 0]
 [0 2 0]
 [0 1 0]]
              precision    recall  f1-score   support

          -1       1.00      0.17      0.29         6
           0       0.25      1.00      0.40         2
           1       0.00      0.00      0.00         1

    accuracy                           0.33         9
   macro avg       0.42      0.39      0.23         9
weighted avg       0.72      0.33      0.28         9



  _warn_prf(average, modifier, msg_start, len(result))


#### Logisitic Regression model

In [5]:
#Creating Logisitic Regression model
model = LogisticRegression()
model.fit(X_train,y_train)

#Predicting on Test set
predictions = model.predict(X_test)

print(accuracy_score(y_test,predictions))
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test, predictions))

0.6666666666666666
[[4 2 0]
 [0 2 0]
 [0 1 0]]
              precision    recall  f1-score   support

          -1       1.00      0.67      0.80         6
           0       0.40      1.00      0.57         2
           1       0.00      0.00      0.00         1

    accuracy                           0.67         9
   macro avg       0.47      0.56      0.46         9
weighted avg       0.76      0.67      0.66         9



  return f(**kwargs)
  _warn_prf(average, modifier, msg_start, len(result))


#### KNN

In [6]:
#Creating KNN Classfier model
knn = KNeighborsClassifier(n_neighbors=3)

#Fitting the training data
knn.fit(X_train,y_train)

#Predicting on the test data
predictions = knn.predict(X_test)

print("k=", 3)
#Printing Confusion matrix and accuracy socres 
print(accuracy_score(y_test,predictions))
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test, predictions))

k= 3
0.3333333333333333
[[2 4 0]
 [1 1 0]
 [1 0 0]]
              precision    recall  f1-score   support

          -1       0.50      0.33      0.40         6
           0       0.20      0.50      0.29         2
           1       0.00      0.00      0.00         1

    accuracy                           0.33         9
   macro avg       0.23      0.28      0.23         9
weighted avg       0.38      0.33      0.33         9



  """
  _warn_prf(average, modifier, msg_start, len(result))


#### decision tree

In [7]:
# Creating the DTC and fitting the model
dtree = DecisionTreeClassifier()
dtree.fit(X_train,y_train)

#Predicting on test data
predictions = dtree.predict(X_test)

#Printing the classification report and accuracy score
print(accuracy_score(y_test,predictions))
print(confusion_matrix(y_test,predictions))

0.2222222222222222
[[2 3 1]
 [1 0 1]
 [0 1 0]]


#### random forest

In [8]:
rfc = RandomForestClassifier(n_estimators=100)
# y_train is a column vector, but 1d array is expected. Therefore, we need to
# change the shape to (n_samples,)
rfc.fit(X_train, y_train.values.ravel())

predictions = rfc.predict(X_test)
print("Accuracy:", accuracy_score(y_test, predictions))
print(confusion_matrix(y_test,predictions))

Accuracy: 0.2222222222222222
[[0 6 0]
 [0 2 0]
 [0 1 0]]
