<a href="https://colab.research.google.com/github/MarzukhAsjad/ML_projects/blob/main/ML_Project_1_(Perceptron).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classifying Rocks and Mines using Sonar Data (Perceptron)
Dataset to be used has been taken from kaggle.com https://www.kaggle.com/datasets/mattcarter865/mines-vs-rocks?resource=download 

Points to note
*   The dataset has been presented with data noting difference between rocks and mines and some sonar data to distinguish them

Steps to produce the prediction results
*   Pre process the data
*   Create training datasets, verifying datasets
*   Create a Perceptron Model with Scikit-learn library
*   Predict using the model and check the accuracy score


#  Data reading and Pre processing the data

In [None]:
import pandas as pd
import numpy as np

data = pd.read_csv('/content/sonar.all-data.csv', header = None)
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,60
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094,R


In [None]:
data.shape #number of rows (entries) and columns
# last column is the classifier, rest are features

(208, 61)

# Rearranging data
*   **M** is Mine

*   **R** is Rock

In [None]:
data[60].value_counts()

M    111
R     97
Name: 60, dtype: int64

In [None]:
X = data.drop(columns = 60, axis = 1)
Y = data[60]
print(X.shape)
print('Class labels:', np.unique(Y))

(208, 60)
Class labels: ['M' 'R']


# Separating training datasets

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.1, random_state=1, stratify=Y)
print(X_train.shape)
print(X_test.shape)
print(y_test)

(187, 60)
(21, 60)
113    M
23     R
45     R
81     R
82     R
109    M
176    M
134    M
96     R
98     M
57     R
169    M
13     R
204    M
10     R
161    M
7      R
172    M
68     R
102    M
106    M
Name: 60, dtype: object


# Training with Perceptron

First, standardise the training data

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

Next, train the data using a perceptron model

In [None]:
from sklearn.linear_model import Perceptron
ppn = Perceptron(eta0=0.1, random_state=1)
ppn.fit(X_train_std, y_train)
y_pred = ppn.predict(X_train_std)
print(y_pred)
print(y_train)
print('Misclassified training samples:',(y_train!=y_pred).sum())
print('Accuracy: %.3f' % accuracy_score(y_pred, y_train))

['M' 'R' 'R' 'M' 'R' 'M' 'R' 'M' 'R' 'R' 'M' 'M' 'R' 'R' 'M' 'R' 'R' 'M'
 'R' 'R' 'M' 'R' 'R' 'R' 'M' 'M' 'R' 'R' 'R' 'M' 'M' 'M' 'R' 'R' 'M' 'M'
 'R' 'M' 'M' 'M' 'R' 'M' 'M' 'R' 'M' 'M' 'M' 'M' 'R' 'R' 'R' 'R' 'M' 'R'
 'M' 'M' 'R' 'M' 'R' 'M' 'R' 'R' 'M' 'R' 'M' 'M' 'R' 'R' 'M' 'M' 'M' 'M'
 'M' 'M' 'M' 'R' 'R' 'R' 'M' 'M' 'M' 'R' 'R' 'M' 'R' 'M' 'M' 'M' 'M' 'M'
 'M' 'R' 'M' 'M' 'M' 'R' 'R' 'M' 'M' 'M' 'R' 'R' 'R' 'M' 'M' 'R' 'R' 'R'
 'M' 'M' 'R' 'M' 'M' 'M' 'M' 'M' 'R' 'M' 'M' 'R' 'M' 'M' 'R' 'R' 'R' 'M'
 'R' 'M' 'R' 'M' 'M' 'M' 'R' 'M' 'M' 'M' 'R' 'R' 'M' 'M' 'M' 'M' 'R' 'M'
 'R' 'M' 'R' 'R' 'M' 'M' 'M' 'R' 'M' 'R' 'M' 'R' 'M' 'M' 'R' 'M' 'R' 'R'
 'M' 'R' 'R' 'M' 'R' 'M' 'R' 'M' 'R' 'M' 'R' 'M' 'M' 'R' 'R' 'R' 'M' 'R'
 'M' 'R' 'M' 'R' 'M' 'M' 'M']
115    M
38     R
56     R
123    M
18     R
      ..
140    M
5      R
154    M
131    M
203    M
Name: 60, Length: 187, dtype: object
Misclassified training samples: 16
Accuracy: 0.914


Not a very large dataset, hence a lot of misclassified training samples. We should reduce the number of misclassified training samples by tweaking the parameters.

Now we predict using the trained model

In [None]:
y_pred = ppn.predict(X_test_std)
print('Misclassified samples:', (y_test != y_pred).sum())

Misclassified samples: 7


We can calculate the accuracy of the test

In [None]:
from sklearn.metrics import accuracy_score
print('Accuracy: %.3f' % accuracy_score(y_test, y_pred))

Accuracy: 0.667


Overall, not the best result. This could be improved using a Logistic Regression Model instead of a Perceptron model.