# SONAR Rock or Mine Prediction

Ref: [Sidhardhan's ML projects](https://www.youtube.com/watch?v=fiz1ORTBGpY&list=PLfFghEzKVmjvuSA67LszN1dZ-Dd_pkus6)
Author

> ### **Aim:**
To detect whether the object beneath the submarine is a Mine (M) or a Rock (R) using data from submarine's SONAR tech

> ### **Workflow:**

1. Collect SONAR data (waves reflected from rock/metal in a lab setup)
1. Data collection and Pre-processing
1. Train, test, split
1. Use logistic regression model (since it works well for binary classification) ---> supervised learning
1. The created logistic regression model will be fed new data and its predictions on whether the object is a Rock(R) or a Mine(M) will be evaluated


In [97]:
# importing required modules

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

## **Data Collection and Pre-processing**

In [98]:
# examining the data
sonar_df = pd.read_csv('sonar_data.csv', header=None)
print(f"The dataframe has {sonar_df.shape[0]} rows and {sonar_df.shape[1]} columns")
print("Here are some basic stats regarding the numerical data:")
sonar_df.describe()

The dataframe has 208 rows and 61 columns
Here are some basic stats regarding the numerical data:


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,50,51,52,53,54,55,56,57,58,59
count,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,...,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0
mean,0.029164,0.038437,0.043832,0.053892,0.075202,0.10457,0.121747,0.134799,0.178003,0.208259,...,0.016069,0.01342,0.010709,0.010941,0.00929,0.008222,0.00782,0.007949,0.007941,0.006507
std,0.022991,0.03296,0.038428,0.046528,0.055552,0.059105,0.061788,0.085152,0.118387,0.134416,...,0.012008,0.009634,0.00706,0.007301,0.007088,0.005736,0.005785,0.00647,0.006181,0.005031
min,0.0015,0.0006,0.0015,0.0058,0.0067,0.0102,0.0033,0.0055,0.0075,0.0113,...,0.0,0.0008,0.0005,0.001,0.0006,0.0004,0.0003,0.0003,0.0001,0.0006
25%,0.01335,0.01645,0.01895,0.024375,0.03805,0.067025,0.0809,0.080425,0.097025,0.111275,...,0.008425,0.007275,0.005075,0.005375,0.00415,0.0044,0.0037,0.0036,0.003675,0.0031
50%,0.0228,0.0308,0.0343,0.04405,0.0625,0.09215,0.10695,0.1121,0.15225,0.1824,...,0.0139,0.0114,0.00955,0.0093,0.0075,0.00685,0.00595,0.0058,0.0064,0.0053
75%,0.03555,0.04795,0.05795,0.0645,0.100275,0.134125,0.154,0.1696,0.233425,0.2687,...,0.020825,0.016725,0.0149,0.0145,0.0121,0.010575,0.010425,0.01035,0.010325,0.008525
max,0.1371,0.2339,0.3059,0.4264,0.401,0.3823,0.3729,0.459,0.6828,0.7106,...,0.1004,0.0709,0.039,0.0352,0.0447,0.0394,0.0355,0.044,0.0364,0.0439


The last column (index = 60) states whether the object is a rock (R) or a mine (M)

In [99]:
# counts number of categories for non-numeric data
sonar_df[60].value_counts()

M    111
R     97
Name: 60, dtype: int64

To train a good model we require:
- Lots of data, for both categories
- An equal amount of data, for both categories

Given the scope of this beginner project, the amount of data is sufficient (ideally we would require at least 1000 values for each category) and since there are an almost equal number of data points for both rocks and mines, our prediction will suitably conform to a good accuracy score.

Separate the input data, X i.e. **features** (columns 0 to 59) and the data to be predicted, y i.e. **labels** (column 60)

In [102]:
# separate the features (X) and the labels (y)
X = sonar_df.drop(60, axis=1)
Y = sonar_df.iloc[:,60]

## **Train, Test, Split**

In [103]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, 
                                                    random_state = 1,
                                                    test_size = 0.1,
                                                    stratify = Y)

Parameters:
- `X_train` = training data; `Y_train` = corresponding labels of the training data
- `X_test` = testing data; `Y_test` = corresponding labels of the testing data
- 10% of data will be in the testing set => `test_size = 0.1`
- split the data into rock(R) and mine(M) => `stratify = Y`
- should spilt the data similarly whenever the function is called => `random_state = 1`

## **Model Training and Evalutaion**


In [104]:
# logistic regression model
sonar_log_model = LogisticRegression()

In [105]:
# fit training data to the model
sonar_log_model.fit(X_train, Y_train)

LogisticRegression()

Check accuracy on training data first...
- `X_train_pred`: prediction values by the model on training data
- `Y_train`: true/actual values


In [106]:
# accuracy on training data
X_train_pred = sonar_log_model.predict(X_train)
accuracy_X_train_pred = accuracy_score(X_train_pred, Y_train)
print(f"Accuracy on Training data: {accuracy_X_train_pred*100:.1f}%")

Accuracy on Training data: 83.4%


...then on testing data:
- `X_test_pred`: prediction values by the model on test data
- `Y_test`: true/actual values

In [107]:
# accuracy on test data
X_test_pred = sonar_log_model.predict(X_test)
accuracy_X_test_pred = accuracy_score(X_test_pred, Y_test)
print(f"Accuracy on Testing data: {accuracy_X_test_pred*100:.1f}%")

Accuracy on Testing data: 76.2%


## **Making a Predictive System**

Choose a random row from the data set (that is not in the training data!) as input data and see if the model correctly predicts the result

In [108]:
from random import choice

def rand_row_idx(X_train):
  '''
  function that accepts a training set and returns the row indices of the values used in the training set and 
  a random row index whose values haven't been used in that training set
  '''
  train_idxs = np.asarray(X_train.index) # all row indices used in training set
  remaining_idxs = [i for i in range(0,X.shape[0]) if i not in train_idxs] # remaining row indices
  return train_idxs, choice(remaining_idxs) # selects a single row index at random

Check whether the row index refers to the indices of the training set values...

In [109]:
# check whether row index choice is in the list of indices of the training set
if rand_row_idx(X_train)[1] in rand_row_idx(X_train)[0]:
  print(True)
else:
  print(False)

False


In [110]:
row_idx = rand_row_idx(X_train)[1]
row_idx

109

Get a set of values corresponding to the random row index and use your model to make a prediction

In [111]:
# reshape input data
new_input = np.asarray(X.iloc[row_idx]).reshape(1,-1)

def rock_or_mine(X):
  '''function that takes in a set of values as input, and returns a prediction whether
  those values correspond to a rock(R) or a mine(M)'''
  return sonar_log_model.predict(X)

In [112]:
if rock_or_mine(new_input) == 'R':
  print("The object predicted is a: Rock")
else:
  print("The object predicted is a: Mine")

if rock_or_mine(new_input) == Y[row_idx]:
  print("This is a correct prediction :)")
else:
  print("This is an incorrect prediction :(")

The object predicted is a: Mine
This is a correct prediction :)
