# SONAR Rock vs Mine Prediction

**Problem Statement**:
The goal of this project is to build a machine learning model that can classify sonar signals as either rocks or mines based on how sonar waves reflect off these objects.

**How It Works:**
1. When sonar waves hit a metal mine, they produce a strong, distinct reflection because metal is a smooth and dense surface.

2. When sonar waves hit a rock, the reflection is weaker and more scattered because rocks have rough and irregular surfaces.

3. By analyzing these reflected signals, a machine learning model can learn the patterns and classify new signals correctly.

**Steps:**

1. **Collect Data** – Use a dataset with sonar signals labeled as rocks or mines.

2. **Train a Model** – Use machine learning to recognize patterns in sonar reflections.

3. **Test & Improve** – Check accuracy and fine-tune the model for better results.

4. **Make Predictions** – Use the trained model to classify new sonar signals.

We have SONAR data for this project, we do preprocessing on it....after that we do splitting of data for machine learning model....then we apply logistic regression on it.....then we feed new data for this model for prediction.

### Importing the dependencies

In [1]:
import numpy as np   #for arrays and numerical operations
import pandas as pd   #for data manipulation
from sklearn.model_selection import train_test_split   #for splitting dataset for machine learning model
from sklearn.linear_model import LogisticRegression     #for logistic regression
from sklearn.metrics import accuracy_score     #evalution by accuracy score for supervised classification machine learning model

### Data Collection and Data Processing

In [2]:
#Loading the dataset to pandas dataframe
sonardf = pd.read_csv("C:\\Users\\Tejas Nigade\\Downloads\\Copy of sonar data.csv")

In [3]:
sonardf  #displaying dataset

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,52,53,54,55,56,57,58,59,60,61
0,0.0200,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.0180,0.0084,0.0090,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.0140,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.2280,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.0180,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.0100,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.0150,0.0085,0.0073,0.0050,0.0044,0.0040,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.0590,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.0110,0.0015,0.0072,0.0048,0.0107,0.0094,R
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
203,0.0187,0.0346,0.0168,0.0177,0.0393,0.1630,0.2028,0.1694,0.2328,0.2684,...,0.0116,0.0098,0.0199,0.0033,0.0101,0.0065,0.0115,0.0193,0.0157,M
204,0.0323,0.0101,0.0298,0.0564,0.0760,0.0958,0.0990,0.1018,0.1030,0.2154,...,0.0061,0.0093,0.0135,0.0063,0.0063,0.0034,0.0032,0.0062,0.0067,M
205,0.0522,0.0437,0.0180,0.0292,0.0351,0.1171,0.1257,0.1178,0.1258,0.2529,...,0.0160,0.0029,0.0051,0.0062,0.0089,0.0140,0.0138,0.0077,0.0031,M
206,0.0303,0.0353,0.0490,0.0608,0.0167,0.1354,0.1465,0.1123,0.1945,0.2354,...,0.0086,0.0046,0.0126,0.0036,0.0035,0.0034,0.0079,0.0036,0.0048,M


In [4]:
sonardf.head()   #displaying first five rows

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,52,53,54,55,56,57,58,59,60,61
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094,R


In [19]:
sonardf.tail()    #displaying last 5 rows

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,52,53,54,55,56,57,58,59,60,61
203,0.0187,0.0346,0.0168,0.0177,0.0393,0.163,0.2028,0.1694,0.2328,0.2684,...,0.0116,0.0098,0.0199,0.0033,0.0101,0.0065,0.0115,0.0193,0.0157,M
204,0.0323,0.0101,0.0298,0.0564,0.076,0.0958,0.099,0.1018,0.103,0.2154,...,0.0061,0.0093,0.0135,0.0063,0.0063,0.0034,0.0032,0.0062,0.0067,M
205,0.0522,0.0437,0.018,0.0292,0.0351,0.1171,0.1257,0.1178,0.1258,0.2529,...,0.016,0.0029,0.0051,0.0062,0.0089,0.014,0.0138,0.0077,0.0031,M
206,0.0303,0.0353,0.049,0.0608,0.0167,0.1354,0.1465,0.1123,0.1945,0.2354,...,0.0086,0.0046,0.0126,0.0036,0.0035,0.0034,0.0079,0.0036,0.0048,M
207,0.026,0.0363,0.0136,0.0272,0.0214,0.0338,0.0655,0.14,0.1843,0.2354,...,0.0146,0.0129,0.0047,0.0039,0.0061,0.004,0.0036,0.0061,0.0115,M


In [5]:
#number of rows and columns
sonardf.shape

(208, 61)

In [6]:
sonardf.describe()    #statistical measures of the data

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,51,52,53,54,55,56,57,58,59,60
count,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,...,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0
mean,0.029164,0.038437,0.043832,0.053892,0.075202,0.10457,0.121747,0.134799,0.178003,0.208259,...,0.016069,0.01342,0.010709,0.010941,0.00929,0.008222,0.00782,0.007949,0.007941,0.006507
std,0.022991,0.03296,0.038428,0.046528,0.055552,0.059105,0.061788,0.085152,0.118387,0.134416,...,0.012008,0.009634,0.00706,0.007301,0.007088,0.005736,0.005785,0.00647,0.006181,0.005031
min,0.0015,0.0006,0.0015,0.0058,0.0067,0.0102,0.0033,0.0055,0.0075,0.0113,...,0.0,0.0008,0.0005,0.001,0.0006,0.0004,0.0003,0.0003,0.0001,0.0006
25%,0.01335,0.01645,0.01895,0.024375,0.03805,0.067025,0.0809,0.080425,0.097025,0.111275,...,0.008425,0.007275,0.005075,0.005375,0.00415,0.0044,0.0037,0.0036,0.003675,0.0031
50%,0.0228,0.0308,0.0343,0.04405,0.0625,0.09215,0.10695,0.1121,0.15225,0.1824,...,0.0139,0.0114,0.00955,0.0093,0.0075,0.00685,0.00595,0.0058,0.0064,0.0053
75%,0.03555,0.04795,0.05795,0.0645,0.100275,0.134125,0.154,0.1696,0.233425,0.2687,...,0.020825,0.016725,0.0149,0.0145,0.0121,0.010575,0.010425,0.01035,0.010325,0.008525
max,0.1371,0.2339,0.3059,0.4264,0.401,0.3823,0.3729,0.459,0.6828,0.7106,...,0.1004,0.0709,0.039,0.0352,0.0447,0.0394,0.0355,0.044,0.0364,0.0439


In [7]:
sonardf.nunique()    #checking unique values in each column

1     177
2     182
3     190
4     181
5     193
     ... 
57    121
58    124
59    119
60    109
61      2
Length: 61, dtype: int64

In [8]:
print(sonardf["61"].value_counts())   #their is no class imbalance issue in this dataset

61
M    111
R     97
Name: count, dtype: int64


### *More the data more accurate your model is...*

Here,

**M --> Mines**

**R --> Rocks**

In [9]:
sonardf.groupby("61").mean()     #checking mean of rh target columns 61

Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,...,51,52,53,54,55,56,57,58,59,60
61,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
M,0.034989,0.045544,0.05072,0.064768,0.086715,0.111864,0.128359,0.149832,0.213492,0.251022,...,0.019352,0.016014,0.011643,0.012185,0.009923,0.008914,0.007825,0.00906,0.008695,0.00693
R,0.022498,0.030303,0.035951,0.041447,0.062028,0.096224,0.11418,0.117596,0.137392,0.159325,...,0.012311,0.010453,0.00964,0.009518,0.008567,0.00743,0.007814,0.006677,0.007078,0.006024


### Separating data and labels

In [10]:
x = sonardf.drop(columns=["61"])    #x contains all columns except target column
y = sonardf["61"]       #y contains target column

In [11]:
print(x)     #printing x

          1       2       3       4       5       6       7       8       9  \
0    0.0200  0.0371  0.0428  0.0207  0.0954  0.0986  0.1539  0.1601  0.3109   
1    0.0453  0.0523  0.0843  0.0689  0.1183  0.2583  0.2156  0.3481  0.3337   
2    0.0262  0.0582  0.1099  0.1083  0.0974  0.2280  0.2431  0.3771  0.5598   
3    0.0100  0.0171  0.0623  0.0205  0.0205  0.0368  0.1098  0.1276  0.0598   
4    0.0762  0.0666  0.0481  0.0394  0.0590  0.0649  0.1209  0.2467  0.3564   
..      ...     ...     ...     ...     ...     ...     ...     ...     ...   
203  0.0187  0.0346  0.0168  0.0177  0.0393  0.1630  0.2028  0.1694  0.2328   
204  0.0323  0.0101  0.0298  0.0564  0.0760  0.0958  0.0990  0.1018  0.1030   
205  0.0522  0.0437  0.0180  0.0292  0.0351  0.1171  0.1257  0.1178  0.1258   
206  0.0303  0.0353  0.0490  0.0608  0.0167  0.1354  0.1465  0.1123  0.1945   
207  0.0260  0.0363  0.0136  0.0272  0.0214  0.0338  0.0655  0.1400  0.1843   

         10  ...      51      52      53      54   

In [12]:
print(y)    #printing y

0      R
1      R
2      R
3      R
4      R
      ..
203    M
204    M
205    M
206    M
207    M
Name: 61, Length: 208, dtype: object


### Training And Test

In [13]:
#10% of data for testing and 90% for training
#stratify Ensures that the proportion of "R" (Rock) and "M" (Mine) in y_train and y_test is the same as in y.
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, stratify=y, random_state=1)

In [14]:
print(x.shape, x_train.shape, x_test.shape)

(208, 60) (187, 60) (21, 60)


In [15]:
print(y.shape, y_train.shape, y_test.shape)

(208,) (187,) (21,)


### Model Training - Logistic Regression

In [16]:
model = LogisticRegression()     #initializing logistic regression
#training the logistic regression model with training data
model.fit(x_train, y_train)   #fitting model on training dataset

### Model Evaluation

In [17]:
#accuracy on training data 
x_train_prediction = model.predict(x_train)   #the model makes prediction on training data
training_data_accuracy = accuracy_score(x_train_prediction, y_train)   #It compares the predicted values with the actual values to calculate the accuracy.

In [18]:
print("Accuracy on training data : ", training_data_accuracy)   #printing accuracy

Accuracy on training data :  0.8342245989304813


In [21]:
#accuracy on test data 
x_test_prediction = model.predict(x_test)    #the model makes prediction on testing data
test_data_accuracy = accuracy_score(x_test_prediction, y_test)   #It compares the predicted values with the actual values to calculate the accuracy.

In [22]:
print("Accuracy on testing data : ", test_data_accuracy)     #printing accuracy

Accuracy on testing data :  0.7619047619047619


### Making a Predictive System

In [26]:
input_data = (0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,0.1609,0.1582,0.2238,0.0645,0.066,0.2273,0.31,0.2999,0.5078,0.4797,0.5783,0.5071,0.4328,0.555,0.6711,0.6415,0.7104,0.808,0.6791,0.3857,0.1307,0.2604,0.5121,0.7547,0.8537,0.8507,0.6692,0.6097,0.4943,0.2744,0.051,0.2834,0.2825,0.4256,0.2641,0.1386,0.1051,0.1343,0.0383,0.0324,0.0232,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032)

#changing the input data datatype list to numpy array
input_data_as_numpy_array = np.asarray(input_data)   #converting list to numpy array

#reshape the numpy array as we predicting for one instance
input_data_reshape = input_data_as_numpy_array.reshape(1,-1)
prediction = model.predict(input_data_reshape)
print(prediction)

# Breaking Down reshape(1, -1)
# 1 → Means one row (since we are predicting for one object).
# -1 → Automatically figures out the number of columns based on the input data size.

if(prediction[0]=="R"):
    print("The Object is Rock.")
else:
    print("The Object is Mine.")

['R']
The Object is Rock.


