<a href="https://colab.research.google.com/github/Dharma-Ranganathan/AllAboutPython/blob/main/Rock_Mine_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Overview:

1. Data Collection from My Drive
2. Data cleaning and Pre-processing
3. Handling Imbalanced Data - Balanced state
4. Handling Missing Dataset
5. Training and Testing
6. Model Selection
7. Model Training
8. Prediction and Model Evaluation

Importing required libraries

In [None]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Data collection

In [None]:
sonar = pd.read_csv('/content/drive/MyDrive/Colab_python/sonar data.csv', header=None)

# checking
sonar.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,60
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094,R


Data cleaning and Pre-processing

In [None]:

# shape
sonar.shape

(208, 61)

In [None]:
# checking missing values
sonar.isnull().sum()

Unnamed: 0,0
0,0
1,0
2,0
3,0
4,0
...,...
56,0
57,0
58,0
59,0


In [None]:
# checking balanced state
sonar.value_counts(60)

Unnamed: 0_level_0,count
60,Unnamed: 1_level_1
M,111
R,97


Basic Pre-processing done so far, dataset is in balanced state and no null values occured

Splitting dataset for x - feature and y - label

Feature - X - independent variable - columns

Label - Y - dependent variable - rows (values)

In [None]:
X = sonar.drop(60,axis=1)
X.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,50,51,52,53,54,55,56,57,58,59
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0232,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0125,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0033,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078
3,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0241,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117
4,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0156,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094


In [None]:
Y = sonar[60]
Y.head()

Unnamed: 0,60
0,R
1,R
2,R
3,R
4,R


Training and Testing

In [None]:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.1,stratify=Y,random_state=3)
print(X.shape,X_train.shape,X_test.shape)

(208, 60) (187, 60) (21, 60)


Model Selection

In [None]:
# logistic regression

model = LogisticRegression()

model.fit(X_train,Y_train)

X_train_prediction

In [None]:
X_train_pred = model.predict(X_train)

Accuracy of X trained and Y trained

In [None]:
accuracy_trained = accuracy_score(X_train_pred,Y_train)
print(f"accuracy of trained data prediction : {(accuracy_trained * 100):.2f} %")

accuracy of trained data prediction : 85.03 %


X_test_prediction

In [None]:
X_test_pred = model.predict(X_test)

Accuracy of X tested and Y tested

In [None]:
accuracy_tested = accuracy_score(X_test_pred,Y_test)
print(f"accuracy of tested data prediction : {(accuracy_tested * 100):.2f} %")

accuracy of tested data prediction : 61.90 %


Trained Prediction - 85 % - Great

Tested Prediction - 62 % - Good

Building a predictive system based on given inputs of rock and mine

In [41]:
# rock data

pred_rock = (0.0368,0.0403,0.0317,0.0293,0.0820,0.1342,0.1161,0.0663,0.0155,0.0506,0.0906,0.2545,0.1464,0.1272,0.1223,0.1669,0.1424,0.1285,0.1857,0.1136,0.2069,0.0219,0.2400,0.2547,0.0240,0.1923,0.4753,0.7003,0.6825,0.6443,0.7063,0.5373,0.6601,0.8708,0.9518,0.9605,0.7712,0.6772,0.6431,0.6720,0.6035,0.5155,0.3802,0.2278,0.1522,0.0801,0.0804,0.0752,0.0566,0.0175,0.0058,0.0091,0.0160,0.0160,0.0081,0.0070,0.0135,0.0067,0.0078,0.0068)

rock_np_arr = np.asarray(pred_rock)

rock_np_arr = rock_np_arr.reshape(1,-1)

Function to print proper statement

In [48]:
def proper(st):
    result = {'R' : 'Obtained prediction is Rock','M' : 'Obtained prediction is Mine'}
    print("output :",result[st])

Prediction of rock data

In [49]:
rock_prediction = model.predict(rock_np_arr)
proper(rock_prediction[0])

output : Obtained prediction is Rock


In [50]:
# mine data

pred_mine = (0.0249,0.0119,0.0277,0.0760,0.1218,0.1538,0.1192,0.1229,0.2119,0.2531,0.2855,0.2961,0.3341,0.4287,0.5205,0.6087,0.7236,0.7577,0.7726,0.8098,0.8995,0.9247,0.9365,0.9853,0.9776,1.0000,0.9896,0.9076,0.7306,0.5758,0.4469,0.3719,0.2079,0.0955,0.0488,0.1406,0.2554,0.2054,0.1614,0.2232,0.1773,0.2293,0.2521,0.1464,0.0673,0.0965,0.1492,0.1128,0.0463,0.0193,0.0140,0.0027,0.0068,0.0150,0.0012,0.0133,0.0048,0.0244,0.0077,0.0074)

mine_np_arr = np.asarray(pred_mine)

mine_np_arr = mine_np_arr.reshape(1,-1)

Prediction of mine data

In [51]:
mine_prediction = model.predict(mine_np_arr)
proper(mine_prediction[0])

output : Obtained prediction is Mine


So far completed basic project of rock vs mine prediction using logistic regression model... Thank you...