# 🧠 Sonar Rock vs Mine Classifier (Full ML Project)

This project uses a dataset of sonar signals to predict whether an object is a **rock** or a **mine**.

## 🔍 What You'll Learn:
- How to load and explore a dataset
- How to clean and preprocess data (handle types, imbalance, scaling, etc.)
- How to train ML models (Logistic Regression & Random Forest)
- How to evaluate using accuracy, confusion matrix, and classification report
- How to use cross-validation
- How to test with new data
- How to write a prediction function
- Bonus: Visualizations to understand model performance

In [None]:
# ✅ Step 1: Importing Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, ConfusionMatrixDisplay

In [None]:
# ✅ Step 2: Load the Dataset
df = pd.read_csv('https://raw.githubusercontent.com/krishnaik06/SONAR-Data-Analysis/master/sonar.all-data.csv', header=None)
# Adding column names for clarity
df.columns = [str(i) for i in range(60)] + ['Label']
df.head()

In [None]:
# ✅ Step 3: Understand the Structure of the Data
print('Shape:', df.shape)
print('\nInfo:')
print(df.info())
print('\nMissing Values:', df.isnull().sum().sum())
print('\nClass Distribution:')
print(df['Label'].value_counts())

In [None]:
# ✅ Step 4: Encode Labels (M = 1 = Mine, R = 0 = Rock)
le = LabelEncoder()
df['Label'] = le.fit_transform(df['Label'])
df['Label'].value_counts()

In [None]:
# ✅ Step 5: Visualize Class Distribution
sns.countplot(x='Label', data=df)
plt.title('Class Distribution (0 = Rock, 1 = Mine)')
plt.show()

In [None]:
# ✅ Step 6: Train-Test Split and Feature Scaling
X = df.drop('Label', axis=1)
y = df['Label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
# ✅ Step 7: Train Logistic Regression and Random Forest
lr_model = LogisticRegression()
lr_model.fit(X_train, y_train)
lr_pred = lr_model.predict(X_test)

rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_train, y_train)
rf_pred = rf_model.predict(X_test)

In [None]:
# ✅ Step 8: Evaluate Models
print('Logistic Regression Accuracy:', accuracy_score(y_test, lr_pred))
print('Random Forest Accuracy:', accuracy_score(y_test, rf_pred))

print('\nClassification Report (Random Forest):')
print(classification_report(y_test, rf_pred))

ConfusionMatrixDisplay.from_estimator(rf_model, X_test, y_test)
plt.title('Confusion Matrix - Random Forest')
plt.show()

In [None]:
# ✅ Step 9: Cross-Validation on Full Dataset
scores = cross_val_score(rf_model, X, y, cv=5)
print('Cross-validation scores:', scores)
print('Mean CV Score:', scores.mean())

In [None]:
# ✅ Step 10: Hyperparameter Tuning (Random Forest)
param_grid = {'n_estimators': [50, 100, 150], 'max_depth': [None, 5, 10]}
grid = GridSearchCV(RandomForestClassifier(random_state=42), param_grid, cv=3)
grid.fit(X_train, y_train)
print('Best Parameters:', grid.best_params_)
best_model = grid.best_estimator_
print('Test Accuracy (Tuned Model):', accuracy_score(y_test, best_model.predict(X_test)))

In [None]:
# ✅ Step 11: Predict on New Data
def predict_sample(sample):
    sample = np.array(sample).reshape(1, -1)
    sample_scaled = scaler.transform(sample)
    pred = best_model.predict(sample_scaled)[0]
    return 'Mine' if pred == 1 else 'Rock'

# Try with a test sample
print('Prediction:', predict_sample(X_test[0]))

## ✅ Conclusion
- Logistic Regression and Random Forest both work, but Random Forest performs better.
- Cross-validation ensures stability across datasets.
- We tuned the model with GridSearchCV for better performance.
- Final model is wrapped in a simple prediction function.

**Next Steps:** Try deploying this with Streamlit!