## Importing Dependencies  
- **numpy** is used for numerical computations.  
- **pandas** is used for handling tabular data.  
- **train_test_split** helps in splitting data for training and testing.  
- **LogisticRegression** is the model used for classification.  
- **accuracy_score** helps evaluate model performance.


In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

## Loading the Dataset
- The dataset is loaded using **pd.read_csv()**, which is a function in pandas that reads CSV (Comma-Separated Values) files and loads them into a DataFrame.  
- **header=None** is used because the dataset does not have predefined column names. Without this, pandas would assume the first row as column names. By setting **header=None**, all rows are treated as data, and pandas assigns default numerical column indices starting from 0.  

In [2]:
sonar_data = pd.read_csv('/content/sonar data.csv', header=None)

FileNotFoundError: [Errno 2] No such file or directory: '/content/sonar data.csv'

## Displaying First Few Rows
- **head()** shows the first 5 rows of the dataset for inspection. It helps in quickly understanding the structure and contents of the dataset.  

In [None]:
sonar_data.head()

## Understanding Dataset Shape
- **shape** returns the number of rows and columns in the dataset. It helps in understanding the dataset's dimensions.  


In [None]:
sonar_data.shape

## Statistical Summary  
- **describe()** provides statistical insights such as mean, standard deviation, minimum, maximum, and quartile values for numerical columns in the dataset. It helps in understanding the distribution of data.  


In [None]:
sonar_data.describe()

## Checking Class Distribution
- The dataset has labels in the **60th column**.  
- **value_counts()** shows the count of each class **Rock-(R)** or **Mine-(M)**, helping to understand the distribution of target labels.  


In [None]:
sonar_data[60].value_counts()

In [None]:
sonar_data.groupby(60).mean()

- **X**: The features (independent variables) of the dataset, obtained by dropping the column with index **60** from **sonar_data**.
- **Y**: The target (dependent variable) of the dataset, which corresponds to the column with index **60** from **sonar_data**.
- **drop(columns = 60, axis = 1)**: Removes the column at index **60** (which is the target) from the dataset to create the feature set **X**.
- **sonar_data[60]**: Selects the column at index **60** from **sonar_data** as the target variable **Y**.


In [None]:
# separating data and Labels
X = sonar_data.drop(columns=60, axis=1)
Y = sonar_data[60]

- **print(X)**: This will display the feature set **X**, which contains all the input variables of the dataset except for the column at index **60** (the target variable).
- **print(Y)**: This will display the target variable **Y**, which contains the values from the column at index **60** of the **sonar_data**.


In [None]:
print(X)
print(Y)

## Splitting Data into Train & Test
- **train_test_split()** divides the dataset into training and testing sets, ensuring the model is evaluated on unseen data.  
- **X**: The input features (independent variables) of the dataset.
- **Y**: The target labels (dependent variable) of the dataset.
- **test_size=0.1**: This specifies that **10%** of the dataset will be used for testing, while **90%** will be used for training.
- **stratify=Y**: Ensures the class distribution in the target variable `Y` is preserved in both the training and testing sets. This is useful for handling imbalanced datasets.
- **random_state=1**: This ensures the split is reproducible. By setting a random seed, you ensure that the split will be the same every time the code is run with this value.



In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.1, stratify=Y, random_state=1)

## Printing the shapes (dimensions) of the following arrays:<br>
**print(X.shape, X_train.shape, X_test.shape)**:
- **X.shape**: The shape of the original feature set **X**, showing the number of samples (rows) and features (columns).
- **X_train.shape**: The shape of the training feature set **X_train**, showing how many samples and features are used for training.
- **X_test.shape**: The shape of the testing feature set **X_test**, showing how many samples and features are used for testing.


In [None]:
print(X.shape, X_train.shape, X_test.shape)

- **print(X_train)**: This will display the feature set for training **(`X_train`)**, which contains the input variables used to train the model.
- **print(Y_train)**: This will display the target variable for training **(`Y_train`)**, which contains the labels or outputs corresponding to the training data.


In [None]:
print(X_train)
print(Y_train)

## Training the Logistic Regression Model

- The process of training a machine learning model involves using the **fit()** method to adjust the model's parameters based on the training data.
  
- **LogisticRegression()**: This is a classification algorithm from **sklearn.linear_model**, used to predict binary or multi-class outcomes based on input features.

- **fit() method**: The **fit()** method is used to train the logistic regression model on the provided training data.
- It takes two arguments: the feature set **(X_train)** and the target variable **(Y_train)**, and it learns the relationship between them.



In [None]:
model = LogisticRegression()
model.fit(X_train, Y_train)

## Evaluating Model Performance
- **X_train_prediction = model.predict(X_train)**: This line uses the trained logistic regression model to predict the labels (target values) for the training feature set **X_train**. The predicted values are stored in **X_train_prediction**.

- **training_data_accuracy = accuracy_score(X_train_prediction, Y_train)**: This calculates the accuracy of the model on the training data by comparing the predicted labels **(X_train_prediction)** with the actual labels **(Y_train)**. The **accuracy_score** function from **sklearn.metrics** returns the proportion of correct predictions, which is stored in **training_data_accuracy**.


In [None]:
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)

In [None]:
print('Accuracy on training data : ', training_data_accuracy)

In [None]:
#accuracy on test data
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)

In [None]:
print('Accuracy on test data : ', test_data_accuracy)

Making a Predictive System

In [None]:
input_data = (0.0307,0.0523,0.0653,0.0521,0.0611,0.0577,0.0665,0.0664,0.1460,0.2792,0.3877,0.4992,0.4981,0.4972,0.5607,0.7339,0.8230,0.9173,0.9975,0.9911,0.8240,0.6498,0.5980,0.4862,0.3150,0.1543,0.0989,0.0284,0.1008,0.2636,0.2694,0.2930,0.2925,0.3998,0.3660,0.3172,0.4609,0.4374,0.1820,0.3376,0.6202,0.4448,0.1863,0.1420,0.0589,0.0576,0.0672,0.0269,0.0245,0.0190,0.0063,0.0321,0.0189,0.0137,0.0277,0.0152,0.0052,0.0121,0.0124,0.0055)

# changing the input_data to a numpy array
input_data_as_numpy_array = np.asarray(input_data)

# reshape the np array as we are predicting for one instance
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction = model.predict(input_data_reshaped)
print(prediction)

if (prediction[0]=='R'):
  print('The object is a Rock')
else:
  print('The object is a mine')
