## Classification: Detect Pipe or Rock with Sonar Data

Classification of Rock or Mine with [sonar data](https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks)). Sonar (sound navigation and ranging) uses sound waves to detect objects, similar to how a bat uses echo-location to navigate, and detect objects. It is the same principle with seismic data for non-invasive underground exploration of geologic formations to locate oil or gas reserves.

```python
url = 'https://apmonitor.com/pds/uploads/Main/sonar_detection.txt'
```

The data set of sonar is taken from different angles and locations to detect the rock or pipe. The data was collected in a laboratory under controlled conditions as a case study for detecting underground pipe. There are 111 labeled sets for the metal cyclinder (pipe) and 97 sonar patterns from rocks with similar conditions. Each sample is a set of 60 numbers between 0 and 1 that represents the integrated energy within a distinct frequency band and for a given time period.	

[Pipe / Rock Sonar Case Study](https://apmonitor.com/pds/index.php/Main/SonarDetection) on [Machine Learning for Engineers](https://apmonitor.com/pds/index.php/Main/SonarDetection)

Although this case study is specifc to detecting differences between metal pipe and rock, it is similar to detection of other underground features such as tunnels, mines, aquifers, and fluid-filled pipelines.

### Import Packages

Import *pandas*, *matplotlib*, and other packages that you need for this exercise.

### Read Data

Read data as a Pandas dataframe with `data=pd.read_csv(url)`. Show 10 random rows with `data.sample(10)`. 

### Data Visualization



In [None]:
data.plot(kind='box', subplots=True, layout=(6,10),\
             sharex=False, legend=False, fontsize=1, \
             figsize=(12,8))
plt.show()

In [None]:
data.groupby('Class').size()

In [None]:
data['Class'].value_counts().plot(kind='bar')
plt.show()

In [None]:
# correlation matrix
from matplotlib import cm
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(data.corr(), vmin=-1, vmax=1, cmap=cm.Spectral_r, interpolation='none')
fig.colorbar(cax)
fig.set_size_inches(10,10)
plt.savefig('sonar_correlation.png')
plt.show()

In [None]:
data.plot(kind='density', subplots=True, layout=(6,10),\
             sharex=False, legend=False, fontsize=1, \
             figsize=(12,8))
plt.show()

In [None]:
data.hist(sharex=False, sharey=False, layout=(6,10),\
        xlabelsize=1, ylabelsize=1, figsize=(12,8))
plt.show()

What insights do you gain from the data visualization and exploration? In particular, comment on the presence of uniform data distributions, outliers, missing data, and other data quality issues.

### Scale Data

Scale data with a [Standard Scalar](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) so that all features (sonar returns) are scaled. 

### Best Features

What are the factors that are most correlated or influential for predicting pipe versus rock. Use the `SelectKBest` package to produce a ranked list.

### Train / Test Split

Split the data into **Test** and **Train** sets. Randomly select values that split the data into a train (80%) and test (20%) set by using the sklearn `train_test_split` with `shuffle=True`. 

### Import Classifier Packages

Classification: Use 8 classification methods. Possible regression methods are:

- AdaBoost
- Logistic Regression
- Naïve Bayes
- Stochastic Gradient Descent
- K-Nearest Neighbors
- Decision Tree
- Random Forest
- Support Vector Classifier
- Deep Learning Neural Network

The [Scikit-learn documentation](https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html) has additional information on classifiers.

### Initialize Classifiers

### Train Classifiers

### Show Confusion Matrix Result

A confusion matrix shows true positive, false positive, true negative, and false negative groups from the test set.

```python
from sklearn.metrics import plot_confusion_matrix
```

Generate a confusion matrix for each classifier.

### Interpretation of Results

Write an executive summary (max 2 page report) on the result of the classifiers from the sonar data set. Report the confusion matrix in the test set. What recommendation do you give on detecting pipe versus rock?