### Seeds Dataset
____

The Wheat Seeds Dataset involves the prediction of species given measurements of seeds from different varieties of wheat.

It is a binary (2-class) classification problem. The number of observations for each class is balanced. 

There are 210 observations with 7 input variables and 1 output variable. The variable names are as follows:

- Area.
- Perimeter.
- Compactness
- Length of kernel.
- Width of kernel.
- Asymmetry coefficient.
- Length of kernel groove.
- Class (1, 2, 3).


___

[FOR MORE](https://archive.ics.uci.edu/ml/datasets/seeds)| [DOWNLOAD](https://raw.githubusercontent.com/selva86/datasets/master/seeds.csv)

In [116]:
from sklearn.neural_network import MLPClassifier
import numpy as np
import pandas as pd

In [117]:
df = pd.read_csv("seeds.csv")
df.head()

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8
0,15.26,14.84,0.871,5.763,3.312,2.221,5.22,1
1,14.88,14.57,0.8811,5.554,3.333,1.018,4.956,1
2,14.29,14.09,0.905,5.291,3.337,2.699,4.825,1
3,13.84,13.94,0.8955,5.324,3.379,2.259,4.805,1
4,16.14,14.99,0.9034,5.658,3.562,1.355,5.175,1


In [118]:
df.tail()

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8
205,12.19,13.2,0.8783,5.137,2.981,3.631,4.87,3
206,11.23,12.88,0.8511,5.14,2.795,4.325,5.003,3
207,13.2,13.66,0.8883,5.236,3.232,8.315,5.056,3
208,11.84,13.21,0.8521,5.175,2.836,3.598,5.044,3
209,12.3,13.34,0.8684,5.243,2.974,5.637,5.063,3


In [119]:
df['V8'].unique()

array([1, 2, 3], dtype=int64)

In [120]:
df.describe()

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8
count,210.0,210.0,210.0,210.0,210.0,210.0,210.0,210.0
mean,14.847524,14.559286,0.870999,5.628533,3.258605,3.700201,5.408071,2.0
std,2.909699,1.305959,0.023629,0.443063,0.377714,1.503557,0.49148,0.818448
min,10.59,12.41,0.8081,4.899,2.63,0.7651,4.519,1.0
25%,12.27,13.45,0.8569,5.26225,2.944,2.5615,5.045,1.0
50%,14.355,14.32,0.87345,5.5235,3.237,3.599,5.223,2.0
75%,17.305,15.715,0.887775,5.97975,3.56175,4.76875,5.877,3.0
max,21.18,17.25,0.9183,6.675,4.033,8.456,6.55,3.0


In [121]:
# lets do splits manual not by preprocessing
X_train = df.iloc[:170,:-1]
X_test = df.iloc[171:,:-1]
y_train = df.iloc[:170,-1]
y_test = df.iloc[171:,-1]


print ( X_train.count() )
print ( X_test.count() )

V1    170
V2    170
V3    170
V4    170
V5    170
V6    170
V7    170
dtype: int64
V1    39
V2    39
V3    39
V4    39
V5    39
V6    39
V7    39
dtype: int64


In [149]:
clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
...                     hidden_layer_sizes=(10, 400), random_state=1)

In [150]:
clf.fit(X_train, y_train)

MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(10, 400), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)

In [151]:
clf.score

<bound method ClassifierMixin.score of MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(10, 400), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)>

In [152]:
len( clf.predict(X_train) )

170

In [153]:
clf.predict(X_train)

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1,
       1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2,
       2, 2, 3, 3, 1, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 1, 3, 3, 3, 3], dtype=int64)

In [154]:
y_pred=clf.predict(X_test)

In [155]:
from sklearn.metrics import accuracy_score

In [156]:
accuracy_score(y_test, y_pred)

0.89743589743589747

In [157]:
from sklearn.metrics import classification_report,confusion_matrix

In [158]:
print(confusion_matrix(y_test,y_pred))

[[ 0  0]
 [ 4 35]]


In [159]:
print(classification_report(y_test,y_pred))

             precision    recall  f1-score   support

          1       0.00      0.00      0.00         0
          3       1.00      0.90      0.95        39

avg / total       1.00      0.90      0.95        39



  'recall', 'true', average, warn_for)


#### resourses used:
http://scikit-learn.org/stable/modules/neural_networks_supervised.html
