# Neural Networks for Classification

In this project, you'll be working with one of the most well-known machine learning datasets - the [Iris Data Set](https://archive.ics.uci.edu/ml/datasets/Iris) hosted at the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.html). Our goal is to train a network to identify a species of iris based on the flower's sepal length, sepal width, petal length, and petal width.

The dataset contains 50 data points for each of the three species, *Iris setosa, Iris versicolour*, and *Iris Virginica* for a total of 150 data points. 
![Iris types](img/iris_types.jpg)

### Data Description

We can load the data into a `pandas` dataframe as follows:

In [1]:
import pandas as pd

iris = pd.read_csv('data/iris.csv')

# Display the first few rows of the dataframe
iris.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


We can see the number of records in each column to ensure all of our datapoints are complete:

In [12]:
iris.count()

Id               150
SepalLengthCm    150
SepalWidthCm     150
PetalLengthCm    150
PetalWidthCm     150
Species          150
dtype: int64

And we can see the data type for each column like so:

In [13]:
iris.dtypes

Id                 int64
SepalLengthCm    float64
SepalWidthCm     float64
PetalLengthCm    float64
PetalWidthCm     float64
Species           object
dtype: object

### Visualization

In machine learning problems, it can be helpful to try and visualize the data where possible in order to get a feel for the problem. The `seaborn` library has some great tools for this.

**Caution:** You may not have `seaborn` installed on your machine. If this is the case, use the `pip` installer from your shell (Mac OSX/Linux): `pip install seaborn`. If you're on Windows, you won't be able to install `scipy` using `pip`. You'll have to use `conda` to install the package or manually download and install a wheel yourself.



We can visualize the relationship between two features and the target classes using `seaborn`'s `FacetGrid`:

In [15]:
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt

sns.FacetGrid(iris, hue="Species", size=6) \
   .map(plt.scatter, "SepalLengthCm", "SepalWidthCm") \
   .add_legend()

ImportError: No module named 'seaborn'

Or we can use `pairplot` to do this for all combinations of features!

In [16]:
sns.pairplot(iris.drop("Id", axis=1), hue="Species", size=3)

NameError: name 'sns' is not defined

From these plots we can see that *Iris setosa* is linearly separable from the others in all feature pairs. This could prove useful for the design of our network classifier.

Now that we've loaded our data and we know how it's structured, it's up to *you* to create a neural network classifier! I've given you some code to branch off of below. Good luck!

AttributeError: module 'sklearn' has no attribute 'placeholder'

[Womp womp](http://www.priceisrightfailhorn.com/). See if you can change the hyperparameters (learning rate, batch size, num_batches, etc.), the cost function (uncomment the one you'd like to use) or the structure of the network itself to yield better results!

In [6]:
from sklearn.datasets import load_iris

In [7]:
iris = load_iris()
type(iris)

sklearn.datasets.base.Bunch

In [9]:
print (iris.data)

[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]
 [ 5.   3.6  1.4  0.2]
 [ 5.4  3.9  1.7  0.4]
 [ 4.6  3.4  1.4  0.3]
 [ 5.   3.4  1.5  0.2]
 [ 4.4  2.9  1.4  0.2]
 [ 4.9  3.1  1.5  0.1]
 [ 5.4  3.7  1.5  0.2]
 [ 4.8  3.4  1.6  0.2]
 [ 4.8  3.   1.4  0.1]
 [ 4.3  3.   1.1  0.1]
 [ 5.8  4.   1.2  0.2]
 [ 5.7  4.4  1.5  0.4]
 [ 5.4  3.9  1.3  0.4]
 [ 5.1  3.5  1.4  0.3]
 [ 5.7  3.8  1.7  0.3]
 [ 5.1  3.8  1.5  0.3]
 [ 5.4  3.4  1.7  0.2]
 [ 5.1  3.7  1.5  0.4]
 [ 4.6  3.6  1.   0.2]
 [ 5.1  3.3  1.7  0.5]
 [ 4.8  3.4  1.9  0.2]
 [ 5.   3.   1.6  0.2]
 [ 5.   3.4  1.6  0.4]
 [ 5.2  3.5  1.5  0.2]
 [ 5.2  3.4  1.4  0.2]
 [ 4.7  3.2  1.6  0.2]
 [ 4.8  3.1  1.6  0.2]
 [ 5.4  3.4  1.5  0.4]
 [ 5.2  4.1  1.5  0.1]
 [ 5.5  4.2  1.4  0.2]
 [ 4.9  3.1  1.5  0.1]
 [ 5.   3.2  1.2  0.2]
 [ 5.5  3.5  1.3  0.2]
 [ 4.9  3.1  1.5  0.1]
 [ 4.4  3.   1.3  0.2]
 [ 5.1  3.4  1.5  0.2]
 [ 5.   3.5  1.3  0.3]
 [ 4.5  2.3  1.3  0.3]
 [ 4.4  3.2  1.3  0.2]
 [ 5.   3.5

In [10]:
print(iris.feature_names)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


In [11]:
print(iris.target)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]


In [12]:
print(iris.target_names)

['setosa' 'versicolor' 'virginica']


In [14]:
print (type(iris.data))
print (type(iris.target))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


In [15]:
print(iris.data.shape)

(150, 4)


In [16]:
print(iris.target.shape)

(150,)


In [17]:
X = iris.data
y = iris.target

In [18]:
from sklearn.neighbors import KNeighborsClassifier

In [19]:
knn=KNeighborsClassifier(n_neighbors =1)

In [20]:
print(knn)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=1, p=2,
           weights='uniform')


In [22]:
knn.fit(X, y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=1, p=2,
           weights='uniform')

In [23]:
knn.predict([3,5,4,2])



array([2])

In [24]:
X_new = [[3,5,4,2],[5,4,3,2]]
knn.predict(X_new)

array([2, 1])

In [25]:
print(X.shape)
print(y.shape)

(150, 4)
(150,)


In [27]:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.4,random_state=4)

In [28]:
print(X_train.shape)
print(X_test.shape)

(90, 4)
(60, 4)


In [29]:
print(y_train.shape)
print(y_test.shape)

(90,)
(60,)


In [36]:
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(X_train,y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [39]:
import sklearn.metrics
y_pred = logreg.predict(X_test)
print(metrics.accuracy_score(y_test,y_pred))

0.95


In [46]:
# my nueral network 
"""I don't think over-fitting occured because i splitted the data 
in half and used one side as the training data and the other like
the testing data, and I think I got pretty neat results."""

knn=KNeighborsClassifier(n_neighbors = 11)
knn.fit(X_train,y_train)
y_pred = knn.predict(X_test)
print(metrics.accuracy_score(y_test,y_pred))

0.983333333333
