# Training using Pandas dataframes
In this notebook we'll take a look at how to use pandas dataframes to train a neural network in CNTK.
We're reusing the code from chapter 1 where we trained a iris classification model.

In [1]:
import cntk
import numpy 

cntk._cntk_py.set_fixed_random_seed(1337)
numpy.random.seed = 1337

## Building the model
The model we use here is a basic classification model with a single hidden layer and an output layer that understands three possible species of flowers. We use a softmax activation function on the output layer and a sigmoid function on the hidden layer.

In [2]:
from cntk import default_options, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import log_softmax, sigmoid

model = Sequential([
    Dense(4, activation=sigmoid),
    Dense(3, activation=log_softmax)
])

features = input_variable(4)
labels = input_variable(3)

z = model(features)

## Loading and preprocessing the training data
We load up the iris dataset and preprocess it so that we end up with a set of floating point numbers as required by the model.
Please note that we didn't split the dataset, this sample is just to demonstrate that you can use pandas dataframes. It is important to know that you can use `train_test_split` from the `scikit-learn` library to perform the necessary splitting.

In [3]:
def one_hot(index, length):
    result = np.zeros(length)
    result[index] = 1
    
    return result

In [4]:
import numpy as np
import pandas as pd

df_source = pd.read_csv('iris.csv', 
    names=['sepal_length', 'sepal_width','petal_length','petal_width', 'species'], 
    index_col=False)

label_mapping = {
    'Iris-setosa': 0,
    'Iris-versicolor': 1,
    'Iris-virginica': 2
}

X = df_source.iloc[:, :4].values

y = df_source['species'].values
y = np.array([one_hot(label_mapping[v], 3) for v in y])

X = X.astype(np.float32)
y = y.astype(np.float32)

## Training the model
In the previous step we've converted our data from pandas to numpy arrays. This makes it fairly straightforward to train our model. We need a loss and learner to optimize the model. We've included a progress printer so we can see what the training process looks like.

In [5]:
from cntk.losses import cross_entropy_with_softmax
from cntk.learners import sgd 
from cntk.logging import ProgressPrinter

progress_writer = ProgressPrinter(0)
loss = cross_entropy_with_softmax(z, labels)
learner = sgd(z.parameters, 0.1)

train_summary = loss.train((X,y), parameter_learners=[learner], callbacks=[progress_writer], minibatch_size=16, max_epochs=5)

 average      since    average      since      examples
    loss       last     metric       last              
 ------------------------------------------------------




Learning rate per minibatch: 0.1
      1.1        1.1          0          0            16
    0.835      0.704          0          0            48
    0.993       1.11          0          0           112
     1.14       1.14          0          0            16
    0.902      0.783          0          0            48
     1.03       1.13          0          0           112
     1.19       1.19          0          0            16
     0.94      0.817          0          0            48
     1.06       1.16          0          0           112
     1.14       1.14          0          0            16
    0.907       0.79          0          0            48
     1.05       1.15          0          0           112
     1.07       1.07          0          0            16
    0.852      0.744          0          0            48
     1.01       1.14          0          0           112
