# Creating a CTF file
In chapter 3 of the book we're using CTF (CNTK Text Format) files to feed data into the neural network for training purposes. In this notebook we'll explain how you can convert a common CSV file into a CTF file for use with CNTK.

## Loading and processing the data
The output dataset for this notebook is a CTF file that can be read directly by the CNTK trainer API.
To do this we need to make sure the dataset contains only floating point values. First we extract a features matrix and a labels matrix from the dataset. The labels are stored as strings so we need to convert those to floating point representations. For this we use the `LabelBinarizer` from `scikit-learn`. 

In [39]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelBinarizer

df_source = pd.read_csv('iris.csv', 
    names=['sepal_length', 'sepal_width','petal_length','petal_width', 'species'], 
    index_col=False)

features = df_source.iloc[:,:4].values
labels = df_source.iloc[:,-1:].values

label_encoder = LabelBinarizer()
labels = label_encoder.fit_transform(labels)

## Writing the file to disk
CTF files are text files that contain samples to use for either training or testing your neural network. The file format works like this. Each line contains a single sequence:

```
line=[sequence_id] (sample+)
sample=|feature (value*)
```

You can store one sequence over multiple lines. Each sample is added to a sequence when the sequence IDs match.
The use of sequence IDs is optional.

The sample code below demonstrates how we can store our preprocessed dataset as a CTF file.
We'll iterate over all the rows in our dataset and store the values as samples in the file.

Note that when you store two different features on the same line, they are considered one sample.
So that's what we'll use to keep the features and labels for a single sample together.

To store the values correctly, we'll iterate over each of the values in the `features` and `labels` array, convert them to a string and then join them together with a space as separator.

In [40]:
with open('iris.ctf', 'w') as output_file:
    for index in range(0, features.shape[0]):
        feature_values = ' '.join([str(x) for x in np.nditer(features[index])])
        label_values = ' '.join([str(x) for x in np.nditer(labels[index])])
        
        output_file.write('|features {} |labels {}\n'.format(feature_values, label_values))