## Create methods that will batch the data 


Whenever you want to batch the data you need to have appropriate methods. There are some batching methods integrated in TensorFlow and sklearn, but some problems may need specific coding. 

Here we show how these methods look like. You can use them for any machine learning framework yo uneed (directly or after little fine-tuning). 

In [1]:
import numpy as np

# Create a class that will do the batching for the algorithm
# This code is extremely reusable. You should just change Audiobooks_data everywhere in the code

class Audiobooks_Data_Reader():
    # Dataset is mandatory argument, while the batch_size is optional
    # If you don't input batch_size, it will automatically take the value: None
    def __init__(self, dataset, batch_size = None):
        
        # The dataset that loads is one of "train", "validation", "test".
        # e.g. if I call this class with x('train',5), it will load 'Audiobooks_data_train.npz' with a batch size of 5.
        npz = np.load('Audiobooks_data_{0}.npz'.format(dataset))
        
        # Two variables that take the values of the inputs and the targets. inputs are floats, targets are integers
        self.inputs, self.targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)
        
        # Counts the batch number, given the size you feed it later
        # If the batch size is None, we are either validating or testing, so we want to take the data in a single batch
        if batch_size is None:
            self.batch_size = self.inputs.shape[0]
        else:
            self.batch_size = batch_size
        self.curr_batch = 0
        self.batch_count = self.inputs.shape[0]//self.batch_size
        
    #  A method which loads the next batch 
    def __next__(self):
        if self.curr_batch >= self.batch_count:
            self.curr_batch = 0
            raise StopIteration()
            
        # You slice the dataset in batches and then the "next" function loads them one after the other
        batch_slice = slice(self.curr_batch*self.batch_size, (self.curr_batch+1)*self.batch_size)
        inputs_batch = self.inputs[batch_slice]
        targets_batch = self.targets[batch_slice] 
        self.curr_batch += 1


        # One hot encode the targets. In this example it's a bit suerfluous since we have a 0/1 column
        # As a target already but we're giving you the code regardless, as it will be useful for any 
        # Classification task with more than one target column

        classes_num = 2
        targets_one_hot = np.zeros((targets_batch.shape[0], classes_num))
        targets_one_hot[range(targets_batch.shape[0]), targets_batch] = 1

        # The function will return the inputs batch and the one hot encoded targets
        return inputs_batch, targets_one_hot


    # A method needed for iterating over the batches, as we will put them in a loop.
    # This tells Python that the class we're defining is iterable, i.e. that we can use it like:
    # for input, outut in data:
        # do things
    # An iterator in Python is a class with a method __next__ that defines exactly how to iterate through its objects
    
    def __iter__(self):
        return self

      
        
        
        
        
        

## Create the machine learning algorithm 
We will create an algorithm which is essentially copy-pasting the MNIST code and we will simply adjust where needed. Once more, I will put the whole code in one piece as we can simply rerun the cell and train a new model. That's becaise the whole algorithm is contained in the cell and we have the tf. reset_default_graph() function. 

In [2]:
import tensorflow as tf

input_size = 10
output_size = 2
hidden_layer_size= 50

tf.reset_default_graph()

inputs = tf.placeholder(tf.float32,[None,input_size])
targets = tf.placeholder(tf.int32,[None,output_size])

weights_1 = tf.get_variable("weights_1", [input_size,hidden_layer_size])
biases_1 = tf.get_variable("biases_1",[hidden_layer_size])

outputs_1 = tf.nn.relu(tf.matmul(inputs,weights_1)+biases_1)

weights_2 = tf.get_variable("weights_2", [hidden_layer_size,hidden_layer_size])
biases_2 = tf.get_variable("biases_2",[hidden_layer_size])

outputs_2 = tf.nn.relu(tf.matmul(outputs_1,weights_2)+biases_2)

weights_3=tf.get_variable("weights_3",[hidden_layer_size,output_size])
biases_3 = tf.get_variable("biases_3",[output_size])

outputs = tf.matmul(outputs_2,weights_3)+biases_3

loss = tf.nn.softmax_cross_entropy_with_logits(logits=outputs,labels=targets)

mean_loss = tf.reduce_mean(loss)

optimize = tf.train.AdamOptimizer(learning_rate=0.001).minimize(mean_loss)

out_equals_target = tf.equal(tf.argmax(outputs,1),tf.argmax(targets,1))

accuracy = tf.reduce_mean(tf.cast(out_equals_target,tf.float32))


sess = tf.InteractiveSession()

initializer = tf.global_variables_initializer()

sess.run(initializer)

# prev_validation_loss=9999999.

batch_size = 100
max_epochs=50
prev_validation_loss=9999999.

train_data = Audiobooks_Data_Reader('train', batch_size)
validation_data = Audiobooks_Data_Reader ('validation')

for epoch_counter in range(max_epochs):
    
    curr_epoch_loss = 0.
    
    for input_batch, target_batch in train_data:
        _, batch_loss = sess.run([optimize, mean_loss],
           feed_dict = {inputs: input_batch, targets: target_batch})
            
        curr_epoch_loss += batch_loss
        
    curr_epoch_loss /= train_data.batch_count

    validation_loss = 0.
    validation_accuracy = 0.
    
    for input_batch, target_batch in validation_data:
        validation_loss,validation_accuracy = sess.run([mean_loss, accuracy],
        feed_dict = {inputs:input_batch, targets:target_batch})
        
    print ('Epoch'+str(epoch_counter+1)+
          '. Training loss: '+'{0:.3f}'.format(curr_epoch_loss)+
          '. Validation loss: '+ '{0:.3f}'.format(validation_loss)+
          '. Validation accuracy: '+'{0:.2f}'.format(validation_accuracy*100.)+'%')
    
    if validation_loss > prev_validation_loss:
        break
    
    prev_validation_loss = validation_loss
    
print ('End of training.')
        

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

Epoch1. Training loss: 0.661. Validation loss: 0.550. Validation accuracy: 69.80%
Epoch2. Training loss: 0.498. Validation loss: 0.459. Validation accuracy: 75.17%
Epoch3. Training loss: 0.439. Validation loss: 0.418. Validation accuracy: 74.94%
Epoch4. Training loss: 0.408. Validation loss: 0.398. Validation accuracy: 75.17%
Epoch5. Training loss: 0.389. Validation loss: 0.386. Validation accuracy: 75.39%
Epoch6. Training loss: 0.375. Validation loss: 0.379. Validation accuracy: 76.96%
Epoch7. Training loss: 0.365. Validation loss: 0.374. Validation accuracy: 78.97%
Epoch8. Training loss: 0.358. Validation loss: 0.371. Validation accuracy: 78.97%
Epoch9. Training loss: 0.353. Validation loss: 0.369. Validation accuracy: 78.75%
Epoch10. Training loss: 0.348. Validation loss: 0.367. Validation accur

## Test the model

In [3]:
test_data = Audiobooks_Data_Reader ('test')

for input_batch, target_batch in test_data:
        test_accuracy = sess.run([accuracy],
                    feed_dict = {inputs:input_batch, targets:target_batch})
        
test_accuracy_percent = test_accuracy[0]*100.

print('Test accuracy: '+'{0:.2f}'.format(test_accuracy_percent)+'%')

Test accuracy: 84.15%
