Inspired by https://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/.

### Problem Definition

For numbers betwen 1 and 100:
   * if the number is the multiple of 3, print "fizz",
   * if the number is the multiple of 5, print "buzz",
   * if the number is the multiple of both 3 and 5, print "fizzbuzz",
   * otherwise, print the number itself.


In [1]:
import torch
import numpy as np
from torch import nn

We will represent input numbers as bits. We'll  support 12-bit numbers, in the range of [0,4095]. We'll train on numbers in the range of [101, 4095]. The final test will be, per problem definition, in [1, 100].

In [2]:
BITS = 12

In [3]:
def to_binary(i, num_bits):
    digits = np.array([i >> d & 1 for d in range(num_bits)])
    # Reverses the array to have the bits in the usual order.
    return digits[::-1].copy()

Example: convert 4 to the binary representation:

In [4]:
to_binary(4, BITS)[::-1]

array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0])

The labels will be encoded as indices of the array, i.e. 0, 1, 2, and 3. The labels mark expected answers, e.g.:

* 333 is "fizz" -> 0
* 115 is "buzz" -> 1
* 225 is "fizbuzz" -> 2, etc. 

We won't label numbers < 101 because those will be used during the test.

In [5]:
label_text = [
    'fizz', 
    'buzz', 
    'fizzbuzz',
    ''
]

In [6]:
def get_label(i):
    if   i % 15 == 0: return 2
    elif i % 5  == 0: return 1
    elif i % 3  == 0: return 0
    else:             return 3

In [7]:
get_label(333), get_label(115), get_label(225)

(0, 1, 2)

Generate the training set. It's two arrays. The first array contains integers, in the bit format. The second array has the corresponding labels, in the integer format (0, 1, 2, 3) as shown above. I.e.:
```
1. train_x: [101, 102, 103, ..., 4095]
2. train_y: [  3,   0,   3, ...,    2]
```

In [8]:
train_x = np.array([to_binary(i, BITS) for i in range(101, 2**BITS)])
train_y = np.array([get_label(i) for i in range(101, 2**BITS)])

In [9]:
train_x[:3]

array([[0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1],
       [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0],
       [0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1]])

In [10]:
train_y[:3]

array([3, 0, 3])

In [11]:
print([label_text[i] for i in train_y[:3]])

['', 'fizz', '']


This network is a *classifier*, i.e. it puts every input into one of the four classses: [0, 1, 2, 3] which correspond to the string values ['fizz', 'buzz', 'fizzbuzz','']. 

For classifiers we use negative log-loss function. Training the network means minimizing the value of this function.

In [12]:
loss = nn.NLLLoss()

The accuracy function calculates how many numbers we classified correctly from our test range, [1-100].

In [13]:
def accuracy(model,  loss):
    test_x = []
    test_y = []
    for i in range(1, 100):
        expect = get_label(i)
        test_x.append(to_binary(i, BITS))
        test_y.append(expect)
    x = torch.from_numpy(np.array(test_x)).float()
    y = torch.from_numpy(np.array(test_y))
    pred_y = model(x)
    pred = torch.argmax(pred_y, dim=1)
    acc = (y == pred).sum().float() / x.shape[0]
    return acc.item()

Create the model:

In [70]:
def create_net():
    return nn.Sequential(
        nn.Linear(BITS, 50),
        nn.ReLU(),
        nn.Linear(50, len(label_text)),
        nn.LogSoftmax(dim=-1)
    )

In [71]:
model = create_net()

In [72]:
opt = torch.optim.SGD(
    model.parameters(), 
    lr=5e-3, 
    momentum=0.9, 
    nesterov=True)

**Training loop**. We use stochastic gradient descent. This mean we pick a small number of training samples, e.g. 32, and perform one iteration of training on this sample. Then, we pick another 32 samples, and so on. Training on smaller samples is much faster than on the entire dataset every iteration.

In [75]:
import sklearn.utils 

train_x, train_y = sklearn.utils.shuffle(train_x, train_y)

train_x_t = torch.from_numpy(train_x).float()
train_y_t = torch.from_numpy(train_y)

print_every=100
batch_size = 32
test_pred = None

for i in range(500):
    batches = int(train_x.shape[0] / batch_size + 1)
    printed = False
    for b in range(batches):
        start = b * batch_size
        end = (b + 1) * batch_size
        bt = train_x_t[start:end]
        by = train_y_t[start:end]
        y_pred = model(bt)
        loss_val = loss(y_pred, by)
        model.zero_grad()
        loss_val.backward()
        opt.step()
        if i % print_every == 1 and not printed:
            print(i, b, "Loss:", loss_val.item(), "test accuracy:", accuracy(model, loss))
            printed = True
print(i, b, "Loss:", loss_val.item(), "test accuracy:", accuracy(model, loss))

1 0 Loss: 0.016308505088090897 test accuracy: 0.9898989796638489
101 0 Loss: 0.011160150170326233 test accuracy: 0.9898989796638489
201 0 Loss: 0.009250616654753685 test accuracy: 0.9898989796638489
301 0 Loss: 0.008243781514465809 test accuracy: 0.9898989796638489
401 0 Loss: 0.007405351847410202 test accuracy: 1.0
499 124 Loss: 0.013771540485322475 test accuracy: 1.0


The neural network achieves 100% accuracy on the test set, i.e. in the range 1-100. 

Predict function:

1. Convert the number into the bit format.
1. Apply the train neural network to obtain the predicted 'class', i.e. 0, 1, 2, or 3.
1. Look up the text label for the returned class and print it along with original number.

In [76]:
def predict(num):
    enc = to_binary(num, BITS)
    enc = torch.from_numpy(enc).float()
    pred = model(enc)
    pred = torch.exp(pred)
    index = torch.argmax(pred).item()
    return (num, label_text[index] if label_text[index] != '' else num)

In [77]:
predict(1), predict(2), predict(3), predict(4), \
predict(5), predict(6), predict(7), predict(10), \
predict(30)

((1, 1),
 (2, 2),
 (3, 'fizz'),
 (4, 4),
 (5, 'buzz'),
 (6, 'fizz'),
 (7, 7),
 (10, 'buzz'),
 (30, 'fizzbuzz'))