# Can we use a neural network to learn the modulo operation?

This is an attempt to answer the Quora question <a href="https://www.quora.com/unanswered/Can-a-neural-network-learn-the-modulo-operation">Can a neural network learn the modulo operation?</a>

In [1]:
import numpy as np
from sklearn.neural_network import MLPRegressor, MLPClassifier
from sklearn.metrics import r2_score

The modulo function takes two numbers a and n and outputs a-(a/n)*n. We'll generate some training numbers in the range of [1,128) and test on numbers in the range [128,256). We use this range to be able to convert the numbers to bytes for classification later.

In [2]:
bytesize = 2**8

In [3]:
a = np.random.randint(1,bytesize/2,10000).reshape(-1,1)
n = np.random.randint(1,bytesize/2,10000).reshape(-1,1)
a_test = np.random.randint(bytesize/2,bytesize,10000).reshape(-1,1)
n_test = np.random.randint(bytesize/2,bytesize,10000).reshape(-1,1)

In [4]:
y = a % n
y_test = a_test % n_test

We'll use the neural network implementation from scikit-learn with the standard settings.

In [5]:
model = MLPRegressor()

In [6]:
X = np.hstack([a,n])
X_test = np.hstack([a_test,n_test])

Our feature matrix consists of random number pairs and looks like this:

In [7]:
X

array([[ 64,  50],
       [109, 103],
       [ 58,  40],
       ...,
       [ 27,  29],
       [ 56,  44],
       [120,  53]])

Let's split the training set 80/20 for training and validation.

In [8]:
X_train = X[:8000]
X_val = X[8000:]
y_train = y[:8000]
y_val = y[8000:]

In [9]:
model.fit(X_train, y_train)

  y = column_or_1d(y, warn=True)


MLPRegressor(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [10]:
y_val_pred = model.predict(X_val)
y_val_true = y_val

Let's look at some sample predictions.

In [11]:
y_val_pred[:10]

array([2.58615012e+00, 2.04988135e+01, 3.56219857e+00, 6.33512900e+00,
       4.76152969e-04, 2.07706836e+01, 6.04556630e+00, 5.88709148e+01,
       4.65442661e+01, 1.72276852e+01])

In [12]:
y_val_true[:10]

array([[ 3],
       [26],
       [ 3],
       [ 2],
       [ 4],
       [21],
       [ 3],
       [60],
       [47],
       [16]], dtype=int32)

Not horrible. How about the R^2 score?

In [13]:
r2_score(y_val_true, y_val_pred)

0.869229538683644

That's actually pretty good! But the numbers in the validation set are in the same range as those in the training set, so that doesn't tell us a lot. Let's use the test numbers instead, which are totally different.

In [14]:
y_test_pred = model.predict(X_test)

In [15]:
r2_score(y_test, y_test_pred)

0.8129753909342599

Again, not bad at all. Here is a function to test some predictions.

In [16]:
def predict_modulo(a,n):
    print("Predicted:", model.predict([[a,n]])[0])
    print("True:", a%n)

In [17]:
predict_modulo(15,9)

Predicted: 4.458343817840735
True: 6


The predictions are often not far off, but even though the R^2 score is decent, we can't really say that the neural net has learned the modulo operation, because it's rarely spot on.

### Binary version

Perhaps we can improve by converting the numbers to a binary representation, which neural networks are so fond of. First, let's write some helper functions to convert numbers to binary vectors and back.

In [18]:
def num2bin(num):
    return [int(c) for c in '{:08b}'.format(num)]

In [19]:
def bin2num(b):
    return int("".join([str(n) for n in b]),2)

In [20]:
def list2bin(numlist):
    return np.array([num2bin(num[0]) for num in numlist])

In [21]:
def list2num(binlist):
    return [bin2num(b) for b in binlist]

In [22]:
a = list2bin(a)
n = list2bin(n)
y = list2bin(y)
a_test = list2bin(a_test)
n_test = list2bin(n_test)
# y_test = list2bin(y_test)

In [23]:
X = np.hstack([a,n])
X_test = np.hstack([a_test,n_test])

Now our feature matrix consists of binary vectors of length 16, representing the numbers a and n.

In [24]:
X

array([[0, 1, 0, ..., 0, 1, 0],
       [0, 1, 1, ..., 1, 1, 1],
       [0, 0, 1, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 1, 0, 1],
       [0, 0, 1, ..., 1, 0, 0],
       [0, 1, 1, ..., 1, 0, 1]])

We're going to need a classifier this time.

In [25]:
model = MLPClassifier()

In [26]:
X_train = X[:8000]
X_val = X[8000:]
y_train = y[:8000]
y_val = y[8000:]

In [27]:
model.fit(X_train, y_train)



MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [28]:
y_val_pred = list2num(model.predict(X_val))
y_val_true = list2num(y_val)

In [29]:
y_val_pred[:20]

[0, 16, 3, 6, 0, 21, 3, 60, 47, 0, 14, 28, 13, 18, 23, 43, 1, 44, 31, 3]

In [30]:
y_val_true[:20]

[3, 26, 3, 2, 4, 21, 3, 60, 47, 16, 14, 28, 13, 18, 23, 43, 1, 44, 31, 3]

In [31]:
r2_score(y_val_true, y_val_pred)

0.9521324692652988

As expected, the neural network was able to fit the training data very well. But what about the real test?

In [32]:
y_test_pred = list2num(model.predict(X_test))

In [33]:
r2_score(y_test, y_test_pred)

-0.893631377257317

No, that didn't work at all. The classification model has just fit the training data without being able to generalize. In conclusion, learning a highly non-linear function such as the modulo operation using statistical inference is hard!