# Learning Non-linear Features in Neural Networks
---

I had recently watched a [interview](https://youtu.be/UMpSrvGB4zs) of Geoffrey Hinton by Andrew Ng. In that interview the question of activation functions came up, and it was mentioned  how it took many layers of sigmoid activations to get a ReLU activation. This brought up something I did not have an intuitive handle on. How did a neural network learn arbitrary non-linear functions? How many layers and how much data would it take to learn such a function?

Since I learn by doing, I decided to use this opportunity to play around with tensorflow to answer these questions.

## The data

The non-linear functions I'm aiming to learn are multiplication and squaring. I feel like these are fairly simple commonly used when feature engineering, so it might be useful to know how much it would take to replicate those.

So I'm going to start off with 10000 training examples. I'm not sure if I want the dependent variables to be exact or have some noise, so I will create both

In [7]:
import numpy as np

np.random.seed(5)

n = 10000
X = np.random.random([2, n])*1000
y_mul = np.reshape(X[0]*X[1], [1, X.shape[1]])
y_mul_noisy = y_mul + np.random.standard_normal(n)*10
y_sq = np.reshape(X[0]*X[0]**2, [1, X.shape[1]])
y_sq_noisy = y_sq + np.random.standard_normal(n)*10

print(X.shape, y_mul.shape, y_mul_noisy.shape, y_sq.shape)

(2, 10000) (1, 10000) (1, 10000) (1, 10000)


In [8]:
X

array([[ 221.99317109,  870.73230618,  206.71915534, ...,  448.51458681,
         309.19316068,  685.9578296 ],
       [ 307.7683588 ,  280.113877  ,  353.7053968 , ...,  621.344013  ,
         229.80883656,  577.66874193]])