## Imports

In [45]:
import torch
import numpy as np

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Lets define the activation function

In [2]:
def sigmoid(x):
    return 1/(1 + torch.exp(-x))

## Lets test our activation function!

Create normally distributed numbers

In [3]:
x = torch.randn((1, 3))

In [4]:
x

tensor([[ 0.3299, -0.0124,  2.6945]])

Squish them between 0 to 1. Observe that numbers which are large, get close to 1.

In [5]:
sigmoid(x)

tensor([[0.5817, 0.4969, 0.9367]])

## Lets do some math with features and weights

In [34]:
features = torch.randn((1, 5))
features
features.shape

tensor([[ 0.3301,  0.5344,  1.6967, -1.2788,  1.5472]])

torch.Size([1, 5])

In [35]:
weights = torch.randn_like(features)
weights
weights.shape

tensor([[-1.6660, -1.2729,  1.2155, -1.0954,  0.3891]])

torch.Size([1, 5])

Both are of type tensors. A tensor is just a generalisation of vectors and matrices. These are the basic data structures used to build a neural network.

In [8]:
features * weights

tensor([[-0.0849,  0.0456,  0.2698, -0.4198,  0.8317]])

In [10]:
0.2235 * -0.38

-0.08493

by using "*" we are just doing element wise multiplication. We want to do matrix multiplication of these two tensors

In [11]:
help(torch.mm)

Help on built-in function mm:

mm(...)
    mm(input, mat2, out=None) -> Tensor
    
    Performs a matrix multiplication of the matrices :attr:`input` and :attr:`mat2`.
    
    If :attr:`input` is a :math:`(n \times m)` tensor, :attr:`mat2` is a
    :math:`(m \times p)` tensor, :attr:`out` will be a :math:`(n \times p)` tensor.
    
    .. note:: This function does not :ref:`broadcast <broadcasting-semantics>`.
              For broadcasting matrix products, see :func:`torch.matmul`.
    
    Args:
        input (Tensor): the first matrix to be multiplied
        mat2 (Tensor): the second matrix to be multiplied
        out (Tensor, optional): the output tensor.
    
    Example::
    
        >>> mat1 = torch.randn(2, 3)
        >>> mat2 = torch.randn(3, 3)
        >>> torch.mm(mat1, mat2)
        tensor([[ 0.4851,  0.5037, -0.3633],
                [-0.0760, -3.6705,  2.4784]])



As we see, the column sizes must match for doing matrix multiplication.

In [12]:
torch.mm(features, weights.T)

tensor([[0.6424]])

Lets add some bias to this!

In [16]:
bias = torch.randn((1, 1))
bias

tensor([[0.8929]])

In [17]:
help(torch.sum)

Help on built-in function sum:

sum(...)
    sum(input, dtype=None) -> Tensor
    
    Returns the sum of all elements in the :attr:`input` tensor.
    
    Args:
        input (Tensor): the input tensor.
        dtype (:class:`torch.dtype`, optional): the desired data type of returned tensor.
            If specified, the input tensor is casted to :attr:`dtype` before the operation
            is performed. This is useful for preventing data type overflows. Default: None.
    
    Example::
    
        >>> a = torch.randn(1, 3)
        >>> a
        tensor([[ 0.1133, -0.9567,  0.2958]])
        >>> torch.sum(a)
        tensor(-0.5475)
    
    .. function:: sum(input, dim, keepdim=False, dtype=None) -> Tensor
    
    Returns the sum of each row of the :attr:`input` tensor in the given
    dimension :attr:`dim`. If :attr:`dim` is a list of dimensions,
    reduce over all of them.
    
    
    If :attr:`keepdim` is ``True``, the output tensor is of the same size
    as :attr:`input` 

### Explore torch.sum

In [18]:
a = torch.randn(4, 4)

In [19]:
a

tensor([[ 0.5372, -1.1006, -0.1742,  2.2311],
        [ 1.1747, -0.1617, -1.8551,  0.8190],
        [-1.2993,  1.3731,  1.2524,  0.1761],
        [ 0.8311, -1.0283,  0.9312,  0.4102]])

In [22]:
torch.sum(a, 1)

tensor([ 1.4936, -0.0230,  1.5022,  1.1442])

In [23]:
sum([ 0.5372, -1.1006, -0.1742,  2.2311])

1.4935

So, by giving 1 we are getting sum on row wise

In [24]:
torch.sum(a, 0)

tensor([ 1.2438, -0.9177,  0.1544,  3.6364])

In [25]:
sum([ 0.5372, 1.1747, -1.2993, 0.8311])

1.2437

So, by giving 0 we are getting sum on column wise

In [27]:
help(torch.arange)

Help on built-in function arange:

arange(...)
    arange(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor
    
    Returns a 1-D tensor of size :math:`\left\lceil \frac{\text{end} - \text{start}}{\text{step}} \right\rceil`
    with values from the interval ``[start, end)`` taken with common difference
    :attr:`step` beginning from `start`.
    
    Note that non-integer :attr:`step` is subject to floating point rounding errors when
    comparing against :attr:`end`; to avoid inconsistency, we advise adding a small epsilon to :attr:`end`
    in such cases.
    
    .. math::
        \text{out}_{{i+1}} = \text{out}_{i} + \text{step}
    
    Args:
        start (Number): the starting value for the set of points. Default: ``0``.
        end (Number): the ending value for the set of points
        step (Number): the gap between each pair of adjacent points. Default: ``1``.
        out (Tensor, optional): the output tensor.
   

In [30]:
help(a.view)

Help on built-in function view:

view(...) method of torch.Tensor instance
    view(*shape) -> Tensor
    
    Returns a new tensor with the same data as the :attr:`self` tensor but of a
    different :attr:`shape`.
    
    The returned tensor shares the same data and must have the same number
    of elements, but may have a different size. For a tensor to be viewed, the new
    view size must be compatible with its original size and stride, i.e., each new
    view dimension must either be a subspace of an original dimension, or only span
    across original dimensions :math:`d, d+1, \dots, d+k` that satisfy the following
    contiguity-like condition that :math:`\forall i = d, \dots, d+k-1`,
    
    .. math::
    
      \text{stride}[i] = \text{stride}[i+1] \times \text{size}[i+1]
    
    Otherwise, it will not be possible to view :attr:`self` tensor as :attr:`shape`
    without copying it (e.g., via :meth:`contiguous`). When it is unclear whether a
    :meth:`view` can be performe

In [26]:
b = torch.arange(4 * 5 * 6).view(4, 5, 6)
b

tensor([[[  0,   1,   2,   3,   4,   5],
         [  6,   7,   8,   9,  10,  11],
         [ 12,  13,  14,  15,  16,  17],
         [ 18,  19,  20,  21,  22,  23],
         [ 24,  25,  26,  27,  28,  29]],

        [[ 30,  31,  32,  33,  34,  35],
         [ 36,  37,  38,  39,  40,  41],
         [ 42,  43,  44,  45,  46,  47],
         [ 48,  49,  50,  51,  52,  53],
         [ 54,  55,  56,  57,  58,  59]],

        [[ 60,  61,  62,  63,  64,  65],
         [ 66,  67,  68,  69,  70,  71],
         [ 72,  73,  74,  75,  76,  77],
         [ 78,  79,  80,  81,  82,  83],
         [ 84,  85,  86,  87,  88,  89]],

        [[ 90,  91,  92,  93,  94,  95],
         [ 96,  97,  98,  99, 100, 101],
         [102, 103, 104, 105, 106, 107],
         [108, 109, 110, 111, 112, 113],
         [114, 115, 116, 117, 118, 119]]])

In [31]:
torch.sum(b, (2, 1))

tensor([ 435, 1335, 2235, 3135])

In [39]:
sigmoid(torch.mm(features, weights.T) + bias)

tensor([[0.9765]])

## Stack them up!

In [40]:
torch.manual_seed(7)                     # Set the random seed so things are predictable

features = torch.randn((1, 3))           # Features are 3 random normal variables

rows, cols = features.shape
n_inputs = cols                          # Number of input units
n_hidden = 2                             # Number of hidden units
n_output = 1                             # Number of output units

w1 = torch.randn((n_inputs, n_hidden))   # Weights of hidden layer
w2 = torch.randn((n_hidden, n_output))   # Weights of output layer

bias1 = torch.randn((1, n_hidden))       # Bias for hidden layer
bias2 = torch.randn((1, n_output))       # Bias for output layer

h1 = sigmoid(torch.mm(features, w1) + bias1) # Output hidden layer
h2 = sigmoid(torch.mm(h1, w2) + bias2)       # Output layer
h2

<torch._C.Generator at 0x7f58351aac90>

tensor([[0.3171]])

As we can see, the output of our small neural network is 0.3171

## Torch to numpy and back

In [46]:
a = np.random.rand(4,3)
a

array([[0.71631457, 0.42575967, 0.9241998 ],
       [0.27314966, 0.66047041, 0.3009574 ],
       [0.38655754, 0.98479923, 0.83085797],
       [0.02493058, 0.68285593, 0.07353881]])

In [47]:
b = torch.from_numpy(a)
b

tensor([[0.7163, 0.4258, 0.9242],
        [0.2731, 0.6605, 0.3010],
        [0.3866, 0.9848, 0.8309],
        [0.0249, 0.6829, 0.0735]], dtype=torch.float64)

In [48]:
b.numpy()

array([[0.71631457, 0.42575967, 0.9241998 ],
       [0.27314966, 0.66047041, 0.3009574 ],
       [0.38655754, 0.98479923, 0.83085797],
       [0.02493058, 0.68285593, 0.07353881]])