<a href="https://colab.research.google.com/github/squeze/my_udacity_deep_learning_solutions/blob/master/intro-neural-networks/gradient_descent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Implementing the gradient descent algorithm


This notebook is based on the udacity deep learning nanodegree exercise for gradient descent, which can be found here:

https://github.com/udacity/deep-learning-v2-pytorch/blob/master/intro-neural-networks/gradient-descent/GradientDescent.ipynb

The original version is implemented with python and numpy, I try to implement it with swift-only as an exercise to learn swift.

## Loading dataset from github
The original dataset is located here:

https://raw.githubusercontent.com/udacity/deep-learning-v2-pytorch/master/intro-neural-networks/gradient-descent/data.csv

In [1]:
import Foundation

let url = "https://raw.githubusercontent.com/udacity/deep-learning-v2-pytorch/master/intro-neural-networks/gradient-descent/data.csv"

// author of this query function: https://gist.github.com/groz/85b95f663f79ba17946269ea65c2c0f4
func query(address: String) -> String {
    let url = URL(string: address)
    let semaphore = DispatchSemaphore(value: 0)
    
    var result: String = ""
    
    let task = URLSession.shared.dataTask(with: url!) {(data, response, error) in
        result = String(data: data!, encoding: String.Encoding.utf8)!
        semaphore.signal()
    }
    
    task.resume()
    semaphore.wait()
    return result
}

let rawData = query(address: url)
let rows = rawData.components(separatedBy: "\n")
let featuresAndTargets = rows.map({ $0.components(separatedBy: ",") })
print(featuresAndTargets)

[["0.78051", "-0.063669", "1"], ["0.28774", "0.29139", "1"], ["0.40714", "0.17878", "1"], ["0.2923", "0.4217", "1"], ["0.50922", "0.35256", "1"], ["0.27785", "0.10802", "1"], ["0.27527", "0.33223", "1"], ["0.43999", "0.31245", "1"], ["0.33557", "0.42984", "1"], ["0.23448", "0.24986", "1"], ["0.0084492", "0.13658", "1"], ["0.12419", "0.33595", "1"], ["0.25644", "0.42624", "1"], ["0.4591", "0.40426", "1"], ["0.44547", "0.45117", "1"], ["0.42218", "0.20118", "1"], ["0.49563", "0.21445", "1"], ["0.30848", "0.24306", "1"], ["0.39707", "0.44438", "1"], ["0.32945", "0.39217", "1"], ["0.40739", "0.40271", "1"], ["0.3106", "0.50702", "1"], ["0.49638", "0.45384", "1"], ["0.10073", "0.32053", "1"], ["0.69907", "0.37307", "1"], ["0.29767", "0.69648", "1"], ["0.15099", "0.57341", "1"], ["0.16427", "0.27759", "1"], ["0.33259", "0.055964", "1"], ["0.53741", "0.28637", "1"], ["0.19503", "0.36879", "1"], ["0.40278", "0.035148", "1"], ["0.21296", "0.55169", "1"], ["0.48447", "0.56991", "1"], ["0.25476",

## Sigmoid activation function
$$\sigma(x) = \frac{1}{1+e^{-x}}$$

Swift is strongly typed, types for input/output must be given. Sigmoid must handle a tensor of floating point values. Choose double.

Swift uses local and external parameter names. To avoid giving the name of the parameter by calling the function, use _ for the external name.

In [0]:
import TensorFlow

func mySigmoid(_ x: Tensor<Double>) -> Tensor<Double> {
  return 1 / (1 + exp(-x))
}

### Test
Test with original tensorflow sigmoid function

In [3]:
let original = sigmoid(Tensor([1.0]))
let myResult = mySigmoid(Tensor([1.0]))
print(original[0], myResult[0])
assert(original[0] == myResult[0])

0.7310585786300049 0.7310585786300049


## Output (prediction) formula
$$\hat{y} = \sigma(w_1 x_1 + w_2 x_2 + b)$$

In [0]:
func myOutputFormula(_ inputs: Tensor<Double>, _ weights: Tensor<Double>, _ bias: Tensor<Double>) -> Tensor<Double> {
  return mySigmoid(matmul(inputs, weights) + bias)
}

### Test

**Input**: 3 rows with 2 featues each, features arranged in columns, one row is one sample, shape is [3,2]

**Nodes**: let nodes in the neural net be 2 within a hidden layer  (not necessary for the multiplication here, but only for the example data)

**Weights**: 3 input row * 2 nodes = 6 weights are needed, shape is [2,3]

**Bias**: a bias is broadcasted for each node of the n-1 layer, shape is [3,1]

**Result**: 3x2-matrix multiplied with 2x3-matrix results in 3x3-matrix

**Calculation**:

*   matrix multiplication

>$0.1 * 1 + 1.1 * 4 = 4.5$

>$0.1 * 2 + 1.1 * 5 = 5.7$

>$0.1 * 3 + 1.1 * 6 = 6.9$

>$0.2 * 1 + 2.2 * 4 = 9$

>$0.2 * 2 + 2.2 * 5 = 11.4$

>$0.2 * 3 + 2.2 * 6 = 13.8$

>$0.3 * 1 + 3.3 * 4 = 13.5$

>$0.3 * 2 + 3.3 * 5 = 17.1$

>$0.3 * 3 + 3.3 * 6 = 20.7$

$$
\left[\begin{array}{cc} 
0.1 & 1.1\\
0.2 & 2.2 \\
0.3 & 3.3
\end{array}\right]
\left[\begin{array}{cc} 
1 & 2 & 3\\ 
4 & 5 & 6
\end{array}\right]
=
\left[\begin{array}{cc} 
4.5 & 5.7 & 6.9\\
9 &11.4 & 13.8\\
13.5 & 17.1 & 20.7
\end{array}\right]
$$

* broadcast bias

>As described in [1] adding a bias vector to a matrix is allowed and is called broadcasting

$$
\left[\begin{array}{cc} 
4.5 & 5.7 & 6.9\\
9 &11.4 & 13.8\\
13.5 & 17.1 & 20.7
\end{array}\right]
+
\left[\begin{array}{cc}
0.01 \\
0.02 \\
0.03
\end{array}\right]
=
\left[\begin{array}{cc} 
4.51 & 5.72 & 6.93\\
9.01 &11.42 & 13.83\\
13.51 & 17.12 & 20.73
\end{array}\right]
$$

* sigmoid function

>$\sigma(4.51) = 0.9891211899829261$


$$\sigma\left[\begin{array}{cc} 
4.51 & 5.72 & 6.93\\
9.01 &11.42 & 13.83\\
13.51 & 17.12 & 20.73
\end{array}\right]
=
\left[\begin{array}{cc} 
0.9891211899829261 & 0.9967310104383614 & 0.9990229546827732 \\
 0.9998778330705631 & 0.9999890263210334 & 0.9999990143859467\\
 0.9999986426840268 & 0.9999999632820475 & 0.9999999990067114
\end{array}\right]$$

[1] http://www.deeplearningbook.org/contents/linear_algebra.html, page 32

In [0]:
let features = Tensor<Double>([[0.1, 1.1], [0.2, 2.2], [0.3, 3.3]])
let weights = Tensor<Double>([[1, 2, 3], [4, 5, 6]])
let bias = Tensor<Double>([0.01, 0.02, 0.03])

print(features.shape)
print(weights.shape)
print(bias.shape)

let myOutput = myOutputFormula(features, weights, bias)
print(myOutput)
print(myOutput.shape)

[3, 2]
[2, 3]
[3]
[0.9891211899829261]
[[0.9891211899829261, 0.9967310104383614, 0.9990229546827732],
 [0.9998778330705631, 0.9999890263210334, 0.9999990143859467],
 [0.9999986426840268, 0.9999999632820475, 0.9999999990067114]]
[3, 3]


## Error function

Log-loss (in this case equals to Cross-Entroy) for binary classification
$$Error(y, \hat{y}) = - y \log(\hat{y}) - (1-y) \log(1-\hat{y})$$

In [0]:
func myErrorFormula(_ y: Double, _ ŷ: Double) -> Double {
  return -(y * log(ŷ)) - ((1-y) * log(1-ŷ))
}

### Test

y: target value of each input data e.g. 0, 1, 0

ŷ: prediction e.g. 0.7, 0.4, 0.1


In [16]:
let ŷ = 0.9
let y = 1.0
myErrorFormula(y, ŷ)

0.10536051565782628
