# Complete - Linear Regression

Simple linear regression (Width, Height, Sex) with multi-variable and categories.

Dataset with Height, Weight, Sex statistics from: 

https://raw.githubusercontent.com/Dataweekends/zero_to_deep_learning_video/master/data/weight-height.csv

**Swift with SciKit Learn MinMax normalization**

Use Python/Pandas to import the dataset Use SciKit Learn to normalize values with MinMax scaler
Based on https://github.com/JacopoMangiavacchi/Swift-TensorFlow-Sample-Notebooks

## Imports

In [0]:
import Python
import TensorFlow

## Setting up

In [0]:
let numpy = Python.import("numpy")
let pandas = Python.import("pandas")
let io = Python.import("io")
let requests = Python.import("requests")
let preprocessing = Python.import("sklearn.preprocessing")

## Getting a dataset

We've got a helper function to get a Numpy normalised dataset. It uses the Python requests and pandas library to download and read the CSV file for the data, as well as SKLearn's Prepocessing library and numpy arrays. Lots of Python!

In [0]:
func getNumpyNormalizedDataset() -> (PythonObject, PythonObject) 
{
    let url="https://raw.githubusercontent.com/Dataweekends/zero_to_deep_learning_video/master/data/weight-height.csv"
    let s = requests.get(url).content
    let df = pandas.read_csv(io.StringIO(s.decode("utf-8")))

    let dummies = pandas.get_dummies(df[["Gender"]])
    let transformed = pandas.concat([df[["Height", "Weight"]], dummies], 1)
    print(transformed)

    let X = transformed[["Height","Gender_Female","Gender_Male"]].values
    let Y = transformed[["Weight"]].values

    let scaler = preprocessing.MinMaxScaler()
    let xNP = numpy.array(scaler.fit_transform(X))
    let yNP = numpy.array(scaler.fit_transform(Y))  
    
    return (xNP, yNP)
}

## Creating the model

As usual, we need to create a `struct` to represent our model, adhering to the  [`Layer` Protocol](https://www.tensorflow.org/swift/api_docs/Protocols/Layer).

Since this is a bit of a contrived example, we actually only need layer (a [`Dense` layer](https://www.tensorflow.org/swift/api_docs/Structs/Dense)) that takes an `inputSize` and an `outputSize`, and is activated with [`identity`](https://www.tensorflow.org/swift/api_docs/Functions.html#identity_:). We use `identity` because we just want it to output a linear function of input.

We create an initialiser, because we need to be able to take a variable number of variables. The default is 1. Inside the intitialiser, we define the layer.

We'll also need to provide a definition of our `@differentiable` `func`, `callAsFunction()`. In this case, we want it to return the `input` passed through the single layer.




In [0]:
struct LinearRegression: Layer 
{
    var layer: Dense<Float>
    init(variables: Int = 1) 
    {
        layer = Dense<Float>(inputSize: variables, outputSize: 1, activation: identity)
    }

    @differentiable func callAsFunction(_ input: Tensor<Float>) -> Tensor<Float>
    {
       return layer(input)
    }
}

## Load our dataset 

We need to get some x and y data, each in the form of a `PythonObject`, using the helper function we sefined `getNumpyNormalizedDataset()`.


In [0]:
let (xNP, yNP) = getNumpyNormalizedDataset()

We also need to create arrays for each:

In [0]:
let xArray = xNP.tolist().flatMap{ $0.map{ Float($0)! }}
let yArray = yNP.tolist().flatMap{ $0.map{ Float($0)! }}

And then a native Swift for TensorFlow `Tensor`, for each of them:

In [0]:
let x = Tensor<Float>(shape: [10000, 3], scalars: xArray)
let y = Tensor<Float>(shape: [10000, 1], scalars: yArray)

## Creating an instance of our model

We want a 3 variable instance of our model:

In [0]:
var model = LinearRegression(variables: 3)

## Creating an optimizer

We'll need an optimizer. SGD will do fine here:



In [0]:
let optimizer = SGD(for: model, learningRate: 0.03)

## Training the model

First, we need a hyperparameter for epochs:

In [0]:
let epochs = 2000

Then we need a training loop. 

For each epoch that we train, we:

* calculate the cost and the gradient, and return the error using `meanSquaredError()` between the predicted and the expected
* update the model's optimizer along the gradient 𝛁
* occasionally print out the current epoch and cost

In [0]:
for epoch in 1...epochs {
    let (cost, 𝛁model) = model.valueWithGradient { m -> Tensor<Float> in
        let ŷ = m(x)
        return meanSquaredError(predicted: ŷ, expected: y)
    }
    optimizer.update(&model, along: 𝛁model)
  
    if epoch % 100 == 0 {
        print("Epoch: \(epoch) Cost: \(cost)")
    }
}

## Testing the model

In [0]:
print(model.inferring(from:[[0.7, 0, 1]])) //Height, Female, Male
// [[0.66004163]]

[[0.66004163]]
