# <font style="color:rgb(50,120,229)">Linear Regression using Libtorch</font>

In this chapter, we will show an example of using a Neural Network for predicting housing prices. The same problem can be solved using a technique called **Linear Regression**. In this unit, we will use Libtorch to create a simple neural network for linear regression.

But before going into that, let's look at what Linear Regression is and the problem we want to solve.


# <font style="color:rgb(50,120,229)">What is Linear regression?</font>

<img src="https://www.learnopencv.com/wp-content/uploads/2018/02/cv4faces-mod10-ch2-linreg-example.png" width="600" />
Linear regression is a linear approach to model the relationship between two variables. For example, the values on the x-axis are independent variables ( normally referred to as Samples ), and the values on the y-axis are dependent variables ( also known as Target). In the figure above, there are 5 points. We want to find a straight line that will minimize the sum of all errors ( shown by arrows in the above figure ). We want to find the slope of the line with the least error. Once we are able to model the given data points, we can predict the value on the y-axis for a new point on the x-axis.

We will learn how to create a simple network with a single layer to perform linear regression. We will use the [**Boston Housing dataset**](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html) as an example. Samples contain 13 attributes of houses at different locations around the Boston suburbs in the late 1970s. Some example attributes are the average number of rooms, crime rate, etc. You can find the complete list of attributes [**here**](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html).

The `13` attributes become our `13`-dimensional independent variable. Targets are the median values of the houses at a location (in `k$`). With the `13` features, we have to train the model, which would predict the price of the house in the validation data.

A schematic diagram of the network we want to create is given below
<img src="https://www.learnopencv.com/wp-content/uploads/2019/12/regression-keras-schema.png" width="400" />

The purpose of training is to find the weights (`w0` to `w12`) and bias (`b`) for which the network produces the correct output by looking at the input data. We say that the network is trained when the error between the predicted output and ground truth becomes very low and does not decrease further. We can then use these weights to predict the output for any new data.

The network consists just one neuron. We use the libtorch Module API to create the network graph. Then we add a Dense layer with the number of inputs equal to the number of features in the data (`13` in this case) and a single output. Then we follow the workflow as explained in the previous section, i.e. We compile the model and train it as we discussed in previous section.



# <font style = "color:rgb(50,120,229)">Import Libraries</font>¶

Libtorch is a C++ library, which has an underlying foundation of PyTorch. For using the library, we will include "`torch/torch.h`" header file. This will include all functions and classes implemented in LibTorch. This is not the ideal, but we need not worry about efficiency for the time being.

```cpp
#include <stdint.h>
#include <torch/torch.h>
#include <iostream>
#include <string>
#include <vector>
#include "CSVReader.h"
```

Next, we define some constants which are used later during training:
- `trainBatchSize`: Number of training examples used in one iteration.
- `testBatchSize`: Number of test examples used in one iteration.
- `epochs`: This is the number of times the algorithm sees the entire dataset during training.
- `logInterval`: Number of iteration after which the training stats should print.
- `datasetPath`: Path of the data file (BostonHousing.csv)
- `device`: Specify whether we have to use CPU or GPU. By default, it takes CPU. As we saw in the sample code, we can check whether GPU is available. Accordingly, we can update torch device.

```cpp
struct Options {
  size_t trainBatchSize = 4;
  size_t testBatchSize = 100;
  size_t epochs = 1000;
  size_t logInterval = 20;
  // path must end in delimiter
  std::string datasetPath = "./BostonHousing.csv";
  // For CPU use torch::kCPU and for GPU use torch::kCUDA
  torch::DeviceType device = torch::kCPU;
};

static Options options;
```

# <font style = "color:rgb(50,120,229)">Data Pre-processing</font>

Let us first define the pre-processing done on the data. 

We perform normalization on the features. It is also called **Scaling** and is required when different features have different ranges. 

For example, consider a data set containing two features - 
- the number of bedrooms, `x1`, and 
- the cost of the flat (in `$`), `x2`. 

`x1` may have range `1-4`, whereas `x2` may have range `100,000-1,000,000`. If we do not normalize the data, the cost feature will have more influence on the learned model, which eventually means that the cost feature is a much more important feature than the number of bedrooms, which may or may not be True. So, we will normalize each feature between `0-1` and let the model decide the prominent feature during training.

We will carry out min-max normalization as the data pre-processing step so that all the feature values lie between `0` and `1`.

We will read values from a csv file as a string and convert it to float. For each feature, we will find the maximum and the minimum value. Finally, to normalize the feature, we will subtract the minimum value and then divide by the maximum range of value.

```cpp
std::vector<std::vector<float>> normalize_feature(std::vector<std::vector<std::string> > feat, int rows, int cols) {
  std::vector<float> input(cols, 1);
  std::vector<std::vector<float>> data(rows, input);

  for (int i = 0; i < cols; i++) {   // each column has one feature
      // initialize the maximum element with 0 
      // std::stof is used to convert string to float
      float maxm = std::stof(feat[1][i]);
      float minm = std::stof(feat[1][i]);

      // Run the inner loop over rows (all values of the feature) for given column (feature) 
      for (int j = 1; j < rows; j++) { 
          // check if any element is greater  
          // than the maximum element 
          // of the column and replace it 
          if (std::stof(feat[j][i]) > maxm) 
              maxm = std::stof(feat[j][i]);

          if (std::stof(feat[j][i]) < minm)
              minm = std::stof(feat[j][i]); 
      } 
      
      // From above loop, we have min and max value of the feature
      // Will use min and max value to normalize values of the feature
      for (int j = 0; j < rows-1; j++) { 
        // Normalize the feature values to lie between 0 and 1
        data[j][i] = (std::stof(feat[j+1][i]) - minm)/(maxm - minm); 
      }
  }

  return data;
}
```

# <font style = "color:rgb(50,120,229)">Create Dataset</font>
We read the data from the csv file - `BostonHousing.csv`. Then, We split the data into two parts - training and validation (will not be used for training). We will split the data into training and validation set with an `80:20` ratio. 

We normalize the features using the function defined above and then shuffle the data before passing it to the training process.

The model will train on the training dataset, and the model evaluation will be on the validation dataset.

```cpp
// Define Data to accomodate pairs of (input_features, output)
using Data = std::vector<std::pair<std::vector<float>, float>>;

std::pair<Data, Data> readInfo() {
  Data train, test;

  // Reads data from CSV file.
  // CSVReader class is defined in CSVReader.h header file
  CSVReader reader(options.datasetPath);
  std::vector<std::vector<std::string> > dataList = reader.getData();


  int N = dataList.size();      // Total number of data points
  // As last column is the output, feature size will be number of column minus one.
  int fSize = dataList[0].size() - 1;
  std::cout << "Total number of features: " << fSize << std::endl;
  std::cout << "Total number of data points: " << N << std::endl;
  int limit = 0.8*N;    // 80 percent data for training and rest 20 percent for validation
  std::vector<float> input(fSize, 1);
  std::vector<std::vector<float>> data(N, input);

  // Normalize data
  data = normalize_feature(dataList, N, fSize);


  for (int i=1; i < N; i++) {
    for (int j= 0; j < fSize; j++) {
        input[j] = data[i-1][j];
    }

    float output = std::stof(dataList[i][fSize]);

    // Split data data into train and test set
    if (i <= limit) {
      train.push_back({input, output});
    } else {
      test.push_back({input, output});
    }
  }

  std::cout << "Total number of training data: " << train.size() << std::endl;
  std::cout << "Total number of test data: " << test.size() << std::endl;

  // Shuffle training data
  std::random_shuffle(train.begin(), train.end());

  return std::make_pair(train, test);
}
```

## <font style = "color:rgb(50,120,229)">Load Dataset using Torch</font>
Pytorch has a few in-built datasets which can be directly loaded. For custom datasets, we can write custom functions to read our data.

Here, we have written a custom data loader to read CSV file - `CSVReader.h`. It reads a line from the file and stores it as a string. The whole csv file can then be read as a vector of strings.

```cpp
/*
 * A class to read data from a csv file.
 */
class CSVReader
{
  std::string fileName;
  std::string delimeter;
 
public:
  CSVReader(std::string filename, std::string delm = ",") :
      fileName(filename), delimeter(delm)
  { }
 
  // Function to fetch data from a CSV File
  std::vector<std::vector<std::string> > getData();
};
 
/*
* Parses through csv file line by line and returns the data
* in vector of vector of strings.
*/
std::vector<std::vector<std::string> > CSVReader::getData()
{
  std::ifstream file(fileName);
 
  std::vector<std::vector<std::string> > dataList;
 
  std::string line = "";
  // Iterate through each line and split the content using delimeter
  while (getline(file, line))
  {
    std::vector<std::string> vec;
    boost::algorithm::split(vec, line, boost::is_any_of(delimeter));
    dataList.push_back(vec);
  }
  // Close the File
  file.close();
 
  return dataList;
}
```

# <font style = "color:rgb(50,120,229)">Neural Network</font>
Let's create the model that we will use to solve the linear regression problem.

Neural networks based on the C++ frontend are composed of reusable building blocks called modules.

[**`torch::nn::Module`**](https://pytorch.org/docs/stable/nn.html) is the base module class from which all other modules are derived. Each module can contain subobjects like parameters, buffers, and submodules. 

These subobjects are explicitly registered. For this linear regression problem, we will need to register a dense layer (`torch::nn::Linear`) 

We also need to implement the forward pass for the network. 

```cpp
// Linear Regression Model
struct Net : torch::nn::Module {
  /*
  Network for Linear Regression is just a single neuron (i.e. one Dense Layer) 
  Usage: auto net = std::make_shared<Net>(num_features, num_outputs) 
  */
  Net(int num_features, int num_outputs) {
    neuron = register_module("neuron", torch::nn::Linear(num_features, num_outputs));
    }

  torch::Tensor forward(torch::Tensor x) {
    /*Convert row tensor to column tensor*/
    x = x.reshape({x.size(0), -1});
    /*Pass the input tensor through linear function*/
    x = neuron->forward(x);
    return x;
  }

  /*Initilaize the constructor with null pointer. More details given in the reference*/
  torch::nn::Linear neuron{ nullptr };
};
```

## <font style = "color:rgb(50,120,229)">Train the Model</font>
Training data is passed in a set of batches, and mean square error loss function is used to calculate the loss. Then the loss function is passed through a stochastic gradient descent optimizer with learning rate 0.001.

```cpp
template <typename DataLoader>
void train(std::shared_ptr<Net> network, DataLoader& loader, torch::optim::Optimizer& optimizer, 
           size_t epoch, size_t data_size) {
  size_t index = 0;
  /*Set network in the training mode*/
  network->train();
  float Loss = 0;

  for (auto& batch : loader) {
    auto data = batch.data.to(options.device);
    auto targets = batch.target.to(options.device).view({-1});
    // Execute the model on the input data
    auto output = network->forward(data);

    //Using mean square error loss function to compute loss
    auto loss = torch::mse_loss(output, targets);

    // Reset gradients
    optimizer.zero_grad();
    // Compute gradients
    loss.backward();
    //Update the parameters
    optimizer.step();

    Loss += loss.template item<float>();

    if (index++ % options.logInterval == 0) {
      auto end = std::min(data_size, (index + 1) * options.trainBatchSize);

      std::cout << "Train Epoch: " << epoch << " " << end << "/" << data_size
                << "\t\tLoss: " << Loss / end << std::endl;
    }
  }
}
```

## <font style = "color:rgb(50,120,229)">Test the Model</font>
Similar to the above, the test data is passed through the trained network to evaluate the model.  Loss is calculated on test data. Sample output at index 5 is printed along with its 

```cpp
template <typename DataLoader>
void test(std::shared_ptr<Net> network, DataLoader& loader, size_t data_size) {
  network->eval();

  for (const auto& batch : loader) {
    auto data = batch.data.to(options.device);
    auto targets = batch.target.to(options.device).view({-1});

    auto output = network->forward(data);
    std::cout << "Predicted:"<< output[0].template item<float>() << "\t" << "Groundtruth: "
            << targets[1].template item<float>() << std::endl;
    std::cout << "Predicted:"<< output[1].template item<float>() << "\t" << "Groundtruth: "
            << targets[1].template item<float>() << std::endl;
    std::cout << "Predicted:"<< output[2].template item<float>() << "\t" << "Groundtruth: "
            << targets[2].template item<float>() << std::endl;
    std::cout << "Predicted:"<< output[3].template item<float>() << "\t" << "Groundtruth: "
            << targets[3].template item<float>() << std::endl;
    std::cout << "Predicted:"<< output[4].template item<float>() << "\t" << "Groundtruth: "
            << targets[4].template item<float>() << std::endl;

    auto loss = torch::mse_loss(output, targets);

    break;
  }
}
```

## <font style = "color:rgb(50,120,229)">Main Function</font>
Main function contains following steps:
- **Data Processing**
    - Read the data from csv file, normalize the data and split into train and test data.
- **Data Loader**
    - This provides options for batch size, number of workers to be used to speed up the data loading.
- **Model Initialization**
    - Define network parameters
- **Training**
    - Call the train function epoch number of times and observe the loss values.
- **Testing**
    - Call the test function in each epoch and observe the loss values.

```cpp
int main() {
  /*Sets manual seed from libtorch random number generators*/
  torch::manual_seed(1);

  /*Use CUDA for computation if available*/
  if (torch::cuda::is_available())
    options.device = torch::kCUDA;
  std::cout << "Running on: "
            << (options.device == torch::kCUDA ? "CUDA" : "CPU") << std::endl;

  /*Read data and split data into train and test sets*/
  auto data = readInfo();

  /*Uses Custom Dataset Class to load train data. Apply stack collation which takes 
  batch of tensors and stacks them into single tensor along the first dimension*/
  auto train_set =
      CustomDataset(data.first).map(torch::data::transforms::Stack<>());
  auto train_size = train_set.size().value();

  /*Data Loader provides options to speed up the data loading like batch size, number of workers*/
  auto train_loader =
      torch::data::make_data_loader(
          std::move(train_set), options.trainBatchSize);

  std::cout << train_size << std::endl;
  /*Uses Custom Dataset Class to load test data. Apply stack collation which takes 
  batch of tensors and stacks them into single tensor along the first dimension*/
  auto test_set =
      CustomDataset(data.second).map(torch::data::transforms::Stack<>());
  auto test_size = test_set.size().value();

  /*Test data loader similar to train data loader*/
  auto test_loader =
      torch::data::make_data_loader(
          std::move(test_set), options.testBatchSize);
  /*Create Linear  Regression Network*/
  auto net = std::make_shared<Net>(13, 1);

  /*Moving model parameters to correct device*/
  net->to(options.device);

  /*Using stochastic gradient descent optimizer with learning rate 0.000001*/
  torch::optim::SGD optimizer(
       net->parameters(), torch::optim::SGDOptions(0.000001));

  std::cout << "Training..." << std::endl;
  for (size_t i = 0; i < options.epochs; ++i) {
    /*Run the training for all iterations*/
    train(net, *train_loader, optimizer, i + 1, train_size);
    std::cout << std::endl;

    if (i == options.epochs - 1) {
      std::cout << "Testing..." << std::endl;
      test(net, *test_loader, test_size);
    }
  }


  return 0;
}
```

# <font style="color:blue">Steps to Compile and Run the Code on Google Colab</font>

## <font style="color:green">Download LibTorch</font>

In [None]:
!wget https://download.pytorch.org/libtorch/cu101/libtorch-shared-with-deps-1.3.1.zip -O libtorch.zip

In [None]:
!unzip libtorch.zip

In [None]:
!rm -r libtorch.zip

## <font style="color:green">Download Code</font>

In [None]:
!wget "https://www.dropbox.com/s/rx6zsq5wmc18p2p/linear-regression.zip?dl=1" -O linear-regression.zip

In [None]:
!unzip linear-regression.zip

In [None]:
import os
os.chdir("linear-regression")

## <font style="color:green">Compile</font>

In [None]:
!cmake -DCMAKE_PREFIX_PATH=$PWD/../libtorch .

In [None]:
!make

## <font style="color:green">Run </font>

In [None]:
!./lregression

## <font style = "color:rgb(50,120,229)">References</font>

https://pytorch.org/tutorials/advanced/cpp_frontend.html

https://github.com/pytorch/examples/blob/master/cpp/custom-dataset/custom-dataset.cpp