![](images/codete_workshops.jpg)

# KharkivJS

During this session, we go through a few JavaScript libraries and frameworks that are used for machine learning. This notebook is divided into three sections:
- data preparation and cleanup,
- scikit-learn as the most popular library for machine learning,
- building a linear regression method with tensorflow,
- a neural network with keras.

## Environment

Feel free to use docker image to setup the environment with Jupyter and nodejs kernel installed from
[https://hub.docker.com/r/kprzystalski/jsml/](https://hub.docker.com/r/kprzystalski/jsml/). You can get the notebooks from github: [https://github.com/codete/kharkivjs](https://github.com/codete/kharkivjs).


## Introduction to machine learning

Before we go to the next steps, it's important to explain how machine learning works. Please take a look on the figure:

![](images/machine_learning_1.png)





## Data preparation and cleanup

As we want to use machine learning, we need some data to aquire some data for the training. There are some libraries like scikit-learn that provides a few popular datasets. You can use it for learning, but for most cases you would like to load your own data. 

We have two types of such objects that makes the work with dataset easier: Series and DataFrame. You can think about Series as a lite version of DataFrames. It is a vector of data. DataFrames are much more complex and you can consider it as a matrix object.

There are a few libraries that allow us to work with DataFrames:
- [Pandas](https://github.com/StratoDem/pandas-js)
- [DataFrameJS](https://www.npmjs.com/package/dataframe-js)
- [Recline](http://okfnlabs.org/recline/)
- [DataForgeJS](https://github.com/data-forge/data-forge-js)

The most popular is pandas, it's also used in Python and is "forked" from Python to JavaScript. Let's  import Series and DataFrame objects from PandasJS:

In [2]:
var Series = require('pandas-js').Series;
var DataFrame = require('pandas-js').DataFrame;

A series can be build as follows:

In [3]:
var series1 = new Series([1, 2, 3, 4], {name: 'A Series', index: [2, 3, 4, 5]})

It can be printed:

In [4]:
series1

Series {
  _data: [ 1, 2, 3, 4 ],
  _axes: { '0': List [ 2, 3, 4, 5 ] },
  _AXIS_ORDERS: List [ 0 ],
  _values: List [ 1, 2, 3, 4 ],
  _dtype: DType { _name: 'int' },
  _name: 'A Series',
  _AXIS_LEN: 1,
  _sort_ascending: [Function: bound _sort_ascending],
  _sort_descending: [Function: bound _sort_descending] }

Similary, we can create a new DataFrame:

In [12]:
var dataframe1 = new DataFrame([[1,2,3],[3,4,5]], {name: 'A DataFrame', index: [6,7]})

It can be printed in the same way as Series:

In [13]:
dataframe1

DataFrame {
  _data: Map { "0": 6	1
7	3
Name: 0, dtype: dtype(int), "1": 6	2
7	4
Name: 1, dtype: dtype(int), "2": 6	3
7	5
Name: 2, dtype: dtype(int) },
  _axes: { '0': List [ 6, 7 ], '1': Seq [ "0", "1", "2" ] },
  _AXIS_ORDERS: List [ 0, 1 ],
  _values: null,
  _AXIS_LEN: 2 }

As we can see, DataFrame ia a list of lists:

In [17]:
dataframe1.to_json()

{ '0': { '6': 1, '7': 3 },
  '1': { '6': 2, '7': 4 },
  '2': { '6': 3, '7': 5 } }

In [18]:
dataframe1.toString()

'\t|  0  |  1  |  2  |\n--------------------\n6\t|  1  |  2  |  3  |\n7\t|  3  |  4  |  5  |\n'

## Data manipulation

We can easily manipulate with data in a DataFrame. There are some methods that can be used in Pandas like filtering the data. Unfortunately, the implementations of DataFrames are still a bit limited, but not all.

In [40]:
var df1 = new DataFrame([{x: 1, y: 3}, {x: 2, y: 4}, {x: 3, y: 5}]);

In [58]:
df1.filter(df1.get('x').gt(1)).to_json()

{ x: { '1': 2, '2': 3 }, y: { '1': 3, '2': 4 } }

In [42]:
df1

DataFrame {
  _data: Map { "x": 0	1
1	2
2	3
Name: x, dtype: dtype(int), "y": 0	2
1	3
2	4
Name: y, dtype: dtype(int) },
  _axes: { '0': List [ 0, 1, 2 ], '1': Seq [ "x", "y" ] },
  _AXIS_ORDERS: List [ 0, 1 ],
  _values: null,
  _AXIS_LEN: 2 }

In [51]:
dataframe1.filter(dataframe1.get('0').gt(1)).to_json()

{ '0': { '7': 3 }, '1': { '7': 4 }, '2': { '7': 5 } }

### Limitations

There are many libraries available for DataFrames, more than in many other languages. Some are ported/cloned from Python, but many are still limited to the functionality like Pandas. It's not a clone of the original ones. The other libraries also have limitations.


### Data visualization

There are plenty of visualization libraries available for JavaScript. Some can be used to visualize the data from DataFrames. It is possible, because we can get the data from a DataFrame as a List of JSON. A few libraries that can be used for data visualization are:
- [D3.js](https://d3js.org/)
- [Recharts](https://github.com/recharts/recharts)
- [React-vis](https://github.com/uber/react-vis)

There are many many more available. This is a huge advantage of JavaScript compared to Python.

## Machine learning with Scikit-learn

Scikit-learn is one of the most popular library for machine learning. It used widely almost everywhere and because of that it was ported to JavaScript, but it lacks of functionality. There are two ports of this library: jskit-learn and scikit-learn, but the first one was last time updated about **a year ago**, and the second **five years ago**.


## Tensorflow
 
First things first: tensors. Those are N-dimensional matrices (usually 2D/3D) constituting building blocks of framework's computation.

![aa](images/tensorflow.png)

Graph can be divided into two parts:
ops (operations) - graph nodes, describing calculations consuming and producing tensors

tensors - graph edges, represent values flowing through operations
For it to be evaluated we need to run it inside session, which is a mechanism used for running and providing values to graph.

Before it, please notice the type is inferred from the value passed to it. If we didn't specify tf.float32 it would be tf.int64 and those types are incompatible in Tensorflow.

Before moving on we have to introduce other Tensor types provided by the framework:

- tf.placeholder: placeholder for the value we will input later and turn into concrete value. Good analogy could be y = f(x), where x only represents some value.
- tf.Variable: unlike tf.Constant, this tensor can change it's value during session execution. They have to be initialized (in high level APIs it is done for you).

In [1]:
const tf = require('@tensorflow/tfjs');

require('@tensorflow/tfjs-node');

{ io: 
   { NodeFileSystem: { [Function: NodeFileSystem] URL_SCHEME: 'file://' } },
  version: '0.1.13' }



Let's add two variable with Tensorflow:

In [2]:
let var1 = tf.tensor2d([[1, 2]]);
let var2 = tf.tensor2d([[3], [4]]);
var1.matMul(var2).print();

Tensor
     [[11],]


Tensorflow has a few levels of API:

![](images/tf_estimator_apis.png)

Let's imagine we have a simple linear regression to solve: $y=ax+b$. In Tensorflow it looks like following:

In [3]:
const trainX = [1.2, 1.5, 2.4, 3.5];
const trainY = [1.5, 1.4, 2.0, 3.1];

const a = tf.variable(tf.scalar(Math.random()));
const b = tf.variable(tf.scalar(Math.random()));

The prediction function would be exactly the regression function:

In [4]:
function predict(x) { 
  return tf.tidy(function() {
    return a.mul(x).add(b);
  });
}

Loss function is a function that needs to be used check how good our classifier is. 

In [5]:
function loss(prediction, actualValues) { 
   const error = prediction.sub(actualValues).square().mean(); 
   return error; 
}

The higher the learning rate is, there faster we find the best fit, but it cannot be too high because of overfitting.

In [6]:
const learningRate = 0.01;
const optimizer = tf.train.sgd(learningRate);

Training is to monimize the loss function:

In [7]:
function train() { 
  optimizer.minimize(function() { 
    const predictions = predict(tf.tensor1d(trainX)); 
    stepLoss = loss(predictions, tf.tensor1d(trainY))   
    return stepLoss; 
  }); 
}

Check if it works:

In [8]:
train()

It's time to check the prediction of our model:

In [15]:
var predicted_value = predict([2.0])

In [16]:
predicted_value.get([0])

2.40226149559021

## Neural networks with Keras

Keras is a framework that base on protobuf and is used to build neural networks layer by layer.

![](images/autoencoders.png)

It loads the models that you can save using Keras in Python and use it in JavaScript. To load the data we need to save the model in Python first and load the framework as below:

In [17]:
const KerasJS = require('keras-js')

In [26]:
var kerasmodel = new KerasJS.Model({
  filepath: '/home/codete/workshop/kerasmodel.bin',    
  filesystem: true,
  headers: {},    
  transferLayerOutputs: false,
  pauseAfterLayerCalls: false,
  visualizations: []    
})

## Summary

There are a few points that we should know after this session:
- JavaScript is rarely used for building a model,
- it can be easily used for prediction,
- it has support for GPU,
- JavaScript is not the first choice for machine learning,
- there is still a lot of work to do in this area.



## Where to go next ...

Find me at O'Reilly live trainings:

-[Sentiment analysis for chatbots in Python](https://www.safaribooksonline.com/live-training/courses/sentiment-analysis-for-chatbots-in-python/0636920185413/)

-[Building intelligent bots in Python](https://www.safaribooksonline.com/live-training/courses/building-intelligent-bots-in-python/0636920185390)

Drop me an email: **karol@codete.com**
or call me: 0048608508372.