<a href="https://colab.research.google.com/github/arangoml/arangopipe/blob/409_and_env_externalization/examples/Arangopipe_with_TensorFlow_Beginner_Guide.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 <a href="https://colab.research.google.com/github/arangoml/arangopipe/blob/master/examples/Arangopipe_with_TensorFlow_Beginner_Guide.ipynb">
  <center><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </center>
</a>



##### Copyright 2019 The TensorFlow Authors.

In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# ArangoML Pipeline Cloud
The purpose of this notebook is to show how easy it is to drop in ArangoML Pipeline to your pre-existing Machine Learning workflows.

We took the simplest existing example of TensorFlow, their beginner's notebook, and simply dropped in our pipeline to capture and store metadata.

If you would like to continue learning about ArangoML and the managed metadata pipeline read our release post https://www.arangodb.com/2020/01/arangoml-pipeline-cloud-manage-machine-learning-metadata/


# TensorFlow 2 quickstart for beginners

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/tutorials/quickstart/beginner"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/quickstart/beginner.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/docs/blob/master/site/en/tutorials/quickstart/beginner.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/docs/site/en/tutorials/quickstart/beginner.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

This short introduction uses [Keras](https://www.tensorflow.org/guide/keras/overview) to:

1. Build a neural network that classifies images.
2. Train this neural network.
3. And, finally, evaluate the accuracy of the model.

This is a [Google Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb) notebook file. Python programs are run directly in the browser—a great way to learn and use TensorFlow. To follow this tutorial, run the notebook in Google Colab by clicking the button at the top of this page.

1. In Colab, connect to a Python runtime: At the top-right of the menu bar, select *CONNECT*.
2. Run all the notebook code cells: Select *Runtime* > *Run all*.

# Install Required Packages

In [3]:
!pip install python-arango
!pip install -i https://test.pypi.org/simple/ arangopipe
!pip install pandas PyYAML==5.1.1 sklearn2
!pip install json-tricks 

Looking in indexes: https://test.pypi.org/simple/
Collecting arangopipe
  Downloading https://test-files.pythonhosted.org/packages/4e/a5/5c735a7b1893d5f61a647b3cefc569921526223f2ee44c52f154b54c896a/arangopipe-0.0.6.8.4-py3-none-any.whl
Installing collected packages: arangopipe
Successfully installed arangopipe-0.0.6.8.4


Download and install the TensorFlow 2 package. Import TensorFlow into your program:

In [4]:
from __future__ import absolute_import, division, print_function, unicode_literals

# Install TensorFlow
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow as tf

TensorFlow 2.x selected.


# Initial Connection to a Managed Service ArangoPipe Database

In [0]:
from arangopipe.arangopipe_storage.arangopipe_api import ArangoPipe
from arangopipe.arangopipe_storage.arangopipe_admin_api import ArangoPipeAdmin
from arangopipe.arangopipe_storage.arangopipe_config import ArangoPipeConfig
from arangopipe.arangopipe_storage.managed_service_conn_parameters import ManagedServiceConnParam
mdb_config = ArangoPipeConfig()
msc = ManagedServiceConnParam()
conn_params = { msc.DB_SERVICE_HOST : "arangoml.arangodb.cloud", \
                        msc.DB_SERVICE_END_POINT : "createDB",\
                        msc.DB_SERVICE_NAME : "createDB",\
                        msc.DB_SERVICE_PORT : 8529,
                        msc.DB_CONN_PROTOCOL : 'https'}
        
mdb_config = mdb_config.create_connection_config(conn_params)

In [6]:
%%capture
admin = ArangoPipeAdmin(reuse_connection = False, config = mdb_config)
ap_config = admin.get_config()
ap = ArangoPipe(config = ap_config)
# Error indicating "heart beat check was not found" is expected.

2020-02-11 03:44:54,615 - arangopipe_logger - ERROR - The dataset by name: heart beat check was not found in Arangopipe!


Load and prepare the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). Convert the samples from integers to floating-point numbers:

In [0]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0


Set identifying metadata information for this project. \
This includes project name, dataset, featureset, and model information.\
This information is then registered and stored.

In [0]:
proj_info = {"name": "MNIST Handwriting Analysis"}
proj_reg = admin.register_project(proj_info)

ds_info = {"name" : "MNIST dataset",\
           "description": "Classification task pertaining to classifiying the digit in an iamge" }
ds_reg = ap.register_dataset(ds_info)

featureset = {'name': 'MNIST digits',
              'description': '28 x 28 pixel images with a label'}
fs_reg = ap.register_featureset(featureset, ds_reg["_key"])

model_info = {"name": "Neural Network",\
              "type": "Neural network with Linear layer, ReLU activation, Dropout Layer (20%) and Softmax output layer"}
model_reg = ap.register_model(model_info, project = "MNIST Handwriting Analysis")

Build the `tf.keras.Sequential` model by stacking layers. Choose an optimizer and loss function for training:

In [0]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Train and evaluate the model:

In [10]:
import uuid #used as run id
from datetime import datetime
model.fit(x_train, y_train, epochs=5)


# Values for any important model parameters and to store performance results.
ruuid = uuid.uuid4()

# current date and time
now = datetime.now()
timestamp = datetime.timestamp(now)

model_params = {"run_id": str(ruuid)}
(loss), (accuracy) = model.evaluate(x_test,  y_test, verbose=2)
print("model loss %.2f , model accuracy %.2f" % (loss, accuracy))
model_perf = {"loss": str(loss),
              "accuracy": str(accuracy),
              "run_id": str(ruuid),
              "timestamp": timestamp}


Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
10000/10000 - 0s - loss: 0.0783 - accuracy: 0.9753
model loss 0.08 , model accuracy 0.98


In [0]:
model_json = model.to_json()

In [0]:
from json_tricks import dumps
weights = model.get_weights()
json_weights = dumps(weights)

In [0]:
model_params['json_weights'] = json_weights
model_params['model_json'] = model_json

In [0]:
  run_info = {"dataset" : ds_reg["_key"],\
              "featureset": fs_reg["_key"],\
              "run_id": str(ruuid),\
              "model": model_reg["_key"],\
              "model-params": model_params,\
              "model-perf": model_perf,\
              "pipeline" : "Handwriting-Analysis-Pipeline",\
              "tag": "MNIST_model_params_saved",\
              "project": "MNIST Handwriting Analysis"}

  ap.log_run(run_info)

# Reusing the Previous Connection

In a subsequent session you can reuse the connection you created previously using the snippet shown below. Note that you are not providing connection information during this interaction.

In [15]:
admin = ArangoPipeAdmin()  
ap_config = admin.get_config()
ap = ArangoPipe(config = ap_config)
# Error indicating "heart beat check was not found" is expected.

Host Connection: https://arangoml.arangodb.cloud:8529


2020-02-11 03:46:02,240 - arangopipe_logger - ERROR - The dataset by name: heart beat check was not found in Arangopipe!


Look up the model you stored in the database with the previous connection

In [16]:
ap.lookup_model("Neural Network")

{'_id': 'models/24197844',
 '_key': '24197844',
 '_rev': '_aBHXqu---_',
 'name': 'Neural Network',
 'type': 'Neural network with Linear layer, ReLU activation, Dropout Layer (20%) and Softmax output layer'}

The image classifier is now trained to ~98% accuracy on this dataset. To learn more, read the [TensorFlow tutorials](https://www.tensorflow.org/tutorials/).

## Recreate a model from persisted state

In [0]:
saved_model_params = ap.lookup_modelparams(tag_value = "MNIST_model_params_saved")

In [0]:
saved_model = saved_model_params['model_json']
saved_model_weights = saved_model_params['json_weights']

In [19]:
mdb_config.cfg

{'arangodb': {'DB_end_point': 'createDB',
  'DB_service_host': 'arangoml.arangodb.cloud',
  'DB_service_name': 'createDB',
  'DB_service_port': 8529,
  'arangodb_replication_factor': None,
  'conn_protocol': 'https',
  'dbName': 'MLbf8etjaazjsu1qnxe9l3up',
  'password': 'MLjgp36v1c81j05j13e80s8p',
  'username': 'MLvcjvb1zt4mlxadu6iqfv'},
 'mlgraph': {'graphname': 'enterprise_ml_graph'}}

In [0]:
from json_tricks import loads
remat_weight = loads(saved_model_weights)
reinitialized_model = tf.keras.models.model_from_json(saved_model)
reinitialized_model.set_weights(remat_weight)


## Compare the predictions of the old and new model

In [0]:
new_predictions = reinitialized_model.predict(x_test)
old_predictions = model.predict(x_test)

In [22]:
import numpy as np
np.array_equal(new_predictions, old_predictions)

True