# IPFS in the Context of OpenMined

The goal of this notebook is to give a quick introduction to IPFS functionalities used in [Grid](https://github.com/OpenMined/Grid), which is a Peer-to-Peer On-Demand Compute Grid. Grid is one step toward Federated Learning. You can find an excellent notebook demo [here](https://github.com/OpenMined/Grid/blob/master/notebooks/DenverMLGridDemo.ipynb) with the presentation recorded [here](https://www.youtube.com/watch?v=iYP4sYz0jho&feature=youtu.be&t=1h13m21s). 

In order to run this notebook, you just have to install [Grid](https://github.com/OpenMined/Grid).

In [151]:
import pickle
import json
import numpy as np
from filelock import FileLock
from datetime import datetime
from threading import Thread
import base64
from bitcoin import base58

from grid.ipfsapi.client import Client as IpfsClient

api = IpfsClient()

## Private - Federated Learning

One of OpenMined's pillars is privacy. We want to make sure users keep control over their data. A key technology to acheive this objective is [Federated Learning](https://research.googleblog.com/2017/04/federated-learning-collaborative.html). Currently, to get predictions on your device, users need to send their personal data to a central location from which the AI corporation will train the machine learning model. Finally, the AI corporation sends prediction back to the end user. With Federated Learning, the machine learning model will be trained directly on the end user's device, where their personal data is sitting. Their personal data will never be sent to a central location. It's only the gradients, information learned by the model based on the new data, that will be sent to a central location. Finally, the gradients received from the different devices will be aggregated to improve the predictions. This technique is a key ingredient to make sure your data stays private. You can find a great implementation of Federated Learning [here](https://medium.com/@mccorby/federated-learning-e79e054c33ef).

## IPFS (InterPlanetary File System)

[IPFS](https://ipfs.io/) protocole is at the core of OpenMined's Federated Learning implementation (Grid). With Federated Learning, you need to exchange data with several end users or data providers. This data can be an ML model or gradients, or simply messages to indicate existence of data that could improve the model, etc. The initial thought was to use blockchain technology to transfer data. However, it can be expensive and slow to tansfer data on-chain. This [article](https://medium.com/@mycoralhealth/learn-to-securely-share-files-on-the-blockchain-with-ipfs-219ee47df54c) articulates how IPFS and Blockchain can be leveraged to transfer data securely.

Here are the main benefits of IPFS:
- Each file and all of the blocks within it are given a unique fingerprint called a cryptographic hash.
- IPFS removes duplications across the network and tracks version history for every file.
- Each network node stores only content it is interested in, and some indexing information that helps figure out who is storing what.
- When looking up files, you're asking the network to find nodes storing the content behind a unique hash.

## Sharing Data Through IPFS

Sharing data through IPFS is extremely simple and fast. When you add a file to IPFS, it will return a hash. 

### Sharing Tensors:

IPFS gives you the option to easily add binary, json, etc. files to IPFS. For example, if you want to share tensors, you just have to serialize the tensor, then add the file to IPFS using add_bytes. 

In [152]:
# Create a simple matrice
data = np.array([[1,4],[65,8]])
# Serialize the matrice using pickle
data_serialized = pickle.dumps(data)
# Add the file to IPFS using add_bytes
tensor_hash = api.add_bytes(data_serialized)
tensor_hash

'QmbWd4xUQzZjv35foX3uKeRTRgJ9X689vHURyECqQLUTKd'

As you can see, the function add_bytes returns a unique identifier, which is a hash.

To retrieve the content of a file identified by a hash, you just have to use the function cat.

In [153]:
# Retreive file from IPFS
file_retreived_from_ipfs = api.cat('QmbWd4xUQzZjv35foX3uKeRTRgJ9X689vHURyECqQLUTKd')
# Deserialize reteived file using pickle 
pickle.loads(file_retreived_from_ipfs)

array([[ 1,  4],
       [65,  8]])

### Sharing Keras model:

With the simple concepts introduced above, you can even share a [Keras](https://keras.io/) model through IPFS.

In [154]:
# Add Keras model to IPFS after serializing it
def keras2ipfs(api, model):
    return api.add_bytes(serialize_keras_model(model))

# Retreive the Keras model then deserialize it
def ipfs2keras(api, model_addr):
    return deserialize_keras_model(api.cat(model_addr))

# Serialize Keras model
def serialize_keras_model(model):
    lock = FileLock('temp_model.h5.lock')
    with lock:
        model.save('temp_model.h5')
        with open('temp_model.h5', 'rb') as f:
            model_bin = f.read()
            f.close()
        return model_bin

# Deserialize Keras model
def deserialize_keras_model(model_bin):
    lock = FileLock('temp_model2.h5.lock')
    with lock:
        with open('temp_model2.h5', 'wb') as g:
            g.write(model_bin)
            g.close()
        model = keras.models.load_model('temp_model2.h5')
        return model

Let's see how it works in action. First, we create a basic deep learning model.

In [155]:
import keras
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD
from grid.clients.keras import KerasClient

In [156]:
input = np.array([[0,0],[0,1],[1,0],[1,1]])
target = np.array([[0],[1],[1],[0]])

model = Sequential()
model.add(Dense(8, input_shape=(2,)))
model.add(Activation('tanh'))
model.add(Dense(1))
model.add(Activation('sigmoid'))

sgd = SGD(lr=0.1)
model.compile(loss='binary_crossentropy', optimizer=sgd)

model.fit(input,target,epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x181d8a97b8>

Then we add the model to IPFS.

In [157]:
model_hash = keras2ipfs(api, model)
model_hash

'QmerfiYTJFJahDf8arV1Kb5Jy3Brvxc5oVEzZKKDK1YHPM'

If someone else wants to make predictions with this model, they can retreive the model using the model_hash.

In [158]:
# Retreive the Keras model from IPFS
retreived_model = ipfs2keras(api, model_hash)

In [159]:
# Make prediction
new_input = np.array([[0,0],[0,1]])
retreived_model.predict(new_input)

array([[0.4748178],
       [0.6431086]], dtype=float32)

And voila you made some predictions on unseen data.

Hopefully these examples show you how easy it is to transfer tensors, models, gradients, etc., through IPFS. You can even share json-serializable Python dict. Here is an example:

In [160]:
data_json = json.dumps({'model_name':'mnist', 'owner_id':'534564'})
data_hash = api.add_json(data_json)
data_hash

'Qmb473ErRjqppLHK8GXfErQg9EaHghHE7BDGpBY1cvVKne'

In [161]:
json.loads(api.cat('Qmb473ErRjqppLHK8GXfErQg9EaHghHE7BDGpBY1cvVKne'))

'{"model_name": "mnist", "owner_id": "534564"}'

## Publish messages on Pubsub IPFS

[Pubsub](https://ipfs.io/blog/25-pubsub/) on IPFS also plays a major role in [Grid](https://github.com/OpenMined/Grid). For example, with pubsub, an OpenMined Client node can ask the Worker node to train a model with certain specs (the model, number of epochs, batch size, etc.), then receive the trained model back from the Worker.

Let's look at an example. We could broadcast the existence of a model with the owner_id on a channel called "model". To do so, you just have to use the function pubsub_pub. In the terminal, run the command **ipfs pubsub sub "model"**, then run the code below. You should see the message below appear.

In [162]:
message = {'model_name':'mnist', 'owner_id':'534564'}

api.pubsub_pub(topic = "model", 
                payload = json.dumps(message),
                stream = True)

<generator object HTTPClient._request.<locals>.stream_decode at 0x181edba308>

You can also listen to messages on the "model" channel. Certain portions of the message are encoded in base64, but you can decode the message using the following decode_message function.

In [163]:
def decode_message(encoded):
        """Decode message published on pubsub"""
        if ('from' in encoded):
            decoded = {}
            decoded['from'] = base64.standard_b64decode(encoded['from'])
            decoded['data'] = base64.standard_b64decode(
                encoded['data']).decode('ascii')
            decoded['seqno'] = base64.standard_b64decode(encoded['seqno'])
            decoded['topicIDs'] = encoded['topicIDs']
            decoded['encoded'] = encoded
            return decoded
        else:
            return None

To extract the data from the message, we can use the handle_message function below.

In [164]:
def handle_message(message):
        msg = json.loads(message['data'])
        return msg

To listen to a specific channel, we just have to use the function pubsub_pub from the IPFS API.

In [165]:
def listen_to_channel_impl(channel,handle_message):
    
        new_messages = api.pubsub_sub(topic=channel,stream = True)

        # new_messages is a generator which will keep yield new messages until
        # you return from the loop. If you do return from the loop, we will no
        # longer be subscribed.
        for m in new_messages:
            message = decode_message(m)
            if message is not None:
                out = handle_message(message)
                if out is not None:
                    print(out)
                    return out

In [166]:
def listen_to_channel(*args):
        """ Listens for IPFS pubsub sub messages asynchronously.
        This function will create the listener and call back your handler
        function on a new thread. """
        t1 = Thread(target=listen_to_channel_impl, args=args)
        t1.start()

Finally, we listen to the "model" channel.

In [167]:
listen_to_channel("model", handle_message)

Because we are listening to the "model" channel, if we publish the message below, we will receive the same message back.

In [168]:
message = {'model_name':'mnist', 'owner_id':'534564'}

api.pubsub_pub(topic = "model", 
                               payload=json.dumps(message),
                               stream = True)

<generator object HTTPClient._request.<locals>.stream_decode at 0x181edbaa98>

{'model_name': 'mnist', 'owner_id': '534564'}


Sharing data through IPFS and publishing messages on pubsub is the backbone of Grid. If you want to contribute to Grid or learn more about IPFS, do not hesitate to reach out on [OpenMined Slack](https://openmined.slack.com/).