## AddMNistData
In this notebook we will use a number of libraries including ibm_boto3 in order to download the mnist dataset and upload it to IBM Cloud Object Storage. We will do so using two different methods, one to train a model and store its results using Keras and another for Watson Studio's Neural Network Modeler

### I - Cloud Object Storage

In [9]:
!pip install wget

Collecting wget
  Downloading wget-3.2.zip
Building wheels for collected packages: wget
  Running setup.py bdist_wheel for wget ... [?25ldone
[?25h  Stored in directory: /home/dsxuser/.cache/pip/wheels/6d/98/29/61ccc41148f871009126c2e844e26f73eeb25e12cca92228a5
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


Install various libraries, ibm_boto3 allows Python developers to manage Cloud Object Storage.

In [10]:
import os
import json
import time
import wget
import keras
import numpy
import pickle
import warnings
import ibm_boto3
from keras.datasets import mnist 
from ibm_botocore.client import Config
from sklearn.model_selection import train_test_split

We define the endpoint we will use. You can find this information in "Endpoint" section of your Cloud Object Storage dashboard

In [3]:
cos_credentials = {
  "****PLACE YOUR COS CREDENTIALS HERE"
}
api_key = cos_credentials['apikey']
service_instance_id = cos_credentials['resource_instance_id']
auth_endpoint = 'https://iam.bluemix.net/oidc/token'
service_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'

We create Boto resource by providing type, endpoint_url and credentials.

In [4]:
cos = ibm_boto3.resource('s3',
                         ibm_api_key_id=api_key,
                         ibm_service_instance_id=service_instance_id,
                         ibm_auth_endpoint=auth_endpoint,
                         config=Config(signature_version='oauth'),
                         endpoint_url=service_endpoint)

### II Downloading MNIST data and upload it to COS buckets

Let's create the buckets we will use to store training data and training results.

**Note:**: Bucket name has to be globally unique - we will use a timestamp to do so

In [18]:
timestamp = str(time.time())
buckets = ['mnist-data-' + timestamp, 'mnist-results-' + timestamp]
for bucket in buckets:
    if not cos.Bucket(bucket) in cos.buckets.all():
        print('Creating bucket "{}"...'.format(bucket))
        try:
            cos.create_bucket(Bucket=bucket)
        except ibm_boto3.exceptions.ibm_botocore.client.ClientError as e:
            print('Error: {}.'.format(e.response['Error']['Message']))

Creating bucket "mnist-data-1522195721.5449784"...
Creating bucket "mnist-results-1522195721.5449784"...


Now we should have our buckets created.

We will work with Keras **MNIST** sample dataset. Let's download our training data and upload them to 'mnist-keras-data' bucket.

Below cell will create 'MNIST_KERAS_DATA' folder and download the file from link.

In [7]:
link = 'https://s3.amazonaws.com/img-datasets/mnist.npz'

In [11]:
data_dir = 'MNIST_KERAS_DATA'
if not os.path.isdir(data_dir):
    os.mkdir(data_dir)

if not os.path.isfile(os.path.join(data_dir, os.path.join(link.split('/')[-1]))):
    wget.download(link, out=data_dir)  
        
!ls MNIST_KERAS_DATA

mnist.npz


Upload the data files to created buckets.

In [19]:
bucket_name = buckets[0]
bucket_obj = cos.Bucket(bucket_name)

for filename in os.listdir(data_dir):
    with open(os.path.join(data_dir, filename), 'rb') as data: 
        bucket_obj.upload_file(os.path.join(data_dir, filename), filename)
        print('{} is uploaded.'.format(filename))

mnist.npz is uploaded.


### III Download MNIST as python pickle for Neural Network Modeler

Authenticate to Watson Machine Learning service on Bluemix.

In [13]:
(X,y), (X_test,y_test) = mnist.load_data()
X_train, X_valid, y_train, y_valid = train_test_split(X,y, test_size=.166, random_state=42)

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


**Action**: Put authentication information from your instance of Watson Machine Learning service here.

In [14]:
with open('mnist-nnm-train.pkl', 'wb') as f:
    pickle.dump((X_train, y_train), f, protocol=pickle.HIGHEST_PROTOCOL)

with open('mnist-nnm-valid.pkl', 'wb') as f:
    pickle.dump((X_valid, y_valid), f, protocol=pickle.HIGHEST_PROTOCOL)

with open('mnist-nnm-test.pkl', 'wb') as f:
    pickle.dump((X_test, y_test), f, protocol=pickle.HIGHEST_PROTOCOL)

In [20]:
cos.create_bucket(Bucket='mnist-nnm-' + timestamp)
cos.Bucket('mnist-nnm-' + timestamp).upload_file('mnist-nnm-train.pkl', 'mnist-nnm-train.pkl')
cos.Bucket('mnist-nnm-' + timestamp).upload_file('mnist-nnm-valid.pkl', 'mnist-nnm-valid.pkl')
cos.Bucket('mnist-nnm-' + timestamp).upload_file('mnist-nnm-test.pkl', 'mnist-nnm-test.pkl')

Upon finishing there should be 4 total objects, one in the first bucket and 3 in the second

In [22]:
for obj in cos.Bucket('mnist-data-' + timestamp).objects.all():
    print('Object key: {}, size: {:5.1f}kB'.format(obj.key, obj.size/1024))

print()

for obj in cos.Bucket('mnist-nnm-' + timestamp).objects.all():
    print('Object key: {}, size: {:5.1f}kB'.format(obj.key, obj.size/1024))

Object key: mnist.npz, size: 11221.1kB

Object key: mnist-nnm-test.pkl, size: 7666.2kB
Object key: mnist-nnm-train.pkl, size: 38360.9kB
Object key: mnist-nnm-valid.pkl, size: 7635.6kB
