# Backup assets from a project and restore to the new one

This notebook contains steps and code to demonstrate support of backup and restore features in Cloud Pak for Data. 
It contains steps and code to work with [`cpdctl`](https://github.com/IBM/cpdctl) CLI tool available in IBM GitHub repository. 
It also introduces commands for working with assets, exporting them from a project, creating the new project and importing assets there.

Some familiarity with Python is helpful. This notebook uses Python 3.7.

In [1]:
import base64
import json
import os
import platform
import requests
import tarfile
import zipfile
from IPython.core.display import display, HTML

## CPD Credentials
**Note**: when running this notebook inside IBM Cloud Pak for Data (CP4D) cluster, cpdctl takes advantage of [zero-configuration mode](https://github.com/IBM/cpdctl#zero-configuration) which means it can connect to the CP4D without explicit configuration. In that case the cells below that set credential and URL variables as well as cells that run `cpdctl config ...` commands can be skipped.

In [2]:
CPD_USER_NAME = 'YOUR CPD USER NAME'
CPD_USER_PASSWORD = 'YOUR CPD USER PASSWORD'
CPD_URL = 'YOUR CPD CLUSTER URL'

### Install the latest version of `cpdctl`

In [3]:
PLATFORM = platform.system().lower()
CPDCTL_ARCH = "{}_amd64".format(PLATFORM)
CPDCTL_RELEASES_URL="https://api.github.com/repos/IBM/cpdctl/releases"
CWD = os.getcwd()
PATH = os.environ['PATH']
CPD_CONFIG = os.path.join(CWD, '.cpdctl.config.yml')

response = requests.get(CPDCTL_RELEASES_URL)
assets = response.json()[0]['assets']
platform_asset = next(a for a in assets if CPDCTL_ARCH in a['name'])
cpdctl_url = platform_asset['url']
cpdctl_file_name = platform_asset['name']

response = requests.get(cpdctl_url, headers={'Accept': 'application/octet-stream'})
with open(cpdctl_file_name, 'wb') as f:
    f.write(response.content)
    
display(HTML('<code>cpdctl</code> binary downloaded from: <a href="{}">{}</a>'.format(platform_asset['browser_download_url'], platform_asset['name'])))

In [4]:
%%capture

%env PATH={CWD}:{PATH}
%env CPD_CONFIG={CPD_CONFIG}

In [5]:
if cpdctl_file_name.endswith('tar.gz'):
    with tarfile.open(cpdctl_file_name, "r:gz") as tar:
        tar.extractall()
elif cpdctl_file_name.endswith('zip'):
    with zipfile.ZipFile(cpdctl_file_name, 'r') as zf:
        zf.extractall()

if CPD_CONFIG and os.path.exists(CPD_CONFIG):
    os.remove(CPD_CONFIG)
    
version_r = ! cpdctl version
CPDCTL_VERSION = version_r.s

print("cpdctl version: {}".format(CPDCTL_VERSION))

cpdctl version: 1.0.0


### Add CPD user and profile configuration

Add "cpd_user" user to the `cpdctl` configuration

In [6]:
! cpdctl config user set cpd_user --username {CPD_USER_NAME} --password {CPD_USER_PASSWORD}

Add "cpd" profile to the `cpdctl` configuration

In [7]:
! cpdctl config profile set cpd --url {CPD_URL}

List available profiles

In [1]:
! cpdctl config profile list

[1mName[0m   [1mType[0m      [1mUser[0m       [1mURL[0m                                              [1mCurrent[0m
[36;1mcpd[0m    private   cpd_user   https://cpd-zen.apps.wp463case.cp.fyre.ibm.com   *


Switch the current profile

In [2]:
! cpdctl config profile use cpd

Switched to profile "cpd".


List available projects in "cpd" profile

In [11]:
! cpdctl project list

...
[1mID[0m                                     [1mName[0m          [1mCreated[0m                    [1mDescription[0m   [1mTags[0m   
[36;1m7fb76cf7-25be-435d-818e-bd6e9b5254f5[0m   cpdctl-demo   2021-01-29T08:01:23.363Z                 []   


### Access the selected project assets

Get the first project ID and show details

In [12]:
result = ! cpdctl project list --output json --raw-output --jmes-query 'resources[0].metadata.guid'
PROJECT_ID = result.s

In [13]:
! cpdctl project get --project-id {PROJECT_ID}

...
[1m[0m               [1m[0m   
[36;1mID:[0m            7fb76cf7-25be-435d-818e-bd6e9b5254f5   
[36;1mName:[0m          cpdctl-demo   
[36;1mCreated:[0m       2021-01-29T08:01:23.363Z   
[36;1mDescription:[0m      
[36;1mTags:[0m          []   


In [14]:
result = ! cpdctl project get --project-id {PROJECT_ID} --output json --jmes-query "entity.name" --raw-output
PROJECT_NAME = result.s
print("'{}' project ID is: {}".format(PROJECT_NAME, PROJECT_ID))

'cpdctl-demo' project ID is: 7fb76cf7-25be-435d-818e-bd6e9b5254f5


List assets in the project

In [15]:
! cpdctl asset search --project-id {PROJECT_ID} --type-name asset --query "*:*"

...
[1mID[0m                                     [1mName[0m                                                [1mCreated[0m                    [1mDescription[0m                 [1mType[0m         [1mState[0m       [1mTags[0m            [1mSize[0m   
[36;1m8a8c8daa-f6eb-4e2b-9526-b6f13a457785[0m   car_rental_training_data.csv                        2021-01-29T08:54:47.000Z                               data_asset   available   [cpdctl-demo]   79518   
[36;1m17ebcd96-588e-4287-9cb9-eb4608a4693e[0m   housing_data.csv                                    2021-01-29T10:20:26.000Z                               data_asset   available   [cpdctl-demo]   41399   
[36;1medb6fe21-77c4-4cb3-aa6d-9e36d2b18edd[0m   credit_risk_training.csv                            2021-01-29T08:51:00.000Z                               data_asset   available   []              689622   
[36;1mceea9923-7ff7-4084-a560-818716e65b4d[0m   Sample notebook                                     2021-01-29T08

### Download data asset

Get "credit_risk_training.csv" data asset ID

In [20]:
result = ! cpdctl asset search --project-id {PROJECT_ID} --type-name data_asset --query "asset.name:credit_risk_training.csv" --output json --jmes-query "results[0].metadata.asset_id" --raw-output
DATA_ASSET_ID = result.s
print("'credit_risk_training.csv' data asset ID is: {}".format(DATA_ASSET_ID))

'credit_risk_training.csv' data asset ID is: edb6fe21-77c4-4cb3-aa6d-9e36d2b18edd


Download data asset

In [21]:
! cpdctl asset get --project-id {PROJECT_ID} --asset-id {DATA_ASSET_ID}

...
[1m[0m               [1m[0m   
[36;1mID:[0m            edb6fe21-77c4-4cb3-aa6d-9e36d2b18edd   
[36;1mName:[0m          credit_risk_training.csv   
[36;1mCreated:[0m       2021-01-29T08:51:00.000Z   
[36;1mDescription:[0m      
[36;1mType:[0m          data_asset   
[36;1mState:[0m         available   
[36;1mTags:[0m          []   
[36;1mSize:[0m          689622   
[36;1mAttachments:[0m   [1mID[0m                                     [1mName[0m                       [1mType[0m         [1mMime Type[0m      
[36;1m[0m               [36;1m0d7bc498-8913-4a53-91d3-b419e8ba070a[0m   credit_risk_training.csv   data_asset   text/csv      
[36;1m[0m                  


In [37]:
result = ! cpdctl asset get --project-id {PROJECT_ID} --asset-id {DATA_ASSET_ID} --output json -j "attachments[0].id" --raw-output
DATA_ATTACHMENT_ID = result.s
print("Data asset attachment ID is: {}".format(DATA_ATTACHMENT_ID))

Data asset attachment ID is: 535ed8dd-8ea9-4dc7-b686-bb4e7d70192c


In [38]:
! cpdctl asset attachment download --project-id {PROJECT_ID} --asset-id {DATA_ASSET_ID} --attachment-id {DATA_ATTACHMENT_ID} --output-file credit_risk_training.csv

...
[32;1mOK[0m
Output written to credit_risk_training.csv


### Upload a new data asset

Clean up the existing "housing_data.csv" data assets

In [22]:
UPLOAD_DATASET_NAME = 'housing_data.csv'
ASSET_QUERY = "asset.name:{}".format(UPLOAD_DATASET_NAME)

In [25]:
result = ! cpdctl asset search --project-id {PROJECT_ID} --type-name data_asset --query "{ASSET_QUERY}" --output json --jmes-query "results[*].metadata.asset_id" --raw-output
DATA_ASSET_IDS = json.loads(result.s)
for data_asset_id in DATA_ASSET_IDS:
    print("Deleteing data asset with ID: {}".format(data_asset_id))
    ! cpdctl asset delete --project-id {PROJECT_ID} --asset-id {data_asset_id}

Download the full training set from github 

In [26]:
! curl https://raw.githubusercontent.com/pmservice/wml-sample-models/master/scikit-learn/boston/data/housing_data.csv -o {UPLOAD_DATASET_NAME}
! wc -l {UPLOAD_DATASET_NAME}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 41399  100 41399    0     0   117k      0 --:--:-- --:--:-- --:--:--  117k
     507 housing_data.csv


Create a new data asset in the project from the downloaded file

In [27]:
! cpdctl asset data-asset upload --file {UPLOAD_DATASET_NAME} --project-id {PROJECT_ID} --progress true --tag "cpdctl-demo" --mime "text/csv"


...
40.65 KiB / 40.43 KiB [---------------------------------------] 100.54% ? p/s 0s
[1m[0m               [1m[0m   
[36;1mID:[0m            17ebcd96-588e-4287-9cb9-eb4608a4693e   
[36;1mName:[0m          housing_data.csv   
[36;1mCreated:[0m       2021-01-29T10:20:26.000Z   
[36;1mDescription:[0m      
[36;1mType:[0m          data_asset   
[36;1mState:[0m         available   
[36;1mTags:[0m          [cpdctl-demo]   
[36;1mSize:[0m          41399   
[36;1mAttachments:[0m   [1mID[0m                                     [1mName[0m               [1mType[0m         [1mMime Type[0m      
[36;1m[0m               [36;1m7f6c9650-36e5-414e-9620-8e8f0bf90c71[0m   housing_data.csv   data_asset   text/csv      
[36;1m[0m                  


In [28]:
result = ! cpdctl asset search --project-id {PROJECT_ID} --type-name data_asset --query "asset.name:credit_risk_training.csv" --output json --jmes-query "results[0].metadata.asset_id" --raw-output
NEW_DATA_ASSET_ID = result.s
print("'{}' data asset ID is: {}".format(UPLOAD_DATASET_NAME, NEW_DATA_ASSET_ID))

'housing_data.csv' data asset ID is: edb6fe21-77c4-4cb3-aa6d-9e36d2b18edd


In [29]:
! cpdctl asset search --project-id {PROJECT_ID} --type-name data_asset --query "*:*"

...
[1mID[0m                                     [1mName[0m                           [1mCreated[0m                    [1mDescription[0m   [1mType[0m         [1mState[0m       [1mTags[0m            [1mSize[0m   
[36;1m8a8c8daa-f6eb-4e2b-9526-b6f13a457785[0m   car_rental_training_data.csv   2021-01-29T08:54:47.000Z                 data_asset   available   [cpdctl-demo]   79518   
[36;1m17ebcd96-588e-4287-9cb9-eb4608a4693e[0m   housing_data.csv               2021-01-29T10:20:26.000Z                 data_asset   available   [cpdctl-demo]   41399   
[36;1medb6fe21-77c4-4cb3-aa6d-9e36d2b18edd[0m   credit_risk_training.csv       2021-01-29T08:51:00.000Z                 data_asset   available   []              689622   


Export all assets from the selected project

In [32]:
EXPORT = {
    'all_assets': True
}
EXPORT_JSON = json.dumps(EXPORT)
result = ! cpdctl asset export start --project-id {PROJECT_ID} --assets '{EXPORT_JSON}' --name demo-project-assets --output json --jmes-query "metadata.id" --raw-output
EXPORT_ID = result.s
print('Export ID: {}'.format(EXPORT_ID))

Export ID: 0988ee86-33b1-4bf4-90cb-676009d3d463


In [33]:
! cpdctl asset export download --project-id {PROJECT_ID} --export-id {EXPORT_ID} --output-file project-assets.zip --progress

...
[32;1mOK[0m
Output written to project-assets.zip


In [34]:
! unzip -l project-assets.zip

Archive:  project-assets.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
      358  01-29-2021 10:23   project.json
       44  01-29-2021 10:23   deflate.log
      416  01-29-2021 10:23   assettypes/folder_asset.json
      988  01-29-2021 10:23   assettypes/column_info.json
      234  01-29-2021 10:23   assettypes/policy_transform.json
      543  01-29-2021 10:23   assettypes/asset_terms.json
      459  01-29-2021 10:23   assettypes/omrs_relationship_message.json
      288  01-29-2021 10:23   assettypes/package_extension.json
      526  01-29-2021 10:23   assettypes/environment.json
      465  01-29-2021 10:23   assettypes/connection_credentials.json
      311  01-29-2021 10:23   assettypes/shiny_asset.json
     1442  01-29-2021 10:23   assettypes/job_run.json
    38778  01-29-2021 10:23   assettypes/wml_model.json
     2528  01-29-2021 10:23   assettypes/wml_remote_training_system.json
    27504  01-29-2021 10:23   assettypes/wml_training_def

### Create a new project from backup

Ensure there is no restored project

In [12]:
RESTORED_PROJECT_NAME = 'cpdctl-demo-restored-project'

In [31]:
JMES_QUERY = "resources[?entity.name == '{}'].metadata.guid".format(RESTORED_PROJECT_NAME)
result = ! cpdctl project list --output json --jmes-query "{JMES_QUERY}"
PROJECT_IDS = json.loads(result.s)
if PROJECT_IDS:
    for project_id in PROJECT_IDS:
        print('Deleting project with ID: {}'.format(project_id))
        ! cpdctl project delete --project-id {project_id}

In [19]:
! cpdctl project list --output json

{
  "resources": [
    {
      "entity": {
        "creator": "demouser",
        "creator_iam_id": "1000331004",
        "name": "cpdctl-demo-restored-project",
        "public": false,
        "scope": {
          "bss_account_id": "999",
          "enforce_members": true
        },
        "storage": {
          "guid": "dcb94a38-4356-424f-9fa0-3e5b34c648ed",
          "type": "assetfiles"
        }
      },
      "metadata": {
        "created_at": "2021-01-29T10:30:12.419Z",
        "guid": "26ec966c-5fd6-4d28-bd32-5ab0aa3fc51e",
        "updated_at": "2021-01-29T10:30:14.083Z",
        "url": "/v2/projects/26ec966c-5fd6-4d28-bd32-5ab0aa3fc51e"
      }
    },
    {
      "entity": {
        "creator": "demouser",
        "creator_iam_id": "1000331004",
        "description": "",
        "name": "cpdctl-samples",
        "public": false,
        "scope": {
          "bss_account_id": "999",
          "enforce_members": true
        },
        "sto

Create a new project

In [44]:
import uuid
STORAGE = {"type": "assetfiles", "guid": str(uuid.uuid4())}
STORAGE_JSON = json.dumps(STORAGE)
result = ! cpdctl project create --name {RESTORED_PROJECT_NAME} --output json --raw-output --storage '{STORAGE_JSON}' --jmes-query 'location'
RESTORED_PROJECT_ID = result.s.split('/')[-1]
print("The new '{}' project ID is: {}".format(RESTORED_PROJECT_NAME, RESTORED_PROJECT_ID))

The new 'cpdctl-demo-restored-project' project ID is: 26ec966c-5fd6-4d28-bd32-5ab0aa3fc51e


In [45]:
result = ! cpdctl asset import start --project-id {RESTORED_PROJECT_ID} --import-file project-assets.zip --output json --jmes-query "metadata.id" --raw-output
IMPORT_ID = result.s
print("The new import ID is: {}".format(IMPORT_ID))

The new import ID is: 1696b550-0415-42d8-a494-1ff2283d7f2f


In [46]:
! cpdctl asset import get --project-id {RESTORED_PROJECT_ID} --import-id {IMPORT_ID}

...
[1m[0m           [1m[0m   
[36;1mID:[0m        1696b550-0415-42d8-a494-1ff2283d7f2f   
[36;1mCreated:[0m   2021-01-29T10:30:39.607Z   
[36;1mState:[0m     completed   


In [47]:
! cpdctl asset search --query '*:*' --type-name asset --project-id {RESTORED_PROJECT_ID}

...
[1mID[0m                                     [1mName[0m                           [1mCreated[0m                    [1mDescription[0m   [1mType[0m         [1mState[0m       [1mTags[0m            [1mSize[0m   
[36;1m3c7ef01c-d5df-4b9a-8bd5-120c917a2928[0m   housing_data.csv               2021-01-29T10:30:43.000Z                 data_asset   available   [cpdctl-demo]   41399   
[36;1mc59be03f-4867-4281-a493-47f36a6e5291[0m   credit_risk_training.csv       2021-01-29T10:30:44.000Z                 data_asset   available   []              689622   
[36;1m5d7aa59f-2202-465c-8276-906a5c90d16f[0m   car_rental_training_data.csv   2021-01-29T10:30:43.000Z                 data_asset   available   [cpdctl-demo]   79518   


### Author

Rafał Bigaj, System Architect with long successful record of building and leading teams. Broad and practical knowledge in the area of cloud computing, machine learning and distributed systems development. 

Copyright © 2020 IBM. This notebook and its source code are released under the terms of the MIT License.