### Uploading the packaged egg file to remote hdfs

In [1]:
# Push the egg file to user HDFS directory in the cluster
import os
import dsx_core_utils
dsx_core_utils.upload_hdfs_file(
    source_path=os.environ['DSX_PROJECT_DIR']+'/SpamFilterScikit/dist/SpamFilterScikit-1.0-py2.7.egg', 
    target_path="/user/user1/SpamFilterScikit-1.0-py2.7.egg",
    webhdfsurl="https://zinc1.fyre.ibm.com:8443/gateway/jalv-126-master-1/webhdfs/v1")

upload success


### Uploading the datatset to remote cluster

In [1]:
# Push the dataset to user HDFS directory in the cluster
import dsx_core_utils
dsx_core_utils.upload_hdfs_file(
    source_path=os.environ['DSX_PROJECT_DIR']+'/datasets/SMSSpamCollection.csv', 
    target_path="/user/user1/SMSSpamCollection.csv",
    webhdfsurl="https://zinc1.fyre.ibm.com:8443/gateway/jalv-126-master-1/webhdfs/v1")

upload success


### Connecting to remote spark through DSX-HI

In [2]:
%load_ext sparkmagic.magics
from dsx_core_utils import proxy_util,dsxhi_util
proxy_util.configure_proxy_livy()

dsxhi_util.list_livy_endpoints()

success configuring sparkmagic livy.
['https://miniedge-hd.fyre.ibm.com:8443/gateway/jalv-126-master-1/livy2/v1', 'https://zinc1.fyre.ibm.com:8443/gateway/jalv-126-master-1/livy/v1', 'https://zinc1.fyre.ibm.com:8443/gateway/jalv-126-master-1/livy2/v1']


### Pushing the python virtual environment to cluster using DSX-HI

In [6]:
!cat /user-home/_global_/.remote-images/dsx-hi/dsx-scripted-ml-python2.json

{ "imageId": "26611bf7fe595f786139d6d2132de070fc813f6a0ef7a4e25857b79c8cd4b565",
  "scriptCommand": "anaconda2/bin/python2.7",
  "libPaths": ["usr/local/spark-2.0.2-bin-hadoop2.7/python","user-home/.scripts/common-helpers/batch/pmml","user-home/.scripts/common-helpers/saas"] }


### Create Session Properties
Using values from `dsx-scripted-ml-python2.json`, we'll need to:

- (1) Pull the archive from HDFS to the Yarn Distributed cache using spark conf **--archives**
- (2) Override the default PYSPARK_PYTHON, from the relative path `scriptCommand`

---

Example Livy Properties for using dsx-scripted-ml-python2.tar.gz Virtual Environment:
```
{"proxyUser": "user1", "archives": ["/user/dsxhi/environments/26611bf7fe595f786139d6d2132de070fc813f6a0ef7a4e25857b79c8cd4b565/dsx-scripted-ml-python2.tar.gz"],"conf":{"spark.yarn.appMasterEnv.PYSPARK_PYTHON":"dsx-scripted-ml-python2.tar.gz/anaconda2/bin/python"}}
```
### Files currently on HDFS:
```
/user/dsxhi/environments/26611bf7fe595f786139d6d2132de070fc813f6a0ef7a4e25857b79c8cd4b565/dsx-scripted-ml-python2.tar.gz
/user/dsxhi/environments/pythonAddons/pythonAddons.tar.gz
```


In [3]:
%manage_spark

Added endpoint https://zinc1.fyre.ibm.com:8443/gateway/jalv-126-master-1/livy2/v1
Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
48,application_1525285700946_0040,pyspark,idle,Link,Link,✔


SparkSession available as 'spark'.


### Distributing the egg file to remote spark cluster

In [4]:
%%spark
sc.addPyFile("hdfs:///user/user1/SpamFilterScikit-1.0-py2.7.egg")

### Invoke the LRModelScikit custom method on remote cluster

In [5]:
%%spark
import SpamFilterScikit
import sys
import os

from pyspark import SparkContext
from pyspark.sql import SparkSession

# Import libraries from the deployed egg
from SpamFilterScikit import LRModelScikit

# Read the file from HDFS
filename = "hdfs:///user/user1/SMSSpamCollection.csv"

# Call the method
LRModelScikit().execute(spark,filename)

Accuracy: 93.00%
y_test  y_pred  count
     1       1     94
     0       1      4
     1       0     10
     0       0     92