<a id='Create_Livy_Session'></a>
## Create Remote Livy Session

First, import `dsx_core_utils` and show a summary of all available DSXHI Endpoints to use within this Notebook.

In [1]:
import dsx_core_utils
DSXHI_SYSTEMS = dsx_core_utils.get_dsxhi_info(showSummary=True)

Available Hadoop systems: 

        systemName LIVYSPARK  LIVYSPARK2                  imageId
0   durotar-hdp301            livyspark2  dsx-scripted-ml-python3
1  asgardia-hdp264            livyspark2  dsx-scripted-ml-python2


Then, define any additional Spark Configurations. See [Livy Sessions REST API](https://livy.incubator.apache.org/docs/latest/rest-api.html) for additional properties.


In [2]:
myConfig={
 "queue": "default",
 "driverMemory": "2G",
 "numExecutors": 1
}

# Set up sparkmagic to connect to the selected registered HI
# system with the specified configs.
dsx_core_utils.setup_livy_sparkmagic(
  system="asgardia-hdp264", 
  livy="livyspark2",
  addlConfig=myConfig)

# (Re-)load spark magic to apply the new configs.
%reload_ext sparkmagic.magics

sparkmagic has been configured to use https://asgardian-edge.fyre.ibm.com:8443/gateway/jalv-dsx121g-master-1/livy2/v1 
success configuring sparkmagic livy.


In [3]:
session_name = 'workshop-part1'
livy_endpoint = 'https://asgardian-edge.fyre.ibm.com:8443/gateway/jalv-dsx121g-master-1/livy2/v1'
%spark add -s $session_name -l python -k -u $livy_endpoint

Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
454,application_1538175267044_0068,pyspark,idle,Link,Link,✔


SparkSession available as 'spark'.


**Create and use custom functions remotely**

Lets create 2 simple functions:

- **run_command** - Simple wrapper to subprocess, to run a linux command whithin the Driver YARN Container

- **spark_dfs_topandas** - Sample function that takes 2 Spark DFs and returns 2 Pandas DFs. ToPandas() is generally not advisable, as it will be resource intensive on the Spark Driver. Some ML Tools however, require the DataFrame to be a Pandas DF, so this is merelly an example for such scenarios. 


In [4]:
%%spark -s $session_name
from subprocess import Popen, PIPE, STDOUT

def run_command(command, sleepAfter=None):        
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True)
    output = p.stdout.read()
    print(output)
    if (sleepAfter != None):
        time.sleep(sleepAfter)
        
def spark_dfs_topandas(DF1,DF2):
    return DF1.toPandas(),DF2.toPandas()

In [6]:
%%spark -s $session_name

run_command("hostname -f")
run_command("pwd")

shad2.fyre.ibm.com

/hadoop/yarn/local/usercache/user1/appcache/application_1538175267044_0068/container_e25_1538175267044_0068_01_000001

### 3. Transfer the `quicken_demo_utils.py` script which was saved above, to your HDFS user directory.

- Use cell magic  `!ls ../scripts`   to see the relative path of the script which was just saved:


In [26]:
!ls ../scripts

quicken_demo_utils.py


- In a new cell, Show registered WebHDFS Secure URLS which logged in user has access to:

In [27]:
import dsx_core_utils
dsx_core_utils.list_dsxhi_webhdfs_endpoints();

['https://asgardian-edge.fyre.ibm.com:8443/gateway/jalv-dsx121g-master-1/webhdfs/v1', 'https://durotar-edge.fyre.ibm.com:8443/gateway/jalv-dsx121g-master-1/webhdfs/v1']


- Use `dsx_core_utils.hdfs_util.upload_file` to upload a file from DSX to your HDFS desired path


In [39]:
dsxlocal_file_location="../scripts/quicken_demo_utils.py"
dsxhi_upload_hdfs_location="/user/user1/quicken_demo_utils.py"
webhdfs_endpoint="https://asgardian-edge.fyre.ibm.com:8443/gateway/jalv-dsx121g-master-1/webhdfs/v1"

dsx_core_utils.hdfs_util.upload_file(webhdfs_endpoint, dsxlocal_file_location, dsxhi_upload_hdfs_location)

upload success


### 4. Test the .py file via sc.addPyFile in a new Livy Session

- Delete the old session with `%spark cleanup`
- Create a new session with `%spark add`

In [51]:
%spark cleanup

An error was encountered:
Invalid status code '500' from https://asgardian-edge.fyre.ibm.com:8443/gateway/jalv-dsx121g-master-1/livy2/v1/sessions/456 with error payload: 


In [44]:
%spark add -s $session_name -l python -k -u $livy_endpoint

Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
456,application_1538175267044_0070,pyspark,idle,Link,Link,✔


SparkSession available as 'spark'.


- Add the "quicken-demo-utils.py" file to the spark context

In [45]:
%%spark
sc.addPyFile("hdfs:///user/user1/quicken_demo_utils.py")

In [46]:
%%spark
import quicken_demo_utils as utils

- Test the imported utils

In [47]:
%%spark -s $session_name

utils.run_command("hostname -f")
utils.run_command("pwd")

shad1.fyre.ibm.com

/hadoop/yarn/local/usercache/user1/appcache/application_1538175267044_0070/container_e25_1538175267044_0070_01_000001