# Sparkling Water and Moving Data Around

Sparkling Water is an application to integrate H2O with Spark. Below is an example showing how to move the data around among Pandas DataFrame, H2OFrame, and Spark Dataframe. 

### 1. Define Context 

In [84]:
from pandas import read_csv, DataFrame

In [85]:
from pyspark import sql

In [86]:
from pysparkling import H2OContext

In [87]:
from h2o import import_file, H2OFrame

In [88]:
ss = sql.SparkSession.builder.getOrCreate()

In [89]:
hc = H2OContext.getOrCreate(ss)


Sparkling Water Context:
 * H2O name: sparkling-water-suyog_local-1545929364584
 * cluster size: 1
 * list of used nodes:
  (executorId, host, port)
  ------------------------
  (driver,10.50.0.248,54321)
  ------------------------

  Open H2O Flow in browser: http://10.50.0.248:54321 (CMD + click in Mac OSX)

    


### 2. Convert Pandas Dataframe to H2OFrame and Spark DataFrame

In [106]:
p_df = read_csv("/home/suyog/github/Sparkling-Water-with-Python/data/sample.csv")

In [107]:
type(p_df)

pandas.core.frame.DataFrame

In [108]:
p2s_df = ss.createDataFrame(p_df)

In [109]:
type(p2s_df)

pyspark.sql.dataframe.DataFrame

In [110]:
p2h_df = H2OFrame(p_df)

Parse progress: |█████████████████████████████████████████████████████████| 100%


In [111]:
 type(p2h_df)

h2o.frame.H2OFrame

### 3. Convert Spark Dataframe to H2OFrame and Pandas DataFrame

In [112]:
ss

In [113]:
## sample.txt is in hdfs
s_df = ss.read.csv("data/sample.txt", header = False, inferSchema = True)

In [114]:
type(s_df)

pyspark.sql.dataframe.DataFrame

In [115]:
p2s_df = ss.createDataFrame(p_df)

In [116]:
type(p2s_df)

pyspark.sql.dataframe.DataFrame

In [117]:
p2h_df = H2OFrame(p_df)

Parse progress: |█████████████████████████████████████████████████████████| 100%


In [118]:
type(p2h_df)

h2o.frame.H2OFrame

### 4. Convert H2OFrame to Pandas Dataframe and Spark DataFrame

In [119]:
h_df = import_file("/home/suyog/github/Sparkling-Water-with-Python/data/sample.csv", header = 1, sep = ",")

Parse progress: |█████████████████████████████████████████████████████████| 100%


In [120]:
type(h_df)

h2o.frame.H2OFrame

In [121]:
 h2p_df = h_df.as_data_frame()

In [122]:
type(h2p_df)

pandas.core.frame.DataFrame

In [123]:
h2s_df = hc.as_spark_frame(h_df)

In [124]:
type(h2s_df)

pyspark.sql.dataframe.DataFrame

https://statcompute.wordpress.com/2017/07/03/sparkling-water-and-moving-data-around/