> In case of error: `You must build Spark with Hive. Export 'SPARK_HIVE=true'` 
* Then, create a new `SQLContext` object before using `sqlContext`. This error happens if there are multiple notebooks using the same out-of-box `sqlContext`.

* `from pyspark.sql import SQLContext`
* `sqlContext = SQLContext(sc)`

In [8]:
%matplotlib inline
import pandas as pd
from pyspark.sql import SQLContext, Row
from pyspark.mllib.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
sqlContext=SQLContext(sc)

sc.master

'local[*]'

In [1]:
# What version of Spark
print(sc.version)

# How many cores do I have?
wrks = ! echo %NUMBER_OF_PROCESSORS%
print( wrks[0] + ' cores /workers')

8 cores /workers


> `user_id, hour, mobile, userFeatures` are predictors `X` while `clicked` is the prediction or observation variable `Y` which we want to predict for each user `ID`. We compose all four into single varible `features`.

In [4]:
# Compose all features into single `feature vector` for each user
usr_features = [(0, 18, 1.0, Vectors.dense([0.0, 10.0, 0.5]), 1.0), 
                (1, 11, 0.5, Vectors.dense([0.3, 14.5, 7.5]), 0.0)]
usr_features

[(0, 18, 1.0, DenseVector([0.0, 10.0, 0.5]), 1.0),
 (1, 11, 0.5, DenseVector([0.3, 14.5, 7.5]), 0.0)]

In [9]:
# Create the Dataframes with individual schema (column header).
dataset = sqlContext.createDataFrame(usr_features,
                                     ["id", "hour", "mobile", "userFeatures", "clicked"])
dataset.collect()

[Row(id=0, hour=18, mobile=1.0, userFeatures=DenseVector([0.0, 10.0, 0.5]), clicked=1.0),
 Row(id=1, hour=11, mobile=0.5, userFeatures=DenseVector([0.3, 14.5, 7.5]), clicked=0.0)]

In [10]:
# 
assembler = VectorAssembler(inputCols=["hour", "mobile", "userFeatures"], outputCol="features")
output = assembler.transform(dataset)

print("Final Dataset: Columns 'hour', 'mobile', 'userFeatures' assembled to vector 'features'")
output.select("features", "clicked").show(truncate=False)

Final Dataset: Columns 'hour', 'mobile', 'userFeatures' assembled to vector 'features'
+-----------------------+-------+
|features               |clicked|
+-----------------------+-------+
|[18.0,1.0,0.0,10.0,0.5]|1.0    |
|[11.0,0.5,0.3,14.5,7.5]|0.0    |
+-----------------------+-------+

