In [1]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

22/11/09 13:34:54 WARN Utils: Your hostname, kevin resolves to a loopback address: 127.0.1.1; using 192.168.1.6 instead (on interface wlp0s20f3)
22/11/09 13:34:54 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


22/11/09 13:34:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


# Format Models According to your Use Case
To preprocess data for Spark's different advanced analytics tools, you must consider your end objective:
* In the case of most classification and regression algorithms, you want to get your data into a column of type Double to represent the label and a column of type Vector (either dense or sparse) to represent tyhe features
* In the case of recommendation, you want to get yuour data into a column of users, a column of items (say movies or books), and a column of ratings
* In the acse of unsupervised learning,a  column of type Vector (either dense or sparse) is needed to represent the features
* In the case of graph analytics, you will want a dataframe of vertices and a dataframe of edges

In [2]:
sales = spark.read.format('csv')\
                .option('header', 'true')\
                .option('inferSchema', 'true')\
                .load('/home/kevin/Desktop/Big-Data-with-Pyspark/data/retail-data/by-day')\
                .coalesce(5)\
                .where('Description IS NOT NULL')

fakeIntDF = spark.read.parquet("/home/kevin/Desktop/Big-Data-with-Pyspark/data/simple-ml-integers")
simpleDF = spark.read.json('/home/kevin/Desktop/Big-Data-with-Pyspark/data/simple-ml')
scaleDF = spark.read.parquet('/home/kevin/Desktop/Big-Data-with-Pyspark/data/simple-ml-scaling')

                                                                                

In [4]:
sales.cache()
sales.show(5)

22/11/09 13:44:32 WARN CacheManager: Asked to cache already cached data.
+---------+---------+--------------------+--------+-------------------+---------+----------+--------------+
|InvoiceNo|StockCode|         Description|Quantity|        InvoiceDate|UnitPrice|CustomerID|       Country|
+---------+---------+--------------------+--------+-------------------+---------+----------+--------------+
|   580538|    23084|  RABBIT NIGHT LIGHT|      48|2011-12-05 08:38:00|     1.79|   14075.0|United Kingdom|
|   580538|    23077| DOUGHNUT LIP GLOSS |      20|2011-12-05 08:38:00|     1.25|   14075.0|United Kingdom|
|   580538|    22906|12 MESSAGE CARDS ...|      24|2011-12-05 08:38:00|     1.65|   14075.0|United Kingdom|
|   580538|    21914|BLUE HARMONICA IN...|      24|2011-12-05 08:38:00|     1.25|   14075.0|United Kingdom|
|   580538|    22467|   GUMBALL COAT RACK|       6|2011-12-05 08:38:00|     2.55|   14075.0|United Kingdom|
+---------+---------+--------------------+--------+------------

# Transformers
These are functions that convert raw data in some way. This might be to create a new interaction variable (from two other variables), to normalize a column, or to simply turn it into a double to be input into a model. Transformers are primarily used in preprocessing or feature generation. Spark's transformers only include a transform method. This is because it will not change based on the input data. 

<img src="/home/kevin/Desktop/Big-Data-with-Pyspark/images/08_spark_transformer.png">

On the left is the input dataframe with a new column representing the output transformation.
The tokenizer is an example of a transformer. It tokenizes a string, splitting on a given character, and has nothing to learn from our data; it simply applies a function. Here is an example of how a tokenizer is built to accept the input column, how it transforms the data, and then the output from that transformation

In [5]:
from pyspark.ml.feature import Tokenizer
tkn = Tokenizer().setInputCol('Description')
tkn.transform(sales.select('Description')).show(5)

+--------------------+------------------------------+
|         Description|Tokenizer_155d84a48675__output|
+--------------------+------------------------------+
|  RABBIT NIGHT LIGHT|          [rabbit, night, l...|
| DOUGHNUT LIP GLOSS |          [doughnut, lip, g...|
|12 MESSAGE CARDS ...|          [12, message, car...|
|BLUE HARMONICA IN...|          [blue, harmonica,...|
|   GUMBALL COAT RACK|          [gumball, coat, r...|
+--------------------+------------------------------+
only showing top 5 rows



# Estimators for Preprocessing
Another tool for preprocessing are estimators. An estimator is necessary when a transformation you would like to perform must be initialized with data or information about the input column. For example, if you wanted to scale the values in our column to have a mean of zero and unit variance, you would need to perform a pass over the entire data in order to calculate the values you would use to normalize the data to mean zero and unit variance. Here is an example of an estimator fitting to a particular input dataset, generating a transformer that is then applied to the input dataset to append a new column

<img src="/home/kevin/Desktop/Big-Data-with-Pyspark/images/09_spark_estimator.png">


An example of this is StandardScaler, which scales your input column according to the range of values in that column to have a zero mean and a variance of 1 in each dimension. For that reason, it must first perform a pass over the data to create the transformer

In [6]:
from pyspark.ml.feature import StandardScaler
scaler = StandardScaler().setInputCol('features')
scaler.fit(scaleDF).transform(scaleDF).show(5)

                                                                                

+---+--------------+-----------------------------------+
| id|      features|StandardScaler_1988935f6eaf__output|
+---+--------------+-----------------------------------+
|  0|[1.0,0.1,-1.0]|               [1.19522860933439...|
|  1| [2.0,1.1,1.0]|               [2.39045721866878...|
|  0|[1.0,0.1,-1.0]|               [1.19522860933439...|
|  1| [2.0,1.1,1.0]|               [2.39045721866878...|
|  1|[3.0,10.1,3.0]|               [3.58568582800318...|
+---+--------------+-----------------------------------+



## Transformer Properties
All transformers require you to specify, at a minimum, the inputCol and the outputCol, which represent the column name of the input and the output. 

# High-Level Transformers
Such as RFormula, allow you to concisely specify a number of transformations in one. These operate at a high level and allow you to avoid doing data manipulations or transformations one by one. In general, you should try to use the highest level transformers you can, in order to minimize the risk of error and help you to focus on the business problem instead of the smaller details of implementation

## RFormula
This is the easist transformer to use when you have conventionally formatted data. Spark borrows this transformer from the R language to make it simple to declaratively specify a set of transformations for your data. With this transformer, values can be either numerical or categorical and you do not need to extract values from strings or manipulate them in any way. 
The RFormula will automatically handle categorical inputs by performing one0hot encoding. With the RFormula, numeric columns will be cast to Double but will not be one-hot encoded. If the label column is of type String, it will be transformed to Double with StringIndexer


## SQL Transformers
It allows you to leverage sparks vast library of SQL-related manipulations. Any SELECT statement you can use in SQL is a valid transformation. The only thing you need to change is that instead of using the table name, you should use the keyword THIS. You might want to use SQLTransformer if you want to formally codify some DataFrame manipulation as a preprocessing step, or try different SQL expressions for features during hyperparameter tuning
You might want to use an SQLTransformer in order to represent all of your manipulations on the very rawest form of your data so that you can version different variations of manipulations as transformers. This gives you the benefit of building and testing varying pipelines, all by simply swapping out transformers

In [8]:
from pyspark.ml.feature import SQLTransformer

basicTransformation = SQLTransformer()\
                            .setStatement(
                                """
                                SELECT sum(Quantity), count(*), CustomerID
                                FROM __THIS__
                                GROUP BY CustomerID
                                """
                            )

basicTransformation.transform(sales).show(5)

+-------------+--------+----------+
|sum(Quantity)|count(1)|CustomerID|
+-------------+--------+----------+
|          119|      62|   14452.0|
|          440|     143|   16916.0|
|          630|      72|   17633.0|
|           34|       6|   14768.0|
|         1542|      30|   13094.0|
+-------------+--------+----------+
only showing top 5 rows



## VectorAssembler
This is a tool you use in nearly every single pipeline you create. It helps concatenate all your features unto one big vector you can then pass to an estimator. It's used typically in the last step of a machine learning pipeline and takes as input a number of columns of Boolean, Double or Vector. This is particularly useful if you're going to perform a number of manipulations using a variety of transformers and need to gather all of those results together

In [9]:
from pyspark.ml.feature import VectorAssembler
assembler = VectorAssembler().setInputCols(['int1', 'int2', 'int3'])

assembler.transform(fakeIntDF).show(5)

+----+----+----+------------------------------------+
|int1|int2|int3|VectorAssembler_526ede91cf85__output|
+----+----+----+------------------------------------+
|   7|   8|   9|                       [7.0,8.0,9.0]|
|   4|   5|   6|                       [4.0,5.0,6.0]|
|   1|   2|   3|                       [1.0,2.0,3.0]|
+----+----+----+------------------------------------+



# Working with Continuous Features
There are two common transformers for continuous features. First, you can convert continuous features into categorical featuyres via a process called bucketing, or you can scale and normalize your features according to several different requirements. These transformers will only work on Double types

In [10]:
contDF = spark.range(30).selectExpr('cast(id as double)')

## Bucketing
The most straightforward approach to bucketing or binning is using a Bucketizer. This will split given continuous feature into the buckets of your designation. You specify how buckets should be created via an array or list of Double values. This is useful because you may wanty to simplify the features in your dataset or simplify their representations for interpretation later on. 

In [12]:
from pyspark.ml.feature import Bucketizer

bucketBorders = [-1.0, 5.0, 10.0, 250.0, 600.0]
bucketer = Bucketizer().setSplits(bucketBorders).setInputCol('id')
bucketer.transform(contDF).show()

+----+-------------------------------+
|  id|Bucketizer_173d5a4cb8ba__output|
+----+-------------------------------+
| 0.0|                            0.0|
| 1.0|                            0.0|
| 2.0|                            0.0|
| 3.0|                            0.0|
| 4.0|                            0.0|
| 5.0|                            1.0|
| 6.0|                            1.0|
| 7.0|                            1.0|
| 8.0|                            1.0|
| 9.0|                            1.0|
|10.0|                            2.0|
|11.0|                            2.0|
|12.0|                            2.0|
|13.0|                            2.0|
|14.0|                            2.0|
|15.0|                            2.0|
|16.0|                            2.0|
|17.0|                            2.0|
|18.0|                            2.0|
|19.0|                            2.0|
+----+-------------------------------+
only showing top 20 rows



In [14]:
from pyspark.ml.feature import QuantileDiscretizer
bucketer = QuantileDiscretizer().setNumBuckets(5).setInputCol('id')
fittedBucket = bucketer.fit(contDF)
fittedBucket.transform(contDF).show()

+----+----------------------------------------+
|  id|QuantileDiscretizer_9ea170bd25aa__output|
+----+----------------------------------------+
| 0.0|                                     0.0|
| 1.0|                                     0.0|
| 2.0|                                     0.0|
| 3.0|                                     0.0|
| 4.0|                                     0.0|
| 5.0|                                     1.0|
| 6.0|                                     1.0|
| 7.0|                                     1.0|
| 8.0|                                     1.0|
| 9.0|                                     1.0|
|10.0|                                     1.0|
|11.0|                                     2.0|
|12.0|                                     2.0|
|13.0|                                     2.0|
|14.0|                                     2.0|
|15.0|                                     2.0|
|16.0|                                     2.0|
|17.0|                                  

## Scaling and Normalization

### StandardScaler
It standardizes a set of features to have zero mean and a standard deviation of 1. The flag withStd will scale the data to unit standard deviation while the flagWithMean will center the data prior to scaling it

In [22]:
from pyspark.ml.feature import StandardScaler, MinMaxScaler, MaxAbsScaler, ElementwiseProduct, Normalizer

sScaler = StandardScaler().setInputCol('features')
sScaler.fit(scaleDF).transform(scaleDF).show()

+---+--------------+-----------------------------------+
| id|      features|StandardScaler_c33ab4c5ee12__output|
+---+--------------+-----------------------------------+
|  0|[1.0,0.1,-1.0]|               [1.19522860933439...|
|  1| [2.0,1.1,1.0]|               [2.39045721866878...|
|  0|[1.0,0.1,-1.0]|               [1.19522860933439...|
|  1| [2.0,1.1,1.0]|               [2.39045721866878...|
|  1|[3.0,10.1,3.0]|               [3.58568582800318...|
+---+--------------+-----------------------------------+



### MinmaxScaler
This will scale the values in a vector(component wise) to the proportional values on a scale from a given min value. If you specify the minimum value to be 0 and the maximum value to be 1, then the values will fall in between 0 and 1

In [17]:
minMax = MinMaxScaler().setMin(5).setMax(10).setInputCol('features')
fittedMinMax = minMax.fit(scaleDF)
fittedMinMax.transform(scaleDF).show(5)

+---+--------------+---------------------------------+
| id|      features|MinMaxScaler_59e48dff86ce__output|
+---+--------------+---------------------------------+
|  0|[1.0,0.1,-1.0]|                    [5.0,5.0,5.0]|
|  1| [2.0,1.1,1.0]|                    [7.5,5.5,7.5]|
|  0|[1.0,0.1,-1.0]|                    [5.0,5.0,5.0]|
|  1| [2.0,1.1,1.0]|                    [7.5,5.5,7.5]|
|  1|[3.0,10.1,3.0]|                 [10.0,10.0,10.0]|
+---+--------------+---------------------------------+



### MaxAbsScaler
This scales the data by dividing each value by the maximum absolute value of this feature. All values end up between -1 and 1. This transformer does not shift or center the data at all in the process

In [23]:
scaler = MaxAbsScaler().setInputCol('features')
fittedma = scaler.fit(scaleDF)
fittedma.transform(scaleDF).show(5)

+---+--------------+---------------------------------+
| id|      features|MaxAbsScaler_a1db2be59c62__output|
+---+--------------+---------------------------------+
|  0|[1.0,0.1,-1.0]|             [0.33333333333333...|
|  1| [2.0,1.1,1.0]|             [0.66666666666666...|
|  0|[1.0,0.1,-1.0]|             [0.33333333333333...|
|  1| [2.0,1.1,1.0]|             [0.66666666666666...|
|  1|[3.0,10.1,3.0]|                    [1.0,1.0,1.0]|
+---+--------------+---------------------------------+



### ElementwiseProduct
It allows us to scale each value in a vector by an arbitrary value. For example, given the vector below and the row "1,0.1,-1" the output will be "10,1.5, -20". Naturally, the dimensions of the scaling vector must match the dimensions of the vector inside the relevant column

In [25]:
from pyspark.ml.linalg import Vectors
scaleUpVec = Vectors.dense(10.0, 15.0, 20.0)
scalingUp = ElementwiseProduct()\
                    .setScalingVec(scaleUpVec)\
                    .setInputCol('features')

scalingUp.transform(scaleDF).show()

+---+--------------+---------------------------------------+
| id|      features|ElementwiseProduct_eb831ff4289b__output|
+---+--------------+---------------------------------------+
|  0|[1.0,0.1,-1.0]|                       [10.0,1.5,-20.0]|
|  1| [2.0,1.1,1.0]|                       [20.0,16.5,20.0]|
|  0|[1.0,0.1,-1.0]|                       [10.0,1.5,-20.0]|
|  1| [2.0,1.1,1.0]|                       [20.0,16.5,20.0]|
|  1|[3.0,10.1,3.0]|                      [30.0,151.5,60.0]|
+---+--------------+---------------------------------------+



### Normalizer
The normalizer allows us to scale multidimensional vectors using on of several power norms

In [26]:
manhattanDistance = Normalizer().setP(1).setInputCol('features')
manhattanDistance.transform(scaleDF).show()

+---+--------------+-------------------------------+
| id|      features|Normalizer_a4e74044116c__output|
+---+--------------+-------------------------------+
|  0|[1.0,0.1,-1.0]|           [0.47619047619047...|
|  1| [2.0,1.1,1.0]|           [0.48780487804878...|
|  0|[1.0,0.1,-1.0]|           [0.47619047619047...|
|  1| [2.0,1.1,1.0]|           [0.48780487804878...|
|  1|[3.0,10.1,3.0]|           [0.18633540372670...|
+---+--------------+-------------------------------+



# Working with Categorical Features

## StringIndexer
The simplest way to index is via the StringIndexer, which maps strings to different numerical IDs. Spark's StringIndexer also creates metadata attached to the DataFrame that specify what inputs correspond to what outputs

In [27]:
from pyspark.ml.feature import StringIndexer

indexer = StringIndexer().setInputCol('lab').setOutputCol('labelInd')
idxRes = indexer.fit(simpleDF).transform(simpleDF)
idxRes.show(5)

+-----+----+------+------------------+--------+
|color| lab|value1|            value2|labelInd|
+-----+----+------+------------------+--------+
|green|good|     1|14.386294994851129|     1.0|
| blue| bad|     8|14.386294994851129|     0.0|
| blue| bad|    12|14.386294994851129|     0.0|
|green|good|    15| 38.97187133755819|     1.0|
|green|good|    12|14.386294994851129|     1.0|
+-----+----+------+------------------+--------+
only showing top 5 rows



In [29]:
valIndexer = StringIndexer().setInputCol('value1').setOutputCol('valueInd')
valIndexer.fit(simpleDF).transform(simpleDF).show()

+-----+----+------+------------------+--------+
|color| lab|value1|            value2|valueInd|
+-----+----+------+------------------+--------+
|green|good|     1|14.386294994851129|     0.0|
| blue| bad|     8|14.386294994851129|     7.0|
| blue| bad|    12|14.386294994851129|     1.0|
|green|good|    15| 38.97187133755819|     3.0|
|green|good|    12|14.386294994851129|     1.0|
|green| bad|    16|14.386294994851129|     2.0|
|  red|good|    35|14.386294994851129|     5.0|
|  red| bad|     1| 38.97187133755819|     0.0|
|  red| bad|     2|14.386294994851129|     4.0|
|  red| bad|    16|14.386294994851129|     2.0|
|  red|good|    45| 38.97187133755819|     6.0|
|green|good|     1|14.386294994851129|     0.0|
| blue| bad|     8|14.386294994851129|     7.0|
| blue| bad|    12|14.386294994851129|     1.0|
|green|good|    15| 38.97187133755819|     3.0|
|green|good|    12|14.386294994851129|     1.0|
|green| bad|    16|14.386294994851129|     2.0|
|  red|good|    35|14.386294994851129|  

## Converting Indexed Values Back to Text
When inspecting your machine learning results, you're likely going to want back to the original values. Since MLlib classifcation models make predictions using the indexed values this conversion is useful for converting model predictions (indices) back to the original categories. We can do this with IndexToString.

In [30]:
from pyspark.ml.feature import IndexToString

labelReverse = IndexToString().setInputCol('labelInd')
labelReverse.transform(idxRes).show(5)

+-----+----+------+------------------+--------+----------------------------------+
|color| lab|value1|            value2|labelInd|IndexToString_f3a602786c2c__output|
+-----+----+------+------------------+--------+----------------------------------+
|green|good|     1|14.386294994851129|     1.0|                              good|
| blue| bad|     8|14.386294994851129|     0.0|                               bad|
| blue| bad|    12|14.386294994851129|     0.0|                               bad|
|green|good|    15| 38.97187133755819|     1.0|                              good|
|green|good|    12|14.386294994851129|     1.0|                              good|
+-----+----+------+------------------+--------+----------------------------------+
only showing top 5 rows



# Text Data Transformers

## Tokenizing Text


In [33]:
from pyspark.ml.feature import Tokenizer, RegexTokenizer, StopWordsRemover, NGram, CountVectorizer, HashingTF, IDF, Word2Vec


In [31]:

token = Tokenizer().setInputCol('Description').setOutputCol('Tokens')
tokenized = token.transform(sales.select('Description'))
tokenized.show(10, False)

+-------------------------------+-------------------------------------+
|Description                    |Tokens                               |
+-------------------------------+-------------------------------------+
|RABBIT NIGHT LIGHT             |[rabbit, night, light]               |
|DOUGHNUT LIP GLOSS             |[doughnut, lip, gloss]               |
|12 MESSAGE CARDS WITH ENVELOPES|[12, message, cards, with, envelopes]|
|BLUE HARMONICA IN BOX          |[blue, harmonica, in, box]           |
|GUMBALL COAT RACK              |[gumball, coat, rack]                |
|SKULLS  WATER TRANSFER TATTOOS |[skulls, , water, transfer, tattoos] |
|FELTCRAFT GIRL AMELIE KIT      |[feltcraft, girl, amelie, kit]       |
|CAMOUFLAGE LED TORCH           |[camouflage, led, torch]             |
|WHITE SKULL HOT WATER BOTTLE   |[white, skull, hot, water, bottle]   |
|ENGLISH ROSE HOT WATER BOTTLE  |[english, rose, hot, water, bottle]  |
+-------------------------------+-------------------------------

In [34]:
regex = RegexTokenizer()\
            .setInputCol('Description')\
            .setOutputCol('Tokens')\
            .setPattern(" ")\
            .setToLowercase(True)


regex.transform(sales.select('Description')).show(10, False)

+-------------------------------+-------------------------------------+
|Description                    |Tokens                               |
+-------------------------------+-------------------------------------+
|RABBIT NIGHT LIGHT             |[rabbit, night, light]               |
|DOUGHNUT LIP GLOSS             |[doughnut, lip, gloss]               |
|12 MESSAGE CARDS WITH ENVELOPES|[12, message, cards, with, envelopes]|
|BLUE HARMONICA IN BOX          |[blue, harmonica, in, box]           |
|GUMBALL COAT RACK              |[gumball, coat, rack]                |
|SKULLS  WATER TRANSFER TATTOOS |[skulls, water, transfer, tattoos]   |
|FELTCRAFT GIRL AMELIE KIT      |[feltcraft, girl, amelie, kit]       |
|CAMOUFLAGE LED TORCH           |[camouflage, led, torch]             |
|WHITE SKULL HOT WATER BOTTLE   |[white, skull, hot, water, bottle]   |
|ENGLISH ROSE HOT WATER BOTTLE  |[english, rose, hot, water, bottle]  |
+-------------------------------+-------------------------------

## Removing Common Words

In [35]:
stopWords = StopWordsRemover.loadDefaultStopWords('english')
stops = StopWordsRemover()\
                    .setStopWords(stopWords)\
                        .setInputCol('Tokens')

stops.transform(tokenized).show(5)

+--------------------+--------------------+-------------------------------------+
|         Description|              Tokens|StopWordsRemover_34be675e390b__output|
+--------------------+--------------------+-------------------------------------+
|  RABBIT NIGHT LIGHT|[rabbit, night, l...|                 [rabbit, night, l...|
| DOUGHNUT LIP GLOSS |[doughnut, lip, g...|                 [doughnut, lip, g...|
|12 MESSAGE CARDS ...|[12, message, car...|                 [12, message, car...|
|BLUE HARMONICA IN...|[blue, harmonica,...|                 [blue, harmonica,...|
|   GUMBALL COAT RACK|[gumball, coat, r...|                 [gumball, coat, r...|
+--------------------+--------------------+-------------------------------------+
only showing top 5 rows



## Creating Word Combinations

In [37]:
unigram = NGram().setInputCol('Tokens').setN(1)
bigram = NGram().setInputCol('Tokens').setN(2)

bigram.transform(tokenized.select('Tokens')).show(5)

+--------------------+--------------------------+
|              Tokens|NGram_bb8d152d9182__output|
+--------------------+--------------------------+
|[rabbit, night, l...|      [rabbit night, ni...|
|[doughnut, lip, g...|      [doughnut lip, li...|
|[12, message, car...|      [12 message, mess...|
|[blue, harmonica,...|      [blue harmonica, ...|
|[gumball, coat, r...|      [gumball coat, co...|
+--------------------+--------------------------+
only showing top 5 rows



## Converting Words into Numerical Representations

In [39]:
cv = CountVectorizer()\
            .setInputCol('Tokens')\
            .setOutputCol('countVec')\
            .setVocabSize(500)\
            .setMinTF(1)\
            .setMinDF(2)


fittedCV = cv.fit(tokenized)
fittedCV.transform(tokenized).show(5)

                                                                                

+--------------------+--------------------+--------------------+
|         Description|              Tokens|            countVec|
+--------------------+--------------------+--------------------+
|  RABBIT NIGHT LIGHT|[rabbit, night, l...|(500,[150,185,212...|
| DOUGHNUT LIP GLOSS |[doughnut, lip, g...|(500,[462,463,491...|
|12 MESSAGE CARDS ...|[12, message, car...|(500,[35,41,166],...|
|BLUE HARMONICA IN...|[blue, harmonica,...|(500,[10,16,36,35...|
|   GUMBALL COAT RACK|[gumball, coat, r...|(500,[228,281,407...|
+--------------------+--------------------+--------------------+
only showing top 5 rows



# Feature Selection