This FINAL homework recaps the Spark ML library:

0) Download the "Rain in Australia" dataset from Kaggle (it is also attached to this assigbnment): https://www.kaggle.com/jsphyg/weather-dataset-rattle-package

1) In a Jupyter Notebook, write a SparkML script that uses a Decision Tree Classifier to predict the RainTomorrow target varible

2) Split the data 80/20 train/test, using a seed of 12345

3) Use transformers to remove unnecessary columns (use your best judgement) and convert categorical variables into one-hot encoded variables

4) Use a parameter grid to determine the best parameters for: impurity - gini, entropy maxBins - 5, 10, 15 minInfoGain - 0.0, 0.2, 0.4 maxDepth - 3, 5, 7

5) Cross-validate with 4 folds

6) Use a pipeline to encapsulate all steps

7) Print the parameters from the best model selected

8) Calculate and print the Area under ROC Curve and Area under Precision-Recall Curve scores for your training and test data sets (these are built-in metrics, you do not need to calculate anything by hand)

Your script should be clean of all the testing and exploration and should only contain the necessary code to satisfy the above conditions

## library load

In [8]:
from pyspark.sql import SparkSession
from pyspark.ml import Pipeline
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.feature import StringIndexer, VectorIndexer
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml.feature import RFormula
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.mllib.evaluation import BinaryClassificationMetrics

## create data frame

In [9]:
spark = SparkSession.builder.getOrCreate()
df=spark.read.option("header", "true").csv("weatherAUS.csv")
df.show(5)

+----------+--------+-------+-------+--------+-----------+--------+-----------+-------------+----------+----------+------------+------------+-----------+-----------+-----------+-----------+--------+--------+-------+-------+---------+------------+
|      Date|Location|MinTemp|MaxTemp|Rainfall|Evaporation|Sunshine|WindGustDir|WindGustSpeed|WindDir9am|WindDir3pm|WindSpeed9am|WindSpeed3pm|Humidity9am|Humidity3pm|Pressure9am|Pressure3pm|Cloud9am|Cloud3pm|Temp9am|Temp3pm|RainToday|RainTomorrow|
+----------+--------+-------+-------+--------+-----------+--------+-----------+-------------+----------+----------+------------+------------+-----------+-----------+-----------+-----------+--------+--------+-------+-------+---------+------------+
|2008-12-01|  Albury|   13.4|   22.9|     0.6|         NA|      NA|          W|           44|         W|       WNW|          20|          24|         71|         22|     1007.7|     1007.1|       8|      NA|   16.9|   21.8|       No|          No|
|2008-12-02|

## I saw some NA block in the csv, so I drop them

In [10]:
col_list=df.toPandas().columns.values.tolist()
df.select ("RainTomorrow").distinct().collect()
df_clean=df.where(df["RainTomorrow"] != "NA")


## using Rformula to read data
http://zwmiller.com/projects/spark_ml_example_part2.html

In [11]:
formula = RFormula(formula="RainTomorrow ~ .",featuresCol="features",labelCol="label")
output = formula.fit(df_clean).transform(df_clean)

### A label indexer that maps a string column of labels to an ML column of label indices. If the input column is numeric, we cast it to string and index the string values.
### Class for indexing categorical feature columns in a dataset of Vector.
### follow the requirement to split the data


In [12]:
labelIndexer = StringIndexer(inputCol="label", outputCol="indexedLabel").fit(output)
featureIndexer = (VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=2).fit(output))
(trainingData, testData) = output.randomSplit([0.8, 0.2], seed=12345)





22/04/27 23:44:30 WARN DAGScheduler: Broadcasting large task binary with size 1494.4 KiB
                                                                                

### DecisionTreeClassifier created
### pipline created

In [13]:
dt = DecisionTreeClassifier(labelCol="indexedLabel", featuresCol="indexedFeatures")
pipeline = Pipeline(stages=[labelIndexer, featureIndexer, dt])

In [14]:
evaluator = MulticlassClassificationEvaluator(
    labelCol="indexedLabel", predictionCol="prediction", metricName="accuracy")

### setting up the require vaule

In [15]:
grid = ParamGridBuilder().addGrid(dt.impurity, ["gini", "entropy"]).addGrid(dt.maxBins, [5, 10, 15]).addGrid(dt.minInfoGain, [0.0, 0.2, 0.4]).addGrid(dt.maxDepth, [3, 5, 7]).build()

In [16]:
cv= CrossValidator(estimator=pipeline, evaluator=evaluator, estimatorParamMaps=grid, numFolds=4)

## training

In [10]:
cvModel = cv.fit(trainingData)

22/04/27 10:58:32 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 10:58:35 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 10:58:38 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 10:58:44 WARN MemoryStore: Not enough space to cache rdd_275_3 in memory! (computed 67.5 MiB so far)
22/04/27 10:58:44 WARN BlockManager: Persisting block rdd_275_3 to disk instead.
22/04/27 10:58:44 WARN MemoryStore: Not enough space to cache rdd_275_0 in memory! (computed 103.7 MiB so far)
22/04/27 10:58:44 WARN BlockManager: Persisting block rdd_275_0 to disk instead.
22/04/27 10:58:44 WARN MemoryStore: Not enough space to cache rdd_275_1 in memory! (computed 103.7 MiB so far)
22/04/27 10:58:44 WARN BlockManager: Persisting block rdd_275_1 to disk instead.
22/04/27 10:58:45 WARN MemoryStore: Not enough space to cache rdd_275_2 in memory! (computed 355.0 MiB so far)
22/04/27 10:58:45 WARN BlockManager: Persisting block rd

22/04/27 10:59:24 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 10:59:24 WARN MemoryStore: Not enough space to cache rdd_368_2 in memory! (computed 103.7 MiB so far)
22/04/27 10:59:24 WARN MemoryStore: Not enough space to cache rdd_368_3 in memory! (computed 103.7 MiB so far)
22/04/27 10:59:24 WARN MemoryStore: Not enough space to cache rdd_368_1 in memory! (computed 103.7 MiB so far)
22/04/27 10:59:24 WARN MemoryStore: Not enough space to cache rdd_368_0 in memory! (computed 103.7 MiB so far)
22/04/27 10:59:26 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 10:59:26 WARN MemoryStore: Not enough space to cache rdd_368_0 in memory! (computed 103.7 MiB so far)
22/04/27 10:59:26 WARN MemoryStore: Not enough space to cache rdd_368_2 in memory! (computed 103.7 MiB so far)
22/04/27 10:59:26 WARN MemoryStore: Not enough space to cache rdd_368_3 in memory! (computed 103.7 MiB so far)
22/04/27 10:59:26 WARN MemoryStore: Not enough spac

22/04/27 11:00:05 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:00:05 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:00:06 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:00:06 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:00:06 WARN MemoryStore: Not enough space to cache rdd_561_3 in memory! (computed 67.5 MiB so far)
22/04/27 11:00:06 WARN BlockManager: Persisting block rdd_561_3 to disk instead.
22/04/27 11:00:07 WARN MemoryStore: Not enough space to cache rdd_561_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:00:07 WARN BlockManager: Persisting block rdd_561_0 to disk instead.
22/04/27 11:00:07 WARN MemoryStore: Not enough space to cache rdd_561_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:00:07 WARN BlockManager: Persisting block rdd_561_2 to disk instead.
22/04/27 11:00:07 WARN MemoryStore: Not enough space to cache rdd_561_1 in memory

22/04/27 11:00:40 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:00:40 WARN MemoryStore: Not enough space to cache rdd_672_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:00:40 WARN MemoryStore: Not enough space to cache rdd_672_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:00:40 WARN MemoryStore: Not enough space to cache rdd_672_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:00:40 WARN MemoryStore: Not enough space to cache rdd_672_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:00:42 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:00:43 WARN MemoryStore: Not enough space to cache rdd_672_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:00:43 WARN MemoryStore: Not enough space to cache rdd_672_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:00:43 WARN MemoryStore: Not enough space to cache rdd_672_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:00:43 WARN MemoryStore: Not enough spac

22/04/27 11:01:14 WARN MemoryStore: Not enough space to cache rdd_807_0 in memory! (computed 355.0 MiB so far)
22/04/27 11:01:14 WARN BlockManager: Persisting block rdd_807_0 to disk instead.
22/04/27 11:01:15 WARN MemoryStore: Not enough space to cache rdd_807_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:01:17 WARN MemoryStore: Not enough space to cache rdd_807_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:01:17 WARN MemoryStore: Not enough space to cache rdd_807_0 in memory! (computed 155.9 MiB so far)
22/04/27 11:01:17 WARN MemoryStore: Not enough space to cache rdd_807_1 in memory! (computed 355.2 MiB so far)
22/04/27 11:01:19 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:01:19 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:01:19 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:01:20 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:01:20 WAR

22/04/27 11:01:56 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:01:57 WARN MemoryStore: Not enough space to cache rdd_982_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:01:57 WARN MemoryStore: Not enough space to cache rdd_982_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:01:57 WARN MemoryStore: Not enough space to cache rdd_982_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:01:57 WARN MemoryStore: Not enough space to cache rdd_982_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:01:58 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:01:59 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:01:59 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:02:00 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:02:00 WARN MemoryStore: Not enough space to cache rdd_1023_2 in memory! (computed 67.5 MiB so far)
22/04/27 11:02:

22/04/27 11:02:35 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:02:36 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:02:36 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:02:37 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:02:37 WARN MemoryStore: Not enough space to cache rdd_1123_2 in memory! (computed 67.5 MiB so far)
22/04/27 11:02:37 WARN BlockManager: Persisting block rdd_1123_2 to disk instead.
22/04/27 11:02:37 WARN MemoryStore: Not enough space to cache rdd_1123_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:02:37 WARN BlockManager: Persisting block rdd_1123_1 to disk instead.
22/04/27 11:02:37 WARN MemoryStore: Not enough space to cache rdd_1123_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:02:37 WARN BlockManager: Persisting block rdd_1123_3 to disk instead.
22/04/27 11:02:38 WARN MemoryStore: Not enough space to cache rdd_1123_0 in

22/04/27 11:03:15 WARN MemoryStore: Not enough space to cache rdd_1298_2 in memory! (computed 355.0 MiB so far)
22/04/27 11:03:15 WARN BlockManager: Persisting block rdd_1298_2 to disk instead.
22/04/27 11:03:15 WARN MemoryStore: Not enough space to cache rdd_1298_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:03:18 WARN MemoryStore: Not enough space to cache rdd_1298_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:03:18 WARN MemoryStore: Not enough space to cache rdd_1298_2 in memory! (computed 155.9 MiB so far)
22/04/27 11:03:18 WARN MemoryStore: Not enough space to cache rdd_1298_0 in memory! (computed 155.9 MiB so far)
22/04/27 11:03:19 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:03:20 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:03:20 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:03:21 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:03:

22/04/27 11:03:51 WARN MemoryStore: Not enough space to cache rdd_1421_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:03:51 WARN MemoryStore: Not enough space to cache rdd_1421_0 in memory! (computed 155.9 MiB so far)
22/04/27 11:03:51 WARN MemoryStore: Not enough space to cache rdd_1421_1 in memory! (computed 355.2 MiB so far)
22/04/27 11:03:53 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:03:53 WARN MemoryStore: Not enough space to cache rdd_1421_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:03:53 WARN MemoryStore: Not enough space to cache rdd_1421_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:03:53 WARN MemoryStore: Not enough space to cache rdd_1421_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:03:53 WARN MemoryStore: Not enough space to cache rdd_1421_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:03:55 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:03:55 WARN MemoryStore: Not enou

22/04/27 11:04:31 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:04:31 WARN MemoryStore: Not enough space to cache rdd_1579_3 in memory! (computed 67.5 MiB so far)
22/04/27 11:04:31 WARN BlockManager: Persisting block rdd_1579_3 to disk instead.
22/04/27 11:04:32 WARN MemoryStore: Not enough space to cache rdd_1579_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:04:32 WARN BlockManager: Persisting block rdd_1579_1 to disk instead.
22/04/27 11:04:32 WARN MemoryStore: Not enough space to cache rdd_1579_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:04:32 WARN BlockManager: Persisting block rdd_1579_2 to disk instead.
22/04/27 11:04:32 WARN MemoryStore: Not enough space to cache rdd_1579_0 in memory! (computed 355.0 MiB so far)
22/04/27 11:04:32 WARN BlockManager: Persisting block rdd_1579_0 to disk instead.
22/04/27 11:04:33 WARN MemoryStore: Not enough space to cache rdd_1579_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:04:35 WARN Memor

22/04/27 11:05:04 WARN MemoryStore: Not enough space to cache rdd_1725_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:05:06 WARN MemoryStore: Not enough space to cache rdd_1725_0 in memory! (computed 155.9 MiB so far)
22/04/27 11:05:06 WARN MemoryStore: Not enough space to cache rdd_1725_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:05:06 WARN MemoryStore: Not enough space to cache rdd_1725_2 in memory! (computed 155.9 MiB so far)
22/04/27 11:05:07 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:05:07 WARN MemoryStore: Not enough space to cache rdd_1725_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:05:07 WARN MemoryStore: Not enough space to cache rdd_1725_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:05:07 WARN MemoryStore: Not enough space to cache rdd_1725_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:05:07 WARN MemoryStore: Not enough space to cache rdd_1725_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:05:09 W

22/04/27 11:05:40 WARN MemoryStore: Not enough space to cache rdd_1825_1 in memory! (computed 355.0 MiB so far)
22/04/27 11:05:40 WARN BlockManager: Persisting block rdd_1825_1 to disk instead.
22/04/27 11:05:41 WARN MemoryStore: Not enough space to cache rdd_1825_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:05:43 WARN MemoryStore: Not enough space to cache rdd_1825_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:05:43 WARN MemoryStore: Not enough space to cache rdd_1825_0 in memory! (computed 155.9 MiB so far)
22/04/27 11:05:44 WARN MemoryStore: Not enough space to cache rdd_1825_1 in memory! (computed 355.2 MiB so far)
22/04/27 11:05:45 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:05:46 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:05:46 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:05:47 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:05:

22/04/27 11:06:56 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:06:56 WARN MemoryStore: Not enough space to cache rdd_2123_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:06:56 WARN MemoryStore: Not enough space to cache rdd_2123_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:06:56 WARN MemoryStore: Not enough space to cache rdd_2123_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:06:56 WARN MemoryStore: Not enough space to cache rdd_2123_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:06:57 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:06:58 WARN MemoryStore: Not enough space to cache rdd_2123_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:06:58 WARN MemoryStore: Not enough space to cache rdd_2123_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:06:58 WARN MemoryStore: Not enough space to cache rdd_2123_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:06:58 WARN MemoryStore: Not enou

22/04/27 11:07:33 WARN MemoryStore: Not enough space to cache rdd_2281_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:07:33 WARN MemoryStore: Not enough space to cache rdd_2281_2 in memory! (computed 155.9 MiB so far)
22/04/27 11:07:33 WARN MemoryStore: Not enough space to cache rdd_2281_0 in memory! (computed 155.9 MiB so far)
22/04/27 11:07:35 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:07:36 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:07:36 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:07:36 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:07:37 WARN MemoryStore: Not enough space to cache rdd_2316_0 in memory! (computed 67.5 MiB so far)
22/04/27 11:07:37 WARN BlockManager: Persisting block rdd_2316_0 to disk instead.
22/04/27 11:07:37 WARN MemoryStore: Not enough space to cache rdd_2316_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:07:

22/04/27 11:08:18 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:08:18 WARN MemoryStore: Not enough space to cache rdd_2437_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:08:18 WARN MemoryStore: Not enough space to cache rdd_2437_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:08:18 WARN MemoryStore: Not enough space to cache rdd_2437_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:08:18 WARN MemoryStore: Not enough space to cache rdd_2437_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:08:20 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:08:20 WARN MemoryStore: Not enough space to cache rdd_2437_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:08:20 WARN MemoryStore: Not enough space to cache rdd_2437_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:08:20 WARN MemoryStore: Not enough space to cache rdd_2437_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:08:20 WARN MemoryStore: Not enou

22/04/27 11:08:53 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:08:53 WARN MemoryStore: Not enough space to cache rdd_2572_1 in memory! (computed 67.5 MiB so far)
22/04/27 11:08:53 WARN BlockManager: Persisting block rdd_2572_1 to disk instead.
22/04/27 11:08:53 WARN MemoryStore: Not enough space to cache rdd_2572_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:08:53 WARN BlockManager: Persisting block rdd_2572_0 to disk instead.
22/04/27 11:08:53 WARN MemoryStore: Not enough space to cache rdd_2572_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:08:53 WARN BlockManager: Persisting block rdd_2572_2 to disk instead.
22/04/27 11:08:56 WARN MemoryStore: Not enough space to cache rdd_2572_0 in memory! (computed 44.0 MiB so far)
22/04/27 11:08:56 WARN MemoryStore: Not enough space to cache rdd_2572_2 in memory! (computed 44.0 MiB so far)
22/04/27 11:08:56 WARN MemoryStore: Not enough space to cache rdd_2572_1 in memory! (computed 44.0 MiB so far)


22/04/27 11:10:05 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:10:05 WARN MemoryStore: Not enough space to cache rdd_2835_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:10:05 WARN MemoryStore: Not enough space to cache rdd_2835_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:10:05 WARN MemoryStore: Not enough space to cache rdd_2835_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:10:05 WARN MemoryStore: Not enough space to cache rdd_2835_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:10:07 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:10:08 WARN MemoryStore: Not enough space to cache rdd_2835_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:10:08 WARN MemoryStore: Not enough space to cache rdd_2835_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:10:08 WARN MemoryStore: Not enough space to cache rdd_2835_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:10:08 WARN MemoryStore: Not enou

22/04/27 11:10:45 WARN MemoryStore: Not enough space to cache rdd_3028_3 in memory! (computed 155.8 MiB so far)
22/04/27 11:10:45 WARN BlockManager: Persisting block rdd_3028_3 to disk instead.
22/04/27 11:10:49 WARN MemoryStore: Not enough space to cache rdd_3028_2 in memory! (computed 67.5 MiB so far)
22/04/27 11:10:49 WARN MemoryStore: Not enough space to cache rdd_3028_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:10:49 WARN MemoryStore: Not enough space to cache rdd_3028_0 in memory! (computed 44.0 MiB so far)
22/04/27 11:10:51 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:10:52 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:10:52 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:10:53 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:10:53 WARN MemoryStore: Not enough space to cache rdd_3063_2 in memory! (computed 67.5 MiB so far)
22/04/27 11:10:53

22/04/27 11:11:28 WARN MemoryStore: Not enough space to cache rdd_3186_0 in memory! (computed 355.0 MiB so far)
22/04/27 11:11:28 WARN BlockManager: Persisting block rdd_3186_0 to disk instead.
22/04/27 11:11:28 WARN MemoryStore: Not enough space to cache rdd_3186_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:11:31 WARN MemoryStore: Not enough space to cache rdd_3186_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:11:31 WARN MemoryStore: Not enough space to cache rdd_3186_0 in memory! (computed 155.9 MiB so far)
22/04/27 11:11:31 WARN MemoryStore: Not enough space to cache rdd_3186_2 in memory! (computed 355.2 MiB so far)
22/04/27 11:11:32 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:11:33 WARN MemoryStore: Not enough space to cache rdd_3186_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:11:33 WARN MemoryStore: Not enough space to cache rdd_3186_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:11:33 WARN MemoryStore: Not enough sp

22/04/27 11:12:06 WARN MemoryStore: Not enough space to cache rdd_3309_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:12:06 WARN MemoryStore: Not enough space to cache rdd_3309_0 in memory! (computed 155.9 MiB so far)
22/04/27 11:12:06 WARN MemoryStore: Not enough space to cache rdd_3309_2 in memory! (computed 355.2 MiB so far)
22/04/27 11:12:08 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:12:08 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:12:08 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:12:09 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:12:09 WARN MemoryStore: Not enough space to cache rdd_3344_1 in memory! (computed 67.5 MiB so far)
22/04/27 11:12:09 WARN BlockManager: Persisting block rdd_3344_1 to disk instead.
22/04/27 11:12:10 WARN MemoryStore: Not enough space to cache rdd_3344_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:12:

22/04/27 11:12:45 WARN MemoryStore: Not enough space to cache rdd_3490_2 in memory! (computed 355.0 MiB so far)
22/04/27 11:12:45 WARN BlockManager: Persisting block rdd_3490_2 to disk instead.
22/04/27 11:12:46 WARN MemoryStore: Not enough space to cache rdd_3490_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:12:48 WARN MemoryStore: Not enough space to cache rdd_3490_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:12:48 WARN MemoryStore: Not enough space to cache rdd_3490_1 in memory! (computed 155.9 MiB so far)
22/04/27 11:12:48 WARN MemoryStore: Not enough space to cache rdd_3490_0 in memory! (computed 355.2 MiB so far)
22/04/27 11:12:50 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:12:50 WARN MemoryStore: Not enough space to cache rdd_3490_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:12:50 WARN MemoryStore: Not enough space to cache rdd_3490_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:12:50 WARN MemoryStore: Not enough sp

22/04/27 11:13:22 WARN MemoryStore: Not enough space to cache rdd_3590_1 in memory! (computed 355.0 MiB so far)
22/04/27 11:13:22 WARN BlockManager: Persisting block rdd_3590_1 to disk instead.
22/04/27 11:13:23 WARN MemoryStore: Not enough space to cache rdd_3590_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:13:25 WARN MemoryStore: Not enough space to cache rdd_3590_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:13:25 WARN MemoryStore: Not enough space to cache rdd_3590_0 in memory! (computed 155.9 MiB so far)
22/04/27 11:13:25 WARN MemoryStore: Not enough space to cache rdd_3590_2 in memory! (computed 355.2 MiB so far)
22/04/27 11:13:27 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:13:28 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:13:28 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:13:28 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:13:

22/04/27 11:14:40 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:14:40 WARN MemoryStore: Not enough space to cache rdd_3888_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:14:40 WARN MemoryStore: Not enough space to cache rdd_3888_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:14:40 WARN MemoryStore: Not enough space to cache rdd_3888_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:14:40 WARN MemoryStore: Not enough space to cache rdd_3888_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:14:42 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:14:42 WARN MemoryStore: Not enough space to cache rdd_3888_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:14:42 WARN MemoryStore: Not enough space to cache rdd_3888_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:14:42 WARN MemoryStore: Not enough space to cache rdd_3888_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:14:42 WARN MemoryStore: Not enou

22/04/27 11:15:17 WARN MemoryStore: Not enough space to cache rdd_4046_0 in memory! (computed 355.0 MiB so far)
22/04/27 11:15:17 WARN BlockManager: Persisting block rdd_4046_0 to disk instead.
22/04/27 11:15:17 WARN MemoryStore: Not enough space to cache rdd_4046_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:15:19 WARN MemoryStore: Not enough space to cache rdd_4046_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:15:19 WARN MemoryStore: Not enough space to cache rdd_4046_2 in memory! (computed 155.9 MiB so far)
22/04/27 11:15:20 WARN MemoryStore: Not enough space to cache rdd_4046_1 in memory! (computed 355.2 MiB so far)
22/04/27 11:15:21 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:15:22 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:15:22 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:15:23 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:15:

22/04/27 11:15:57 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:15:57 WARN MemoryStore: Not enough space to cache rdd_4192_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:15:57 WARN MemoryStore: Not enough space to cache rdd_4192_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:15:57 WARN MemoryStore: Not enough space to cache rdd_4192_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:15:57 WARN MemoryStore: Not enough space to cache rdd_4192_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:15:59 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:15:59 WARN MemoryStore: Not enough space to cache rdd_4192_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:15:59 WARN MemoryStore: Not enough space to cache rdd_4192_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:15:59 WARN MemoryStore: Not enough space to cache rdd_4192_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:15:59 WARN MemoryStore: Not enou

22/04/27 11:16:37 WARN MemoryStore: Not enough space to cache rdd_4327_0 in memory! (computed 44.0 MiB so far)
22/04/27 11:16:37 WARN MemoryStore: Not enough space to cache rdd_4327_2 in memory! (computed 44.0 MiB so far)
22/04/27 11:16:37 WARN MemoryStore: Not enough space to cache rdd_4327_1 in memory! (computed 44.0 MiB so far)
22/04/27 11:16:39 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:16:39 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:16:39 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:16:40 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:16:40 WARN MemoryStore: Not enough space to cache rdd_4362_2 in memory! (computed 67.5 MiB so far)
22/04/27 11:16:40 WARN BlockManager: Persisting block rdd_4362_2 to disk instead.
22/04/27 11:16:41 WARN MemoryStore: Not enough space to cache rdd_4362_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:16:41 

22/04/27 11:17:20 WARN MemoryStore: Not enough space to cache rdd_4507_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:17:20 WARN MemoryStore: Not enough space to cache rdd_4507_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:17:20 WARN MemoryStore: Not enough space to cache rdd_4507_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:17:20 WARN MemoryStore: Not enough space to cache rdd_4507_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:17:22 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:17:22 WARN MemoryStore: Not enough space to cache rdd_4507_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:17:22 WARN MemoryStore: Not enough space to cache rdd_4507_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:17:22 WARN MemoryStore: Not enough space to cache rdd_4507_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:17:22 WARN MemoryStore: Not enough space to cache rdd_4507_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:17:24 

22/04/27 11:17:58 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:17:58 WARN MemoryStore: Not enough space to cache rdd_4600_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:17:58 WARN MemoryStore: Not enough space to cache rdd_4600_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:17:58 WARN MemoryStore: Not enough space to cache rdd_4600_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:17:58 WARN MemoryStore: Not enough space to cache rdd_4600_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:18:01 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:18:01 WARN MemoryStore: Not enough space to cache rdd_4600_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:18:01 WARN MemoryStore: Not enough space to cache rdd_4600_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:18:01 WARN MemoryStore: Not enough space to cache rdd_4600_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:18:01 WARN MemoryStore: Not enou

22/04/27 11:18:38 WARN MemoryStore: Not enough space to cache rdd_4793_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:18:38 WARN MemoryStore: Not enough space to cache rdd_4793_0 in memory! (computed 155.9 MiB so far)
22/04/27 11:18:39 WARN MemoryStore: Not enough space to cache rdd_4793_2 in memory! (computed 355.2 MiB so far)
22/04/27 11:18:40 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:18:41 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:18:41 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:18:41 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:18:42 WARN MemoryStore: Not enough space to cache rdd_4828_2 in memory! (computed 67.5 MiB so far)
22/04/27 11:18:42 WARN BlockManager: Persisting block rdd_4828_2 to disk instead.
22/04/27 11:18:42 WARN MemoryStore: Not enough space to cache rdd_4828_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:18:

22/04/27 11:19:14 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:19:15 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:19:15 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:19:16 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:19:16 WARN MemoryStore: Not enough space to cache rdd_4951_0 in memory! (computed 67.5 MiB so far)
22/04/27 11:19:16 WARN BlockManager: Persisting block rdd_4951_0 to disk instead.
22/04/27 11:19:16 WARN MemoryStore: Not enough space to cache rdd_4951_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:19:16 WARN BlockManager: Persisting block rdd_4951_3 to disk instead.
22/04/27 11:19:16 WARN MemoryStore: Not enough space to cache rdd_4951_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:19:16 WARN BlockManager: Persisting block rdd_4951_2 to disk instead.
22/04/27 11:19:17 WARN MemoryStore: Not enough space to cache rdd_4951_1 in

22/04/27 11:19:53 WARN MemoryStore: Not enough space to cache rdd_5074_1 in memory! (computed 355.0 MiB so far)
22/04/27 11:19:53 WARN BlockManager: Persisting block rdd_5074_1 to disk instead.
22/04/27 11:19:53 WARN MemoryStore: Not enough space to cache rdd_5074_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:19:55 WARN MemoryStore: Not enough space to cache rdd_5074_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:19:55 WARN MemoryStore: Not enough space to cache rdd_5074_0 in memory! (computed 155.9 MiB so far)
22/04/27 11:19:56 WARN MemoryStore: Not enough space to cache rdd_5074_2 in memory! (computed 355.2 MiB so far)
22/04/27 11:19:57 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:19:58 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:19:58 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:19:59 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:19:

22/04/27 11:20:33 WARN MemoryStore: Not enough space to cache rdd_5255_1 in memory! (computed 355.0 MiB so far)
22/04/27 11:20:33 WARN BlockManager: Persisting block rdd_5255_1 to disk instead.
22/04/27 11:20:33 WARN MemoryStore: Not enough space to cache rdd_5255_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:20:36 WARN MemoryStore: Not enough space to cache rdd_5255_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:20:36 WARN MemoryStore: Not enough space to cache rdd_5255_2 in memory! (computed 155.9 MiB so far)
22/04/27 11:20:36 WARN MemoryStore: Not enough space to cache rdd_5255_0 in memory! (computed 355.2 MiB so far)
22/04/27 11:20:37 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:20:38 WARN MemoryStore: Not enough space to cache rdd_5255_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:20:38 WARN MemoryStore: Not enough space to cache rdd_5255_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:20:38 WARN MemoryStore: Not enough sp

22/04/27 11:21:54 WARN MemoryStore: Not enough space to cache rdd_5565_1 in memory! (computed 355.0 MiB so far)
22/04/27 11:21:54 WARN BlockManager: Persisting block rdd_5565_1 to disk instead.
22/04/27 11:21:54 WARN MemoryStore: Not enough space to cache rdd_5565_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:21:57 WARN MemoryStore: Not enough space to cache rdd_5565_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:21:57 WARN MemoryStore: Not enough space to cache rdd_5565_0 in memory! (computed 155.9 MiB so far)
22/04/27 11:21:57 WARN MemoryStore: Not enough space to cache rdd_5565_2 in memory! (computed 355.2 MiB so far)
22/04/27 11:21:58 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:21:59 WARN MemoryStore: Not enough space to cache rdd_5565_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:21:59 WARN MemoryStore: Not enough space to cache rdd_5565_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:21:59 WARN MemoryStore: Not enough sp

22/04/27 11:22:32 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:22:32 WARN MemoryStore: Not enough space to cache rdd_5653_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:22:32 WARN MemoryStore: Not enough space to cache rdd_5653_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:22:32 WARN MemoryStore: Not enough space to cache rdd_5653_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:22:32 WARN MemoryStore: Not enough space to cache rdd_5653_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:22:34 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:22:35 WARN MemoryStore: Not enough space to cache rdd_5653_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:22:35 WARN MemoryStore: Not enough space to cache rdd_5653_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:22:35 WARN MemoryStore: Not enough space to cache rdd_5653_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:22:35 WARN MemoryStore: Not enou

22/04/27 11:23:12 WARN MemoryStore: Not enough space to cache rdd_5846_2 in memory! (computed 355.0 MiB so far)
22/04/27 11:23:12 WARN BlockManager: Persisting block rdd_5846_2 to disk instead.
22/04/27 11:23:13 WARN MemoryStore: Not enough space to cache rdd_5846_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:23:15 WARN MemoryStore: Not enough space to cache rdd_5846_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:23:15 WARN MemoryStore: Not enough space to cache rdd_5846_2 in memory! (computed 155.9 MiB so far)
22/04/27 11:23:16 WARN MemoryStore: Not enough space to cache rdd_5846_0 in memory! (computed 355.2 MiB so far)
22/04/27 11:23:17 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:23:18 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:23:18 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:23:18 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:23:

22/04/27 11:23:57 WARN MemoryStore: Not enough space to cache rdd_6004_1 in memory! (computed 8.4 MiB so far)
22/04/27 11:23:57 WARN MemoryStore: Not enough space to cache rdd_6004_0 in memory! (computed 67.5 MiB so far)
22/04/27 11:23:57 WARN MemoryStore: Not enough space to cache rdd_6004_2 in memory! (computed 67.5 MiB so far)
22/04/27 11:23:58 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:23:58 WARN MemoryStore: Not enough space to cache rdd_6004_0 in memory! (computed 44.0 MiB so far)
22/04/27 11:23:58 WARN MemoryStore: Not enough space to cache rdd_6004_2 in memory! (computed 44.0 MiB so far)
22/04/27 11:23:58 WARN MemoryStore: Not enough space to cache rdd_6004_1 in memory! (computed 44.0 MiB so far)
22/04/27 11:24:00 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:24:00 WARN MemoryStore: Not enough space to cache rdd_6004_2 in memory! (computed 44.0 MiB so far)
22/04/27 11:24:00 WARN MemoryStore: Not enough space

22/04/27 11:24:37 WARN MemoryStore: Not enough space to cache rdd_6162_0 in memory! (computed 355.0 MiB so far)
22/04/27 11:24:37 WARN BlockManager: Persisting block rdd_6162_0 to disk instead.
22/04/27 11:24:37 WARN MemoryStore: Not enough space to cache rdd_6162_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:24:39 WARN MemoryStore: Not enough space to cache rdd_6162_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:24:39 WARN MemoryStore: Not enough space to cache rdd_6162_2 in memory! (computed 155.9 MiB so far)
22/04/27 11:24:39 WARN MemoryStore: Not enough space to cache rdd_6162_0 in memory! (computed 155.9 MiB so far)
22/04/27 11:24:41 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:24:42 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:24:42 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:24:42 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:24:

22/04/27 11:25:16 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:25:16 WARN MemoryStore: Not enough space to cache rdd_6308_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:25:16 WARN MemoryStore: Not enough space to cache rdd_6308_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:25:16 WARN MemoryStore: Not enough space to cache rdd_6308_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:25:16 WARN MemoryStore: Not enough space to cache rdd_6308_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:25:18 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:25:19 WARN MemoryStore: Not enough space to cache rdd_6308_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:25:19 WARN MemoryStore: Not enough space to cache rdd_6308_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:25:19 WARN MemoryStore: Not enough space to cache rdd_6308_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:25:19 WARN MemoryStore: Not enou

22/04/27 11:25:52 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:25:52 WARN MemoryStore: Not enough space to cache rdd_6443_1 in memory! (computed 67.5 MiB so far)
22/04/27 11:25:52 WARN BlockManager: Persisting block rdd_6443_1 to disk instead.
22/04/27 11:25:52 WARN MemoryStore: Not enough space to cache rdd_6443_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:25:52 WARN BlockManager: Persisting block rdd_6443_2 to disk instead.
22/04/27 11:25:52 WARN MemoryStore: Not enough space to cache rdd_6443_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:25:52 WARN BlockManager: Persisting block rdd_6443_3 to disk instead.
22/04/27 11:25:53 WARN MemoryStore: Not enough space to cache rdd_6443_0 in memory! (computed 355.0 MiB so far)
22/04/27 11:25:53 WARN BlockManager: Persisting block rdd_6443_0 to disk instead.
22/04/27 11:25:53 WARN MemoryStore: Not enough space to cache rdd_6443_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:25:56 WARN Memor

22/04/27 11:26:36 WARN MemoryStore: Not enough space to cache rdd_6623_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:26:40 WARN MemoryStore: Not enough space to cache rdd_6623_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:26:40 WARN MemoryStore: Not enough space to cache rdd_6623_1 in memory! (computed 155.9 MiB so far)
22/04/27 11:26:40 WARN MemoryStore: Not enough space to cache rdd_6623_2 in memory! (computed 355.2 MiB so far)
22/04/27 11:26:42 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:26:42 WARN MemoryStore: Not enough space to cache rdd_6623_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:26:42 WARN MemoryStore: Not enough space to cache rdd_6623_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:26:42 WARN MemoryStore: Not enough space to cache rdd_6623_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:26:42 WARN MemoryStore: Not enough space to cache rdd_6623_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:26:44 W

22/04/27 11:27:30 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:27:31 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:27:31 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:27:32 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:27:32 WARN MemoryStore: Not enough space to cache rdd_6769_0 in memory! (computed 67.5 MiB so far)
22/04/27 11:27:32 WARN BlockManager: Persisting block rdd_6769_0 to disk instead.
22/04/27 11:27:32 WARN MemoryStore: Not enough space to cache rdd_6769_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:27:32 WARN BlockManager: Persisting block rdd_6769_3 to disk instead.
22/04/27 11:27:32 WARN MemoryStore: Not enough space to cache rdd_6769_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:27:32 WARN BlockManager: Persisting block rdd_6769_2 to disk instead.
22/04/27 11:27:33 WARN MemoryStore: Not enough space to cache rdd_6769_1 in

22/04/27 11:28:12 WARN MemoryStore: Not enough space to cache rdd_6944_2 in memory! (computed 44.0 MiB so far)
22/04/27 11:28:12 WARN MemoryStore: Not enough space to cache rdd_6944_1 in memory! (computed 29.2 MiB so far)
22/04/27 11:28:12 WARN MemoryStore: Not enough space to cache rdd_6944_0 in memory! (computed 67.5 MiB so far)
22/04/27 11:28:14 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:28:15 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:28:15 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:28:16 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:28:16 WARN MemoryStore: Not enough space to cache rdd_6979_2 in memory! (computed 67.5 MiB so far)
22/04/27 11:28:16 WARN BlockManager: Persisting block rdd_6979_2 to disk instead.
22/04/27 11:28:17 WARN MemoryStore: Not enough space to cache rdd_6979_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:28:17 

22/04/27 11:28:51 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:28:51 WARN MemoryStore: Not enough space to cache rdd_7067_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:28:51 WARN MemoryStore: Not enough space to cache rdd_7067_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:28:51 WARN MemoryStore: Not enough space to cache rdd_7067_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:28:51 WARN MemoryStore: Not enough space to cache rdd_7067_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:28:53 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:28:53 WARN MemoryStore: Not enough space to cache rdd_7067_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:28:53 WARN MemoryStore: Not enough space to cache rdd_7067_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:28:53 WARN MemoryStore: Not enough space to cache rdd_7067_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:28:53 WARN MemoryStore: Not enou

22/04/27 11:29:29 WARN MemoryStore: Not enough space to cache rdd_7225_2 in memory! (computed 67.5 MiB so far)
22/04/27 11:29:29 WARN BlockManager: Persisting block rdd_7225_2 to disk instead.
22/04/27 11:29:29 WARN MemoryStore: Not enough space to cache rdd_7225_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:29:29 WARN BlockManager: Persisting block rdd_7225_1 to disk instead.
22/04/27 11:29:29 WARN MemoryStore: Not enough space to cache rdd_7225_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:29:29 WARN BlockManager: Persisting block rdd_7225_3 to disk instead.
22/04/27 11:29:30 WARN MemoryStore: Not enough space to cache rdd_7225_0 in memory! (computed 355.0 MiB so far)
22/04/27 11:29:30 WARN BlockManager: Persisting block rdd_7225_0 to disk instead.
22/04/27 11:29:30 WARN MemoryStore: Not enough space to cache rdd_7225_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:29:33 WARN MemoryStore: Not enough space to cache rdd_7225_1 in memory! (computed 103.7 MiB so far)
22

22/04/27 11:30:08 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:30:08 WARN MemoryStore: Not enough space to cache rdd_7371_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:30:08 WARN MemoryStore: Not enough space to cache rdd_7371_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:30:08 WARN MemoryStore: Not enough space to cache rdd_7371_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:30:08 WARN MemoryStore: Not enough space to cache rdd_7371_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:30:10 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:30:10 WARN MemoryStore: Not enough space to cache rdd_7371_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:30:10 WARN MemoryStore: Not enough space to cache rdd_7371_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:30:10 WARN MemoryStore: Not enough space to cache rdd_7371_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:30:10 WARN MemoryStore: Not enou

22/04/27 11:31:25 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:31:25 WARN MemoryStore: Not enough space to cache rdd_7681_1 in memory! (computed 67.5 MiB so far)
22/04/27 11:31:25 WARN BlockManager: Persisting block rdd_7681_1 to disk instead.
22/04/27 11:31:25 WARN MemoryStore: Not enough space to cache rdd_7681_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:31:25 WARN BlockManager: Persisting block rdd_7681_3 to disk instead.
22/04/27 11:31:25 WARN MemoryStore: Not enough space to cache rdd_7681_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:31:25 WARN BlockManager: Persisting block rdd_7681_2 to disk instead.
22/04/27 11:31:26 WARN MemoryStore: Not enough space to cache rdd_7681_0 in memory! (computed 355.0 MiB so far)
22/04/27 11:31:26 WARN BlockManager: Persisting block rdd_7681_0 to disk instead.
22/04/27 11:31:26 WARN MemoryStore: Not enough space to cache rdd_7681_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:31:29 WARN Memor

22/04/27 11:32:41 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:32:41 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:32:41 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:32:42 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:32:43 WARN MemoryStore: Not enough space to cache rdd_7962_1 in memory! (computed 67.5 MiB so far)
22/04/27 11:32:43 WARN BlockManager: Persisting block rdd_7962_1 to disk instead.
22/04/27 11:32:43 WARN MemoryStore: Not enough space to cache rdd_7962_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:32:43 WARN BlockManager: Persisting block rdd_7962_2 to disk instead.
22/04/27 11:32:43 WARN MemoryStore: Not enough space to cache rdd_7962_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:32:43 WARN BlockManager: Persisting block rdd_7962_3 to disk instead.
22/04/27 11:32:44 WARN MemoryStore: Not enough space to cache rdd_7962_0 in

22/04/27 11:33:21 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:33:21 WARN MemoryStore: Not enough space to cache rdd_8073_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:33:21 WARN MemoryStore: Not enough space to cache rdd_8073_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:33:21 WARN MemoryStore: Not enough space to cache rdd_8073_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:33:21 WARN MemoryStore: Not enough space to cache rdd_8073_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:33:23 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:33:24 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:33:24 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:33:25 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:33:25 WARN MemoryStore: Not enough space to cache rdd_8120_2 in memory! (computed 67.5 MiB so far)
22/04/27 11

22/04/27 11:33:59 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:33:59 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:33:59 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:34:00 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:34:00 WARN MemoryStore: Not enough space to cache rdd_8243_3 in memory! (computed 67.5 MiB so far)
22/04/27 11:34:00 WARN BlockManager: Persisting block rdd_8243_3 to disk instead.
22/04/27 11:34:01 WARN MemoryStore: Not enough space to cache rdd_8243_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:34:01 WARN BlockManager: Persisting block rdd_8243_2 to disk instead.
22/04/27 11:34:01 WARN MemoryStore: Not enough space to cache rdd_8243_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:34:01 WARN BlockManager: Persisting block rdd_8243_1 to disk instead.
22/04/27 11:34:01 WARN MemoryStore: Not enough space to cache rdd_8243_0 in

22/04/27 11:34:37 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:34:37 WARN MemoryStore: Not enough space to cache rdd_8383_3 in memory! (computed 103.7 MiB so far)
22/04/27 11:34:37 WARN MemoryStore: Not enough space to cache rdd_8383_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:34:37 WARN MemoryStore: Not enough space to cache rdd_8383_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:34:37 WARN MemoryStore: Not enough space to cache rdd_8383_0 in memory! (computed 103.7 MiB so far)
22/04/27 11:34:39 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:34:40 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:34:40 WARN DAGScheduler: Broadcasting large task binary with size 2.8 MiB
22/04/27 11:34:41 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:34:41 WARN MemoryStore: Not enough space to cache rdd_8424_0 in memory! (computed 67.5 MiB so far)
22/04/27 11

22/04/27 11:35:20 WARN DAGScheduler: Broadcasting large task binary with size 3.0 MiB
22/04/27 11:35:21 WARN MemoryStore: Not enough space to cache rdd_8524_3 in memory! (computed 67.5 MiB so far)
22/04/27 11:35:21 WARN BlockManager: Persisting block rdd_8524_3 to disk instead.
22/04/27 11:35:21 WARN MemoryStore: Not enough space to cache rdd_8524_2 in memory! (computed 103.7 MiB so far)
22/04/27 11:35:21 WARN BlockManager: Persisting block rdd_8524_2 to disk instead.
22/04/27 11:35:21 WARN MemoryStore: Not enough space to cache rdd_8524_1 in memory! (computed 103.7 MiB so far)
22/04/27 11:35:21 WARN BlockManager: Persisting block rdd_8524_1 to disk instead.
22/04/27 11:35:22 WARN MemoryStore: Not enough space to cache rdd_8524_0 in memory! (computed 355.0 MiB so far)
22/04/27 11:35:22 WARN BlockManager: Persisting block rdd_8524_0 to disk instead.
22/04/27 11:35:22 WARN MemoryStore: Not enough space to cache rdd_8524_3 in memory! (computed 44.0 MiB so far)
22/04/27 11:35:24 WARN Memor

In [11]:
test_metric = evaluator.evaluate(cvModel.transform(testData))

22/04/27 11:37:25 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
                                                                                

In [12]:
test_metric

0.7981008444584603

## Print the parameters from the best model selected

In [26]:
best_Model = cvModel.bestModel
for i in range(len(best_Model.stages)):
    print(best_Model.stages[i])

StringIndexerModel: uid=StringIndexer_722d5a54cc00, handleInvalid=error
VectorIndexerModel: uid=VectorIndexer_fca0cf2b4740, numFeatures=8018, handleInvalid=error
DecisionTreeClassificationModel: uid=DecisionTreeClassifier_69aa8a77b962, depth=7, numNodes=179, numClasses=2, numFeatures=8018


In [28]:
java_model = best_Model.stages[-1]._java_obj
{param.name: java_model.getOrDefault(java_model.getParam(param.name)) 
    for param in grid[0]}

{'impurity': 'gini', 'maxBins': 5, 'minInfoGain': 0.0, 'maxDepth': 7}

In [29]:
predictionAndLabels = best_Model.transform(testData)

## Calculate and print the Area under ROC Curve and Area under Precision-Recall Curve scores

In [30]:

metrics = BinaryClassificationMetrics(predictionAndLabels.select("prediction","indexedLabel").rdd)


print("Area under PR = %s" % metrics.areaUnderPR)


print("Area under ROC = %s" % metrics.areaUnderROC)

22/04/27 11:50:09 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB
22/04/27 11:50:11 WARN DAGScheduler: Broadcasting large task binary with size 2.4 MiB

Area under PR = 0.4831531580955708
Area under ROC = 0.6112066519249517


                                                                                