# Spam Sms Classifier

##### Required depedencies

In [32]:
from pyspark.sql import SparkSession
from pyspark.ml.feature import StringIndexer, Tokenizer, StopWordsRemover, CountVectorizer, IDF, VectorAssembler
from pyspark.ml.classification import LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, NaiveBayes
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator
from pyspark.ml.evaluation import MulticlassClassificationEvaluator

## Load data and inspect it

In [33]:
spark = SparkSession.builder.appName("SMS Spam Classifier").getOrCreate()

import logging
logger = spark._jvm.org.apache.log4j
logging.getLogger("py4j").setLevel(logging.ERROR)
logger.LogManager.getLogger("org").setLevel(logger.Level.ERROR)

##### Nous pouvons maintenant charger les données dans un DataFrame Spark et renommer les colonnes :

In [34]:
data = spark.read.format("csv").option("header", "false").option("delimiter", "\t").load("sms_spam_collection.txt")
data = data.withColumnRenamed("_c0", "class").withColumnRenamed("_c1", "text")

In [35]:
data.show()

+-----+--------------------+
|class|                text|
+-----+--------------------+
|  ham|Go until jurong p...|
|  ham|Ok lar... Joking ...|
| spam|Free entry in 2 a...|
|  ham|U dun say so earl...|
|  ham|Nah I don't think...|
| spam|FreeMsg Hey there...|
|  ham|Even my brother i...|
|  ham|As per your reque...|
| spam|WINNER!! As a val...|
| spam|Had your mobile 1...|
|  ham|I'm gonna be home...|
| spam|SIX chances to wi...|
| spam|URGENT! You have ...|
|  ham|I've been searchi...|
|  ham|I HAVE A DATE ON ...|
| spam|XXXMobileMovieClu...|
|  ham|Oh k...i'm watchi...|
|  ham|Eh u remember how...|
|  ham|Fine if thats th...|
| spam|England v Macedon...|
+-----+--------------------+
only showing top 20 rows



## trasform data

In [36]:
# Nous convertissons ensuite la colonne "class" en valeurs numériques
indexer = StringIndexer(inputCol="class", outputCol="label")
data = indexer.fit(data).transform(data)

In [37]:
# convertis la colonne de texte en une liste de mots sans les mots vides
tokenizer = Tokenizer(inputCol="text", outputCol="words")
stopwords_remover = StopWordsRemover(inputCol="words", outputCol="filtered_words")
data = tokenizer.transform(data)
data = stopwords_remover.transform(data)

In [38]:
# transforme les listes de mots en vecteurs d'occurrences de mots
count_vectorizer = CountVectorizer(inputCol="filtered_words", outputCol="raw_features")
model = count_vectorizer.fit(data)
data = model.transform(data)

In [39]:
idf = IDF(inputCol="raw_features", outputCol="features")
idf_model = idf.fit(data)
data = idf_model.transform(data)

In [40]:
assembler = VectorAssembler(inputCols=["features"], outputCol="features_vector")
data = assembler.transform(data)

In [41]:
data.describe().show()

+-------+-----+--------------------+------------------+
|summary|class|                text|             label|
+-------+-----+--------------------+------------------+
|  count| 5574|                5574|              5574|
|   mean| null|               645.0|0.1340150699677072|
| stddev| null|                null|0.3406990688361999|
|    min|  ham| &lt;#&gt;  in mc...|               0.0|
|    max| spam|… we r stayin her...|               1.0|
+-------+-----+--------------------+------------------+



In [42]:
(training_data, test_data) = data.randomSplit([0.7, 0.3], seed=100)

In [43]:
training_data.describe().show()

23/02/27 14:16:38 WARN DAGScheduler: Broadcasting large task binary with size 1192.5 KiB
23/02/27 14:16:38 WARN DAGScheduler: Broadcasting large task binary with size 1176.9 KiB
+-------+-----+--------------------+------------------+
|summary|class|                text|             label|
+-------+-----+--------------------+------------------+
|  count| 3907|                3907|              3907|
|   mean| null|               645.0| 0.135398003583312|
| stddev| null|                null|0.3421919853904309|
|    min|  ham| &lt;#&gt;  in mc...|               0.0|
|    max| spam|… we r stayin her...|               1.0|
+-------+-----+--------------------+------------------+



In [44]:
test_data.describe().show()

23/02/27 14:16:39 WARN DAGScheduler: Broadcasting large task binary with size 1192.5 KiB
23/02/27 14:16:39 WARN DAGScheduler: Broadcasting large task binary with size 1176.9 KiB
+-------+-----+--------------------+------------------+
|summary|class|                text|             label|
+-------+-----+--------------------+------------------+
|  count| 1667|                1667|              1667|
|   mean| null|                null|0.1307738452309538|
| stddev| null|                null|0.3372540246678357|
|    min|  ham| and  picking the...|               0.0|
|    max| spam|Ü thk of wat to e...|               1.0|
+-------+-----+--------------------+------------------+



## use different models

##### Logistic regression, decision tree, naive bayes

In [45]:
# Créer les modèles
log_reg = LogisticRegression(featuresCol="features_vector", labelCol="label", maxIter=10)
dt = DecisionTreeClassifier(featuresCol="features_vector", labelCol="label")
nb = NaiveBayes(featuresCol="features_vector", labelCol="label")

# Entraîner les modèles
log_reg_model = log_reg.fit(training_data)
dt_model = dt.fit(training_data)
nb_model = nb.fit(training_data)

23/02/27 14:16:39 WARN DAGScheduler: Broadcasting large task binary with size 1192.0 KiB
23/02/27 14:16:39 WARN DAGScheduler: Broadcasting large task binary with size 1193.5 KiB
23/02/27 14:16:39 WARN DAGScheduler: Broadcasting large task binary with size 1193.5 KiB
23/02/27 14:16:39 WARN DAGScheduler: Broadcasting large task binary with size 1193.5 KiB
23/02/27 14:16:39 WARN DAGScheduler: Broadcasting large task binary with size 1193.5 KiB
23/02/27 14:16:39 WARN DAGScheduler: Broadcasting large task binary with size 1193.5 KiB
23/02/27 14:16:39 WARN DAGScheduler: Broadcasting large task binary with size 1193.5 KiB
23/02/27 14:16:39 WARN DAGScheduler: Broadcasting large task binary with size 1193.5 KiB
23/02/27 14:16:39 WARN DAGScheduler: Broadcasting large task binary with size 1193.5 KiB
23/02/27 14:16:39 WARN DAGScheduler: Broadcasting large task binary with size 1193.5 KiB
23/02/27 14:16:39 WARN DAGScheduler: Broadcasting large task binary with size 1193.5 KiB
23/02/27 14:16:39 WAR

                                                                                

23/02/27 14:16:44 WARN DAGScheduler: Broadcasting large task binary with size 1506.1 KiB


                                                                                

23/02/27 14:16:44 WARN DAGScheduler: Broadcasting large task binary with size 1506.8 KiB
23/02/27 14:16:45 WARN DAGScheduler: Broadcasting large task binary with size 1507.3 KiB
23/02/27 14:16:45 WARN DAGScheduler: Broadcasting large task binary with size 1508.1 KiB
23/02/27 14:16:45 WARN DAGScheduler: Broadcasting large task binary with size 1509.3 KiB
23/02/27 14:16:45 WARN DAGScheduler: Broadcasting large task binary with size 1192.6 KiB
23/02/27 14:16:45 WARN DAGScheduler: Broadcasting large task binary with size 1163.4 KiB


In [46]:
log_reg_predictions = log_reg_model.transform(test_data)
dt_predictions = dt_model.transform(test_data)
nb_predictions = nb_model.transform(test_data)

In [47]:
evaluator = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="accuracy")

In [None]:
log_reg_accuracy = evaluator.evaluate(log_reg_predictions)
dt_accuracy = evaluator.evaluate(dt_predictions)
nb_accuracy = evaluator.evaluate(nb_predictions)

print("Logistic regression accuracy = %g" % (log_reg_accuracy))
print("Decision tree accuracy = %g" % (dt_accuracy))
print("Naive bayes accuracy = %g" % (nb_accuracy))

23/02/27 14:16:45 WARN DAGScheduler: Broadcasting large task binary with size 1321.1 KiB
23/02/27 14:16:46 WARN DAGScheduler: Broadcasting large task binary with size 1216.4 KiB
23/02/27 14:16:46 WARN DAGScheduler: Broadcasting large task binary with size 1419.8 KiB
Logistic regression accuracy = 0.982004
Decision tree accuracy = 0.925615
Naive bayes accuracy = 0.916617


## Random Forest with CrossValidator

In [49]:
rf = RandomForestClassifier(labelCol="label", featuresCol="features")

paramGrid = ParamGridBuilder() \
    .addGrid(rf.numTrees, [10, 20, 30]) \
    .addGrid(rf.maxDepth, [2, 4, 6, 8]) \
    .build()

evaluator = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="accuracy")

cv = CrossValidator(estimator=rf, estimatorParamMaps=paramGrid, evaluator=evaluator, numFolds=5)

rf_model = cv.fit(training_data)

23/02/27 14:16:46 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:16:46 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:16:46 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:16:49 WARN DAGScheduler: Broadcasting large task binary with size 1529.3 KiB
23/02/27 14:16:50 WARN DAGScheduler: Broadcasting large task binary with size 1532.9 KiB
23/02/27 14:16:50 WARN DAGScheduler: Broadcasting large task binary with size 1271.1 KiB
23/02/27 14:16:50 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:16:50 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:16:50 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:16:53 WARN DAGScheduler: Broadcasting large task binary with size 1529.3 KiB
23/02/27 14:16:54 WARN DAGScheduler: Broadcasting large task binary with size 1532.9 KiB
23/02/27 14:16:54 WARN DAGScheduler: Broadcasting large task binary with size 1538.0 KiB
23/02/27 14:16:54 WARN DAGScheduler: Broadcasting large task binary with size 1543.5 KiB
23/02/27 14:16:54 WARN DAGScheduler: Broadcasting large task binary with size 1274.9 KiB
23/02/27 14:16:54 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:16:54 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:16:54 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:16:57 WARN DAGScheduler: Broadcasting large task binary with size 1529.3 KiB
23/02/27 14:16:57 WARN DAGScheduler: Broadcasting large task binary with size 1532.9 KiB
23/02/27 14:16:57 WARN DAGScheduler: Broadcasting large task binary with size 1538.0 KiB
23/02/27 14:16:58 WARN DAGScheduler: Broadcasting large task binary with size 1543.5 KiB
23/02/27 14:16:58 WARN DAGScheduler: Broadcasting large task binary with size 1547.3 KiB
23/02/27 14:16:58 WARN DAGScheduler: Broadcasting large task binary with size 1549.8 KiB
23/02/27 14:16:58 WARN DAGScheduler: Broadcasting large task binary with size 1279.1 KiB
23/02/27 14:16:58 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:16:58 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:16:58 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:17:01 WARN DAGScheduler: Broadcasting large task binary with size 1529.3 KiB
23/02/27 14:17:01 WARN DAGScheduler: Broadcasting large task binary with size 1532.9 KiB
23/02/27 14:17:01 WARN DAGScheduler: Broadcasting large task binary with size 1538.0 KiB
23/02/27 14:17:01 WARN DAGScheduler: Broadcasting large task binary with size 1543.5 KiB
23/02/27 14:17:02 WARN DAGScheduler: Broadcasting large task binary with size 1547.3 KiB
23/02/27 14:17:02 WARN DAGScheduler: Broadcasting large task binary with size 1549.8 KiB
23/02/27 14:17:02 WARN DAGScheduler: Broadcasting large task binary with size 1554.7 KiB
23/02/27 14:17:02 WARN DAGScheduler: Broadcasting large task binary with size 1559.8 KiB
23/02/27 14:17:02 WARN DAGScheduler: Broadcasting large task binary with size 1284.0 KiB
23/02/27 14:17:02 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:17:02 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:02 WAR

                                                                                

23/02/27 14:17:05 WARN DAGScheduler: Broadcasting large task binary with size 1534.5 KiB
23/02/27 14:17:05 WARN DAGScheduler: Broadcasting large task binary with size 1542.3 KiB
23/02/27 14:17:06 WARN DAGScheduler: Broadcasting large task binary with size 1294.7 KiB
23/02/27 14:17:06 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:17:06 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:06 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:17:09 WARN DAGScheduler: Broadcasting large task binary with size 1534.5 KiB
23/02/27 14:17:09 WARN DAGScheduler: Broadcasting large task binary with size 1542.3 KiB
23/02/27 14:17:09 WARN DAGScheduler: Broadcasting large task binary with size 1552.9 KiB
23/02/27 14:17:09 WARN DAGScheduler: Broadcasting large task binary with size 1561.2 KiB
23/02/27 14:17:10 WARN DAGScheduler: Broadcasting large task binary with size 1302.1 KiB
23/02/27 14:17:10 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:17:10 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:10 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:17:13 WARN DAGScheduler: Broadcasting large task binary with size 1534.5 KiB
23/02/27 14:17:13 WARN DAGScheduler: Broadcasting large task binary with size 1542.3 KiB
23/02/27 14:17:13 WARN DAGScheduler: Broadcasting large task binary with size 1552.9 KiB
23/02/27 14:17:13 WARN DAGScheduler: Broadcasting large task binary with size 1561.2 KiB
23/02/27 14:17:13 WARN DAGScheduler: Broadcasting large task binary with size 1570.6 KiB
23/02/27 14:17:13 WARN DAGScheduler: Broadcasting large task binary with size 1580.3 KiB
23/02/27 14:17:14 WARN DAGScheduler: Broadcasting large task binary with size 1311.3 KiB
23/02/27 14:17:14 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:17:14 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:14 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:17:17 WARN DAGScheduler: Broadcasting large task binary with size 1534.5 KiB
23/02/27 14:17:17 WARN DAGScheduler: Broadcasting large task binary with size 1542.3 KiB
23/02/27 14:17:17 WARN DAGScheduler: Broadcasting large task binary with size 1552.9 KiB
23/02/27 14:17:17 WARN DAGScheduler: Broadcasting large task binary with size 1561.2 KiB
23/02/27 14:17:17 WARN DAGScheduler: Broadcasting large task binary with size 1570.6 KiB
23/02/27 14:17:18 WARN DAGScheduler: Broadcasting large task binary with size 1580.3 KiB
23/02/27 14:17:18 WARN DAGScheduler: Broadcasting large task binary with size 1590.0 KiB
23/02/27 14:17:18 WARN DAGScheduler: Broadcasting large task binary with size 1601.2 KiB
23/02/27 14:17:18 WARN DAGScheduler: Broadcasting large task binary with size 1319.6 KiB
23/02/27 14:17:18 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:17:18 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:18 WAR

                                                                                

23/02/27 14:17:21 WARN DAGScheduler: Broadcasting large task binary with size 1539.8 KiB
23/02/27 14:17:22 WARN DAGScheduler: Broadcasting large task binary with size 1554.1 KiB
23/02/27 14:17:22 WARN DAGScheduler: Broadcasting large task binary with size 1317.9 KiB
23/02/27 14:17:22 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:17:22 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:22 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:17:25 WARN DAGScheduler: Broadcasting large task binary with size 1539.8 KiB
23/02/27 14:17:25 WARN DAGScheduler: Broadcasting large task binary with size 1554.1 KiB
23/02/27 14:17:25 WARN DAGScheduler: Broadcasting large task binary with size 1567.7 KiB
23/02/27 14:17:26 WARN DAGScheduler: Broadcasting large task binary with size 1577.7 KiB
23/02/27 14:17:26 WARN DAGScheduler: Broadcasting large task binary with size 1328.3 KiB
23/02/27 14:17:26 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:17:26 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:26 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:17:29 WARN DAGScheduler: Broadcasting large task binary with size 1539.8 KiB
23/02/27 14:17:29 WARN DAGScheduler: Broadcasting large task binary with size 1554.1 KiB
23/02/27 14:17:29 WARN DAGScheduler: Broadcasting large task binary with size 1567.7 KiB
23/02/27 14:17:29 WARN DAGScheduler: Broadcasting large task binary with size 1577.7 KiB
23/02/27 14:17:30 WARN DAGScheduler: Broadcasting large task binary with size 1591.1 KiB
23/02/27 14:17:30 WARN DAGScheduler: Broadcasting large task binary with size 1602.4 KiB
23/02/27 14:17:30 WARN DAGScheduler: Broadcasting large task binary with size 1339.4 KiB
23/02/27 14:17:30 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:17:30 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:30 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:17:33 WARN DAGScheduler: Broadcasting large task binary with size 1539.8 KiB
23/02/27 14:17:33 WARN DAGScheduler: Broadcasting large task binary with size 1554.1 KiB
23/02/27 14:17:34 WARN DAGScheduler: Broadcasting large task binary with size 1567.7 KiB
23/02/27 14:17:34 WARN DAGScheduler: Broadcasting large task binary with size 1577.7 KiB
23/02/27 14:17:34 WARN DAGScheduler: Broadcasting large task binary with size 1591.1 KiB
23/02/27 14:17:34 WARN DAGScheduler: Broadcasting large task binary with size 1602.4 KiB
23/02/27 14:17:34 WARN DAGScheduler: Broadcasting large task binary with size 1616.0 KiB
23/02/27 14:17:34 WARN DAGScheduler: Broadcasting large task binary with size 1632.7 KiB
23/02/27 14:17:34 WARN DAGScheduler: Broadcasting large task binary with size 1354.6 KiB
23/02/27 14:17:34 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:35 WARN DAGScheduler: Broadcasting large task binary with size 1227.2 KiB
23/02/27 14:17:35 WAR

                                                                                

23/02/27 14:17:38 WARN DAGScheduler: Broadcasting large task binary with size 1529.0 KiB
23/02/27 14:17:38 WARN DAGScheduler: Broadcasting large task binary with size 1533.6 KiB
23/02/27 14:17:38 WARN DAGScheduler: Broadcasting large task binary with size 1271.1 KiB
23/02/27 14:17:38 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:38 WARN DAGScheduler: Broadcasting large task binary with size 1227.2 KiB
23/02/27 14:17:38 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:17:41 WARN DAGScheduler: Broadcasting large task binary with size 1529.0 KiB
23/02/27 14:17:42 WARN DAGScheduler: Broadcasting large task binary with size 1533.6 KiB
23/02/27 14:17:42 WARN DAGScheduler: Broadcasting large task binary with size 1537.5 KiB
23/02/27 14:17:42 WARN DAGScheduler: Broadcasting large task binary with size 1540.5 KiB
23/02/27 14:17:42 WARN DAGScheduler: Broadcasting large task binary with size 1274.2 KiB
23/02/27 14:17:42 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:42 WARN DAGScheduler: Broadcasting large task binary with size 1227.2 KiB
23/02/27 14:17:42 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:17:45 WARN DAGScheduler: Broadcasting large task binary with size 1529.0 KiB
23/02/27 14:17:46 WARN DAGScheduler: Broadcasting large task binary with size 1533.6 KiB
23/02/27 14:17:46 WARN DAGScheduler: Broadcasting large task binary with size 1537.5 KiB
23/02/27 14:17:46 WARN DAGScheduler: Broadcasting large task binary with size 1540.5 KiB
23/02/27 14:17:46 WARN DAGScheduler: Broadcasting large task binary with size 1544.8 KiB
23/02/27 14:17:46 WARN DAGScheduler: Broadcasting large task binary with size 1549.9 KiB
23/02/27 14:17:46 WARN DAGScheduler: Broadcasting large task binary with size 1279.0 KiB
23/02/27 14:17:46 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:46 WARN DAGScheduler: Broadcasting large task binary with size 1227.2 KiB
23/02/27 14:17:46 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:17:49 WARN DAGScheduler: Broadcasting large task binary with size 1529.0 KiB
23/02/27 14:17:50 WARN DAGScheduler: Broadcasting large task binary with size 1533.6 KiB
23/02/27 14:17:50 WARN DAGScheduler: Broadcasting large task binary with size 1537.5 KiB
23/02/27 14:17:50 WARN DAGScheduler: Broadcasting large task binary with size 1540.5 KiB
23/02/27 14:17:50 WARN DAGScheduler: Broadcasting large task binary with size 1544.8 KiB
23/02/27 14:17:50 WARN DAGScheduler: Broadcasting large task binary with size 1549.9 KiB
23/02/27 14:17:50 WARN DAGScheduler: Broadcasting large task binary with size 1552.2 KiB
23/02/27 14:17:50 WARN DAGScheduler: Broadcasting large task binary with size 1556.9 KiB
23/02/27 14:17:50 WARN DAGScheduler: Broadcasting large task binary with size 1282.6 KiB
23/02/27 14:17:50 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:50 WARN DAGScheduler: Broadcasting large task binary with size 1227.2 KiB
23/02/27 14:17:50 WAR

                                                                                

23/02/27 14:17:53 WARN DAGScheduler: Broadcasting large task binary with size 1534.3 KiB
23/02/27 14:17:54 WARN DAGScheduler: Broadcasting large task binary with size 1543.5 KiB
23/02/27 14:17:54 WARN DAGScheduler: Broadcasting large task binary with size 1295.0 KiB
23/02/27 14:17:54 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:54 WARN DAGScheduler: Broadcasting large task binary with size 1227.2 KiB
23/02/27 14:17:54 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:17:57 WARN DAGScheduler: Broadcasting large task binary with size 1534.3 KiB
23/02/27 14:17:58 WARN DAGScheduler: Broadcasting large task binary with size 1543.5 KiB
23/02/27 14:17:58 WARN DAGScheduler: Broadcasting large task binary with size 1553.6 KiB
23/02/27 14:17:58 WARN DAGScheduler: Broadcasting large task binary with size 1563.5 KiB
23/02/27 14:17:58 WARN DAGScheduler: Broadcasting large task binary with size 1303.2 KiB
23/02/27 14:17:58 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:17:58 WARN DAGScheduler: Broadcasting large task binary with size 1227.2 KiB
23/02/27 14:17:58 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:18:01 WARN DAGScheduler: Broadcasting large task binary with size 1534.3 KiB
23/02/27 14:18:01 WARN DAGScheduler: Broadcasting large task binary with size 1543.5 KiB
23/02/27 14:18:02 WARN DAGScheduler: Broadcasting large task binary with size 1553.6 KiB
23/02/27 14:18:02 WARN DAGScheduler: Broadcasting large task binary with size 1563.5 KiB
23/02/27 14:18:02 WARN DAGScheduler: Broadcasting large task binary with size 1571.4 KiB
23/02/27 14:18:02 WARN DAGScheduler: Broadcasting large task binary with size 1578.9 KiB
23/02/27 14:18:02 WARN DAGScheduler: Broadcasting large task binary with size 1311.9 KiB
23/02/27 14:18:02 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:02 WARN DAGScheduler: Broadcasting large task binary with size 1227.2 KiB
23/02/27 14:18:02 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:18:05 WARN DAGScheduler: Broadcasting large task binary with size 1534.3 KiB
23/02/27 14:18:06 WARN DAGScheduler: Broadcasting large task binary with size 1543.5 KiB
23/02/27 14:18:06 WARN DAGScheduler: Broadcasting large task binary with size 1553.6 KiB
23/02/27 14:18:06 WARN DAGScheduler: Broadcasting large task binary with size 1563.5 KiB
23/02/27 14:18:06 WARN DAGScheduler: Broadcasting large task binary with size 1571.4 KiB
23/02/27 14:18:06 WARN DAGScheduler: Broadcasting large task binary with size 1578.9 KiB
23/02/27 14:18:06 WARN DAGScheduler: Broadcasting large task binary with size 1587.7 KiB
23/02/27 14:18:06 WARN DAGScheduler: Broadcasting large task binary with size 1596.7 KiB
23/02/27 14:18:06 WARN DAGScheduler: Broadcasting large task binary with size 1320.2 KiB
23/02/27 14:18:06 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:06 WARN DAGScheduler: Broadcasting large task binary with size 1227.2 KiB
23/02/27 14:18:06 WAR

                                                                                

23/02/27 14:18:09 WARN DAGScheduler: Broadcasting large task binary with size 1539.5 KiB
23/02/27 14:18:10 WARN DAGScheduler: Broadcasting large task binary with size 1554.3 KiB
23/02/27 14:18:10 WARN DAGScheduler: Broadcasting large task binary with size 1317.9 KiB
23/02/27 14:18:10 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:10 WARN DAGScheduler: Broadcasting large task binary with size 1227.2 KiB
23/02/27 14:18:10 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:18:13 WARN DAGScheduler: Broadcasting large task binary with size 1539.5 KiB
23/02/27 14:18:14 WARN DAGScheduler: Broadcasting large task binary with size 1554.3 KiB
23/02/27 14:18:14 WARN DAGScheduler: Broadcasting large task binary with size 1565.7 KiB
23/02/27 14:18:14 WARN DAGScheduler: Broadcasting large task binary with size 1576.7 KiB
23/02/27 14:18:14 WARN DAGScheduler: Broadcasting large task binary with size 1328.0 KiB
23/02/27 14:18:14 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:14 WARN DAGScheduler: Broadcasting large task binary with size 1227.2 KiB
23/02/27 14:18:14 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:18:17 WARN DAGScheduler: Broadcasting large task binary with size 1539.5 KiB
23/02/27 14:18:18 WARN DAGScheduler: Broadcasting large task binary with size 1554.3 KiB
23/02/27 14:18:18 WARN DAGScheduler: Broadcasting large task binary with size 1565.7 KiB
23/02/27 14:18:18 WARN DAGScheduler: Broadcasting large task binary with size 1576.7 KiB
23/02/27 14:18:18 WARN DAGScheduler: Broadcasting large task binary with size 1588.5 KiB
23/02/27 14:18:18 WARN DAGScheduler: Broadcasting large task binary with size 1601.3 KiB
23/02/27 14:18:18 WARN DAGScheduler: Broadcasting large task binary with size 1339.9 KiB
23/02/27 14:18:18 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:18 WARN DAGScheduler: Broadcasting large task binary with size 1227.2 KiB
23/02/27 14:18:18 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:18:21 WARN DAGScheduler: Broadcasting large task binary with size 1539.5 KiB
23/02/27 14:18:22 WARN DAGScheduler: Broadcasting large task binary with size 1554.3 KiB
23/02/27 14:18:22 WARN DAGScheduler: Broadcasting large task binary with size 1565.7 KiB
23/02/27 14:18:22 WARN DAGScheduler: Broadcasting large task binary with size 1576.7 KiB
23/02/27 14:18:22 WARN DAGScheduler: Broadcasting large task binary with size 1588.5 KiB
23/02/27 14:18:22 WARN DAGScheduler: Broadcasting large task binary with size 1601.3 KiB
23/02/27 14:18:22 WARN DAGScheduler: Broadcasting large task binary with size 1614.4 KiB
23/02/27 14:18:22 WARN DAGScheduler: Broadcasting large task binary with size 1631.9 KiB
23/02/27 14:18:22 WARN DAGScheduler: Broadcasting large task binary with size 1355.0 KiB
23/02/27 14:18:23 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:18:23 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:23 WAR

                                                                                

23/02/27 14:18:26 WARN DAGScheduler: Broadcasting large task binary with size 1527.6 KiB
23/02/27 14:18:26 WARN DAGScheduler: Broadcasting large task binary with size 1531.2 KiB
23/02/27 14:18:26 WARN DAGScheduler: Broadcasting large task binary with size 1271.0 KiB
23/02/27 14:18:26 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:18:27 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:27 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:18:30 WARN DAGScheduler: Broadcasting large task binary with size 1527.6 KiB
23/02/27 14:18:30 WARN DAGScheduler: Broadcasting large task binary with size 1531.2 KiB
23/02/27 14:18:30 WARN DAGScheduler: Broadcasting large task binary with size 1535.8 KiB
23/02/27 14:18:30 WARN DAGScheduler: Broadcasting large task binary with size 1538.8 KiB
23/02/27 14:18:30 WARN DAGScheduler: Broadcasting large task binary with size 1274.3 KiB
23/02/27 14:18:30 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:18:30 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:30 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:18:33 WARN DAGScheduler: Broadcasting large task binary with size 1527.6 KiB
23/02/27 14:18:34 WARN DAGScheduler: Broadcasting large task binary with size 1531.2 KiB
23/02/27 14:18:34 WARN DAGScheduler: Broadcasting large task binary with size 1535.8 KiB
23/02/27 14:18:34 WARN DAGScheduler: Broadcasting large task binary with size 1538.8 KiB
23/02/27 14:18:34 WARN DAGScheduler: Broadcasting large task binary with size 1543.8 KiB
23/02/27 14:18:34 WARN DAGScheduler: Broadcasting large task binary with size 1548.4 KiB
23/02/27 14:18:34 WARN DAGScheduler: Broadcasting large task binary with size 1278.9 KiB
23/02/27 14:18:34 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:18:34 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:34 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:18:37 WARN DAGScheduler: Broadcasting large task binary with size 1527.6 KiB
23/02/27 14:18:38 WARN DAGScheduler: Broadcasting large task binary with size 1531.2 KiB
23/02/27 14:18:38 WARN DAGScheduler: Broadcasting large task binary with size 1535.8 KiB
23/02/27 14:18:38 WARN DAGScheduler: Broadcasting large task binary with size 1538.8 KiB
23/02/27 14:18:38 WARN DAGScheduler: Broadcasting large task binary with size 1543.8 KiB
23/02/27 14:18:38 WARN DAGScheduler: Broadcasting large task binary with size 1548.4 KiB
23/02/27 14:18:38 WARN DAGScheduler: Broadcasting large task binary with size 1553.2 KiB
23/02/27 14:18:38 WARN DAGScheduler: Broadcasting large task binary with size 1557.7 KiB
23/02/27 14:18:38 WARN DAGScheduler: Broadcasting large task binary with size 1284.3 KiB
23/02/27 14:18:38 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:18:38 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:38 WAR

                                                                                

23/02/27 14:18:41 WARN DAGScheduler: Broadcasting large task binary with size 1532.8 KiB
23/02/27 14:18:42 WARN DAGScheduler: Broadcasting large task binary with size 1542.0 KiB
23/02/27 14:18:42 WARN DAGScheduler: Broadcasting large task binary with size 1294.4 KiB
23/02/27 14:18:42 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:18:42 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:42 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:18:45 WARN DAGScheduler: Broadcasting large task binary with size 1532.8 KiB
23/02/27 14:18:46 WARN DAGScheduler: Broadcasting large task binary with size 1542.0 KiB
23/02/27 14:18:46 WARN DAGScheduler: Broadcasting large task binary with size 1551.5 KiB
23/02/27 14:18:46 WARN DAGScheduler: Broadcasting large task binary with size 1558.8 KiB
23/02/27 14:18:46 WARN DAGScheduler: Broadcasting large task binary with size 1302.1 KiB
23/02/27 14:18:46 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:18:46 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:46 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:18:49 WARN DAGScheduler: Broadcasting large task binary with size 1532.8 KiB
23/02/27 14:18:49 WARN DAGScheduler: Broadcasting large task binary with size 1542.0 KiB
23/02/27 14:18:49 WARN DAGScheduler: Broadcasting large task binary with size 1551.5 KiB
23/02/27 14:18:50 WARN DAGScheduler: Broadcasting large task binary with size 1558.8 KiB
23/02/27 14:18:50 WARN DAGScheduler: Broadcasting large task binary with size 1567.1 KiB
23/02/27 14:18:50 WARN DAGScheduler: Broadcasting large task binary with size 1576.1 KiB
23/02/27 14:18:50 WARN DAGScheduler: Broadcasting large task binary with size 1311.1 KiB
23/02/27 14:18:50 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:18:50 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:50 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:18:53 WARN DAGScheduler: Broadcasting large task binary with size 1532.8 KiB
23/02/27 14:18:53 WARN DAGScheduler: Broadcasting large task binary with size 1542.0 KiB
23/02/27 14:18:54 WARN DAGScheduler: Broadcasting large task binary with size 1551.5 KiB
23/02/27 14:18:54 WARN DAGScheduler: Broadcasting large task binary with size 1558.8 KiB
23/02/27 14:18:54 WARN DAGScheduler: Broadcasting large task binary with size 1567.1 KiB
23/02/27 14:18:54 WARN DAGScheduler: Broadcasting large task binary with size 1576.1 KiB
23/02/27 14:18:54 WARN DAGScheduler: Broadcasting large task binary with size 1586.0 KiB
23/02/27 14:18:54 WARN DAGScheduler: Broadcasting large task binary with size 1595.8 KiB
23/02/27 14:18:54 WARN DAGScheduler: Broadcasting large task binary with size 1320.3 KiB
23/02/27 14:18:54 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:18:54 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:54 WAR

                                                                                

23/02/27 14:18:57 WARN DAGScheduler: Broadcasting large task binary with size 1538.1 KiB
23/02/27 14:18:58 WARN DAGScheduler: Broadcasting large task binary with size 1550.0 KiB
23/02/27 14:18:58 WARN DAGScheduler: Broadcasting large task binary with size 1317.2 KiB
23/02/27 14:18:58 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:18:58 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:18:58 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:19:01 WARN DAGScheduler: Broadcasting large task binary with size 1538.1 KiB
23/02/27 14:19:02 WARN DAGScheduler: Broadcasting large task binary with size 1550.0 KiB
23/02/27 14:19:02 WARN DAGScheduler: Broadcasting large task binary with size 1562.5 KiB
23/02/27 14:19:02 WARN DAGScheduler: Broadcasting large task binary with size 1575.9 KiB
23/02/27 14:19:02 WARN DAGScheduler: Broadcasting large task binary with size 1328.9 KiB
23/02/27 14:19:02 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:02 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:02 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:19:05 WARN DAGScheduler: Broadcasting large task binary with size 1538.1 KiB
23/02/27 14:19:06 WARN DAGScheduler: Broadcasting large task binary with size 1550.0 KiB
23/02/27 14:19:06 WARN DAGScheduler: Broadcasting large task binary with size 1562.5 KiB
23/02/27 14:19:06 WARN DAGScheduler: Broadcasting large task binary with size 1575.9 KiB
23/02/27 14:19:06 WARN DAGScheduler: Broadcasting large task binary with size 1591.1 KiB
23/02/27 14:19:06 WARN DAGScheduler: Broadcasting large task binary with size 1606.3 KiB
23/02/27 14:19:06 WARN DAGScheduler: Broadcasting large task binary with size 1343.1 KiB
23/02/27 14:19:06 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:06 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:06 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:19:09 WARN DAGScheduler: Broadcasting large task binary with size 1538.1 KiB
23/02/27 14:19:10 WARN DAGScheduler: Broadcasting large task binary with size 1550.0 KiB
23/02/27 14:19:10 WARN DAGScheduler: Broadcasting large task binary with size 1562.5 KiB
23/02/27 14:19:10 WARN DAGScheduler: Broadcasting large task binary with size 1575.9 KiB
23/02/27 14:19:10 WARN DAGScheduler: Broadcasting large task binary with size 1591.1 KiB
23/02/27 14:19:10 WARN DAGScheduler: Broadcasting large task binary with size 1606.3 KiB
23/02/27 14:19:10 WARN DAGScheduler: Broadcasting large task binary with size 1619.4 KiB
23/02/27 14:19:11 WARN DAGScheduler: Broadcasting large task binary with size 1632.2 KiB
23/02/27 14:19:11 WARN DAGScheduler: Broadcasting large task binary with size 1358.0 KiB
23/02/27 14:19:11 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:11 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:11 WAR

                                                                                

23/02/27 14:19:14 WARN DAGScheduler: Broadcasting large task binary with size 1527.3 KiB
23/02/27 14:19:15 WARN DAGScheduler: Broadcasting large task binary with size 1531.4 KiB
23/02/27 14:19:15 WARN DAGScheduler: Broadcasting large task binary with size 1271.1 KiB
23/02/27 14:19:15 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:15 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:15 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:19:18 WARN DAGScheduler: Broadcasting large task binary with size 1527.3 KiB
23/02/27 14:19:18 WARN DAGScheduler: Broadcasting large task binary with size 1531.4 KiB
23/02/27 14:19:18 WARN DAGScheduler: Broadcasting large task binary with size 1536.2 KiB
23/02/27 14:19:18 WARN DAGScheduler: Broadcasting large task binary with size 1541.1 KiB
23/02/27 14:19:19 WARN DAGScheduler: Broadcasting large task binary with size 1274.8 KiB
23/02/27 14:19:19 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:19 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:19 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:19:22 WARN DAGScheduler: Broadcasting large task binary with size 1527.3 KiB
23/02/27 14:19:22 WARN DAGScheduler: Broadcasting large task binary with size 1531.4 KiB
23/02/27 14:19:22 WARN DAGScheduler: Broadcasting large task binary with size 1536.2 KiB
23/02/27 14:19:22 WARN DAGScheduler: Broadcasting large task binary with size 1541.1 KiB
23/02/27 14:19:22 WARN DAGScheduler: Broadcasting large task binary with size 1544.3 KiB
23/02/27 14:19:22 WARN DAGScheduler: Broadcasting large task binary with size 1548.9 KiB
23/02/27 14:19:23 WARN DAGScheduler: Broadcasting large task binary with size 1278.9 KiB
23/02/27 14:19:23 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:23 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:23 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:19:26 WARN DAGScheduler: Broadcasting large task binary with size 1527.3 KiB
23/02/27 14:19:26 WARN DAGScheduler: Broadcasting large task binary with size 1531.4 KiB
23/02/27 14:19:26 WARN DAGScheduler: Broadcasting large task binary with size 1536.2 KiB
23/02/27 14:19:26 WARN DAGScheduler: Broadcasting large task binary with size 1541.1 KiB
23/02/27 14:19:26 WARN DAGScheduler: Broadcasting large task binary with size 1544.3 KiB
23/02/27 14:19:26 WARN DAGScheduler: Broadcasting large task binary with size 1548.9 KiB
23/02/27 14:19:26 WARN DAGScheduler: Broadcasting large task binary with size 1553.1 KiB
23/02/27 14:19:27 WARN DAGScheduler: Broadcasting large task binary with size 1558.7 KiB
23/02/27 14:19:27 WARN DAGScheduler: Broadcasting large task binary with size 1284.1 KiB
23/02/27 14:19:27 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:27 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:27 WAR

                                                                                

23/02/27 14:19:30 WARN DAGScheduler: Broadcasting large task binary with size 1532.5 KiB
23/02/27 14:19:30 WARN DAGScheduler: Broadcasting large task binary with size 1540.8 KiB
23/02/27 14:19:30 WARN DAGScheduler: Broadcasting large task binary with size 1294.5 KiB
23/02/27 14:19:30 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:30 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:31 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:19:34 WARN DAGScheduler: Broadcasting large task binary with size 1532.5 KiB
23/02/27 14:19:34 WARN DAGScheduler: Broadcasting large task binary with size 1540.8 KiB
23/02/27 14:19:34 WARN DAGScheduler: Broadcasting large task binary with size 1549.3 KiB
23/02/27 14:19:34 WARN DAGScheduler: Broadcasting large task binary with size 1557.2 KiB
23/02/27 14:19:34 WARN DAGScheduler: Broadcasting large task binary with size 1302.3 KiB
23/02/27 14:19:34 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:34 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:34 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:19:37 WARN DAGScheduler: Broadcasting large task binary with size 1532.5 KiB
23/02/27 14:19:38 WARN DAGScheduler: Broadcasting large task binary with size 1540.8 KiB
23/02/27 14:19:38 WARN DAGScheduler: Broadcasting large task binary with size 1549.3 KiB
23/02/27 14:19:38 WARN DAGScheduler: Broadcasting large task binary with size 1557.2 KiB
23/02/27 14:19:38 WARN DAGScheduler: Broadcasting large task binary with size 1566.2 KiB
23/02/27 14:19:38 WARN DAGScheduler: Broadcasting large task binary with size 1574.7 KiB
23/02/27 14:19:38 WARN DAGScheduler: Broadcasting large task binary with size 1311.5 KiB
23/02/27 14:19:38 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:38 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:38 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:19:41 WARN DAGScheduler: Broadcasting large task binary with size 1532.5 KiB
23/02/27 14:19:42 WARN DAGScheduler: Broadcasting large task binary with size 1540.8 KiB
23/02/27 14:19:42 WARN DAGScheduler: Broadcasting large task binary with size 1549.3 KiB
23/02/27 14:19:42 WARN DAGScheduler: Broadcasting large task binary with size 1557.2 KiB
23/02/27 14:19:42 WARN DAGScheduler: Broadcasting large task binary with size 1566.2 KiB
23/02/27 14:19:42 WARN DAGScheduler: Broadcasting large task binary with size 1574.7 KiB
23/02/27 14:19:42 WARN DAGScheduler: Broadcasting large task binary with size 1584.2 KiB
23/02/27 14:19:42 WARN DAGScheduler: Broadcasting large task binary with size 1594.0 KiB
23/02/27 14:19:43 WARN DAGScheduler: Broadcasting large task binary with size 1320.7 KiB
23/02/27 14:19:43 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:43 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:43 WAR

                                                                                

23/02/27 14:19:46 WARN DAGScheduler: Broadcasting large task binary with size 1537.8 KiB
23/02/27 14:19:46 WARN DAGScheduler: Broadcasting large task binary with size 1550.2 KiB
23/02/27 14:19:46 WARN DAGScheduler: Broadcasting large task binary with size 1317.4 KiB
23/02/27 14:19:46 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:46 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:46 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:19:49 WARN DAGScheduler: Broadcasting large task binary with size 1537.8 KiB
23/02/27 14:19:50 WARN DAGScheduler: Broadcasting large task binary with size 1550.2 KiB
23/02/27 14:19:50 WARN DAGScheduler: Broadcasting large task binary with size 1562.8 KiB
23/02/27 14:19:50 WARN DAGScheduler: Broadcasting large task binary with size 1574.8 KiB
23/02/27 14:19:50 WARN DAGScheduler: Broadcasting large task binary with size 1328.7 KiB
23/02/27 14:19:50 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:50 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:50 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:19:53 WARN DAGScheduler: Broadcasting large task binary with size 1537.8 KiB
23/02/27 14:19:54 WARN DAGScheduler: Broadcasting large task binary with size 1550.2 KiB
23/02/27 14:19:54 WARN DAGScheduler: Broadcasting large task binary with size 1562.8 KiB
23/02/27 14:19:54 WARN DAGScheduler: Broadcasting large task binary with size 1574.8 KiB
23/02/27 14:19:54 WARN DAGScheduler: Broadcasting large task binary with size 1587.8 KiB
23/02/27 14:19:54 WARN DAGScheduler: Broadcasting large task binary with size 1599.6 KiB
23/02/27 14:19:54 WARN DAGScheduler: Broadcasting large task binary with size 1339.2 KiB
23/02/27 14:19:55 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:55 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:55 WARN DAGScheduler: Broadcasting large task binary with size 1366.1 KiB


                                                                                

23/02/27 14:19:58 WARN DAGScheduler: Broadcasting large task binary with size 1537.8 KiB
23/02/27 14:19:58 WARN DAGScheduler: Broadcasting large task binary with size 1550.2 KiB
23/02/27 14:19:58 WARN DAGScheduler: Broadcasting large task binary with size 1562.8 KiB
23/02/27 14:19:58 WARN DAGScheduler: Broadcasting large task binary with size 1574.8 KiB
23/02/27 14:19:58 WARN DAGScheduler: Broadcasting large task binary with size 1587.8 KiB
23/02/27 14:19:58 WARN DAGScheduler: Broadcasting large task binary with size 1599.6 KiB
23/02/27 14:19:58 WARN DAGScheduler: Broadcasting large task binary with size 1613.7 KiB
23/02/27 14:19:59 WARN DAGScheduler: Broadcasting large task binary with size 1629.5 KiB
23/02/27 14:19:59 WARN DAGScheduler: Broadcasting large task binary with size 1353.4 KiB
23/02/27 14:19:59 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:19:59 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:19:59 WAR

                                                                                

23/02/27 14:20:02 WARN DAGScheduler: Broadcasting large task binary with size 1529.1 KiB
23/02/27 14:20:02 WARN DAGScheduler: Broadcasting large task binary with size 1533.7 KiB
23/02/27 14:20:03 WARN DAGScheduler: Broadcasting large task binary with size 1271.1 KiB
23/02/27 14:20:03 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:20:03 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:20:03 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:20:06 WARN DAGScheduler: Broadcasting large task binary with size 1529.1 KiB
23/02/27 14:20:06 WARN DAGScheduler: Broadcasting large task binary with size 1533.7 KiB
23/02/27 14:20:06 WARN DAGScheduler: Broadcasting large task binary with size 1538.7 KiB
23/02/27 14:20:06 WARN DAGScheduler: Broadcasting large task binary with size 1542.5 KiB
23/02/27 14:20:07 WARN DAGScheduler: Broadcasting large task binary with size 1274.8 KiB
23/02/27 14:20:07 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:20:07 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:20:07 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:20:10 WARN DAGScheduler: Broadcasting large task binary with size 1529.1 KiB
23/02/27 14:20:10 WARN DAGScheduler: Broadcasting large task binary with size 1533.7 KiB
23/02/27 14:20:10 WARN DAGScheduler: Broadcasting large task binary with size 1538.7 KiB
23/02/27 14:20:10 WARN DAGScheduler: Broadcasting large task binary with size 1542.5 KiB
23/02/27 14:20:10 WARN DAGScheduler: Broadcasting large task binary with size 1546.4 KiB
23/02/27 14:20:10 WARN DAGScheduler: Broadcasting large task binary with size 1551.1 KiB
23/02/27 14:20:11 WARN DAGScheduler: Broadcasting large task binary with size 1279.9 KiB
23/02/27 14:20:11 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:20:11 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:20:11 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:20:14 WARN DAGScheduler: Broadcasting large task binary with size 1529.1 KiB
23/02/27 14:20:14 WARN DAGScheduler: Broadcasting large task binary with size 1533.7 KiB
23/02/27 14:20:14 WARN DAGScheduler: Broadcasting large task binary with size 1538.7 KiB
23/02/27 14:20:14 WARN DAGScheduler: Broadcasting large task binary with size 1542.5 KiB
23/02/27 14:20:14 WARN DAGScheduler: Broadcasting large task binary with size 1546.4 KiB
23/02/27 14:20:14 WARN DAGScheduler: Broadcasting large task binary with size 1551.1 KiB
23/02/27 14:20:14 WARN DAGScheduler: Broadcasting large task binary with size 1555.0 KiB
23/02/27 14:20:15 WARN DAGScheduler: Broadcasting large task binary with size 1559.9 KiB
23/02/27 14:20:15 WARN DAGScheduler: Broadcasting large task binary with size 1284.4 KiB
23/02/27 14:20:15 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:20:15 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:20:15 WAR

                                                                                

23/02/27 14:20:18 WARN DAGScheduler: Broadcasting large task binary with size 1534.4 KiB
23/02/27 14:20:18 WARN DAGScheduler: Broadcasting large task binary with size 1543.1 KiB
23/02/27 14:20:18 WARN DAGScheduler: Broadcasting large task binary with size 1294.7 KiB
23/02/27 14:20:18 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:20:19 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:20:19 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:20:22 WARN DAGScheduler: Broadcasting large task binary with size 1534.4 KiB
23/02/27 14:20:22 WARN DAGScheduler: Broadcasting large task binary with size 1543.1 KiB
23/02/27 14:20:22 WARN DAGScheduler: Broadcasting large task binary with size 1554.1 KiB
23/02/27 14:20:22 WARN DAGScheduler: Broadcasting large task binary with size 1563.1 KiB
23/02/27 14:20:22 WARN DAGScheduler: Broadcasting large task binary with size 1303.2 KiB
23/02/27 14:20:22 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:20:22 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:20:23 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:20:26 WARN DAGScheduler: Broadcasting large task binary with size 1534.4 KiB
23/02/27 14:20:26 WARN DAGScheduler: Broadcasting large task binary with size 1543.1 KiB
23/02/27 14:20:26 WARN DAGScheduler: Broadcasting large task binary with size 1554.1 KiB
23/02/27 14:20:26 WARN DAGScheduler: Broadcasting large task binary with size 1563.1 KiB
23/02/27 14:20:26 WARN DAGScheduler: Broadcasting large task binary with size 1570.4 KiB
23/02/27 14:20:26 WARN DAGScheduler: Broadcasting large task binary with size 1578.0 KiB
23/02/27 14:20:26 WARN DAGScheduler: Broadcasting large task binary with size 1312.1 KiB
23/02/27 14:20:27 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:20:27 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:20:27 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:20:30 WARN DAGScheduler: Broadcasting large task binary with size 1534.4 KiB
23/02/27 14:20:30 WARN DAGScheduler: Broadcasting large task binary with size 1543.1 KiB
23/02/27 14:20:30 WARN DAGScheduler: Broadcasting large task binary with size 1554.1 KiB
23/02/27 14:20:30 WARN DAGScheduler: Broadcasting large task binary with size 1563.1 KiB
23/02/27 14:20:30 WARN DAGScheduler: Broadcasting large task binary with size 1570.4 KiB
23/02/27 14:20:30 WARN DAGScheduler: Broadcasting large task binary with size 1578.0 KiB
23/02/27 14:20:30 WARN DAGScheduler: Broadcasting large task binary with size 1587.7 KiB
23/02/27 14:20:31 WARN DAGScheduler: Broadcasting large task binary with size 1597.6 KiB
23/02/27 14:20:31 WARN DAGScheduler: Broadcasting large task binary with size 1320.5 KiB
23/02/27 14:20:31 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:20:31 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:20:31 WAR

                                                                                

23/02/27 14:20:34 WARN DAGScheduler: Broadcasting large task binary with size 1539.6 KiB
23/02/27 14:20:34 WARN DAGScheduler: Broadcasting large task binary with size 1552.5 KiB
23/02/27 14:20:35 WARN DAGScheduler: Broadcasting large task binary with size 1317.9 KiB
23/02/27 14:20:35 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:20:35 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:20:35 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:20:38 WARN DAGScheduler: Broadcasting large task binary with size 1539.6 KiB
23/02/27 14:20:38 WARN DAGScheduler: Broadcasting large task binary with size 1552.5 KiB
23/02/27 14:20:38 WARN DAGScheduler: Broadcasting large task binary with size 1566.4 KiB
23/02/27 14:20:38 WARN DAGScheduler: Broadcasting large task binary with size 1580.2 KiB
23/02/27 14:20:38 WARN DAGScheduler: Broadcasting large task binary with size 1330.0 KiB
23/02/27 14:20:39 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:20:39 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:20:39 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:20:42 WARN DAGScheduler: Broadcasting large task binary with size 1539.6 KiB
23/02/27 14:20:42 WARN DAGScheduler: Broadcasting large task binary with size 1552.5 KiB
23/02/27 14:20:42 WARN DAGScheduler: Broadcasting large task binary with size 1566.4 KiB
23/02/27 14:20:42 WARN DAGScheduler: Broadcasting large task binary with size 1580.2 KiB
23/02/27 14:20:42 WARN DAGScheduler: Broadcasting large task binary with size 1591.0 KiB
23/02/27 14:20:43 WARN DAGScheduler: Broadcasting large task binary with size 1605.9 KiB
23/02/27 14:20:43 WARN DAGScheduler: Broadcasting large task binary with size 1343.1 KiB
23/02/27 14:20:43 WARN DAGScheduler: Broadcasting large task binary with size 1227.0 KiB
23/02/27 14:20:43 WARN DAGScheduler: Broadcasting large task binary with size 1227.1 KiB
23/02/27 14:20:43 WARN DAGScheduler: Broadcasting large task binary with size 1366.0 KiB


                                                                                

23/02/27 14:20:46 WARN DAGScheduler: Broadcasting large task binary with size 1539.6 KiB
23/02/27 14:20:46 WARN DAGScheduler: Broadcasting large task binary with size 1552.5 KiB
23/02/27 14:20:46 WARN DAGScheduler: Broadcasting large task binary with size 1566.4 KiB
23/02/27 14:20:47 WARN DAGScheduler: Broadcasting large task binary with size 1580.2 KiB
23/02/27 14:20:47 WARN DAGScheduler: Broadcasting large task binary with size 1591.0 KiB
23/02/27 14:20:47 WARN DAGScheduler: Broadcasting large task binary with size 1605.9 KiB
23/02/27 14:20:47 WARN DAGScheduler: Broadcasting large task binary with size 1620.3 KiB
23/02/27 14:20:47 WARN DAGScheduler: Broadcasting large task binary with size 1637.3 KiB
23/02/27 14:20:47 WARN DAGScheduler: Broadcasting large task binary with size 1357.0 KiB
23/02/27 14:20:47 WARN DAGScheduler: Broadcasting large task binary with size 1193.3 KiB
23/02/27 14:20:47 WARN DAGScheduler: Broadcasting large task binary with size 1193.4 KiB
23/02/27 14:20:47 WAR

                                                                                

23/02/27 14:20:51 WARN DAGScheduler: Broadcasting large task binary with size 1513.3 KiB


                                                                                

23/02/27 14:20:52 WARN DAGScheduler: Broadcasting large task binary with size 1516.9 KiB
23/02/27 14:20:52 WARN DAGScheduler: Broadcasting large task binary with size 1521.3 KiB
23/02/27 14:20:52 WARN DAGScheduler: Broadcasting large task binary with size 1526.2 KiB
23/02/27 14:20:52 WARN DAGScheduler: Broadcasting large task binary with size 1533.0 KiB
23/02/27 14:20:52 WARN DAGScheduler: Broadcasting large task binary with size 1536.9 KiB
23/02/27 14:20:52 WARN DAGScheduler: Broadcasting large task binary with size 1540.4 KiB
23/02/27 14:20:52 WARN DAGScheduler: Broadcasting large task binary with size 1544.7 KiB


In [50]:
predictions = rf_model.transform(test_data)

In [53]:
accuracy = evaluator.evaluate(predictions)
print("Accuracy = %g" % (accuracy))

23/02/27 14:21:06 WARN DAGScheduler: Broadcasting large task binary with size 1256.4 KiB
Accuracy = 0.885423


## Compare models

In [62]:
from pyspark.ml.evaluation import MulticlassClassificationEvaluator

model_names = ["Logistic regression", "Decision tree", "Naive bayes", "Random forest"]
models = [log_reg_model, dt_model, nb_model, rf_model]

# Initialize the evaluators
evaluator_f1 = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="f1")
evaluator_acc = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="accuracy")
evaluator_pres = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="weightedPrecision")
evaluator_recall = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="weightedRecall")

# Create a list of dictionaries to store the results
results = []
for i, model in enumerate(models):
    # Evaluate the model on the test set
    test_predictions = model.transform(test_data)
    f1_score = evaluator_f1.evaluate(test_predictions)
    accuracy = evaluator_acc.evaluate(test_predictions)
    precision = evaluator_pres.evaluate(test_predictions)
    recall = evaluator_recall.evaluate(test_predictions)
    # Add the results to the list
    results.append({
        "Model": model_names[i],
        "F1 Score": f1_score,
        "Accuracy": accuracy,
        "Precision": precision,
        "Recall": recall
    })


23/02/27 14:32:24 WARN DAGScheduler: Broadcasting large task binary with size 1323.2 KiB
23/02/27 14:32:24 WARN DAGScheduler: Broadcasting large task binary with size 1323.2 KiB
23/02/27 14:32:24 WARN DAGScheduler: Broadcasting large task binary with size 1323.2 KiB
23/02/27 14:32:24 WARN DAGScheduler: Broadcasting large task binary with size 1323.2 KiB
23/02/27 14:32:24 WARN DAGScheduler: Broadcasting large task binary with size 1218.5 KiB
23/02/27 14:32:25 WARN DAGScheduler: Broadcasting large task binary with size 1218.5 KiB
23/02/27 14:32:25 WARN DAGScheduler: Broadcasting large task binary with size 1218.5 KiB
23/02/27 14:32:25 WARN DAGScheduler: Broadcasting large task binary with size 1218.5 KiB
23/02/27 14:32:25 WARN DAGScheduler: Broadcasting large task binary with size 1421.9 KiB
23/02/27 14:32:25 WARN DAGScheduler: Broadcasting large task binary with size 1421.9 KiB
23/02/27 14:32:25 WARN DAGScheduler: Broadcasting large task binary with size 1421.9 KiB
23/02/27 14:32:26 WAR

In [63]:
# Print the results table
print("{:<25} {:<10} {:<10} {:<10} {:<10}".format('Model', 'F1 Score', 'Accuracy', 'Precision', 'Recall'))
print("=" * 90)
for result in results:
    print("{:<25} {:<10.2f} {:<10.2f} {:<10.2f} {:<10.2f}".format(result['Model'], result['F1 Score'], result['Accuracy'], result['Precision'], result['Recall']))

Model                     F1 Score   Accuracy   Precision  Recall    
Logistic regression       0.98       0.98       0.98       0.98      
Decision tree             0.92       0.93       0.92       0.93      
Naive bayes               0.92       0.92       0.94       0.92      
Random forest             0.84       0.89       0.90       0.89      


We can notice that the logistic regression model performed the best with an F1 score of 0.98, a precision of 0.98 and a recall of 0.98. This means that the model did very well in classifying SMS as spam or healthy.

The decision tree model also performed well with an F1 score of 0.92, precision of 0.92 and recall of 0.93.

However, the Naive Bayes model performed similarly to the decision tree in terms of F1 score and recall, but had slightly higher precision.

The Random Forest model performed worse than the other models with an F1 score of 0.84, precision of 0.90 and recall of 0.89.

In general, the choice of model will depend on the priorities of the application. If precision is the priority, then the Naive Bayes model may be preferred, but if recall is the priority, the decision tree model may be preferred. However, if balancing precision and recall is the priority, then the logistic regression model may be chosen due to its superior overall performance.