### Youtube Comments Analysis

The dataset provided for this project are comments for videos related to animals and/or pets. The dataset is 240MB compressed; it was download from the below google drive link:
https://drive.google.com/file/d/1o3DsS3jN_t2Mw3TsV0i7ySRmh9kyYi1a/view?usp=sharing

In [3]:
# link: https://drive.google.com/file/d/1o3DsS3jN_t2Mw3TsV0i7ySRmh9kyYi1a/view?usp=sharing
# need to install googledrivedownloader==0.4 on cluster beforehand
from google_drive_downloader import GoogleDriveDownloader as gdd
gdd.download_file_from_google_drive(file_id='1o3DsS3jN_t2Mw3TsV0i7ySRmh9kyYi1a', dest_path='./../../dbfs/laioffer/spark_hw3/data/animal_comments.gz')



In [4]:

%sh
gunzip -k /dbfs/laioffer/spark_hw3/data/animal_comments.gz
ls ../../dbfs/laioffer/spark_hw3/data/

In [5]:
%sh
cd /dbfs/laioffer/spark_hw3/data/

In this section, we download the data from googledrive using a google downloader and then unzip the file. The files are now in /dbfs/laioffer/spark_hw3/data/animal_comments

#### 0. Data Exploration and Cleaning

In this section, we are going to first do a preliminary exploration of the original dataset and drops all the null rows from the original dataset.

In [9]:
from pyspark.sql import SparkSession
spark = SparkSession \
    .builder \
    .appName("youtube analysis") \
    .config("spark.some.config.option", "some-value") \
    .getOrCreate()

In [10]:
df_clean = spark.table("animal_comments")
df_clean.show(10)

In [11]:
df_clean.count() 

In [12]:
# Dropping rows containing null values.
df_clean = df_clean.na.drop(subset=["comment"])
df_clean.count()

In [13]:
df_clean.show()

In [14]:
# find user with preference of dog and cat
from pyspark.sql.functions import when
from pyspark.sql.functions import col

# you can user your ways to extract the label

df_clean = df_clean.withColumn("label", \
                           (when(col("comment").like("%my dog%"), 1) \
                           .when(col("comment").like("%I have a dog%"), 1) \
                           .when(col("comment").like("%my cat%"), 1) \
                           .when(col("comment").like("%I have a cat%"), 1) \
                           .when(col("comment").like("%my puppy%"), 1) \
                           .when(col("comment").like("%my pup%"), 1) \
                           .when(col("comment").like("%my kitty%"), 1) \
                           .when(col("comment").like("%my pussy%"), 1) \
                           .otherwise(0)))

In [15]:
df_clean
display(df_clean)

creator_name,userid,comment,label
Doug The Pug,87.0,I shared this to my friends and mom the were lol,0
Doug The Pug,87.0,Super cute 😀🐕🐶,0
bulletproof,530.0,stop saying get em youre literally dumb . have some common sense or dont own this kind of dog. fucking retarded I swear,0
Meu Zoológico,670.0,Tenho uma jiboia e um largato,0
ojatro,1031.0,I wanna see what happened to the pigs after that please,0
Tingle Triggers,1212.0,Well shit now Im hungry,0
Hope For Paws - Official Rescue Channel,1806.0,when I saw the end it said to adopt I saw different animal sites I was mad that they separated the cute little pups after being together for a long time,0
Hope For Paws - Official Rescue Channel,2036.0,Holy crap. That is quite literally the most adorable pup Ive ever seen.,0
Life Story,2637.0,武器はクエストで貰えるんじゃないんですか？,0
Brian Barczyk,2698.0,Call the teddy Larry,0


In [16]:
print("have pet:", df_clean.filter(col('label') == 1.0).distinct().count())
print("no pet:", df_clean.filter(col('label') == 0.0).distinct().count())

In the above cell, I mannually labeled the users with comments indicated that they are pet owners. And I find out the ration between pet owners and non pet owners are around 1:100(40110:5717102).

#### 1. Data preprocessing and Build the classifier

In the below section, I first used a regexTokenizer to tokenize and convert the sentence in the comments into words and remove the gaps. And then used Word2Vec to do a nueral words embeddings. The outputs are feature vectors.

In [20]:
from pyspark.ml.feature import RegexTokenizer, Word2Vec
from pyspark.ml.classification import LogisticRegression

# regular expression tokenizer
regexTokenizer = RegexTokenizer(inputCol="comment", outputCol="words", pattern="\\W")

word2Vec = Word2Vec(inputCol="words", outputCol="features")

In [21]:
from pyspark.ml import Pipeline

pipeline = Pipeline(stages=[regexTokenizer, word2Vec])

# Fit the pipeline to training documents.
pipeline_fit = pipeline.fit(df_clean)
dataset = pipeline_fit.transform(df_clean)

In [22]:
dataset.show()

In order to train and test the model, we need to first split the data into training and test datasets. However, in the previous section, we have already known that the data is very imbalance. We have very few rows labeled with 1, which is pet owners. And we have majority rows label with 0. The imbalance data will be a big obsticle in training classify models. In order to solve this problem, we are randomly undersample the data with 0 labels.

In [24]:
(lable1_train,lable1_test)=dataset.filter(col('label')==1).randomSplit([0.7, 0.3],seed = 100)
(lable0_train, lable0_ex)=dataset.filter(col('label')==0).randomSplit([0.01, 0.99],seed = 100)
(lable0_test, lable0_ex2)=lable0_ex.randomSplit([0.004, 0.996],seed = 100)


In [25]:
trainingData = lable0_train.union(lable1_train)
testData=lable0_test.union(lable1_test)

In [26]:
print("Dataset Count: " + str(dataset.count()))
print("Training Dataset Count: " + str(trainingData.count()))
print("Test Dataset Count: " + str(testData.count()))

##### LogisticRegression

In the below section, we build a logistic regression model to classify the pet owner based on their youtube comments. We first trained the model using the trainning datas and testing datas. Then we used paramgridbuilder and crossvalidator packages to tune and validate the model.

In [29]:
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.tuning import CrossValidator,ParamGridBuilder
from pyspark.ml.evaluation import BinaryClassificationEvaluator

#build a logistic model 
lr = LogisticRegression(featuresCol="features",labelCol="label" )

#train model
lrModel = lr.fit(trainingData)

#tune the model with max iterator, regulation parameter and elastic net alpha
param_grid = ParamGridBuilder().addGrid(lr.maxIter,[5, 10]).addGrid(lr.regParam, [0.1, 0.01, 0.001]).addGrid(lr.elasticNetParam, [0.75, 0.8, 0.85]).build()

#set up the evaluator and define the proper metrics as AUC
evaluator = BinaryClassificationEvaluator(labelCol="label", rawPredictionCol="rawPrediction", metricName="areaUnderROC")

#5-fold cross validation
crossval = CrossValidator(estimator=lr,
                          estimatorParamMaps=param_grid,
                          evaluator=evaluator,
                          numFolds=5)

In [30]:
#fit the model and extract the best model
model = crossval.fit(trainingData)
best_model = model.bestModel

In [31]:
#use best model to predict the test data
predictions=best_model.transform(testData)
predictions.show(10)

In [32]:

print ("**Best Model**")
print (" Elastic:" +str(best_model._java_obj.parent().getElasticNetParam()))
print (" MaxIter:" +str(best_model._java_obj.parent().getMaxIter()))
print (" RegParam:" + str(best_model._java_obj.parent().getRegParam()))

In [33]:
training_summary = best_model.summary
training_summary.roc.show()

In [34]:
print("areaUnderROC(AUC): " + str(training_summary.areaUnderROC))

In [35]:
predictions.show(10)


In the above section, we have already train, test and tune the model. Next, we will evaluate the extracted best logistic model with accuracy, precision, recall and AUC.

In [37]:
TP = predictions.filter((col("label") == 1) & (col("prediction") == 1)).count()
FP = predictions.filter((col("label") == 0) & (col("prediction") == 1)).count()
TN = predictions.filter((col("label") == 0) & (col("prediction") == 0)).count()
FN = predictions.filter((col("label") == 1) & (col("prediction") == 0)).count()

accuracy = (TP + TN) / (TP + FP + TN + FN)
precision = TP / (TP + FP)
recall = TP / (TP + FN)

print ("True Positives:", TP)
print ("False Positives:", FP)
print ("True Negatives:", TN)
print ("False Negatives:", FN)
print ("Test Accuracy:", accuracy)
print ("Test Precision:", precision)
print ("Test Recall:", recall)
print ("Test AUC of ROC:", training_summary.areaUnderROC)

##### RandomForest

After using the logistic model, we then try to use the random forest model. We will repeat the same train test procedures as logistic model in this part.

In [40]:
from pyspark.ml.classification import RandomForestClassifier

#build a logistic model 
rf = RandomForestClassifier(labelCol="label" , featuresCol="features", numTrees=10)

#tune the model with max iterator, regulation parameter and elastic net alpha
param_grid = ParamGridBuilder().addGrid(rf.numTrees, [5, 10]).addGrid(rf.maxDepth, [3, 4, 5]).build()

#set up the evaluator and define the proper metrics as AUC
evaluator = BinaryClassificationEvaluator(labelCol="label", rawPredictionCol="rawPrediction", metricName="areaUnderROC")

#5-fold cross validation
crossval = CrossValidator(estimator=rf, estimatorParamMaps=param_grid, evaluator=evaluator, numFolds=5)

model = crossval.fit(trainingData)

#extract the best model
best_model = model.bestModel

#predict with test data
predictions=best_model.transform(testData)
predictions.show(10)

In [41]:
print ("**Best Model**")
print (" numTree:" +str(best_model._java_obj.parent().getNumTrees()))
print (" RegParam:" + str(best_model._java_obj.parent().getMaxDepth()))

In [42]:
from pyspark.ml.evaluation import BinaryClassificationEvaluator

evaluator = BinaryClassificationEvaluator(labelCol="label", rawPredictionCol="rawPrediction", metricName="areaUnderROC")
AUC = evaluator.evaluate(predictions)

In [43]:
TP = predictions.filter((col("label") == 1) & (col("prediction") == 1)).count()
FP = predictions.filter((col("label") == 0) & (col("prediction") == 1)).count()
TN = predictions.filter((col("label") == 0) & (col("prediction") == 0)).count()
FN = predictions.filter((col("label") == 1) & (col("prediction") == 0)).count()

accuracy = (TP + TN) / (TP + FP + TN + FN)
precision = TP / (TP + FP)
recall = TP / (TP + FN)

print ("True Positives:", TP)
print ("False Positives:", FP)
print ("True Negatives:", TN)
print ("False Negatives:", FN)
print ("Test Accuracy:", accuracy)
print ("Test Precision:", precision)
print ("Test Recall:", recall)
print ("Test AUC of ROC:", AUC)

##### Gradient boosting

In this part, we will try to train a Gradiant boosting tree model.

In [46]:
from pyspark.ml.classification import GBTClassifier

GDBT= GBTClassifier(labelCol="label" , featuresCol="features")


param_grid = ParamGridBuilder().addGrid(GDBT.stepSize, [0.0001, 0.001, 0.01, 0.1, 0.2, 0.3]).addGrid(GDBT.maxDepth, [3, 4, 5]).build()

evaluator = BinaryClassificationEvaluator(labelCol="label", rawPredictionCol="rawPrediction", metricName="areaUnderROC")

crossval = CrossValidator(estimator=GDBT,
                          estimatorParamMaps=param_grid,
                          evaluator=evaluator,
                          numFolds=5)
model = crossval.fit(trainingData)
best_model = model.bestModel
predictions=best_model.transform(testData)
predictions.show(10)

In [47]:
print ("**Best Model**")
print (" learning rate:" +str(best_model._java_obj.parent().getStepSize()))
print (" max depth:" + str(best_model._java_obj.parent().getMaxDepth()))

In [48]:
from pyspark.ml.evaluation import BinaryClassificationEvaluator

evaluator = BinaryClassificationEvaluator(labelCol="label", rawPredictionCol="rawPrediction", metricName="areaUnderROC")
AUC = evaluator.evaluate(predictions)

In [49]:
TP = predictions.filter((col("label") == 1) & (col("prediction") == 1)).count()
FP = predictions.filter((col("label") == 0) & (col("prediction") == 1)).count()
TN = predictions.filter((col("label") == 0) & (col("prediction") == 0)).count()
FN = predictions.filter((col("label") == 1) & (col("prediction") == 0)).count()

accuracy = (TP + TN) / (TP + FP + TN + FN)
precision = TP / (TP + FP)
recall = TP / (TP + FN)

print ("True Positives:", TP)
print ("False Positives:", FP)
print ("True Negatives:", TN)
print ("False Negatives:", FN)
print ("Test Accuracy:", accuracy)
print ("Test Precision:", precision)
print ("Test Recall:", recall)
print ("Test AUC of ROC:", AUC)

Since the best Gradient Boosting model has a test AUC of 0.97, the highest among all the three models. We choose the best Gradient Boosting model with learning rate of 0.3 and max iteration of 5.

#### 2. Classify All The Users

In this section, since the users labeled 1 are those we are sure that they are pet onwers. And the users labeled 0 are the one did not indicate they are onwers. They probabely are not pet onwers but we are not sure. So we put datas labled 0 into the best model to classify those users. After the classification, we then combine the newly classified set and the previously labled set to get all the pet onwers in the dataset.

In [53]:
df_all0 = dataset.filter((col("label") == 0))
display(df_all0)

creator_name,userid,comment,label,words,features
Doug The Pug,87.0,I shared this to my friends and mom the were lol,0,"List(i, shared, this, to, my, friends, and, mom, the, were, lol)","List(1, 100, List(), List(0.01784441399980675, -0.020379101772877304, 0.12678783395412294, -0.03029621936465529, -0.017744215416975996, 0.022211988710544327, -0.11156877909194339, 0.1452448835427111, 0.11735361069440842, 0.025073884224349804, -0.10923891103911129, 0.018014919690110466, -0.08526893506165255, -0.11025032222228633, -0.11347918474877423, -0.009192273698069834, -0.057526062581349506, 0.061800442043353214, -0.12616047330877997, -0.058853128416971726, 0.007348398805003275, 0.015894510867920788, 0.03985926652835174, -0.04098683371293274, 0.0723224056260237, -0.05881796857680787, 0.10799326906404035, 0.08547606941481883, -0.034434194189750335, 0.050004090055484667, 0.0025668262758038263, 0.05669034407897429, 0.019723338244313545, 0.01534969667607749, -0.01758848630230535, -0.06886818903413686, 0.056710246950387955, 0.036821977997368034, -0.048310428519140594, 0.21720414032990282, -0.12300450515679338, -0.04557176570365713, -0.02733937947248871, -0.020434588532556187, 0.093020345677029, 0.011690069836649029, 0.030635233291170814, -0.13460135779512877, 0.020283695500852034, 0.016415423967621544, -0.11034134131940929, -0.19184056059880691, -0.1208180675113743, 0.07561470805244012, 0.24518262865868481, -0.07774396096779541, 0.07317259772257372, -0.20141763443296606, 0.019589733671058308, 0.004427224820987745, 0.06102355414027856, 0.04739846458489245, 0.16153187728063628, 0.03162644363262437, -0.012427768077362667, 0.006310680373148485, 0.13922911255874418, 0.06705699482170696, 0.11559060859409245, 0.03718750894239003, 0.034705567554655405, -0.0036969812912998764, 0.14047886058688164, -0.010464219884438948, -0.13436831580474973, -0.01427018828690052, 0.05217442593791268, 0.20534982532262802, 0.0060194998302242975, -0.007256170819428834, 0.0036896970123052597, -0.03589680528437549, 0.027363426475362346, -0.0011968112296678803, -0.03598377921364524, 0.06176092894747853, -0.03407896360890432, -0.032485040904165624, -0.14415372704917734, 0.09035833958875049, 0.03175183148546652, -0.11052434180270541, -0.1539065070788969, 0.02812724154103886, 0.038774443472820254, 0.054216154935685074, -6.559804420579563E-4, -0.06792547320947051, -0.04848683320663192, 0.031643440227278254))"
Doug The Pug,87.0,Super cute 😀🐕🐶,0,"List(super, cute)","List(1, 100, List(), List(-0.2883199453353882, -0.03971422091126442, 0.26716596633195877, 0.17103654332458973, 0.2350746262818575, 0.10617415606975555, 0.30452148616313934, 0.17845770105486736, -0.023327077738940716, 0.300587996840477, -0.2319678533822298, 0.22827358543872833, 0.2698281481862068, 0.1508161723613739, -0.1575203153770417, 0.04653379041701555, -0.016199510544538498, -0.2715160623192787, 0.07917454093694687, 0.171640757471323, 0.08451637253165245, 0.4065053462982178, 0.31024640426039696, -0.1791418492794037, -0.3042733445763588, -0.11411438323557377, 0.17948328703641891, -0.1565072238445282, 0.14295890927314758, -0.002457171678543091, -0.1484678965061903, 0.41716181486845016, -0.16310758143663406, -0.3879905492067337, 0.10439009498804808, 0.0014057625085115433, 0.2724960520863533, -0.0018823854625225067, 0.08369084633886814, 0.1448155753314495, -0.03386777639389038, 0.11670710518956184, -0.0755307525396347, 0.052051153033971786, -0.08758007176220417, -0.02676476538181305, 0.08695698529481888, -0.11278240010142326, 0.4293353706598282, -0.13709503412246704, -0.050914641469717026, 0.12392759881913662, -0.045077405869960785, -0.0732060894370079, 0.12585338577628136, 0.08409356884658337, 0.2712007239460945, 0.10406625643372536, 0.20672351121902466, 0.13562875613570213, 0.24732454866170883, 0.07576844841241837, -0.04651647061109543, 0.011338777840137482, 0.2172185704112053, -0.09791882336139679, -0.0460000354796648, 0.21088873967528343, 0.044737400487065315, -0.18638974241912365, -0.06498082354664803, 0.20667102932929993, -0.17841801792383194, -0.22705168277025223, -0.4249563291668892, 0.034786973148584366, 0.18176992610096931, 0.06318297516554594, -0.010194666683673859, 0.1647276058793068, 0.2260626144707203, -0.23128195945173502, 0.1281356979161501, -0.07588844373822212, 0.026304202154278755, 0.22604744881391525, -0.018072262406349182, 0.4002431035041809, 0.03795202076435089, 0.1791907474398613, 0.12564293667674065, -0.16447316110134125, 0.2370682805776596, -0.10017823334783316, 0.07772793640833697, 0.103927843272686, -0.11603093147277832, 0.05644687544554472, 0.01362236263230443, -0.02477256953716278))"
bulletproof,530.0,stop saying get em youre literally dumb . have some common sense or dont own this kind of dog. fucking retarded I swear,0,"List(stop, saying, get, em, youre, literally, dumb, have, some, common, sense, or, dont, own, this, kind, of, dog, fucking, retarded, i, swear)","List(1, 100, List(), List(0.08585900381546129, 0.04208597424440086, 0.08249367206272754, -0.008411236573010683, 0.008137771371614443, 0.04400505489585075, -0.12954170151550154, 0.08466068710285155, 0.049628152046352625, -0.03674552877518264, -0.1931223515421152, 0.03454256109596992, 0.004359977203421295, 0.052314599086953836, -0.08042790660295975, -0.060365692670033735, -0.028364144561981615, 0.03881079259074547, -0.137428077805618, 0.011922013903544708, -0.06579695318148218, 0.14349806016649713, 0.10307211126200855, 0.057865193081935024, -0.05760633731684224, -0.004308296706188809, 0.030147163111740323, -0.02772899361496622, -0.044461995236236944, 0.07464848943478004, -0.1025375396791126, -0.061319230293685745, -0.003744677063712681, 0.045211446386846633, -0.08734069097871808, -0.06354627445001494, 0.04449668763713403, -0.06389527591487752, -0.004942479363473301, 0.1085072738033804, -0.08428703633729707, -0.04533675935288722, -0.03477057264271108, 0.020301252297940664, -0.09993949426676739, -0.05200714228505438, -8.167975836179473E-4, 0.0020017583261836658, -0.2433781647546725, -0.12385272404009647, -0.05547042743472213, -0.09662072906609286, -0.024695127077972178, 0.0369223722002723, 0.1940999547527595, -0.028215138749642807, 0.02917546033859253, -0.05291050844508308, 0.03138770738785917, 0.0881773603826084, 0.040544234247962864, -0.029662308283150196, 0.008130689701912079, 0.030083369421349333, -0.10090587461705912, -0.0900836932388219, 0.1965838595378128, 7.846931164914912E-4, 0.03448068718849258, -0.023970582035624168, -0.042764452645893805, 0.02005134122869508, 0.012764517556537281, 0.0037613090297037906, -0.09222409789535133, -0.131900544045493, -0.025915199890732765, 0.029729792001572525, -0.05215729853476991, -0.01994773152876984, 0.11495871003717184, -0.0674022606320002, 0.055880339054221455, -0.00618245176685212, 0.04374877668239854, -0.026285359254953535, 0.06398306295953014, -0.12871376814490015, -0.1747991120739078, 0.10462294720028611, -0.042684632395817476, -0.2186086117713289, -0.11873984269120477, 0.010780376805500551, 0.18443348707461898, 0.11128544346006079, -0.01684554750946435, -0.15458258738825945, 0.0743878089230169, 0.08705778494053944))"
Meu Zoológico,670.0,Tenho uma jiboia e um largato,0,"List(tenho, uma, jiboia, e, um, largato)","List(1, 100, List(), List(0.28823004026586807, -0.32453093243141967, 0.12151295971125364, -0.3926486996933818, 0.18536886821190515, 0.04811889259144664, 0.30825150913248456, -0.011646497839440901, 0.21095866026977697, -0.09620827188094457, 0.332518607378006, 0.15809332331021625, 0.6045018583536148, -0.08421315625309944, 0.10595400111439326, 0.014667528759067256, -0.3235469857851664, -0.16155724351604778, -0.24601540683458248, -0.0840630008218189, -0.15877828126152355, 0.45900797098875046, 0.11396388057619333, -0.031328052282333374, 0.1850082216163476, -0.11433597778280576, 0.030652588760858634, -0.016583967643479504, 0.4181657458345095, 0.1480483526053528, 0.24245671307047206, -0.25848883017897606, 0.5652512808640797, 0.1993839293718338, -0.44100701312224067, 0.10414692192959288, 0.06612642730275789, -0.40891249353686965, 0.048866581792632736, -0.433157796661059, -0.02461772753546635, 0.25362938145796454, 0.052562362203995384, -0.33123183250427246, -0.33251720666885376, -0.021200658132632572, 0.10228040766863462, 0.2979748720924059, 0.32874753636618453, -0.01194591944416364, 0.25278356422980625, 0.41863887260357535, 0.42658789207537967, 0.09255454565087953, 0.11989557192039986, 0.05145480755406121, 0.010159508402769763, -0.4485742876616617, -0.26301706333955127, -0.4354307229320208, 0.24161480863889057, -0.22051537859564024, -0.17520783149908917, 0.3449477801720301, -0.320939173301061, -0.02829432673752308, -0.17417717445641756, -0.11967955343425274, -0.26035069550077117, 0.2733688559383154, -0.3521327767521143, -0.3359258820613225, -0.1105911685153842, -0.40241974343856174, 0.36800799270470935, -0.4450947940349579, 0.23914000516136485, -0.33817852164308226, -0.07987665796341994, -0.18445014697499573, 0.1329975943081081, 0.24943491847564775, -0.06704528133074442, 0.21501761302351952, -0.19014255500709015, 0.47341155757506687, -0.018885438640912373, 0.050146427781631545, 0.2714543516437212, 0.03757862956263125, 0.0706576673934857, 0.12539308983832598, 0.43872160961230594, -0.3500975507001082, -0.6654191936055819, -0.11719376997401316, 0.05279066041111946, -0.5481110823651154, -0.11700985064574827, 0.21837918025751907))"
ojatro,1031.0,I wanna see what happened to the pigs after that please,0,"List(i, wanna, see, what, happened, to, the, pigs, after, that, please)","List(1, 100, List(), List(0.042817209593274376, -0.015012672424993732, 0.01927428125319156, -0.06028429147872058, 0.07104550769806586, 0.011958611282435331, -0.13548468798398972, 0.09437937865203078, -0.014831292019648987, -0.02384819361296567, -0.14015533131631938, 0.034354494884610176, 0.04264847673899071, -0.05787822214717215, -0.13563473327932032, -0.016594090033322573, -0.034157192673195495, 0.10670426691120322, -0.15979437015696682, 0.06272807243195447, -0.06855261181573787, 0.04310772576454011, -0.03230928604237058, -0.028790740465576, -0.0637180157460865, -0.05795184912329371, 0.10774460282515395, -0.0579472160898149, -0.007871851854195649, 0.07959682121872902, -0.010653612288561735, -0.01401199535890059, 0.02424312631641938, 0.058793571502478284, -0.011164892380210487, -0.06277390992776914, -0.0231837679378011, -0.06673655049367384, -0.0770063624209301, 0.12793844277885827, -0.1057710735635324, -0.02436069670048627, 0.0767031532119621, 0.008414155549623749, 0.059272418983957985, -0.03694230385802009, 0.08024492419578813, 0.0712669680572369, -0.010868507606739347, -0.0070401450449770146, 0.021143624890886382, -0.2765934697606347, -0.11498213830319318, -0.0016246668317101219, 0.005139282481236892, -0.05033673517358363, 0.07389707787131722, -0.0736122260039503, 0.08809690041975542, -0.023387845436280426, 0.05932074663525617, -0.016515642404556274, 0.09660727395252748, -0.010589544407346033, 0.006970954974266616, -0.0840985685671595, 0.12387748384340243, 0.09998970952900973, 0.015124801207672466, 0.19657938192937185, -0.0794581109022891, 0.06186794557354667, 0.18497282740744678, -0.018481158566745846, -0.11906917067244649, -0.14833605153994128, 0.049253190105611626, 0.13460026410492984, 0.09517664479261095, -0.034416083923795006, 0.10392725349149921, -0.2371203564107418, 0.08185868270017885, 0.018137062815102665, 0.09835339219055393, -0.033361030954190275, -0.03254643827676773, -0.10830833524166586, -0.10770199160006913, 0.156005780805241, 9.573648937723854E-4, -0.1742777916687456, -0.1867713912771168, 0.10794603079557419, 0.08145058802752332, 0.03674917710437016, -0.06851461123336446, -0.07041944025761702, 0.08406129851937294, -0.058400989954613826))"
Tingle Triggers,1212.0,Well shit now Im hungry,0,"List(well, shit, now, im, hungry)","List(1, 100, List(), List(-0.034047055244445804, -0.06653760075569153, 0.09781830608844758, 0.02079472839832306, 0.14327257052063944, -0.04297992140054703, -3.8585364818573E-4, 0.07819402068853379, 0.16029004827141763, 0.15125021189451218, -0.23241555392742158, 0.09723891690373421, 0.12308629006147385, 0.0317805789411068, 0.03275787318125367, -0.11917050033807755, -0.06467104256153107, -0.034539140202105044, -0.062004129588603976, -0.036856125295162204, -0.04603645205497742, 0.0031181126832962036, 0.13887058496475221, -0.036994665116071704, -0.00485038529150188, -0.19836164638400078, 0.06248506456613541, 0.0951465129852295, -0.0148252472281456, 0.06730623096227646, -0.11981069706380368, -0.013956573605537415, -0.03280579149723053, -0.2381417602300644, 0.006836168840527535, 0.008548136055469514, -0.003861731290817261, -0.11204413548111916, 0.0203007809817791, 0.3072248484939337, 0.060469505935907365, -0.13499518595635893, 0.09859980568289757, 0.09637097399681807, -0.07224482595920563, 0.16189672499895097, 0.020810607820749283, -0.010273276269435883, -0.07309120297431947, 0.001907344162464142, -0.07013193159364164, -0.17279730662703516, -0.0743634257465601, -0.06256519854068757, 0.16481812670826912, -0.060364498198032385, 0.08280292553827168, -0.16718699894845487, 0.0735963448882103, -0.03195268660783768, 0.001643351942766458, -0.16110493689775468, 0.14026555344462396, -0.10241281539201737, -0.03470916599035263, -0.17132967039942743, 0.2516831271350384, -0.022276724502444268, -0.0041623409837484365, -0.30073073059320454, -0.08299471586942674, 0.06528998464345932, -0.026103063672780993, 0.1649008193984628, -0.06280723884701729, -0.16998861217871308, 0.13783688172698022, 0.0064664877951145176, 0.02554207965731621, 0.12370595410466195, 0.11304216980934144, -0.033051708713173866, -0.1660622775554657, 0.029700400680303576, -0.02672766819596291, 0.08484700024127961, 4.2596533894538884E-4, -0.13887673765420913, -0.1742345690727234, 0.12264016792178155, 0.008312474191188813, -0.25025034248828887, 0.09600013457238675, 0.2263250455260277, 0.11535194739699364, 0.13138854503631592, -0.20214648246765138, -0.12556779459118844, 0.0010377878788858652, 0.2571211729198694))"
Hope For Paws - Official Rescue Channel,1806.0,when I saw the end it said to adopt I saw different animal sites I was mad that they separated the cute little pups after being together for a long time,0,"List(when, i, saw, the, end, it, said, to, adopt, i, saw, different, animal, sites, i, was, mad, that, they, separated, the, cute, little, pups, after, being, together, for, a, long, time)","List(1, 100, List(), List(0.04950299204867934, -0.07334895347875933, 0.04444359466733951, -0.05763527192175388, 0.09713920240380591, 0.061312827634655176, -0.12214255104622533, 0.058524883562518705, 0.01790244651267365, 0.026558385981667425, -0.15350745594309223, 0.006281442196679211, 0.06058827560064533, 0.002550971093437364, -0.10647670275929774, 0.024572697761017948, -0.05724374655513994, 0.10766194478398369, -0.05193761706111893, 0.03216902289779917, 0.04553567429989456, 0.0969245070591569, 0.04802879063607824, 0.04283828325118989, -0.04417718363730537, 0.025230098936346267, 0.09739288836417179, 0.024667083623728926, -0.01176404002528157, 0.028654053634513288, -0.046103705501844804, 0.09745505855478827, 0.03723248669637307, 0.061920112419512965, 0.005874774914475218, -0.04255919008245391, 0.026065276995781927, -0.03144846308315473, 0.012060646838959186, 0.1576260370052149, -0.052871634368784726, -0.014648273947738832, 0.018892053154207045, 0.02775755548669446, 0.05860792370813508, 0.049946701154112816, 0.10588463652698743, -0.006178939249366522, -0.01737382589659143, -0.06489188360395812, -0.09097565813065175, -0.14346653538485687, -0.1499751643427918, 0.008721845232010368, 0.1943939561925588, -0.049080794346669024, 0.04060656772626023, -0.08659877279593098, 0.06615362896193419, 0.05127573524030947, 0.03953847139247603, 0.005577134929837719, 0.05452308688132513, -0.03448264643309578, -0.0013646627475898112, -0.09399900586915112, 0.05433418956254759, 0.08218438600588049, 0.049440368041095716, -0.012535418196010494, 0.042559715581216635, 0.07163926615424815, 0.11285202511616291, 0.027615498886593887, -0.04652505814127864, -0.05726189567001476, 0.02392353021329449, 0.1250206063231153, 0.07124477363522014, -0.0807583850372823, 0.010803469126262972, -0.1277988098802105, -0.03417744157054732, -0.015240334383692712, 0.004410357635107732, 0.022661232194232363, -0.017285220595377106, 0.00801400787942487, -0.07586700553374906, 0.040369214219672066, 0.05488434985219951, -0.22897371523562937, -0.08317140074476839, 0.05843323380536129, 0.018574969910625967, 0.16227434287148138, -0.13500058911590565, -0.010825927264147227, 0.03306409398904971, 0.07802009737990316))"
Hope For Paws - Official Rescue Channel,2036.0,Holy crap. That is quite literally the most adorable pup Ive ever seen.,0,"List(holy, crap, that, is, quite, literally, the, most, adorable, pup, ive, ever, seen)","List(1, 100, List(), List(-0.07561409459091151, -0.045587355736643076, 0.2515913133437817, 0.018170873706157394, -0.05798564313982542, 0.05879999147369885, -0.17557791333932143, 0.14973499482640854, -0.02113119627420719, 0.1429468788063297, -0.18575404539632684, -0.024860362307383466, 0.052306503773881845, 0.03709070026301421, -0.13535885534320888, -0.07942296822483723, 0.08778911922127008, 0.13000955833838537, -0.13777787438952008, -0.08311219821469143, 0.09632759856489989, 0.18938253569201782, 0.0748147749556945, -0.12026262713166384, -0.21186068431528, -0.06321428544246234, -0.04944531648204877, -0.20444863590483484, 0.019511032751044977, 0.08541971960893044, -0.017027911443549853, -0.046167719536102735, 0.03622285679627497, -0.05237821647180961, -0.041007683307935416, -0.005548443931799669, 0.035594680530126564, -0.045453099462275326, 0.00759867664713126, 0.24938696522552234, -0.10890222059634443, -0.08376205074959077, 0.13938099375137916, 0.10694857633027893, 0.002966033509717538, 0.057966033999736495, 0.12411851149338943, -0.04714865789103967, -0.07627986185252666, -0.07297572694145717, -0.06472515390039636, -0.16965753699724492, -0.03477278043730901, 0.06876687705516815, 0.20311555352348548, 0.01698677413738691, -0.03835656658674662, 0.01970018048842366, 0.09026167197869375, 0.14072309190837237, 0.04471550638285967, 0.01904991355079871, 0.1424054946893683, -0.0011112636599976283, -5.555266084579321E-4, -0.11078361823008612, 0.024640806019306183, -0.005101241171360016, 0.25020409432741314, 0.011887878698941607, -0.056031934702052526, -0.01557908536723027, -0.023956552291145693, 0.10908797297340174, -0.0256420593135632, -0.1119336187839508, -0.14224808835066283, 0.23793137589326274, 0.02953416675042648, -0.026943036569998816, 0.13639305216761738, 0.012837069176244909, -0.03752217011956068, 0.005535363864440185, 0.032897274368084393, -0.015307027022712506, -0.046224140060635716, 2.736771588948054E-5, -0.051528473026477374, 0.11648002582100722, -0.09573741987920725, -0.27984678802581936, 0.022399001145878665, 0.028149825664093863, 0.056848636792542845, 0.017716513230250433, -0.16133901304923573, -0.11279063729139475, -0.04774549438689764, 0.04601471224584832))"
Life Story,2637.0,武器はクエストで貰えるんじゃないんですか？,0,List(),"List(0, 100, List(), List())"
Brian Barczyk,2698.0,Call the teddy Larry,0,"List(call, the, teddy, larry)","List(1, 100, List(), List(0.07877420785371214, -0.13355843117460608, 0.3007301352918148, 0.04747941280947998, 0.03207775438204408, -0.1230789190158248, -0.3439836925826967, 0.07431583164725453, -0.0024210046976804733, -0.18732835166156292, -0.15603073686361313, 0.037834652699530125, 0.20630016177892685, -0.09962722100317478, -0.31899852049537003, -0.02139435149729252, -0.2935159797780216, 0.17479249089956284, 0.10335477162152529, -0.01139368861913681, -0.04302087426185608, -0.0632171556353569, -0.047406155616045, -0.16660610283724964, 0.016383370166295208, -0.05390241276472807, -0.18198371678590775, -0.12673680344596505, 0.015470384212676436, 0.28644467890262604, -0.1293459963053465, 0.10863261111080647, 0.04137715697288513, 0.16550677921622992, -0.04092426714487374, -0.007809099275618792, 0.14519253373146057, 0.10844758711755276, 0.1489410288631916, 0.09854512382298708, -0.10877818986773491, 0.1264948584139347, -0.004563726484775543, 0.046346817165613174, 0.20973960030823946, -0.008077459875494242, 0.057948376750573516, -0.07959950901567936, 0.0026207920163869858, -0.04562824871391058, -0.15371153503656387, -0.06569951074197888, -0.08594276383519173, 0.04974360577762127, -0.03388275089673698, 0.10372670087963343, 0.09192037582397461, -0.07061243336647749, 0.02200358174741268, 0.05790180340409279, -0.10421942453831434, -0.03363431803882122, -0.09812624473124743, -0.1499440437182784, 0.11277554696425796, 0.014993683435022831, 0.050365918315947056, 0.1010418077930808, 0.06413785740733147, 0.033937874250113964, 0.09314431250095367, 0.1769010853022337, -0.10207697004079819, -0.027531016618013382, -0.11828675214201212, -0.18146945349872112, 0.1645746510475874, 0.041952547384426, 0.24254196416586637, -0.10762782488018274, 0.0737443957477808, -0.20986556448042393, -0.0786440409719944, 0.055217692628502846, 0.015029767993837595, 0.24473231646697968, -0.1263260841369629, -0.12474229834151629, -0.14612796451183385, 0.18531354144215584, 0.05691302241757512, -0.054089755518361926, 0.17883873730897903, 0.04811604923452251, -0.009316734969615936, 0.14216326363384724, -0.05900656711310148, 0.08958140853792429, 0.050484505482017994, -0.047841928666457534))"


In [54]:
pred_all = best_model.transform(df_all0)
pred_all.show(10)

In [55]:

#number of total user
total_user = dataset.select('userid').distinct().count()
#number of labeled owner
owner_labeled = dataset.filter(col('label') == 1.0).distinct().count() 
#number of owner predicted
owner_pred = pred_all.filter(col('prediction') == 1.0).distinct().count()

fraction = (owner_labeled+owner_pred)/total_user
print('Fraction of the users who are cat/dog owners (ML estimate): ', round(fraction,3))

#### 3. Find out the topics important to the pet onwers.

In this section, I want to find out the hidden topics that are mostly important to pet owners. In order to do that, we need to use a Latent Dirichlet allocation (LDA) model. Before we train the model, we are going to do some data preprocessing. Removing all the stop words from the texts.

In [58]:
from pyspark.ml.feature import StopWordsRemover

find_topics= dataset.filter(col('label') == 1.0)
display(find_topics)


creator_name,userid,comment,label,words,features
The Dodo,24401.0,Now I want to try that with my dog!!!,1,"List(now, i, want, to, try, that, with, my, dog)","List(1, 100, List(), List(0.056104992619819105, 0.050285025499761105, 0.11073548977987634, -0.04980612132284376, 0.07449254652278291, 0.0707605845398373, -0.1215410472618209, -0.0019604708585474226, 0.012262929230928421, -0.023461801103419725, -0.09825960174202919, 0.017990300224887, -0.05660707738974856, -0.05604188113162915, -0.0530484484301673, -0.030050284746620387, -0.06695526838302612, 0.09616125592341025, -0.12502048992448384, 0.02976697931687037, 0.03735411746634377, 0.0725842545636826, -0.011665587986095084, -0.004889869855509863, 0.09053938608202669, -0.09636249429442816, 0.1071703533331553, -0.06528256088495255, -0.044271935100874134, 0.010242050720585717, -0.009210775088932779, 0.10560674137539333, 0.06459492072463036, 0.04222452330092589, -0.12644597536159885, 0.02462833643787437, -0.006274375236696667, -0.0874204476777878, -0.07095742308431201, 0.1777695222861237, -0.12643878327475652, -0.07588901743292809, 0.08988336670315927, 0.0565632503065798, 0.04414659324619505, -0.028848238910237946, -0.03874571041928397, 0.03944606220142709, -0.018145894217822287, 0.02805910880366961, -0.029242295771837234, -0.20419202082686952, -0.09253901884787612, 0.01294391933414671, 0.1389701979027854, -0.1819789818384581, 0.06204639095813036, -0.11334567310081588, 0.13792605987853473, -0.02634068847530418, 0.05851504881219524, -0.015389719977974892, 0.04392962530255318, -0.07011616147226757, 0.010950049488908714, -0.03974952274519536, 0.22382714288930097, 0.10857764010628064, 0.04235041982287334, -0.002972521896784504, -0.03819861407909128, 0.0015110371427403556, 0.21475149856673345, -0.006662546657025814, -0.09540232224389911, -0.07881565397191378, 0.03798630378312535, 0.06034875247213575, 0.027262467063135568, -6.380845508020785E-4, 0.10569482472621732, -0.13261165635453329, 0.03771849597493807, 0.0350563522014353, 0.07201329411731826, 0.054166774161987834, 0.05912539611260096, -0.04117297815779845, -0.07249593155251609, 0.04673711996939447, 2.5771144363615247E-5, -0.18946548447840741, -0.14548150677647856, 0.08335085461537042, 0.1571948150069349, 0.07877916180425219, 0.040688058050970234, -0.047293868381530046, 0.05456212920964592, -0.019719206650430955))"
Cole & Marmalade,43112.0,I blow smoke in my cats ear right to his brain,1,"List(i, blow, smoke, in, my, cats, ear, right, to, his, brain)","List(1, 100, List(), List(0.08869852024045857, -0.06921172980219126, 0.18917915267361837, -0.08392188867384738, -0.0724193842404268, 0.10376042754135349, -0.058406558226455345, 0.0159420735118064, 0.05600973074350887, 0.01039572474970059, -0.04562356373803182, 0.030906513693149795, 0.0059270535553382206, -0.04341583893719045, -0.17895731431516734, -0.004143979188732125, -0.03654006860134277, 0.03178462005135688, -0.10622292587702925, -0.04558534479954026, -0.013947658748789267, 0.06744514371861111, 0.02970695360140367, -0.0164144096726721, 0.06996051780879498, 0.024682228749787267, -0.07928733188997615, 0.046117245304313575, 0.023222364273599604, 0.06468839943408966, -0.002066387710246173, 0.08365636691451073, 0.012941054957495495, 0.09589683203111318, -0.06967235153371638, -9.155168452046134E-4, 0.03265694935213436, 0.01826491951942444, -0.005401689220558514, 0.20245216143402187, 0.01650929484855045, 0.060834553760783325, 0.134439828382297, -0.052835835281505504, 0.18231407502158123, 0.07378573647954248, -0.04818564484065229, 0.022946046749976547, -0.10144006308506835, 0.0026925958439030433, -0.12242365831678564, -0.19196268285370685, -0.08273805415427143, 0.04770046032287858, 0.08549785512414845, -0.11029721813445742, -0.014890094139528546, -0.20521191278980538, 0.10971558762883599, 0.038205173882571136, 0.03480411214034327, -0.013642556457356974, 0.027942037040537056, -0.01652187896384434, 0.001935300501910123, -0.029198415132916787, 0.2588126098906452, 0.04059295136142861, -0.03442023753781211, 0.07050857538442043, -0.007512927140024575, -0.16148450784385204, 0.03597743157297373, -0.05818572504953905, -0.0366797342219136, -0.010016130791469053, 0.04046215862035751, 0.08391308649019762, -0.05434482972222296, 0.07448233689435504, 0.046204947951165115, -0.2110433133149689, 0.10268333757465536, 0.04623666727407412, -0.0717510161074725, -0.07471170598133044, -0.06152430464598266, 0.004804257684471932, -0.13928939334370874, 0.001338709145784378, 0.01508875038813461, -0.28702452643351123, -0.1858761322430589, 0.10357968238267032, 0.07336551188067957, 0.05918144714087248, -0.037593810903755104, -0.08173921098932624, 0.07026694173162634, 0.09809573265639218))"
Zak Georges Dog Training rEvolution,63541.0,my dog lucky wont eat of his bowl hell only eat out peoples hands how do i get him to eat out of his bowl,1,"List(my, dog, lucky, wont, eat, of, his, bowl, hell, only, eat, out, peoples, hands, how, do, i, get, him, to, eat, out, of, his, bowl)","List(1, 100, List(), List(0.10748952932655811, 0.014644682668149472, 0.08934833567589522, 0.009441435635089875, 0.025657075680792334, 0.09812133725732565, -0.01962353616952896, 0.029013569969683885, 0.009428030345588923, -0.003647192046046257, -0.16639159008860588, -0.05196924354881048, 0.0018827961105853319, -0.055542717138887386, -0.19661732560023665, -0.0501146237552166, -0.05416282210499048, -0.011591513827443124, -0.14914250085130334, 0.023592431340366603, 0.013343537133187056, 0.04975450048223138, 0.02729053871706128, -0.06142535637132824, 0.09496789803728461, -0.015828610341995956, -0.009805352189578117, -0.01980819404125214, -0.028880756441503765, 0.11180601492524148, 0.011253205633256585, 0.05874194085597992, 0.04951195687055588, 0.07038082040846348, -0.03124967847019434, 0.022823593467473983, 0.09406973034143448, -0.020199880618602038, -0.027754758941009643, 0.1670206787250936, -0.11512951612472534, 0.05014008476398885, 0.07663952831178904, -0.014752077534794808, -0.010578340291976929, 0.022688640635460614, -0.025474815517663955, 0.05117227878421545, -0.10220073504373431, -0.0582200781442225, -0.08599848784506321, -0.12213692542165518, -0.041749214492738246, 0.0549134910851717, 0.14948210485279562, -0.05121075112372637, 0.07873335648328066, -0.13352054953575135, 0.11157105255872012, 0.06987216506153346, 0.05791072851512581, 0.08128534320741893, 0.04146762937307358, -0.026311614364385606, 0.07583582952618599, -0.12543021693825723, 0.19047610230743886, 0.05275855224579573, 0.0157853052765131, -0.07231538116931915, -0.04306560337543488, -0.14965583343058825, 0.1336092309653759, -0.003211762346327305, -0.037312900889664886, -0.0642353942990303, -0.018296890314668416, 0.1494659322500229, 0.06107750616967678, 0.06813134398311377, 0.11150614440441131, -0.13636238925158978, 0.05599132403731346, -0.036151823550462724, 0.09755648454651236, -0.12309702232480049, 0.1186497800797224, -0.0782052796613425, -0.09464672787114978, -0.038166700303554534, 0.13439188666641713, -0.2470585160702467, -0.08304008305072784, 0.1917586261034012, 0.15490343786776067, 0.14426070343703032, -0.008124075904488564, -0.01551724759861827, 0.07698118776082993, 0.07731568362563848))"
DarkDynastyK9s,175184.0,thats what my dog do,1,"List(thats, what, my, dog, do)","List(1, 100, List(), List(-0.09564292691648008, 0.16462829597294332, 0.13252610415220262, -0.052013386785984043, 0.014756854623556137, 0.13843841552734376, -0.1823470711708069, 0.12769070565700533, -0.03565411856397987, -0.031332043744623665, -0.06168989650905132, 0.030762221664190292, -0.04752304404973984, -0.11053234394639731, -0.17680432163178922, -0.03909990340471268, -0.05957886688411236, 0.061067018099129206, -0.058718220749869945, 0.037072740495204926, -0.09399931244552136, 0.08790078088641168, -0.04123937813565135, -0.012186171114444734, -0.0636980147100985, -0.09660586854442954, 0.001008649915456772, -0.15447964519262314, -0.016032578796148302, 0.13144652247428895, -0.11358829215168953, 0.06966525912284852, -0.015710038063116372, -0.025388557836413386, -0.08677423745393753, -0.018934500636532903, 0.04253383628965821, -0.10946894930675627, -0.0766807071864605, 0.03894123239442707, -0.2503435000777245, 0.010344111919403078, -0.03892035335302353, 0.1135632373392582, -0.052910816669464115, -0.10652170404791833, 0.016431376338005066, 0.027367955446243288, 0.08871594667434693, 0.004134295135736466, -0.1194342575967312, -0.05897194836288691, -0.04990578927099705, 0.06792684420943261, 0.12695522047579288, -0.06038951799273491, 0.029665660113096237, -0.13801816403865816, 0.08343312032520772, -0.035468858852982524, 0.04120863024145365, -0.06664248518645764, 0.11089197993278505, -0.0702677458524704, -0.04083194918930531, -0.07723752334713936, 0.20355706214904787, 0.11953682191669941, 0.021428924798965455, 0.0012067884206771852, 0.0577259223908186, -0.057205121591687204, 0.13611112534999847, 0.10124586906749755, 0.08587825186550618, -0.2325075887143612, -0.012903738021850587, 0.1450848251581192, -0.17819387279450893, 0.05789556559175253, 0.1685182109475136, -0.10727767571806908, 0.07380505800247193, -0.015694399923086168, 0.09528872817754747, -0.0039727121591567995, 0.019180922210216524, -0.07218359708786011, -0.144282166659832, 0.1507846012711525, -0.05875306557863951, -0.07523798453621566, -0.14457167796790601, 0.058826351165771486, 0.1253655094653368, 0.10395298302173615, 0.0010764800012111665, -0.15582600831985474, 0.05146027207374573, -1.5361718833446504E-4))"
The Pet Collective,203881.0,Im so happy i think Im almost crying Im hugging my dog Ik its not a cat but its a animal that need love,1,"List(im, so, happy, i, think, im, almost, crying, im, hugging, my, dog, ik, its, not, a, cat, but, its, a, animal, that, need, love)","List(1, 100, List(), List(-0.05398442433215678, -0.02690101669092352, 0.15329515499373275, -0.0013575746367375055, 0.0451279921811268, 0.0802868596316936, -0.06910796894226223, -0.033523242066924766, 0.018101139556771763, 0.11036963985922435, -0.14762373264723766, 0.11117139179259539, -0.02509756257253078, -0.006181019653467956, -0.05033245147205889, -0.004965288972016424, -0.10997954647367199, 0.06613334401724083, -0.058526939557244376, -0.023276787716895342, 0.017569821560755372, 0.14039537143738318, 0.03950079889424766, 0.1411870188312605, -0.048798505536979064, -0.15686424092564266, 0.011517212338124711, -0.026004756606804826, -0.0200081246875925, 0.07693644287064672, -0.06844690527456501, 0.10529281131069486, 3.9601504492263E-5, 0.009615272341761738, -0.02939806859164188, -0.05184134654943288, 0.083778018442293, -0.012561295123305172, -0.05173788328344623, 0.11351599753834307, -0.08465031984572609, -0.06956894532777369, -0.0041835741527999435, 0.11740112452146907, -0.018487621874858935, 0.06705094027953842, 0.07100990228354931, 0.029699681443162262, -0.04921681307799493, 0.07639144159232576, -0.04345887123296658, -0.09284922996691117, -0.05399921440402977, -0.01195154230420788, 0.17257845432808, 0.00275037115594993, 0.06203689909307286, -0.0703157012273247, 0.05186401059230168, -0.026640171340356268, 0.03263158441404812, -0.1019473784447958, 0.07918710971110461, -0.04214536435999131, 0.004617083158033589, -0.1166918040253222, 0.16721440269611776, 0.02697160283666259, -0.010480001583346166, -0.07071679107806024, -0.01975782999458412, 0.11475409970929225, 0.045532207198751465, 0.06700454978272319, -0.10957059046874443, -0.08887787151616067, 0.13860819490704063, 0.035471170102634154, 0.003682435412580768, -0.0014850485216205318, 0.06948087106381232, -0.1451086476445198, -0.08430564534501173, 0.05330204016839464, 0.060614833530659475, 0.04186622746055946, 0.023232739185914397, -0.029491022384415068, -0.09057623305125162, 0.10020273465973635, -0.003786271942468981, -0.18888507046115893, -0.1448547017450134, 0.030329394532600418, 0.10676514382551734, 0.0784747542347759, -0.06944403331848055, -0.13490187642552579, -0.022126214132489015, 0.006738901827096318))"
Cole & Marmalade,263721.0,My cat scratches at it I spray at her but not her so it scars her if she keeps doing it I will spray her ya she stoped for a wile then now she is doing it agin ☹️ ya I always like my door shut and if she is in here in the morning she will want out and Im like IM TRYING TO SLEEEP STOOOOP PLZ IM TIRD 😭😭😭😭then someone will let her out and Im like yaaaaas 5 mins of peace but its hard for me to sleep alone like I have to have my kitty or I get sad and lonely and feel kinda unsafe but she make me feel safe and she keeps me safe and I keep her safe!,1,"List(my, cat, scratches, at, it, i, spray, at, her, but, not, her, so, it, scars, her, if, she, keeps, doing, it, i, will, spray, her, ya, she, stoped, for, a, wile, then, now, she, is, doing, it, agin, ya, i, always, like, my, door, shut, and, if, she, is, in, here, in, the, morning, she, will, want, out, and, im, like, im, trying, to, sleeep, stoooop, plz, im, tird, then, someone, will, let, her, out, and, im, like, yaaaaas, 5, mins, of, peace, but, its, hard, for, me, to, sleep, alone, like, i, have, to, have, my, kitty, or, i, get, sad, and, lonely, and, feel, kinda, unsafe, but, she, make, me, feel, safe, and, she, keeps, me, safe, and, i, keep, her, safe)","List(1, 100, List(), List(0.08892963050558107, -0.044371326281238466, 0.09978573151203173, -0.045767875729999956, 0.052303697541872446, 0.09098385617647681, -0.15103663436527695, 0.004321259711193101, 0.04437867554788095, 0.04877398683569364, -0.13825778947061587, 0.06632762969445227, 0.004891863453292077, -0.007631408491761249, -0.10503829757292424, -0.015406708465889096, -0.10833743714254289, 0.0761504532945823, -0.09161401816996775, 0.015529490640713652, -0.004295060155732978, -0.0063514388110038015, -3.7733181529948787E-4, 0.007857008801487785, 0.004573388250953998, -0.04562900040294945, 0.021859476019369228, 4.9536533263181484E-5, -0.022661071794573218, 0.04756359961785136, -0.0044238023350067854, 0.11037687031691143, -0.011904400425978125, 0.048569430433769496, -0.03523075871353578, -0.013851773512038973, 0.08262732361477139, 0.005512548847154023, -0.042577816488882224, 0.13339615749165176, 0.021314190292257756, -0.03255167192940961, 0.05356420207798721, 0.008663303956704876, 0.015660997780580672, 0.0038688872910795672, 0.09040838533321455, 0.04085704516089003, 0.0020991581728711964, 0.05722804416549362, -0.06499386937620358, -0.1804324771538602, -0.11789409409187013, 0.04734353698105195, 0.121972706101294, -0.045715898316449316, 0.04421009540167307, -0.09907925850711763, 0.11631837536885042, 0.026470444307872844, 0.06717842202323804, -0.033105928240524184, 0.07500598335689339, 0.0405515435887801, -0.03730030972961216, -0.07481943815101628, 0.1614242686454447, 0.06819156640361348, -0.018991255882992258, -0.07628077359197871, -0.0443161801329904, 0.043600594388320334, 0.1373616614850283, 0.04259615846263665, -0.10498141160520214, -0.07897802164605368, 0.022545411807274627, 0.0690379718045381, 0.013233280327591685, 0.048473253324177235, 0.039202946005389094, -0.057835344651398515, -0.0013439477395055996, 0.0074752132857083195, 0.01547184458092576, 0.0185621588926522, 0.05911821916684388, -0.03824348013885385, -0.07966603092650222, -0.00434031848975968, 0.07254762706279214, -0.18165962850194303, -0.11188931117067114, 0.10730028675442876, 0.07125363648904773, 0.11183617927975231, -0.020386254238650096, -0.0647009885184572, 0.01201373558667969, 0.021675737326844566))"
Steff J,273292.0,Since my cat is getting old Im gonna start calling him by a new name..GRANDPAW!!How is cat food sold?USUALLY PURR CAN!!GIVEAWAY ENTRY!!!!,1,"List(since, my, cat, is, getting, old, im, gonna, start, calling, him, by, a, new, name, grandpaw, how, is, cat, food, sold, usually, purr, can, giveaway, entry)","List(1, 100, List(), List(0.0325969964480744, -0.03650977107911156, 0.1468297669556565, 0.09168297625505008, 0.020023395584967848, 0.09237976289855747, -0.10297979137976654, 0.08910668589389668, -0.018717976081041764, 0.023102818307681728, -0.04658379973485493, 0.029232820954348426, 0.030236806147373643, 0.028639072634317787, -0.10122672683344439, -0.0024164968098585424, -0.10137865830284472, 0.06234645571273107, -0.0675752131232562, -0.12002240451581132, -0.006162540700573188, 0.04707743304495055, 0.13032048775886115, 0.04528839755445146, 0.021570527917132355, -0.12983979967136222, 0.07173433638392733, -0.08650219139571373, -0.08154259722393294, 0.05427968480552619, 0.05305215295475836, 0.04580529052943278, -0.058394044082468524, 0.08789312123106077, 0.027990356472750697, -0.017811833701741237, 0.0843355067616078, -0.052813981910451106, -0.07262430316768587, 0.1626758732331487, -0.03783753908310945, -0.029592906733831536, 0.08790324117021206, 0.03043234778418152, 0.03100671397987753, -0.0044643093760196985, 0.0072258786083414005, 0.028346467297524214, 0.012955783337999422, 0.026012457835559662, -0.08906379139695604, -0.17238188398858675, -0.02106395965585342, 0.05161698900449735, 0.08125119876617996, -0.009626127949629266, 0.13878015679522204, -0.17342515646193465, 0.04482704680413008, 0.05120851525750298, -0.02648684493480967, -0.051239912290699206, 0.02505428965937776, 0.02449160679959907, -0.026899683969811752, -0.03286230134276243, 0.1160185978246423, 0.17205303190320803, 0.03989706912006323, -0.013876503297629265, 0.046334126229899436, 0.0019253070394580182, 0.05308249918743968, 0.09654696858846225, -0.13854889349582106, -0.05298917499693254, 0.06367718213452743, 0.15149579182840311, -0.017078677568441402, -0.05089319945098116, 0.13029765849933028, -0.04915309041881791, -0.04424641177488061, -2.914033471964873E-4, 0.04720566822036815, 0.041986435929384947, 0.03321166569367051, -0.03105014508876663, -0.10656442106343234, 0.06570122122334747, 0.07167331295428225, -0.15724312953758413, -0.07691446639812337, 0.08175923682462712, 0.08432308878176488, 0.15667301319682828, -0.08904654308347605, -0.0901149782137229, 0.06213219959261971, 0.044170539754514515))"
Cole & Marmalade,329273.0,I have several plants of catnip planted around our garden but my cats dont really seem bothered by it? Are my cats constantly high or something???,1,"List(i, have, several, plants, of, catnip, planted, around, our, garden, but, my, cats, dont, really, seem, bothered, by, it, are, my, cats, constantly, high, or, something)","List(1, 100, List(), List(0.08043761795852333, -0.0848665013551139, 0.11690790535738836, -0.02916046167508914, 0.06438320799833701, 0.08158667528858551, -0.1344547827119151, 0.10894716283879602, -0.005565675297895304, 0.04070444756115858, -0.07081055440581763, 0.02258428525573646, -0.03514408097208406, -0.08611423433579218, -0.08683115020036125, -0.1413969470259662, 0.02322115070329836, 0.13378589704202917, -0.07525088754482567, -0.030551491401498567, 0.06255148390594584, 0.0446624270024096, 0.05664820532099559, 0.0229275761745297, 0.05798538072177997, -0.047387773694936186, 0.14484589449756852, 0.007097603377098075, -0.015954853841461815, 0.05654748461137598, 9.461470091572176E-4, 0.1691642263904214, -0.04299540579533921, 0.0368061841178972, -0.10529920322677265, 0.01729609278173974, 0.0511127501153029, 0.0012016879269280112, -0.00846209335857286, 0.10591365966516046, -0.04121926351217553, -0.009742742135690955, 0.017385845144207664, -0.028755835902232393, 0.10067337443335699, -0.0073165112676528785, -0.058821330921581164, 0.01481999965527883, -0.09140744452508023, -0.030607431947898407, -0.09472566506323907, -0.10537988674612, -0.09684653179003642, 0.05671891190398198, 0.1338168875887417, -0.04749959337865361, 0.15981720452411818, -0.10444010016866602, 0.03044240133693585, 0.05014522287708063, 0.1254113692020138, 0.08557238171880062, 0.06941378646745132, 0.027026707073673606, -0.037779283709824085, -0.05292242736770557, 0.1730718816129061, 0.014346443815156817, -0.01583501438681896, -0.10706416990321417, 0.011943452221413072, -0.10426266111720067, 0.1387705671815918, -0.0010215817019343376, -0.09125631943774912, -0.013026720521828303, -0.08465714380145073, 0.15020910184830427, -0.06246855217390336, -0.03746670434394708, -0.004310081909912137, -0.08834189888483916, 0.055746927588748246, -0.036123050609603524, 0.027397138281510428, -0.024724047362374574, 0.11544270971073554, -0.10003010839080582, -0.19126505338443586, 0.05813539687257547, 0.04823589360771271, -0.2625854258927015, -0.17376816688248745, 0.05281952283201883, 0.04528381263550658, 0.11487981940333088, -0.09751916392885436, -0.049829747151726715, 0.0774564566448904, 0.06650437103011288))"
Hope For Paws - Official Rescue Channel,344406.0,This is so sad because my dog died and the mom looks just like her and I started crying,1,"List(this, is, so, sad, because, my, dog, died, and, the, mom, looks, just, like, her, and, i, started, crying)","List(1, 100, List(), List(-5.102614431004775E-4, -0.047942626878227056, 0.19011140386819053, -0.052413718603355315, 0.11812341990145413, 0.09673578044595686, -0.12525269996963048, 0.002293009261943792, 0.04633900974141924, 0.15883788878196164, -0.13011114987985867, 0.09660543422949941, -0.015609777544772153, 0.03138606127743658, -0.14799209561591084, -0.015201671196049765, -0.07413860041599132, 0.08362883723365437, -0.09828701066343408, -0.014097620487997406, 0.10805481966388852, 0.13328945754390012, 0.08199600853320015, 0.07903025225785218, -0.049559674124650066, -0.028928905112766905, -0.05001240803271924, 0.02030676613120656, 0.020264116985919442, 0.06605740810597413, -0.02514996075708615, 0.05193319485375755, 0.012129995579782284, 0.025541890829213355, 0.007172186564850179, -0.030135929881668598, 0.09886289121105188, 0.054448659262178754, -0.02090154372547802, 0.11891959358840004, -0.05120898905749383, -0.05978761485924846, -0.04510084029875303, 0.12212991415473975, 0.046233927095799064, -0.0024205195521445648, 0.11040470871682229, -0.04115456668660045, 0.026215898574599505, 0.10922169734380746, -0.11225181244509784, -0.1908385479136517, -0.10961961344276604, 0.005427628167365727, 0.20234415865821861, -0.014868540748322725, 0.03132941426807328, -0.15733727244170087, 0.09937777311394089, 0.0541601764332307, 0.007928091505738465, -0.054078496208316396, 0.11941576135608269, 0.007973076481568185, 0.02450728959305898, -0.15694623817015732, 0.09728416396108897, 0.09342300337984373, 0.09711515491730288, -0.036031212265554224, 0.0432349088552751, 0.07712631272787755, 0.10800686842565865, 0.061187869820155595, -0.1235445341781566, -0.06833310194901729, 0.10870640347466656, 0.09349994096708925, -0.048614398135166416, -0.09405471703135652, 0.1095797775411292, -0.11107218177302887, -0.00420853327714691, -0.051098980597759545, -0.018917162363466463, 0.03557988145927849, 0.0714131507434343, -0.05256184120930572, -0.11006696590859638, 0.04982554461610944, 0.029345413473875898, -0.13480844662377708, -0.13684584651338427, 0.06166692416330701, 0.09278924944565484, 0.08454524980563866, 0.022114200321467298, -0.08126730964470066, -0.019091564690155025, 0.06695040401169344))"
stacyvlogs,346294.0,my cat died today im sad woching this video,1,"List(my, cat, died, today, im, sad, woching, this, video)","List(1, 100, List(), List(-0.007834469019952748, 0.027541239720044863, 0.13935521990060806, -0.07501901908674173, 0.1846330331948896, 0.0462243614731253, 0.010944797657430172, -0.04282891243282291, 0.08776161726564169, 0.20493924534983105, -0.07578308688890602, 0.09236786152339643, -0.1120851207524538, -0.005323226108885137, -0.0871997620496485, -0.028782181441783905, -0.12265767840047677, 0.05817171484361299, -0.11340960135890377, -0.08821825103627311, 0.05893745335439841, 0.029793783711890377, 0.1337790654765235, 0.07171898273130257, -0.0942183743075778, -0.059960121030194886, 0.023942916757530634, 0.0016127512272861267, 0.03200614192367841, 0.046645301290684275, -0.0021026891966660815, 0.12934411855207548, -0.035469002297355064, -0.09027702154384719, 0.03656469740801387, -0.019333558777968086, -0.014211850447787179, -6.032471234599749E-4, -0.07040999974641535, 0.18889564648270607, -0.07925261257009374, 0.011747638048190208, -0.04079557370601428, 0.15504711866378784, 0.010001058379809061, -0.09306987074928151, 0.0639548779775699, -0.12949937623408106, 0.048086995672848486, 0.048843664634558887, -0.15228520178546506, -0.11296150740236044, -0.022728233287731804, -0.014385100454092026, 0.09285158415635426, 0.061062212205595434, -1.1735450890329148E-4, -0.1793293585586879, 0.034917670767754316, -0.05008082091808319, -0.010018772549099391, -0.13791593888567552, 0.1516243006206221, 0.029511247244146135, 0.022503794274396364, -0.05020020798676544, 0.10907972686820559, 0.05514562626679738, 0.05963066458288166, 0.01605759561061859, 0.11186136677861214, 0.15230233160157997, 0.03498431336548593, 0.07644686723748842, -0.043504896884163216, -0.10098452659116851, 0.28697717415828566, 0.07133984627823034, 0.018955121955109965, -0.0565152827443348, 0.05693579837679863, -0.08512597075766987, -0.08276738836947414, 0.0591610065764851, 0.005243846111827427, 0.05496647622850206, 0.05399964791205194, -0.05135098208362857, -0.1755362840162383, 0.08473383262753487, 0.05826879292726517, -0.16485844200683963, -0.1479613035917282, 0.07515330012473795, 0.08358601708379056, 0.0017860179973973166, -0.13119194092966305, -0.1597871199871103, -0.0960506378647147, 0.11417293905590971))"


In [59]:
remover = StopWordsRemover(inputCol="words", outputCol="filtered")
removed = remover.transform(find_topics)
display(removed)

creator_name,userid,comment,label,words,features,filtered
The Dodo,24401.0,Now I want to try that with my dog!!!,1,"List(now, i, want, to, try, that, with, my, dog)","List(1, 100, List(), List(0.056104992619819105, 0.050285025499761105, 0.11073548977987634, -0.04980612132284376, 0.07449254652278291, 0.0707605845398373, -0.1215410472618209, -0.0019604708585474226, 0.012262929230928421, -0.023461801103419725, -0.09825960174202919, 0.017990300224887, -0.05660707738974856, -0.05604188113162915, -0.0530484484301673, -0.030050284746620387, -0.06695526838302612, 0.09616125592341025, -0.12502048992448384, 0.02976697931687037, 0.03735411746634377, 0.0725842545636826, -0.011665587986095084, -0.004889869855509863, 0.09053938608202669, -0.09636249429442816, 0.1071703533331553, -0.06528256088495255, -0.044271935100874134, 0.010242050720585717, -0.009210775088932779, 0.10560674137539333, 0.06459492072463036, 0.04222452330092589, -0.12644597536159885, 0.02462833643787437, -0.006274375236696667, -0.0874204476777878, -0.07095742308431201, 0.1777695222861237, -0.12643878327475652, -0.07588901743292809, 0.08988336670315927, 0.0565632503065798, 0.04414659324619505, -0.028848238910237946, -0.03874571041928397, 0.03944606220142709, -0.018145894217822287, 0.02805910880366961, -0.029242295771837234, -0.20419202082686952, -0.09253901884787612, 0.01294391933414671, 0.1389701979027854, -0.1819789818384581, 0.06204639095813036, -0.11334567310081588, 0.13792605987853473, -0.02634068847530418, 0.05851504881219524, -0.015389719977974892, 0.04392962530255318, -0.07011616147226757, 0.010950049488908714, -0.03974952274519536, 0.22382714288930097, 0.10857764010628064, 0.04235041982287334, -0.002972521896784504, -0.03819861407909128, 0.0015110371427403556, 0.21475149856673345, -0.006662546657025814, -0.09540232224389911, -0.07881565397191378, 0.03798630378312535, 0.06034875247213575, 0.027262467063135568, -6.380845508020785E-4, 0.10569482472621732, -0.13261165635453329, 0.03771849597493807, 0.0350563522014353, 0.07201329411731826, 0.054166774161987834, 0.05912539611260096, -0.04117297815779845, -0.07249593155251609, 0.04673711996939447, 2.5771144363615247E-5, -0.18946548447840741, -0.14548150677647856, 0.08335085461537042, 0.1571948150069349, 0.07877916180425219, 0.040688058050970234, -0.047293868381530046, 0.05456212920964592, -0.019719206650430955))","List(want, try, dog)"
Cole & Marmalade,43112.0,I blow smoke in my cats ear right to his brain,1,"List(i, blow, smoke, in, my, cats, ear, right, to, his, brain)","List(1, 100, List(), List(0.08869852024045857, -0.06921172980219126, 0.18917915267361837, -0.08392188867384738, -0.0724193842404268, 0.10376042754135349, -0.058406558226455345, 0.0159420735118064, 0.05600973074350887, 0.01039572474970059, -0.04562356373803182, 0.030906513693149795, 0.0059270535553382206, -0.04341583893719045, -0.17895731431516734, -0.004143979188732125, -0.03654006860134277, 0.03178462005135688, -0.10622292587702925, -0.04558534479954026, -0.013947658748789267, 0.06744514371861111, 0.02970695360140367, -0.0164144096726721, 0.06996051780879498, 0.024682228749787267, -0.07928733188997615, 0.046117245304313575, 0.023222364273599604, 0.06468839943408966, -0.002066387710246173, 0.08365636691451073, 0.012941054957495495, 0.09589683203111318, -0.06967235153371638, -9.155168452046134E-4, 0.03265694935213436, 0.01826491951942444, -0.005401689220558514, 0.20245216143402187, 0.01650929484855045, 0.060834553760783325, 0.134439828382297, -0.052835835281505504, 0.18231407502158123, 0.07378573647954248, -0.04818564484065229, 0.022946046749976547, -0.10144006308506835, 0.0026925958439030433, -0.12242365831678564, -0.19196268285370685, -0.08273805415427143, 0.04770046032287858, 0.08549785512414845, -0.11029721813445742, -0.014890094139528546, -0.20521191278980538, 0.10971558762883599, 0.038205173882571136, 0.03480411214034327, -0.013642556457356974, 0.027942037040537056, -0.01652187896384434, 0.001935300501910123, -0.029198415132916787, 0.2588126098906452, 0.04059295136142861, -0.03442023753781211, 0.07050857538442043, -0.007512927140024575, -0.16148450784385204, 0.03597743157297373, -0.05818572504953905, -0.0366797342219136, -0.010016130791469053, 0.04046215862035751, 0.08391308649019762, -0.05434482972222296, 0.07448233689435504, 0.046204947951165115, -0.2110433133149689, 0.10268333757465536, 0.04623666727407412, -0.0717510161074725, -0.07471170598133044, -0.06152430464598266, 0.004804257684471932, -0.13928939334370874, 0.001338709145784378, 0.01508875038813461, -0.28702452643351123, -0.1858761322430589, 0.10357968238267032, 0.07336551188067957, 0.05918144714087248, -0.037593810903755104, -0.08173921098932624, 0.07026694173162634, 0.09809573265639218))","List(blow, smoke, cats, ear, right, brain)"
Zak Georges Dog Training rEvolution,63541.0,my dog lucky wont eat of his bowl hell only eat out peoples hands how do i get him to eat out of his bowl,1,"List(my, dog, lucky, wont, eat, of, his, bowl, hell, only, eat, out, peoples, hands, how, do, i, get, him, to, eat, out, of, his, bowl)","List(1, 100, List(), List(0.10748952932655811, 0.014644682668149472, 0.08934833567589522, 0.009441435635089875, 0.025657075680792334, 0.09812133725732565, -0.01962353616952896, 0.029013569969683885, 0.009428030345588923, -0.003647192046046257, -0.16639159008860588, -0.05196924354881048, 0.0018827961105853319, -0.055542717138887386, -0.19661732560023665, -0.0501146237552166, -0.05416282210499048, -0.011591513827443124, -0.14914250085130334, 0.023592431340366603, 0.013343537133187056, 0.04975450048223138, 0.02729053871706128, -0.06142535637132824, 0.09496789803728461, -0.015828610341995956, -0.009805352189578117, -0.01980819404125214, -0.028880756441503765, 0.11180601492524148, 0.011253205633256585, 0.05874194085597992, 0.04951195687055588, 0.07038082040846348, -0.03124967847019434, 0.022823593467473983, 0.09406973034143448, -0.020199880618602038, -0.027754758941009643, 0.1670206787250936, -0.11512951612472534, 0.05014008476398885, 0.07663952831178904, -0.014752077534794808, -0.010578340291976929, 0.022688640635460614, -0.025474815517663955, 0.05117227878421545, -0.10220073504373431, -0.0582200781442225, -0.08599848784506321, -0.12213692542165518, -0.041749214492738246, 0.0549134910851717, 0.14948210485279562, -0.05121075112372637, 0.07873335648328066, -0.13352054953575135, 0.11157105255872012, 0.06987216506153346, 0.05791072851512581, 0.08128534320741893, 0.04146762937307358, -0.026311614364385606, 0.07583582952618599, -0.12543021693825723, 0.19047610230743886, 0.05275855224579573, 0.0157853052765131, -0.07231538116931915, -0.04306560337543488, -0.14965583343058825, 0.1336092309653759, -0.003211762346327305, -0.037312900889664886, -0.0642353942990303, -0.018296890314668416, 0.1494659322500229, 0.06107750616967678, 0.06813134398311377, 0.11150614440441131, -0.13636238925158978, 0.05599132403731346, -0.036151823550462724, 0.09755648454651236, -0.12309702232480049, 0.1186497800797224, -0.0782052796613425, -0.09464672787114978, -0.038166700303554534, 0.13439188666641713, -0.2470585160702467, -0.08304008305072784, 0.1917586261034012, 0.15490343786776067, 0.14426070343703032, -0.008124075904488564, -0.01551724759861827, 0.07698118776082993, 0.07731568362563848))","List(dog, lucky, wont, eat, bowl, hell, eat, peoples, hands, get, eat, bowl)"
DarkDynastyK9s,175184.0,thats what my dog do,1,"List(thats, what, my, dog, do)","List(1, 100, List(), List(-0.09564292691648008, 0.16462829597294332, 0.13252610415220262, -0.052013386785984043, 0.014756854623556137, 0.13843841552734376, -0.1823470711708069, 0.12769070565700533, -0.03565411856397987, -0.031332043744623665, -0.06168989650905132, 0.030762221664190292, -0.04752304404973984, -0.11053234394639731, -0.17680432163178922, -0.03909990340471268, -0.05957886688411236, 0.061067018099129206, -0.058718220749869945, 0.037072740495204926, -0.09399931244552136, 0.08790078088641168, -0.04123937813565135, -0.012186171114444734, -0.0636980147100985, -0.09660586854442954, 0.001008649915456772, -0.15447964519262314, -0.016032578796148302, 0.13144652247428895, -0.11358829215168953, 0.06966525912284852, -0.015710038063116372, -0.025388557836413386, -0.08677423745393753, -0.018934500636532903, 0.04253383628965821, -0.10946894930675627, -0.0766807071864605, 0.03894123239442707, -0.2503435000777245, 0.010344111919403078, -0.03892035335302353, 0.1135632373392582, -0.052910816669464115, -0.10652170404791833, 0.016431376338005066, 0.027367955446243288, 0.08871594667434693, 0.004134295135736466, -0.1194342575967312, -0.05897194836288691, -0.04990578927099705, 0.06792684420943261, 0.12695522047579288, -0.06038951799273491, 0.029665660113096237, -0.13801816403865816, 0.08343312032520772, -0.035468858852982524, 0.04120863024145365, -0.06664248518645764, 0.11089197993278505, -0.0702677458524704, -0.04083194918930531, -0.07723752334713936, 0.20355706214904787, 0.11953682191669941, 0.021428924798965455, 0.0012067884206771852, 0.0577259223908186, -0.057205121591687204, 0.13611112534999847, 0.10124586906749755, 0.08587825186550618, -0.2325075887143612, -0.012903738021850587, 0.1450848251581192, -0.17819387279450893, 0.05789556559175253, 0.1685182109475136, -0.10727767571806908, 0.07380505800247193, -0.015694399923086168, 0.09528872817754747, -0.0039727121591567995, 0.019180922210216524, -0.07218359708786011, -0.144282166659832, 0.1507846012711525, -0.05875306557863951, -0.07523798453621566, -0.14457167796790601, 0.058826351165771486, 0.1253655094653368, 0.10395298302173615, 0.0010764800012111665, -0.15582600831985474, 0.05146027207374573, -1.5361718833446504E-4))","List(thats, dog)"
The Pet Collective,203881.0,Im so happy i think Im almost crying Im hugging my dog Ik its not a cat but its a animal that need love,1,"List(im, so, happy, i, think, im, almost, crying, im, hugging, my, dog, ik, its, not, a, cat, but, its, a, animal, that, need, love)","List(1, 100, List(), List(-0.05398442433215678, -0.02690101669092352, 0.15329515499373275, -0.0013575746367375055, 0.0451279921811268, 0.0802868596316936, -0.06910796894226223, -0.033523242066924766, 0.018101139556771763, 0.11036963985922435, -0.14762373264723766, 0.11117139179259539, -0.02509756257253078, -0.006181019653467956, -0.05033245147205889, -0.004965288972016424, -0.10997954647367199, 0.06613334401724083, -0.058526939557244376, -0.023276787716895342, 0.017569821560755372, 0.14039537143738318, 0.03950079889424766, 0.1411870188312605, -0.048798505536979064, -0.15686424092564266, 0.011517212338124711, -0.026004756606804826, -0.0200081246875925, 0.07693644287064672, -0.06844690527456501, 0.10529281131069486, 3.9601504492263E-5, 0.009615272341761738, -0.02939806859164188, -0.05184134654943288, 0.083778018442293, -0.012561295123305172, -0.05173788328344623, 0.11351599753834307, -0.08465031984572609, -0.06956894532777369, -0.0041835741527999435, 0.11740112452146907, -0.018487621874858935, 0.06705094027953842, 0.07100990228354931, 0.029699681443162262, -0.04921681307799493, 0.07639144159232576, -0.04345887123296658, -0.09284922996691117, -0.05399921440402977, -0.01195154230420788, 0.17257845432808, 0.00275037115594993, 0.06203689909307286, -0.0703157012273247, 0.05186401059230168, -0.026640171340356268, 0.03263158441404812, -0.1019473784447958, 0.07918710971110461, -0.04214536435999131, 0.004617083158033589, -0.1166918040253222, 0.16721440269611776, 0.02697160283666259, -0.010480001583346166, -0.07071679107806024, -0.01975782999458412, 0.11475409970929225, 0.045532207198751465, 0.06700454978272319, -0.10957059046874443, -0.08887787151616067, 0.13860819490704063, 0.035471170102634154, 0.003682435412580768, -0.0014850485216205318, 0.06948087106381232, -0.1451086476445198, -0.08430564534501173, 0.05330204016839464, 0.060614833530659475, 0.04186622746055946, 0.023232739185914397, -0.029491022384415068, -0.09057623305125162, 0.10020273465973635, -0.003786271942468981, -0.18888507046115893, -0.1448547017450134, 0.030329394532600418, 0.10676514382551734, 0.0784747542347759, -0.06944403331848055, -0.13490187642552579, -0.022126214132489015, 0.006738901827096318))","List(im, happy, think, im, almost, crying, im, hugging, dog, ik, cat, animal, need, love)"
Cole & Marmalade,263721.0,My cat scratches at it I spray at her but not her so it scars her if she keeps doing it I will spray her ya she stoped for a wile then now she is doing it agin ☹️ ya I always like my door shut and if she is in here in the morning she will want out and Im like IM TRYING TO SLEEEP STOOOOP PLZ IM TIRD 😭😭😭😭then someone will let her out and Im like yaaaaas 5 mins of peace but its hard for me to sleep alone like I have to have my kitty or I get sad and lonely and feel kinda unsafe but she make me feel safe and she keeps me safe and I keep her safe!,1,"List(my, cat, scratches, at, it, i, spray, at, her, but, not, her, so, it, scars, her, if, she, keeps, doing, it, i, will, spray, her, ya, she, stoped, for, a, wile, then, now, she, is, doing, it, agin, ya, i, always, like, my, door, shut, and, if, she, is, in, here, in, the, morning, she, will, want, out, and, im, like, im, trying, to, sleeep, stoooop, plz, im, tird, then, someone, will, let, her, out, and, im, like, yaaaaas, 5, mins, of, peace, but, its, hard, for, me, to, sleep, alone, like, i, have, to, have, my, kitty, or, i, get, sad, and, lonely, and, feel, kinda, unsafe, but, she, make, me, feel, safe, and, she, keeps, me, safe, and, i, keep, her, safe)","List(1, 100, List(), List(0.08892963050558107, -0.044371326281238466, 0.09978573151203173, -0.045767875729999956, 0.052303697541872446, 0.09098385617647681, -0.15103663436527695, 0.004321259711193101, 0.04437867554788095, 0.04877398683569364, -0.13825778947061587, 0.06632762969445227, 0.004891863453292077, -0.007631408491761249, -0.10503829757292424, -0.015406708465889096, -0.10833743714254289, 0.0761504532945823, -0.09161401816996775, 0.015529490640713652, -0.004295060155732978, -0.0063514388110038015, -3.7733181529948787E-4, 0.007857008801487785, 0.004573388250953998, -0.04562900040294945, 0.021859476019369228, 4.9536533263181484E-5, -0.022661071794573218, 0.04756359961785136, -0.0044238023350067854, 0.11037687031691143, -0.011904400425978125, 0.048569430433769496, -0.03523075871353578, -0.013851773512038973, 0.08262732361477139, 0.005512548847154023, -0.042577816488882224, 0.13339615749165176, 0.021314190292257756, -0.03255167192940961, 0.05356420207798721, 0.008663303956704876, 0.015660997780580672, 0.0038688872910795672, 0.09040838533321455, 0.04085704516089003, 0.0020991581728711964, 0.05722804416549362, -0.06499386937620358, -0.1804324771538602, -0.11789409409187013, 0.04734353698105195, 0.121972706101294, -0.045715898316449316, 0.04421009540167307, -0.09907925850711763, 0.11631837536885042, 0.026470444307872844, 0.06717842202323804, -0.033105928240524184, 0.07500598335689339, 0.0405515435887801, -0.03730030972961216, -0.07481943815101628, 0.1614242686454447, 0.06819156640361348, -0.018991255882992258, -0.07628077359197871, -0.0443161801329904, 0.043600594388320334, 0.1373616614850283, 0.04259615846263665, -0.10498141160520214, -0.07897802164605368, 0.022545411807274627, 0.0690379718045381, 0.013233280327591685, 0.048473253324177235, 0.039202946005389094, -0.057835344651398515, -0.0013439477395055996, 0.0074752132857083195, 0.01547184458092576, 0.0185621588926522, 0.05911821916684388, -0.03824348013885385, -0.07966603092650222, -0.00434031848975968, 0.07254762706279214, -0.18165962850194303, -0.11188931117067114, 0.10730028675442876, 0.07125363648904773, 0.11183617927975231, -0.020386254238650096, -0.0647009885184572, 0.01201373558667969, 0.021675737326844566))","List(cat, scratches, spray, scars, keeps, spray, ya, stoped, wile, agin, ya, always, like, door, shut, morning, want, im, like, im, trying, sleeep, stoooop, plz, im, tird, someone, let, im, like, yaaaaas, 5, mins, peace, hard, sleep, alone, like, kitty, get, sad, lonely, feel, kinda, unsafe, make, feel, safe, keeps, safe, keep, safe)"
Steff J,273292.0,Since my cat is getting old Im gonna start calling him by a new name..GRANDPAW!!How is cat food sold?USUALLY PURR CAN!!GIVEAWAY ENTRY!!!!,1,"List(since, my, cat, is, getting, old, im, gonna, start, calling, him, by, a, new, name, grandpaw, how, is, cat, food, sold, usually, purr, can, giveaway, entry)","List(1, 100, List(), List(0.0325969964480744, -0.03650977107911156, 0.1468297669556565, 0.09168297625505008, 0.020023395584967848, 0.09237976289855747, -0.10297979137976654, 0.08910668589389668, -0.018717976081041764, 0.023102818307681728, -0.04658379973485493, 0.029232820954348426, 0.030236806147373643, 0.028639072634317787, -0.10122672683344439, -0.0024164968098585424, -0.10137865830284472, 0.06234645571273107, -0.0675752131232562, -0.12002240451581132, -0.006162540700573188, 0.04707743304495055, 0.13032048775886115, 0.04528839755445146, 0.021570527917132355, -0.12983979967136222, 0.07173433638392733, -0.08650219139571373, -0.08154259722393294, 0.05427968480552619, 0.05305215295475836, 0.04580529052943278, -0.058394044082468524, 0.08789312123106077, 0.027990356472750697, -0.017811833701741237, 0.0843355067616078, -0.052813981910451106, -0.07262430316768587, 0.1626758732331487, -0.03783753908310945, -0.029592906733831536, 0.08790324117021206, 0.03043234778418152, 0.03100671397987753, -0.0044643093760196985, 0.0072258786083414005, 0.028346467297524214, 0.012955783337999422, 0.026012457835559662, -0.08906379139695604, -0.17238188398858675, -0.02106395965585342, 0.05161698900449735, 0.08125119876617996, -0.009626127949629266, 0.13878015679522204, -0.17342515646193465, 0.04482704680413008, 0.05120851525750298, -0.02648684493480967, -0.051239912290699206, 0.02505428965937776, 0.02449160679959907, -0.026899683969811752, -0.03286230134276243, 0.1160185978246423, 0.17205303190320803, 0.03989706912006323, -0.013876503297629265, 0.046334126229899436, 0.0019253070394580182, 0.05308249918743968, 0.09654696858846225, -0.13854889349582106, -0.05298917499693254, 0.06367718213452743, 0.15149579182840311, -0.017078677568441402, -0.05089319945098116, 0.13029765849933028, -0.04915309041881791, -0.04424641177488061, -2.914033471964873E-4, 0.04720566822036815, 0.041986435929384947, 0.03321166569367051, -0.03105014508876663, -0.10656442106343234, 0.06570122122334747, 0.07167331295428225, -0.15724312953758413, -0.07691446639812337, 0.08175923682462712, 0.08432308878176488, 0.15667301319682828, -0.08904654308347605, -0.0901149782137229, 0.06213219959261971, 0.044170539754514515))","List(since, cat, getting, old, im, gonna, start, calling, new, name, grandpaw, cat, food, sold, usually, purr, giveaway, entry)"
Cole & Marmalade,329273.0,I have several plants of catnip planted around our garden but my cats dont really seem bothered by it? Are my cats constantly high or something???,1,"List(i, have, several, plants, of, catnip, planted, around, our, garden, but, my, cats, dont, really, seem, bothered, by, it, are, my, cats, constantly, high, or, something)","List(1, 100, List(), List(0.08043761795852333, -0.0848665013551139, 0.11690790535738836, -0.02916046167508914, 0.06438320799833701, 0.08158667528858551, -0.1344547827119151, 0.10894716283879602, -0.005565675297895304, 0.04070444756115858, -0.07081055440581763, 0.02258428525573646, -0.03514408097208406, -0.08611423433579218, -0.08683115020036125, -0.1413969470259662, 0.02322115070329836, 0.13378589704202917, -0.07525088754482567, -0.030551491401498567, 0.06255148390594584, 0.0446624270024096, 0.05664820532099559, 0.0229275761745297, 0.05798538072177997, -0.047387773694936186, 0.14484589449756852, 0.007097603377098075, -0.015954853841461815, 0.05654748461137598, 9.461470091572176E-4, 0.1691642263904214, -0.04299540579533921, 0.0368061841178972, -0.10529920322677265, 0.01729609278173974, 0.0511127501153029, 0.0012016879269280112, -0.00846209335857286, 0.10591365966516046, -0.04121926351217553, -0.009742742135690955, 0.017385845144207664, -0.028755835902232393, 0.10067337443335699, -0.0073165112676528785, -0.058821330921581164, 0.01481999965527883, -0.09140744452508023, -0.030607431947898407, -0.09472566506323907, -0.10537988674612, -0.09684653179003642, 0.05671891190398198, 0.1338168875887417, -0.04749959337865361, 0.15981720452411818, -0.10444010016866602, 0.03044240133693585, 0.05014522287708063, 0.1254113692020138, 0.08557238171880062, 0.06941378646745132, 0.027026707073673606, -0.037779283709824085, -0.05292242736770557, 0.1730718816129061, 0.014346443815156817, -0.01583501438681896, -0.10706416990321417, 0.011943452221413072, -0.10426266111720067, 0.1387705671815918, -0.0010215817019343376, -0.09125631943774912, -0.013026720521828303, -0.08465714380145073, 0.15020910184830427, -0.06246855217390336, -0.03746670434394708, -0.004310081909912137, -0.08834189888483916, 0.055746927588748246, -0.036123050609603524, 0.027397138281510428, -0.024724047362374574, 0.11544270971073554, -0.10003010839080582, -0.19126505338443586, 0.05813539687257547, 0.04823589360771271, -0.2625854258927015, -0.17376816688248745, 0.05281952283201883, 0.04528381263550658, 0.11487981940333088, -0.09751916392885436, -0.049829747151726715, 0.0774564566448904, 0.06650437103011288))","List(several, plants, catnip, planted, around, garden, cats, dont, really, seem, bothered, cats, constantly, high, something)"
Hope For Paws - Official Rescue Channel,344406.0,This is so sad because my dog died and the mom looks just like her and I started crying,1,"List(this, is, so, sad, because, my, dog, died, and, the, mom, looks, just, like, her, and, i, started, crying)","List(1, 100, List(), List(-5.102614431004775E-4, -0.047942626878227056, 0.19011140386819053, -0.052413718603355315, 0.11812341990145413, 0.09673578044595686, -0.12525269996963048, 0.002293009261943792, 0.04633900974141924, 0.15883788878196164, -0.13011114987985867, 0.09660543422949941, -0.015609777544772153, 0.03138606127743658, -0.14799209561591084, -0.015201671196049765, -0.07413860041599132, 0.08362883723365437, -0.09828701066343408, -0.014097620487997406, 0.10805481966388852, 0.13328945754390012, 0.08199600853320015, 0.07903025225785218, -0.049559674124650066, -0.028928905112766905, -0.05001240803271924, 0.02030676613120656, 0.020264116985919442, 0.06605740810597413, -0.02514996075708615, 0.05193319485375755, 0.012129995579782284, 0.025541890829213355, 0.007172186564850179, -0.030135929881668598, 0.09886289121105188, 0.054448659262178754, -0.02090154372547802, 0.11891959358840004, -0.05120898905749383, -0.05978761485924846, -0.04510084029875303, 0.12212991415473975, 0.046233927095799064, -0.0024205195521445648, 0.11040470871682229, -0.04115456668660045, 0.026215898574599505, 0.10922169734380746, -0.11225181244509784, -0.1908385479136517, -0.10961961344276604, 0.005427628167365727, 0.20234415865821861, -0.014868540748322725, 0.03132941426807328, -0.15733727244170087, 0.09937777311394089, 0.0541601764332307, 0.007928091505738465, -0.054078496208316396, 0.11941576135608269, 0.007973076481568185, 0.02450728959305898, -0.15694623817015732, 0.09728416396108897, 0.09342300337984373, 0.09711515491730288, -0.036031212265554224, 0.0432349088552751, 0.07712631272787755, 0.10800686842565865, 0.061187869820155595, -0.1235445341781566, -0.06833310194901729, 0.10870640347466656, 0.09349994096708925, -0.048614398135166416, -0.09405471703135652, 0.1095797775411292, -0.11107218177302887, -0.00420853327714691, -0.051098980597759545, -0.018917162363466463, 0.03557988145927849, 0.0714131507434343, -0.05256184120930572, -0.11006696590859638, 0.04982554461610944, 0.029345413473875898, -0.13480844662377708, -0.13684584651338427, 0.06166692416330701, 0.09278924944565484, 0.08454524980563866, 0.022114200321467298, -0.08126730964470066, -0.019091564690155025, 0.06695040401169344))","List(sad, dog, died, mom, looks, like, started, crying)"
stacyvlogs,346294.0,my cat died today im sad woching this video,1,"List(my, cat, died, today, im, sad, woching, this, video)","List(1, 100, List(), List(-0.007834469019952748, 0.027541239720044863, 0.13935521990060806, -0.07501901908674173, 0.1846330331948896, 0.0462243614731253, 0.010944797657430172, -0.04282891243282291, 0.08776161726564169, 0.20493924534983105, -0.07578308688890602, 0.09236786152339643, -0.1120851207524538, -0.005323226108885137, -0.0871997620496485, -0.028782181441783905, -0.12265767840047677, 0.05817171484361299, -0.11340960135890377, -0.08821825103627311, 0.05893745335439841, 0.029793783711890377, 0.1337790654765235, 0.07171898273130257, -0.0942183743075778, -0.059960121030194886, 0.023942916757530634, 0.0016127512272861267, 0.03200614192367841, 0.046645301290684275, -0.0021026891966660815, 0.12934411855207548, -0.035469002297355064, -0.09027702154384719, 0.03656469740801387, -0.019333558777968086, -0.014211850447787179, -6.032471234599749E-4, -0.07040999974641535, 0.18889564648270607, -0.07925261257009374, 0.011747638048190208, -0.04079557370601428, 0.15504711866378784, 0.010001058379809061, -0.09306987074928151, 0.0639548779775699, -0.12949937623408106, 0.048086995672848486, 0.048843664634558887, -0.15228520178546506, -0.11296150740236044, -0.022728233287731804, -0.014385100454092026, 0.09285158415635426, 0.061062212205595434, -1.1735450890329148E-4, -0.1793293585586879, 0.034917670767754316, -0.05008082091808319, -0.010018772549099391, -0.13791593888567552, 0.1516243006206221, 0.029511247244146135, 0.022503794274396364, -0.05020020798676544, 0.10907972686820559, 0.05514562626679738, 0.05963066458288166, 0.01605759561061859, 0.11186136677861214, 0.15230233160157997, 0.03498431336548593, 0.07644686723748842, -0.043504896884163216, -0.10098452659116851, 0.28697717415828566, 0.07133984627823034, 0.018955121955109965, -0.0565152827443348, 0.05693579837679863, -0.08512597075766987, -0.08276738836947414, 0.0591610065764851, 0.005243846111827427, 0.05496647622850206, 0.05399964791205194, -0.05135098208362857, -0.1755362840162383, 0.08473383262753487, 0.05826879292726517, -0.16485844200683963, -0.1479613035917282, 0.07515330012473795, 0.08358601708379056, 0.0017860179973973166, -0.13119194092966305, -0.1597871199871103, -0.0960506378647147, 0.11417293905590971))","List(cat, died, today, im, sad, woching, video)"


The CountVectorizer provides a simple way to both tokenize a collection of text documents and build a vocabulary of known words, but also to encode new documents using that vocabulary. We used the vector it generate to fit the LDA models. Since our goal is to find topics from the texts, after train the LDA model. We extract the topics from the model and use a user difined function to translate the words in the topics into understandable english.

In [61]:
from pyspark.mllib.linalg import Vector, Vectors
from pyspark.ml.clustering import LDA
from pyspark.ml.feature import CountVectorizer, CountVectorizerModel

removed = removed.select('userid','filtered')

# minTF, minium number of times a word must appear in a document
# minDF, minimun number of documents a word must appear in
cv = CountVectorizer(inputCol="filtered", outputCol="features",minTF=2, minDF=4) 

countVectorModel = cv.fit(removed)

countVectors = (countVectorModel .transform(removed).select("userid", "features").cache())

print(len(countVectorModel.vocabulary))  # how many documents, vocab size

# k, number of topics
lda = LDA(k = 10, maxIter = 50 )


ldaModel = lda.fit(countVectors)


# Print topics and top-weighted terms
topics = ldaModel.describeTopics(maxTermsPerTopic=20)
print("The topics described by their top-weighted terms:")
topics.show(truncate=False)




In [62]:
transformed = ldaModel.transform(countVectors)
transformed.show(truncate=False)

In [63]:
vocabArray = countVectorModel.vocabulary

ListOfIndexToWords = udf(lambda wl: list([vocabArray[w] for w in wl]))
FormatNumbers = udf(lambda nl: ["{:1.4f}".format(x) for x in nl])

topics.select(ListOfIndexToWords(topics.termIndices).alias('Words list in Top 10 Topics ')).show(truncate=False, n=10)


Since the model only do a clustering of the datas, it finds different topics with word lists. I, as a humman, intepreate the top 4 topics from the words listed in the topics.

In [65]:
print("**Top 4 topics**")
print("cats with babies")
print("breakfast")
print("dogs with farm")
print("dogs friendly with people or other animals")

#### 4. Identify Creators With Cat And Dog Owners In The Audience

In [67]:
dataset.createOrReplaceTempView("dataset_sql")

In [68]:
%sql
Select creator_name, count(userid) From dataset_sql Where label = 1.0 Group by creator_name Order by count(userid) desc limit 10


creator_name,count(userid)
The Dodo,4110
Cole & Marmalade,2904
Gohan The Husky,2353
Zak Georges Dog Training rEvolution,2212
Hope For Paws - Official Rescue Channel,1873
Vet Ranch,1748
Gone to the Snow Dogs,1718
Brian Barczyk,1609
Robin Seplut,1584
Taylor Nicole Dean,1575


###5. Analysis and Future work

In this project, we used three different classification models to classify the pet owners from youtube comments. In this case, the gradient boosting model out perform slightly to the other two with AUC of 0.97. Based on the classifier, around 15 percent of all the users from the datasets are pet onwers. We also used LDA model to find out the topics pet onwer interested most are "cats with babies", "breakfast", "dogs with farm", and "dogs friendly with people or other animals". 

Some problems I encountered during the project:
1.  The prediction result of 15 percent pet onwers in all the users seems too high, compared with 1 percent labled onwers in the original dataset. The reason is becuase when we first train the model with used undersampling to due with the imbalance data with a precission rate of 0.88. It means there are 0.12 chance we will falsely predict non pet onwers as pet onwers. Since the original dataset includes too many non pet onwers(labled 0), when we through all the data into the model, it falsely predict too many non pet onwers as pet oners. 
2.  The users that we labled 0 are the users who did not say they have pets or indicates they have pets. We actually cannot sure they don't have pets. Using them to train the classifiers may impact the true accuracy of the classifiers.
3.  How to better intepreate the topics extracts from LDA models. Since LDA models classify the topics and generate documents only using optimizer, we need to look at the words list belong to the topics to guest the topics. This can be a little bit tricky.