##Content
build up classification models for the cat and dog owners from text comments
* Upload and Import Data

## Upload and Import Data

In [0]:
file_location = "/FileStore/tables/animals_comments_csv.gz"
df = spark.read.load(file_location, format='csv', header = True, inferSchema = True)
df.show(10)

+--------------------+------+-------------------------------------+
|        creator_name|userid|                              comment|
+--------------------+------+-------------------------------------+
|        Doug The Pug|  87.0|                 I shared this to ...|
|        Doug The Pug|  87.0|                   Super cute  😀🐕🐶|
|         bulletproof| 530.0|                 stop saying get e...|
|       Meu Zoológico| 670.0|                 Tenho uma jiboia ...|
|              ojatro|1031.0|                 I wanna see what ...|
|     Tingle Triggers|1212.0|                 Well shit now Im ...|
|Hope For Paws - O...|1806.0|                 when I saw the en...|
|Hope For Paws - O...|2036.0|                 Holy crap. That i...|
|          Life Story|2637.0|武器はクエストで貰えるんじゃないん...|
|       Brian Barczyk|2698.0|                 Call the teddy Larry|
+--------------------+------+-------------------------------------+
only showing top 10 rows



## Data Preprocessing

In [0]:
df.dtypes

Out[2]: [('creator_name', 'string'), ('userid', 'double'), ('comment', 'string')]

In [0]:
print("Number of rows in df:", df.count())

Number of rows in df: 5820035


In [0]:
# Count null values in each columns 
# The null values are less than 1% of the total data, they are removed
print('Number of null values in creator_name: ',df.filter(df['creator_name'].isNull()).count())
print('Number of null values in userid: ',df.filter(df['userid'].isNull()).count())
print('Number of null values in comment: ',df.filter(df['comment'].isNull()).count())

Number of null values in creator_name:  32050
Number of null values in userid:  565
Number of null values in comment:  1051


### Removing Missing Data

In [0]:
#remove rows with no comments and no userid
def remove_null(df):
    df_drop = df.filter(df['comment'].isNotNull())
    df_drop = df_drop.filter(df_drop['userid'].isNotNull())
    df_drop = df_drop.dropDuplicates()
  
    print('After dropping, we have ', str(df_drop.count()), 'row in dataframe')
    return df_drop
df = remove_null(df)

After dropping, we have  5757212 row in dataframe


### Covert all comments to lower case

In [0]:
import pyspark.sql.functions as F
df = df.withColumn('comment', F.lower(F.col('comment')))
display(df)

creator_name,userid,comment
Brian Barczyk,28322.0,since i watched these snake videos i am not afraid of snakes anymore and now i really want a snake for my birthday
WaysideWaifs,89939.0,why are humans so inhuman...
MaxluvsMya,3267.0,why are you always awake so late lol
Brave Wilderness,130178.0,what is worst noni or the bullet ant sting?
Hope For Paws - Official Rescue Channel,1806.0,when i saw the end it said to adopt i saw different animal sites i was mad that they separated the cute little pups after being together for a long time
Brian Barczyk,2698.0,call the teddy larry
The Dodo,24401.0,u need a scuba tank so you can stay down longer
BarkBox,41679.0,well never stop getting barkbox for our dog he likes it too much. does anyone know if there is a barkbox option to get just one toy a month? thats all our dog can handle
RaleighLink14,65992.0,since bratayley put up this video annie only went to see her horse 2 times. they travel to much and i totally agree what raleigh said. also green horse and green rider... it isnt a great mix. im not being rude its just my opinion. xx
Viktor Larkhill,89939.0,omg.. i pray the doggies get well soon and finds lovely forever homes.... viktor you are an inspiration to many... god bless you..


### Data Sampling (around 10% data)

In [0]:
# To preprocessing all data and train models cost too long for free databrick version. only use 500000 rows of data for this project
from pyspark.sql.functions import rand 
df.orderBy(rand(seed=0)).createOrReplaceTempView("data")
df_sample = spark.sql("select * from data limit 500000")
display(df_sample)

creator_name,userid,comment
Waggle TV,935462.0,omg so cuuuttteeee! i my heart melted 😍 😍 😍 😍 😍 🐱🐱🐱🐱🐈🐈🐈🐈
Brave Wilderness,823749.0,will i be with you next time?
The Pet Collective,178914.0,is it stopped? lol
Talking Kitty Cat,2370875.0,do another vid
Brave Wilderness,1018655.0,you should try ofc if your in ohio
Info Marvel,132417.0,quiero un funko xd
Hope For Paws - Official Rescue Channel,2267432.0,aww this video is so sweet...i was crying when i saw the name😢
Brave Wilderness,854684.0,and it was big as that one
Ericas Slot World,32528.0,awesome wins
Shehry Vlogs,1269044.0,happy ramzan😃🌹💮ramzan mubarak.....


In [0]:
df_sample.count()

Out[8]: 500000

###Construct Label Feature

In [0]:
# find user with preference of dog and cat
cond = (df_sample["comment"].like("%my dog%") | df_sample["comment"].like("%i have a dog%")\
        | df_sample["comment"].like("%my cat%") | df_sample["comment"].like("%i have a cat%") \
        | df_sample["comment"].like("%my dogs%") | df_sample["comment"].like("%my cats%")\
        | df_sample["comment"].like("%my cat%") | df_sample["comment"].like("%i have dogs%")\
        | df_sample["comment"].like("%i have cats%") | df_sample["comment"].like("%my puppy%")\
        | df_sample["comment"].like("%my kitten%") | df_sample["comment"].like("%i have a puppy%")\
        | df_sample["comment"].like("%i have puppies%"))

df_sample = df_sample.withColumn('dog_cat',  cond)
df_sample = df_sample.withColumn('no_pet', ~df_sample["comment"].like("%my%") & ~df_sample["comment"].like("%have%") & ~df_sample["comment"].like("%my dog%") \
                              & ~df_sample["comment"].like("%my cat%")) 

display(df_sample)

creator_name,userid,comment,dog_cat,no_pet
Waggle TV,935462.0,omg so cuuuttteeee! i my heart melted 😍 😍 😍 😍 😍 🐱🐱🐱🐱🐈🐈🐈🐈,False,False
Brave Wilderness,823749.0,will i be with you next time?,False,True
The Pet Collective,178914.0,is it stopped? lol,False,True
Talking Kitty Cat,2370875.0,do another vid,False,True
Brave Wilderness,1018655.0,you should try ofc if your in ohio,False,True
Info Marvel,132417.0,quiero un funko xd,False,True
Hope For Paws - Official Rescue Channel,2267432.0,aww this video is so sweet...i was crying when i saw the name😢,False,True
Brave Wilderness,854684.0,and it was big as that one,False,True
Ericas Slot World,32528.0,awesome wins,False,True
Shehry Vlogs,1269044.0,happy ramzan😃🌹💮ramzan mubarak.....,False,True


### Tokenizing

In [0]:
from pyspark.ml.feature import RegexTokenizer, Word2Vec
regexTokenizer = RegexTokenizer(inputCol="comment", outputCol="words", pattern="\\W")
df_sample = regexTokenizer.transform(df_sample)
df_sample.show(10)

+--------------------+---------+--------------------+-------+------+--------------------+
|        creator_name|   userid|             comment|dog_cat|no_pet|               words|
+--------------------+---------+--------------------+-------+------+--------------------+
|           Waggle TV| 935462.0|omg so cuuuttteee...|  false| false|[omg, so, cuuuttt...|
|    Brave Wilderness| 823749.0|will i be with yo...|  false|  true|[will, i, be, wit...|
|  The Pet Collective| 178914.0|  is it stopped? lol|  false|  true|[is, it, stopped,...|
|   Talking Kitty Cat|2370875.0|      do another vid|  false|  true|  [do, another, vid]|
|    Brave Wilderness|1018655.0|you should try  o...|  false|  true|[you, should, try...|
|         Info Marvel| 132417.0|  quiero un funko xd|  false|  true|[quiero, un, funk...|
|Hope For Paws - O...|2267432.0|aww this video is...|  false|  true|[aww, this, video...|
|    Brave Wilderness| 854684.0|and it was big as...|  false|  true|[and, it, was, bi...|
|   Ericas

### Obatain Word Vector using Word2vect

In [0]:
word2Vec = Word2Vec(vectorSize=30, minCount=1, inputCol="words", outputCol="wordVector")
model = word2Vec.fit(df_sample)
data= model.transform(df_sample)
display(data)

creator_name,userid,comment,dog_cat,no_pet,words,wordVector
Waggle TV,935462.0,omg so cuuuttteeee! i my heart melted 😍 😍 😍 😍 😍 🐱🐱🐱🐱🐈🐈🐈🐈,False,False,"List(omg, so, cuuuttteeee, i, my, heart, melted)","Map(vectorType -> dense, length -> 30, values -> List(-0.07466187940112182, -0.20838288410699793, 0.1324264944830377, -0.024825547555727617, -0.12878190326903546, 0.03794203592198236, 0.27583101764321327, 0.04214263265021145, -0.10951728985777923, -0.0470111785190446, -0.012633550113865306, 0.187299339965518, -0.13388548059655087, 0.08390451595187187, -0.050774987787008286, -0.14597763047952736, -0.08779163957972612, 0.03465812662450064, -0.17476791170026573, -0.0288113286452634, -0.1104191640791084, -0.21200774902743952, 0.09595432797712938, 0.21904150902160574, 0.2589101195335388, -0.07689503580331802, -0.2554157045004623, -0.4088724540280444, -0.0403162034095398, 0.15587798372975417))"
Brave Wilderness,823749.0,will i be with you next time?,False,True,"List(will, i, be, with, you, next, time)","Map(vectorType -> dense, length -> 30, values -> List(-0.15598009253985115, -0.14778589191181318, 0.015028277678149087, -0.0442960544356278, 0.07165342409695898, -0.09080295317939349, 0.0918543870959963, -0.015054452632154736, -0.25161305389233996, -0.09471620725733892, -0.07438113221100398, 0.26989313015448196, 0.2526968690965857, -0.4030399693708334, 0.07358090552900519, -0.10044817067682743, 0.07172632696373121, 0.204570255109242, -0.21608497548316205, 0.09086645261517592, 0.15106882687125886, 0.0899548019681658, 0.08376904683453695, -0.06153537213270153, -0.19207279809883662, -0.15848060431224958, -0.1900717381920133, -0.03134851264102118, 0.012265282550028392, -0.05460901611617633))"
The Pet Collective,178914.0,is it stopped? lol,False,True,"List(is, it, stopped, lol)","Map(vectorType -> dense, length -> 30, values -> List(-0.016521641286090016, -0.18212509714066982, 0.17548917466774583, -0.07953933178214356, -0.1193255740654422, -0.13871250208467245, 0.27975744009017944, 0.18910090997815132, -0.08840985130518675, -0.2557426504790783, 0.1024506411049515, 0.26153228245675564, 0.27808234794065356, 0.029358407482504845, 0.05377034191042185, 0.012973509263247252, -0.0032087923027575016, -0.09663892351090908, -0.3679237812757492, 0.06315898150205612, 0.17904045432806015, -0.006913928315043449, -0.020719077438116074, 0.002955511212348938, 0.09306730143725872, 0.010158304125070572, -0.026430388912558556, -0.09001512243412435, -0.12717261794023216, 0.06006549345329404))"
Talking Kitty Cat,2370875.0,do another vid,False,True,"List(do, another, vid)","Map(vectorType -> dense, length -> 30, values -> List(-0.15587492287158966, -0.13115408147374788, 0.042994365096092224, -0.07564760247866312, -0.1827316315223773, -0.015837964912255604, -0.05348749707142512, -0.23304010927677155, -0.1670026332139969, 0.07268123825391133, 0.1043395921587944, -0.017412965496381123, 0.20958862950404483, -0.21023389200369516, -0.12187538792689641, 0.03374408682187398, 0.21416370570659637, 0.12094537013520797, -0.24724893644452095, 0.3040754795074463, 0.12318279842535654, 0.5078013042608897, 0.04353859151403109, -0.040791230276227, 0.2529054780801137, -0.050668080647786454, -0.2665165600677331, 0.0848734900355339, -0.1322258673608303, -0.3244616960485776))"
Brave Wilderness,1018655.0,you should try ofc if your in ohio,False,True,"List(you, should, try, ofc, if, your, in, ohio)","Map(vectorType -> dense, length -> 30, values -> List(-0.1768311089836061, -0.10910743288695812, -0.047438159002922475, -0.09360082124476321, -0.13761433097533882, -0.03112967498600483, -0.006305700575467199, 0.040213123662397265, -0.11776560975704342, 0.06069771503098309, -0.24581995548214763, 0.42824289202690125, 0.08176304576772964, -0.25065838173031807, 0.11232816893607378, -0.009214894904289395, 0.07649663253687322, 0.11297939528594725, -0.21079476663726382, 0.1600798841973301, -0.040204108954640105, 0.2089224299415946, 0.005775229772552848, -0.051605339627712965, -0.08753122016787529, -0.07694539311341941, -0.10380100849579321, -0.010449278866872191, -0.0692386431619525, -0.010960160405375063))"
Info Marvel,132417.0,quiero un funko xd,False,True,"List(quiero, un, funko, xd)","Map(vectorType -> dense, length -> 30, values -> List(-0.04965228866785765, -0.6036486402153969, 0.9172139875590801, -0.6959440410137177, 0.27431584644364193, -0.46230283193290234, -0.9302610550075769, 0.7558090910315514, 0.40262281708419323, 0.24870476126670837, 0.047494953498244286, -0.885614313185215, -0.5211499631404877, -0.31131035881116986, -0.48750505270436406, -0.43783731386065483, 0.1564502976834774, 0.021454849280416965, 0.8348748190328479, 0.3472715523093939, 0.21744293998926878, -0.34486994333565235, -0.7017205283045769, -0.25977401644922793, 0.18203275464475155, -0.2312862789258361, -0.1671841866336763, -0.1449403204023838, 0.25162531435489655, 0.3543026074767113))"
Hope For Paws - Official Rescue Channel,2267432.0,aww this video is so sweet...i was crying when i saw the name😢,False,True,"List(aww, this, video, is, so, sweet, i, was, crying, when, i, saw, the, name)","Map(vectorType -> dense, length -> 30, values -> List(-0.17388105471452164, -0.17635770647653512, 0.11213947852541292, -0.007284881813185555, -0.14718350142772707, -0.05663168563374451, 0.3527114072016307, 0.19368381478956767, -0.20305205056709902, -0.21203413140028715, 0.00430180478308882, 0.129939379170537, -0.006958977452346257, -0.07725366221607795, -0.14426058158278465, -0.0017233137373945542, 0.11165118802871023, -0.06048488883035523, -0.16882682372150676, -0.07396566428776298, 0.05908920326536255, -0.004882618917950562, -0.05748370928423745, 0.18845020163072537, 0.24485371421490396, -0.0716034984216094, -0.1841510182712227, -0.20048266170280318, -0.11113322439736553, 0.06684482516720891))"
Brave Wilderness,854684.0,and it was big as that one,False,True,"List(and, it, was, big, as, that, one)","Map(vectorType -> dense, length -> 30, values -> List(-0.0903529967846615, -0.12684933734791617, 0.10432945830481392, -0.05057435802050999, 0.04517484935266631, -0.02650251651981047, 0.18763755900519233, 0.2345056358192648, -0.1387855006115777, -0.15196838229894638, -0.06980751308479478, 0.2662965697901589, 0.12900855631700583, 0.018280794577939168, 0.06906960372413908, 0.046483908713396103, 0.1286927850118705, -0.1379280047757285, -0.18772244988940656, 0.15806229379294173, 0.16668830386229924, 0.013362118708235875, 0.10945012234151363, 0.005151095667055675, 0.06136037392674812, -0.0794419630962823, -0.20508211851119995, 0.01025815973324435, -0.04423427688223975, -0.005580992570945195))"
Ericas Slot World,32528.0,awesome wins,False,True,"List(awesome, wins)","Map(vectorType -> dense, length -> 30, values -> List(-0.23501548171043396, -0.15518826618790627, 0.16493510082364082, 0.26655468344688416, 0.1584939043968916, 0.4814159870147705, -0.09030993282794952, 0.005224987864494324, -0.0020321980118751526, -0.638104185461998, 0.23814036324620247, 0.012775108218193054, -0.13262592628598213, -0.17935652658343315, 0.02705739438533783, 0.06833615200594068, 0.319446824491024, 0.5959937945008278, -0.5805033892393112, 0.10222168639302254, 0.2918735593557358, 0.4069572687149048, -0.2841838710010052, 0.0022728145122528076, 0.23494999669492245, -0.03319522738456726, 0.04919132590293884, -0.028248414397239685, -0.48357412219047546, -0.20368111436255276))"
Shehry Vlogs,1269044.0,happy ramzan😃🌹💮ramzan mubarak.....,False,True,"List(happy, ramzan, ramzan, mubarak)","Map(vectorType -> dense, length -> 30, values -> List(0.019915775395929813, -0.21618522331118584, -0.021221652626991272, 0.15618060529232025, -0.04125012829899788, 0.19000037061050534, -0.1630452978424728, 0.010360442101955414, -0.08709165826439857, 0.1385887023061514, -0.06043743249028921, -0.005516913719475269, -0.01654851995408535, -0.07726339239161462, -0.12535075470805168, -0.08393264748156071, -0.04690156411379576, 0.17561960592865944, -0.013811320066452026, -0.16415990889072418, -0.0018517039716243744, -0.14683019020594656, -0.22210168093442917, -0.10989185376092792, -0.00905473344027996, -0.31734729558229446, 0.055108504835516214, -0.281668484210968, -0.2028075009584427, 0.03616253437940031))"


In [0]:
data.count()

Out[12]: 500000

### Fix Imbalance Data  - Under Sample

In [0]:
# negaive lable is 10 times more than positive lable
data_pets = data.filter(F.col('dog_cat') == True) 
data_no_pets = data.filter(F.col('no_pet') ==  True)
print("Number of confirmed user who own dogs or cats: ", data_pets.count())
print("Number of confirmed user who don't have pet's: ", data_no_pets.count())

Number of confirmed user who own dogs or cats:  4572
Number of confirmed user who don't have pet's:  416025


In [0]:
# label ratio = 1/2 
data_no_pets.orderBy(rand()).createOrReplaceTempView("table")

Num_Pos_Label = data.filter(F.col('dog_cat') == True).count() 
Num_Neg_Label = data.filter(F.col('no_pet') ==  True).count()

#Q1 = spark.sql("SELECT col1 from table where col2>500 limit {}, 1".format(q25))
#pass variable to sql
data_no_pets_down = spark.sql("select * from table where limit {}".format(Num_Pos_Label*2))

In [0]:
print('Now after balancing the lables, we have ')   
print('Positive label: ', Num_Pos_Label)
print('Negtive label: ', data_no_pets_down.count())

Now after balancing the lables, we have 
Positive label:  4572
Negtive label:  9144


### Target Feature Encoding

In [0]:
def get_label(data_pets,data_no_pets_down):
    data_labeled = data_pets.select('dog_cat','wordVector').union(data_no_pets_down.select('dog_cat','wordVector'))
    return data_labeled

data_labeled = get_label(data_pets,data_no_pets_down)
data_labeled.show(10)

+-------+--------------------+
|dog_cat|          wordVector|
+-------+--------------------+
|   true|[-0.1902915891259...|
|   true|[-0.0985827284372...|
|   true|[-0.1430899834781...|
|   true|[-0.0264325229864...|
|   true|[-0.3016146466963...|
|   true|[-0.1632387682440...|
|   true|[-0.1459623438882...|
|   true|[-0.1018698478943...|
|   true|[-0.1086992367337...|
|   true|[-0.1035014218746...|
+-------+--------------------+
only showing top 10 rows



In [0]:
data_labeled

Out[20]: DataFrame[dog_cat: boolean, wordVector: vector]

In [0]:
#convert Boolean value to 1 and 0's
from pyspark.sql.functions import col, udf
from pyspark.sql.types import IntegerType

def multiple(x):
    return int(x*1)
udf_boolToInt= udf(lambda z: multiple(z),IntegerType())
df_labeled = data_labeled.withColumn('label',udf_boolToInt('dog_cat'))
df_labeled.show(10)

+-------+--------------------+-----+
|dog_cat|          wordVector|label|
+-------+--------------------+-----+
|   true|[-0.1902915891259...|    1|
|   true|[-0.0985827284372...|    1|
|   true|[-0.1430899834781...|    1|
|   true|[-0.0264325229864...|    1|
|   true|[-0.3016146466963...|    1|
|   true|[-0.1632387682440...|    1|
|   true|[-0.1459623438882...|    1|
|   true|[-0.1018698478943...|    1|
|   true|[-0.1086992367337...|    1|
|   true|[-0.1035014218746...|    1|
+-------+--------------------+-----+
only showing top 10 rows



## Model Selection and Evaluation

### Training and Test Data Split

In [0]:
train, test = df_labeled.randomSplit([0.8, 0.2], seed=12345)

### Logistic Regression

In [0]:
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier


In [0]:
lr = LogisticRegression(featuresCol="wordVector",labelCol="label" , maxIter=10, regParam=0.1, elasticNetParam=0.8)

# Run TrainValidationSplit, and choose the best set of parameters.
lrModel = lr.fit(train)

# Make predictions on test data. model is the model with combination of parameters
# that performed best.
lr_predictions = lrModel.transform(test)
lr_predictions.show(10)

+-------+--------------------+-----+--------------------+--------------------+----------+
|dog_cat|          wordVector|label|       rawPrediction|         probability|prediction|
+-------+--------------------+-----+--------------------+--------------------+----------+
|   true|[-0.4852680265903...|    1|[0.21563014565240...|[0.55369962770904...|       0.0|
|   true|[-0.4090231597423...|    1|[-0.1617485063958...|[0.45965080498417...|       1.0|
|   true|[-0.4039231002330...|    1|[-0.1790656375623...|[0.45535282602227...|       1.0|
|   true|[-0.3966523755341...|    1|[-0.0708856887650...|[0.48228599460391...|       1.0|
|   true|[-0.3866898834705...|    1|[0.31097878457853...|[0.57712415334824...|       0.0|
|   true|[-0.3690780947605...|    1|[0.20052949025479...|[0.54996505146449...|       0.0|
|   true|[-0.3656601145863...|    1|[-0.1030458086708...|[0.47426131918633...|       1.0|
|   true|[-0.3514966964721...|    1|[-0.5499973971744...|[0.36586501286484...|       1.0|
|   true|[

In [0]:
# Check the ROC-AUC
def get_evaluation_result(predictions):
    evaluator = BinaryClassificationEvaluator(
      labelCol="label", rawPredictionCol="rawPrediction", metricName="areaUnderROC")
    AUC = evaluator.evaluate(predictions)
    TP = predictions[(predictions["label"] == 1) & (predictions["prediction"] == 1.0)].count()
    FP = predictions[(predictions["label"] == 0) & (predictions["prediction"] == 1.0)].count()
    TN = predictions[(predictions["label"] == 0) & (predictions["prediction"] == 0.0)].count()
    FN = predictions[(predictions["label"] == 1) & (predictions["prediction"] == 0.0)].count()

    accuracy = (TP + TN)*1.0 / (TP + FP + TN + FN)
    precision = TP*1.0 / (TP + FP)
    recall = TP*1.0 / (TP + FN)
    print ("Test Accuracy:", accuracy)
    print ("Test Precision:", precision)
    print ("Test Recall:", recall)
    print ("Test AUC of ROC:", AUC)

print("Prediction result summary for Logistic Regression Model:  ")
get_evaluation_result(lr_predictions)

Prediction result summary for Logistic Regression Model:  
Test Accuracy: 0.693745506829619
Test Precision: 0.8
Test Recall: 0.09586056644880174
Test AUC of ROC: 0.9043422209131635


### Random Forest

In [0]:
rf = RandomForestClassifier(labelCol="label", featuresCol="wordVector", numTrees=15)
rf_model = rf.fit(train)
rf_predictions = rf_model.transform(test)
rf_predictions.show(10)

+-------+--------------------+-----+--------------------+--------------------+----------+
|dog_cat|          wordVector|label|       rawPrediction|         probability|prediction|
+-------+--------------------+-----+--------------------+--------------------+----------+
|   true|[-0.4852680265903...|    1|[8.1689355209498,...|[0.54459570139665...|       0.0|
|   true|[-0.4090231597423...|    1|[2.92927524127241...|[0.19528501608482...|       1.0|
|   true|[-0.4039231002330...|    1|[5.55409926311906...|[0.37027328420793...|       1.0|
|   true|[-0.3966523755341...|    1|[4.36597548586913...|[0.29106503239127...|       1.0|
|   true|[-0.3866898834705...|    1|[7.44143199858576...|[0.49609546657238...|       1.0|
|   true|[-0.3690780947605...|    1|[5.94873407172154...|[0.39658227144810...|       1.0|
|   true|[-0.3656601145863...|    1|[4.46494719494271...|[0.29766314632951...|       1.0|
|   true|[-0.3514966964721...|    1|[2.92927524127241...|[0.19528501608482...|       1.0|
|   true|[

In [0]:
print("Prediction result summary for Random Forest Model:  ")
get_evaluation_result(rf_predictions)

Prediction result summary for Random Forest Model:  
Test Accuracy: 0.8838964773544212
Test Precision: 0.8789808917197452
Test Recall: 0.7516339869281046
Test AUC of ROC: 0.9494603635445604
