# Naive Bayes in Hadoop MR
__`MIDS w261: Machine Learning at Scale | UC Berkeley School of Information | Spring 2020`__

We use Hadoop MapReduce to implement simple parallelized machine learning algorithm: Naive Bayes. 

## Notebook Setup

In [1]:
# imports
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
%reload_ext autoreload
%autoreload 2

In [36]:
# global vars (paths) - ADJUST AS NEEDED
JAR_FILE = "/usr/lib/hadoop-mapreduce/hadoop-streaming.jar"
HDFS_DIR = "/user/root/HW2"
HOME_DIR = "" # FILL IN HERE eg. /media/notebooks/Assignments/HW2

In [3]:
# save path for use in Hadoop jobs (-cmdenv PATH={PATH})
from os import environ
PATH  = environ['PATH']

In [4]:
# data path
ENRON = "data/enronemail_1h.txt"

In [5]:
# make the HDFS directory if it doesn't already exist
!hdfs dfs -ls 
!hdfs dfs -mkdir {HDFS_DIR}

Found 1 items
drwxr-xr-x   - root supergroup          0 2020-01-25 16:02 demo2


In [6]:
!pwd

/media/notebooks/Assignments/HW2


In [9]:
# take a look at the first 100 characters of the first 5 records (RUN THIS CELL AS IS)
!head -n 5 /media/notebooks/Assignments/HW2/{ENRON} | cut -c-100

0001.1999-12-10.farmer	0	 christmas tree farm pictures	NA
0001.1999-12-10.kaminski	0	 re: rankings	 thank you.
0001.2000-01-17.beck	0	 leadership development pilot	" sally:  what timing, ask and you shall receiv
0001.2000-06-06.lokay	0	" key dates and impact of upcoming sap implementation over the next few week
0001.2001-02-07.kitchen	0	 key hr issues going forward	 a) year end reviews-report needs generating 


In [10]:
# see how many messages/lines are in the file 
#(this number may be off by 1 if the last line doesn't end with a newline)
!wc -l /media/notebooks/Assignments/HW2/{ENRON}

100 /media/notebooks/Assignments/HW2/data/enronemail_1h.txt


In [11]:
# make the HDFS directory if it doesn't already exist
!hdfs dfs -mkdir {HDFS_DIR}

mkdir: `/user/root/HW2': File exists


In [12]:
# load the data into HDFS (RUN THIS CELL AS IS)
!hdfs dfs -copyFromLocal /media/notebooks/Assignments/HW2/{ENRON} {HDFS_DIR}/enron.txt

In [13]:
!hdfs dfs -ls {HDFS_DIR}

Found 1 items
-rw-r--r--   1 root supergroup     204559 2020-01-26 11:22 /user/root/HW2/enron.txt


# Enron Ham/Spam EDA.
Before building the classifier, lets get aquainted with our data. We're interested in which words occur more in spam emails than in legitimate ("ham") emails. 

We implement two Hadoop MapReduce jobs to count and sort word occurrences by document class. 

In [14]:
# part a - do your work in the provided scripts then RUN THIS CELL AS IS
!chmod a+x EnronEDA/mapper.py
!chmod a+x EnronEDA/reducer.py

In [37]:
# part a - clear output directory in HDFS (RUN THIS CELL AS IS)
!hdfs dfs -rm -r {HDFS_DIR}/eda-output

rm: `/user/root/HW2/eda-output': No such file or directory


In [38]:
# part a - Hadoop streaming job (RUN THIS CELL AS IS)
!hdfs dfs -rm -r {HDFS_DIR}/eda-output
!hadoop jar {JAR_FILE} \
  -files EnronEDA/reducer.py,EnronEDA/mapper.py \
  -mapper mapper.py \
  -reducer reducer.py \
  -input {HDFS_DIR}/enron.txt \
  -output {HDFS_DIR}/eda-output \
  -numReduceTasks 2 \
  -cmdenv PATH={PATH}

rm: `/user/root/HW2/eda-output': No such file or directory
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.6.0-cdh5.16.2.jar] /tmp/streamjob7889522164996968704.jar tmpDir=null
20/01/26 15:05:54 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/26 15:05:54 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/26 15:05:55 INFO mapred.FileInputFormat: Total input paths to process : 1
20/01/26 15:05:55 INFO mapreduce.JobSubmitter: number of splits:2
20/01/26 15:05:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1579903014542_0027
20/01/26 15:05:55 INFO impl.YarnClientImpl: Submitted application application_1579903014542_0027
20/01/26 15:05:55 INFO mapreduce.Job: The url to track the job: http://docker.w261:8088/proxy/application_1579903014542_0027/
20/01/26 15:05:55 INFO mapreduce.Job: Running job: job_1579903014542_0027
20/01/26 15:06:02 INFO mapreduce.Job: Job job_1579903014542_0027 running in uber mode : false
20

In [39]:
# part a - retrieve results from HDFS & copy them into a local file (RUN THIS CELL AS IS)
!hdfs dfs -cat {HDFS_DIR}/eda-output/part-0000* > EnronEDA/results.txt

In [40]:
# part b - write your grep command here
!grep 'assistance' EnronEDA/results.txt

assistance	1	8
assistance	0	2


In [46]:
# part d - clear the output directory in HDFS (RUN THIS CELL AS IS)
!hdfs dfs -rm -r {HDFS_DIR}/eda-sort-output

Deleted /user/root/HW2/eda-sort-output


In [49]:
# part d - write your Hadoop streaming job here
!hdfs dfs -rm -r {HDFS_DIR}/eda-sort-output
!hadoop jar {JAR_FILE} \
  -D stream.num.map.output.key.fields=3 \
  -D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator \
  -D mapreduce.partition.keycomparator.options="-k2,2nr -k3,3nr" \
  -mapper /bin/cat \
  -reducer /bin/cat \
  -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
  -input {HDFS_DIR}/eda-output \
  -output {HDFS_DIR}/eda-sort-output \
  -numReduceTasks 2 \
  -cmdenv PATH={PATH}

Deleted /user/root/HW2/eda-sort-output
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.6.0-cdh5.16.2.jar] /tmp/streamjob8844326108999041629.jar tmpDir=null
20/01/26 15:36:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/26 15:36:05 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/26 15:36:05 INFO mapred.FileInputFormat: Total input paths to process : 2
20/01/26 15:36:05 INFO mapreduce.JobSubmitter: number of splits:2
20/01/26 15:36:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1579903014542_0031
20/01/26 15:36:06 INFO impl.YarnClientImpl: Submitted application application_1579903014542_0031
20/01/26 15:36:06 INFO mapreduce.Job: The url to track the job: http://docker.w261:8088/proxy/application_1579903014542_0031/
20/01/26 15:36:06 INFO mapreduce.Job: Running job: job_1579903014542_0031
20/01/26 15:36:13 INFO mapreduce.Job: Job job_1579903014542_0031 running in uber mode : false
20/01/26 15:36:13 INFO

In [50]:
# part d - view the top 10 records from each partition (RUN THIS CELL AS IS)
for idx in range(2):
    print(f"\n===== part-0000{idx}=====\n")
    !hdfs dfs -cat {HDFS_DIR}/eda-sort-output/part-0000{idx} | head


===== part-00000=====

and	1	392	
a	1	347	
you	1	345	
of	1	336	
for	1	204	
it	1	152	
that	1	145	
on	1	136	
is	1	135	
will	1	102	
cat: Unable to write to output stream.

===== part-00001=====

the	1	698	
to	1	566	
your	1	357	
in	1	236	
com	1	153	
this	1	143	
i	1	140	
or	1	117	
we	1	116	
with	1	116	
cat: Unable to write to output stream.


In [None]:
# part e - clear the output directory in HDFS (RUN THIS CELL AS IS)
!hdfs dfs -rm -r {HDFS_DIR}/eda-sort-output

In [44]:
# part e - write your Hadoop streaming job here
!hdfs dfs -rm -r {HDFS_DIR}/eda-sort-output
!hadoop jar {JAR_FILE} \
  -D stream.num.map.output.key.fields=3 \
  -D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator \
  -D mapreduce.partition.keycomparator.options="-k3,3nr" \
  -D mapreduce.partition.keypartitioner.options="-k2,2" \
  -mapper /bin/cat \
  -reducer /bin/cat \
  -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
  -input {HDFS_DIR}/eda-output \
  -output {HDFS_DIR}/eda-sort-output \
  -numReduceTasks 2 \
  -cmdenv PATH={PATH}

Deleted /user/root/HW2/eda-sort-output
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.6.0-cdh5.16.2.jar] /tmp/streamjob3006420551311855225.jar tmpDir=null
20/01/26 15:31:09 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/26 15:31:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/26 15:31:10 INFO mapred.FileInputFormat: Total input paths to process : 2
20/01/26 15:31:10 INFO mapreduce.JobSubmitter: number of splits:2
20/01/26 15:31:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1579903014542_0029
20/01/26 15:31:11 INFO impl.YarnClientImpl: Submitted application application_1579903014542_0029
20/01/26 15:31:11 INFO mapreduce.Job: The url to track the job: http://docker.w261:8088/proxy/application_1579903014542_0029/
20/01/26 15:31:11 INFO mapreduce.Job: Running job: job_1579903014542_0029
20/01/26 15:31:19 INFO mapreduce.Job: Job job_1579903014542_0029 running in uber mode : false
20/01/26 15:31:19 INFO

In [45]:
# part e - view the top 10 records from each partition (RUN THIS CELL AS IS)
for idx in range(2):
    print(f"\n===== part-0000{idx}=====\n")
    !hdfs dfs -cat {HDFS_DIR}/eda-sort-output/part-0000{idx} | head


===== part-00000=====

the	0	549	
to	0	398	
ect	0	382	
and	0	278	
of	0	230	
hou	0	206	
a	0	196	
in	0	182	
for	0	170	
on	0	135	
cat: Unable to write to output stream.

===== part-00001=====

the	1	698	
to	1	566	
and	1	392	
your	1	357	
a	1	347	
you	1	345	
of	1	336	
in	1	236	
for	1	204	
com	1	153	
cat: Unable to write to output stream.


__Expected output:__
<table>
<th>part-00000:</th>
<th>part-00001:</th>
<tr><td><pre>
the	0	549	
to	0	398	
ect	0	382	
and	0	278	
of	0	230	
hou	0	206	
a	0	196	
in	0	182	
for	0	170	
on	0	135
</pre></td>
<td><pre>
the	1	698	
to	1	566	
and	1	392	
your	1	357	
a	1	347	
you	1	345	
of	1	336	
in	1	236	
for	1	204	
com	1	153
</pre></td></tr>
</table>

# Enron Ham/Spam NB Classifier & Results.

__Test/Train split__

In [289]:
# part a - test/train split (RUN THIS CELL AS IS)
!head -n 80 data/enronemail_1h.txt > data/enron_train.txt
!tail -n 20 data/enronemail_1h.txt > data/enron_test.txt
!hdfs dfs -copyFromLocal data/enron_train.txt {HDFS_DIR}
!hdfs dfs -copyFromLocal data/enron_test.txt {HDFS_DIR}

__Training__ (Enron MNB Model _without smoothing_ )

In [398]:
# part b -  Unsmoothed model (FILL IN THE MISSING CODE BELOW)

# clear the output directory
!hdfs dfs -rm -r {HDFS_DIR}/enron-model

# hadoop command
!hadoop jar {JAR_FILE} \
  -D stream.num.map.output.key.fields=2 \
  -D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator \
  -D mapreduce.partition.keypartitioner.options="-k1,1" \
  -D mapreduce.partition.keycomparator.options="-k2,2 " \
  -files NaiveBayes/train_mapper.py,NaiveBayes/train_reducer.py \
  -mapper train_mapper.py \
  -reducer train_reducer.py \
  -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
  -input {HDFS_DIR}/enron_train.txt \
  -output {HDFS_DIR}/enron-model \
  -cmdenv PATH={PATH} \
  -numReduceTasks 2

# save the model locally
!mkdir NaiveBayes/Unsmoothed
!hdfs dfs -cat {HDFS_DIR}/enron-model/part-000* > NaiveBayes/Unsmoothed/NBmodel.txt

Deleted /user/root/HW2/enron-model
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.6.0-cdh5.16.2.jar] /tmp/streamjob1738499756421908179.jar tmpDir=null
20/01/30 22:43:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/30 22:43:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/30 22:43:42 INFO mapred.FileInputFormat: Total input paths to process : 1
20/01/30 22:43:42 INFO mapreduce.JobSubmitter: number of splits:2
20/01/30 22:43:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1579903014542_0111
20/01/30 22:43:42 INFO impl.YarnClientImpl: Submitted application application_1579903014542_0111
20/01/30 22:43:42 INFO mapreduce.Job: The url to track the job: http://docker.w261:8088/proxy/application_1579903014542_0111/
20/01/30 22:43:42 INFO mapreduce.Job: Running job: job_1579903014542_0111
20/01/30 22:43:49 INFO mapreduce.Job: Job job_1579903014542_0111 running in uber mode : false
20/01/30 22:43:49 INFO map

In [488]:
# part b - check your UNSMOOTHED model results (RUN THIS CELL AS IS)
!grep assistance NaiveBayes/Unsmoothed/NBmodel.txt
# EXPECTED OUTPUT: assistance	2,4,0.000172547666293,0.000296823983378

assistance	2,4,0.0001725476662928134,0.00029682398337785694


In [489]:
# part b - check your UNSMOOTHED model results (RUN THIS CELL AS IS)
!grep money NaiveBayes/Unsmoothed/NBmodel.txt
# EXPECTED OUTPUT: money	1,22,8.62738331464e-05,0.00163253190858

money	1,22,8.62738331464067e-05,0.001632531908578213


__Training__ (Enron MNB Model _with Laplace +1 smoothing_ )

In [469]:
# part b -  Smoothed model (FILL IN THE MISSING CODE BELOW)

# clear the output directory
!hdfs dfs -rm -r {HDFS_DIR}/smooth-model

# hadoop command
!hadoop jar {JAR_FILE} \
  -D stream.num.map.output.key.fields=2 \
  -D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator \
  -D mapreduce.partition.keypartitioner.options="-k1,1" \
  -D mapreduce.partition.keycomparator.options="-k2,2 " \
  -files NaiveBayes/train_mapper.py,NaiveBayes/train_reducer_smooth.py \
  -mapper train_mapper.py \
  -reducer train_reducer_smooth.py \
  -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
  -input {HDFS_DIR}/enron_train.txt \
  -output {HDFS_DIR}/smooth-model \
  -cmdenv PATH={PATH} \
  -numReduceTasks 2

# apply POST PROCESS and save the model locally
!mkdir NaiveBayes/Smoothed
!hdfs dfs -cat {HDFS_DIR}/smooth-model/part-000* | sort | NaiveBayes/smooth_postprocess.py> NaiveBayes/Smoothed/NBmodel.txt

Deleted /user/root/HW2/smooth-model
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.6.0-cdh5.16.2.jar] /tmp/streamjob5053958865615182082.jar tmpDir=null
20/01/31 00:47:14 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/31 00:47:14 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/31 00:47:15 INFO mapred.FileInputFormat: Total input paths to process : 1
20/01/31 00:47:15 INFO mapreduce.JobSubmitter: number of splits:2
20/01/31 00:47:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1579903014542_0120
20/01/31 00:47:15 INFO impl.YarnClientImpl: Submitted application application_1579903014542_0120
20/01/31 00:47:15 INFO mapreduce.Job: The url to track the job: http://docker.w261:8088/proxy/application_1579903014542_0120/
20/01/31 00:47:15 INFO mapreduce.Job: Running job: job_1579903014542_0120
20/01/31 00:47:23 INFO mapreduce.Job: Job job_1579903014542_0120 running in uber mode : false
20/01/31 00:47:23 INFO ma

In [470]:
# part b - check your SMOOTHED model results (RUN THIS CELL AS IS)
!grep assistance NaiveBayes/Smoothed/NBmodel.txt
# EXPECTED OUTPUT: assistance	2,4,0.000185804533631,0.000277300205202

assistance	2,4,0.0001858045336306206,0.00027730020520215184


In [471]:
# part b - check your SMOOTHED model results (RUN THIS CELL AS IS)
!grep money NaiveBayes/Smoothed/NBmodel.txt
# EXPECTED OUTPUT: money	1,22,0.000123869689087,0.00127558094393

money	1,22,0.0001238696890870804,0.0012755809439298986


__Evaluation__

In [474]:
# part c - write your code in NaiveBayes/evaluation_reducer.py then RUN THIS
!chmod a+x NaiveBayes/evaluation_reducer.py

In [544]:
# part c - unit test your evaluation job on the chinese model (RUN THIS CELL AS IS)
!cat NaiveBayes/chineseTest.txt | NaiveBayes/classify_mapper.py 
!cat NaiveBayes/chineseTest.txt | NaiveBayes/classify_mapper.py | NaiveBayes/evaluation_reducer.py

d5	1	6.134092622766479	7.532326167942548	1
d6	1	3.0081547935545476	3.6041382256608454	1
d7	0	3.819085009771877	6.93634273583425	1
d8	0	1.6218604324346575	4.990432586777937	1
d5	1	6.134092622766479	7.532326167942548	 True
d6	1	3.0081547935545476	3.6041382256608454	 True
d7	0	3.819085009771877	6.93634273583425	 False
d8	0	1.6218604324346575	4.990432586777937	 False
# Documents: 	4.0
True Positives:	2.0
True Negatives:	0.0
False Positives:	2.0
False Negatives:	0.0
Accuracy:	0.5
Precision:	0.5
Recall:	1.0
F-Score:	0.6666666666666666


In [545]:
# part c - Evaluate the UNSMOOTHED Model Here (FILL IN THE MISSING CODE)

# clear the output directory
!hdfs dfs -rm -r {HDFS_DIR}/unsmooth-results

# hadoop job
!hadoop jar {JAR_FILE} \
  -files NaiveBayes/Unsmoothed/NBmodel.txt,NaiveBayes/classify_mapper.py,NaiveBayes/evaluation_reducer.py \
  -mapper classify_mapper.py \
  -reducer evaluation_reducer.py \
  -input {HDFS_DIR}/enron_test.txt \
  -output {HDFS_DIR}/unsmooth-results \
  -cmdenv PATH={PATH} \

# retrieve results locally
!hdfs dfs -cat {HDFS_DIR}/unsmooth-results/part-000* > NaiveBayes/Unsmoothed/results.txt
!cat NaiveBayes/Unsmoothed/results.txt | column -t

Deleted /user/root/HW2/unsmooth-results
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.6.0-cdh5.16.2.jar] /tmp/streamjob1169574045150028489.jar tmpDir=null
20/01/31 21:37:17 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/31 21:37:17 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/31 21:37:17 INFO mapred.FileInputFormat: Total input paths to process : 1
20/01/31 21:37:18 INFO mapreduce.JobSubmitter: number of splits:2
20/01/31 21:37:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1579903014542_0147
20/01/31 21:37:18 INFO impl.YarnClientImpl: Submitted application application_1579903014542_0147
20/01/31 21:37:18 INFO mapreduce.Job: The url to track the job: http://docker.w261:8088/proxy/application_1579903014542_0147/
20/01/31 21:37:18 INFO mapreduce.Job: Running job: job_1579903014542_0147
20/01/31 21:37:25 INFO mapreduce.Job: Job job_1579903014542_0147 running in uber mode : false
20/01/31 21:37:25 INF

In [546]:
# part c - Evaluate the SMOOTHED Model Here (FILL IN THE MISSING CODE)

# clear the output directory
!hdfs dfs -rm -r {HDFS_DIR}/smooth-results

# hadoop job
!hadoop jar {JAR_FILE} \
  -files NaiveBayes/Smoothed/NBmodel.txt,NaiveBayes/classify_mapper.py,NaiveBayes/evaluation_reducer.py \
  -mapper classify_mapper.py \
  -reducer evaluation_reducer.py \
  -input {HDFS_DIR}/enron_test.txt \
  -output {HDFS_DIR}/smooth-results \
  -cmdenv PATH={PATH} \

# retrieve results locally
!hdfs dfs -cat {HDFS_DIR}/smooth-results/part-000* > NaiveBayes/Smoothed/results.txt
!cat NaiveBayes/Smoothed/results.txt | column -t

Deleted /user/root/HW2/smooth-results
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.6.0-cdh5.16.2.jar] /tmp/streamjob4751150189798274221.jar tmpDir=null
20/01/31 21:38:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/31 21:38:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/01/31 21:38:04 INFO mapred.FileInputFormat: Total input paths to process : 1
20/01/31 21:38:04 INFO mapreduce.JobSubmitter: number of splits:2
20/01/31 21:38:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1579903014542_0148
20/01/31 21:38:04 INFO impl.YarnClientImpl: Submitted application application_1579903014542_0148
20/01/31 21:38:04 INFO mapreduce.Job: The url to track the job: http://docker.w261:8088/proxy/application_1579903014542_0148/
20/01/31 21:38:04 INFO mapreduce.Job: Running job: job_1579903014542_0148
20/01/31 21:38:12 INFO mapreduce.Job: Job job_1579903014542_0148 running in uber mode : false
20/01/31 21:38:12 INFO 

In [547]:
# part c - display results 
# NOTE: feel free to modify the tail commands to match the format of your results file
print('=========== UNSMOOTHED MODEL ============')
!tail -n 9 NaiveBayes/Unsmoothed/results.txt
print('=========== SMOOTHED MODEL ============')
!tail -n 9 NaiveBayes/Smoothed/results.txt

# Documents: 	20.0
True Positives:	2.0
True Negatives:	0.0
False Positives:	9.0
False Negatives:	9.0
Accuracy:	0.1
Precision:	0.18181818181818182
Recall:	0.18181818181818182
F-Score:	0.18181818181818182
# Documents: 	20.0
True Positives:	11.0
True Negatives:	6.0
False Positives:	3.0
False Negatives:	0.0
Accuracy:	0.85
Precision:	0.7857142857142857
Recall:	1.0
F-Score:	0.88
