Permalink
Browse files

revert accidental part4 change

  • Loading branch information...
1 parent f4edf52 commit f26e3724732da280d94633e10fd7600fef283135 @dvryaboy dvryaboy committed Sep 2, 2012
Showing with 4 additions and 1 deletion.
  1. +4 −1 part4/src/scripts/wc.pig
View
5 part4/src/scripts/wc.pig
@@ -10,8 +10,11 @@ stopPipe = FILTER stopPipe BY stop != 'stop';
tokenPipe = FOREACH docPipe GENERATE doc_id, FLATTEN(TOKENIZE(LOWER(text), ' [](),.')) AS token;
tokenPipe = FILTER tokenPipe BY token MATCHES '\\w.*';
--- perform a left join to remove stop words
+--- perform a left join to remove stop words, discarding the rows
+--- which joined with stop words, i.e., were non-null after left join
tokenPipe = JOIN tokenPipe BY token LEFT, stopPipe BY stop using 'replicated';
+tokenPipe = FILTER tokenPipe BY stopPipe::stop is NULL;
+
-- DUMP tokenPipe;
-- determine the word counts

0 comments on commit f26e372

Please sign in to comment.