Skip to content

Conversation

@rjurney
Copy link
Contributor

@rjurney rjurney commented Dec 24, 2013

register ../../lib/stanford-postagger-withModel.jar
register ../../target/varaha-1.0-SNAPSHOT.jar

reviews = LOAD 'data/ten.avro' USING AvroStorage;
foo = FOREACH reviews GENERATE business_id, varaha.text.StanfordTokenize(text) AS tagged;
DUMP foo

(41J1FgfIsmsLRCZ3QILG6w,{(truly),(impressive),(facility),(came),(for),(two),(books),(not),(knowing),(this),(location),(-LRB-),(normally),(Appaloosa),(-RRB-),(The),(staff),(was),(very),(helpful),(and),(found),(what),(wanted),(very),(quickly),(was),(there),(minutes),(tops),(would),(highly),(recommend),(this),(Library),(anyone),(interested),('ll),(coming),(back),(very),(soon),(for),(next),(batch)})
(4YX4ZtUqs6xtcc4AdjbpeQ,{(Other),(circle),(are),(much),(cleaner),(than),(this),(one),(The),(best),(thing),(about),(this),(store),(the),(Employees),(are),(friendly),(and),(nice),('ve),(been),(this),(location),(the),(morning),(and),(the),(evening),(and),(there),(must),(point),(where),(the),(shift),(changes),(and),(they),(stop),(cleaning),(the),(bathrooms),(and),(emptying),(the),(trash),(the),(morning),(everything),(clean),(the),(time),(evening),(rolls),(around),(there),(are),(odd),(smells),(all),(over),(the),(store),(shame),(since),(larger),(newer),(looking),(store),(that),(n't),(cleaner),('ll),(back),(hopes),(they),(clean),(little),(more)})
(5kRug3bEienrpovtPRVVwg,{(Went),(with),(husband),(Richardson),(Rokerij),(for),(the),(first),(time),(raved),(about),(this),(place),(went),(Wednesday),(night),(with),(reservation),(The),(wait),(was),(about),(hour),(Luckily),(there),(were),(bar),(seats),(that),(became),(available),(took),(them),(ordered),(the),(cheese),(flatbread),(appetizer),(and),(was),(delicious),(had),(large),(salad),(for),(dinner),(which),(was),(perfect),(was),(not),(very),(hungry),(husband),(had),(the),(chicken),(enchiladas),(that),(tasted),(and),(were),(very),(good),(The),(food),(cooked),(order),(did),(take),(while),(get),(our),(meal),(but),(was),(worth),(the),(wait),(and),(service),(was),(excellent),(While),(waiting),(chatted),(with),(several),(people),(the),(bar),(and),(one),(couple),(offered),(taste),(their),(appetizer),(returned),(the),(favor),(when),(flatbread),(came),(One),(more),(thing),(not),(leave),(without),(getting),(the),(decadent),(truffle),(dessert),(Heavenly),(but),(not),(over),(done),(any),(way),(All),(all),(great),(experience),(recommend),(reservations)})

reviews = LOAD 'data/ten.avro' USING AvroStorage();
reviews = LIMIT reviews 1000;
bar = FOREACH reviews GENERATE business_id, FLATTEN(varaha.text.SentenceTokenize(text)) AS tokenized_sentences;
bar = FOREACH bar GENERATE business_id, varaha.text.StanfordPOSTagger(tokenized_sentences) AS tagged;
DUMP bar

(6VRbbNQe5ouWmwsMebUMkg,{(My,PRP$),(friend,NN),(added,VBD),(some,DT),(sugar,NN),(to,TO),(it,PRP),(and,CC),(it,PRP),(turned,VBD),(okay/good,NN),(.,.)})
(6VRbbNQe5ouWmwsMebUMkg,{(Entrees,NNS),(average,VBP),(about,IN),($,$),(10,CD),(-,:),($,$),(13,CD),(.,.)})
(6VRbbNQe5ouWmwsMebUMkg,{(Naan,NN),(ranges,NNS),(from,IN),(about,IN),($,$),(1.50,CD),(-,:),($,$),(3,CD),(.,.)})
(6VRbbNQe5ouWmwsMebUMkg,{(Appetizers,NNS),(during,IN),(happy,JJ),(hour,NN),(range,NN),(from,IN),($,$),(3,CD),(-,:),($,$),(8,CD),(+,CC),(.,.)})
(6VRbbNQe5ouWmwsMebUMkg,{(Add,VB),(in,IN),(alcohol,NN),(and,CC),(you,PRP),('re,VBP),(looking,VBG),(at,IN),(a,DT),(not,RB),(inexpensive,JJ),(meal,NN),(but,CC),(definitely,RB),(good,JJ),(quality,NN),(.,.)})
(6oRAC4uyJCsJl1X0WZpVSA,{(love,VB),(the,DT),(gyro,NN),(plate,NN),(.,.)})
(6oRAC4uyJCsJl1X0WZpVSA,{(Rice,NNP),(is,VBZ),(so,RB),(good,JJ),(and,CC),(I,PRP),(also,RB),(dig,VBP),(their,PRP$),(candy,NN),(selection,NN),(:,:),(-RRB-,-RRB-)})

reviews = LOAD 'data/ten.avro' USING AvroStorage();
reviews = LIMIT reviews 1000;
bar = FOREACH reviews GENERATE business_id, varaha.text.StanfordPOSTagger(varaha.text.StanfordTokenize(text)) AS tokens;
DUMP bar

(-UnYs8XvV1M983xZoREdng,{(have,VB),(say,VB),(loved,NN),(Vino,NNP),(First,NNP),(off,RB),(very,RB),(unpretentious,JJ),(not,RB),(very,RB),(knowledgeable,JJ),(about,IN),(wine,NN),(tend,VBP),(shy,JJ),(away,RB),(from,IN),(places,NNS),(that,WDT),(have,VBP),(attitude,NN),(also,RB),(had,VBD),(one,CD),(the,DT),(1000,CD),(outstanding,JJ),(Groupons,NNS),(about,IN),(expire,VBP),(And,CC),(spite,NN),(the,DT),(fact,NN),(that,IN),(just,RB),(about,IN),(everyone,NN),(coming,VBG),(that,IN),(evening,NN),(had,VBD),(Groupon,NNP),(the,DT),(staff,NN),(was,VBD),(fantastic,JJ),(they,PRP),(not,RB),(have,VBP),(kitchen,NN),(all,DT),(appetizers,NNS),(are,VBP),(cold,JJ),(but,CC),(had,VBD),(nice,JJ),(cheese,NN),(plate,NN),(which,WDT),(included,VBD),(cheeses,NNS),(olives,NNS),(nuts,NNS),(grapes,NNS),(and,CC),(dried,VBD),(fruit,NN),(only,RB),(complaint,NN),(was,VBD),(that,IN),(the,DT),(lahvosh-like,JJ),(crackers,NNS),(were,VBD),(really,RB),(oily,JJ),(and,CC),(not,RB),(good,JJ),(all,DT),(Lose,VB),(those,DT),(and,CC),(would,MD),(have,VB),(been,VBN),(much,RB),(better,RBR),(for,IN),(the,DT),(wine,NN),(was,VBD),(actually,RB),(better,JJR),(than,IN),(expected,VBN),(Although,IN),(n't,RB),(generally,RB),(care,VB),(for,IN),(really,RB),(sweet,JJ),(wines,NNS),(both,CC),(the,DT),(Summer,NN),(Rain,NN),(and,CC),(Peachy,JJ),(Keen,JJ),(were,VBD),(really,RB),(enjoyable,JJ),(just,RB),(think,VB),(them,PRP),(more,RBR),(crisp,JJ),(summer,NN),(beverage,NN),(than,IN),(wine,NN),(was,VBD),(surprised,VBN),(like,IN),(the,DT),(Pinot,NNP),(Grigio,NNP),(much,RB),(did,VBD),(and,CC),(may,MD),(have,VB),(purchased,VBN),(bottle,NN),(but,CC),(was,VBD),(not,RB),(available,JJ),(that,IN),(evening,NN),(The,DT),(Miscela,NNP),(Italian,NNP),(blend,VB),(was,VBD),(miss,VB),(for,IN),(-LRB-,-LRB-),(too,RB),(acidic,JJ),(for,IN),(taste,NN),(-RRB-,-RRB-),(but,CC),(the,DT),(Malbec,NNP),(was,VBD),(better,JJR),(For,IN),(after,IN),(dinner,NN),(wines,NNS),(the,DT),(Grande,NNP),(Finale,NNP),(was,VBD),(over-the-top,JJ),(sweet,JJ),(would,MD),(probably,RB),(not,RB),(drink,VB),(more,JJR),(than,IN),(tasting,NN),(The,DT),(Porto,NNP),(Cocoa,NNP),(however,RB),(was,VBD),(fantastic,JJ),(generally,RB),(stay,VB),(away,RB),(from,IN),(Port,NNP),(because,IN),(dislike,NN),(the,DT),(brandy,NN),(burn,VBP),(But,CC),(one,CD),(whiff,NN),(this,DT),(and,CC),(was,VBD),(hooked,VBN),(before,IN),(tasted,VBN),(While,IN),(not,RB),(like,IN),(terribly,RB),(sweet,JJ),(you,PRP),(definitely,RB),(get,VBP),(the,DT),(essence,NN),(chocolate,NN),(bought,VBD),(bottle,NN),(take,VB),(home,NN),(fact,NN),(but,CC),(only,RB),(saw,VBD),(one,CD),(wee,NN),(little,JJ),(glass,NN),(husband,NN),(apparently,RB),(mistook,VBD),(for,IN),(Yoo-hoo,NN),(and,CC),(drank,VBD),(the,DT),(rest,NN),(Great,JJ),(place,NN),(begin,VB),(your,PRP$),(evening,NN),(And,CC),(because,IN),(many,JJ),(these,DT),(young,JJ),(wines,NNS),(are,VBP),(sweeter,JJR),(even,RB),(non-wine-drinking,JJ),(husband,NN),(enjoyed,VBN)})

@alienrobotwizard
Copy link
Owner

@rjurney Would you mind squashing these commits so I can look at a single diff?

@rjurney
Copy link
Contributor Author

rjurney commented Dec 29, 2013

Yeah, I can do that. I think you can also do that in the interface?

On Sunday, December 29, 2013, Jacob wrote:

@rjurney https://github.com/rjurney Would you mind squashing these
commits so I can look at a single diff?


Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-31319298
.

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

@alienrobotwizard
Copy link
Owner

Sorry for taking so long to get to this. Overall it looks good. Can we put the udfs that rely strictly on the stanford nlp package in their own namespace? varaha.text is getting a little crowded.

@rjurney
Copy link
Contributor Author

rjurney commented Jan 15, 2014

Yeah, I'll do that.

On Tuesday, January 14, 2014, Jacob wrote:

Sorry for taking so long to get to this. Overall it looks good. Can we put
the udfs that rely strictly on the stanford nlp package in their own
namespace? varaha.text is getting a little crowded.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-32324389
.

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants