-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Naive Bayes Agent #1899
Naive Bayes Agent #1899
Conversation
Hey @nogre, welcome! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a really good start to a useful Agent. What are you using it for?
# - generic to work with all types of tokens, not just text | ||
|
||
|
||
module NBayes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a copy of the code from the gem? It might make more sense to do this Agent as it's own gem using huginn_agent and require the nbayes from it's gemspec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend adding rspec coverage for your agent too, whether or not it's added to Huginn directly or pulled into it's own gem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to change 1 line in the NBayes gem to make the saving to Agent memory work easily. So it might be possible to work around this. There is a comment in the code towards the bottom of the NBayes module if you want to take a look (line 345).
As I stated above, I'm new to Ruby, so I'm a bit unsure about how all this works. Same goes for rspec. But I will give it a shot.
I'm building a sort of google-news-esque site that aggregates blog posts called The Practical Ontologist. So I use a lot of RSS agents and then format the output. Then take all the output and build the site with Jekyll. more on how it works What I want to do is be able to have different sections instead of a single page, as it is right now. So, if I can figure out how to get things working/ trained well, this agent will classify and tag different posts. Then I'll have sub-pages that focus on specific topics. |
if arg.instance_of? String | ||
File.open(arg, "w") {|f| YAML.dump(self, f) } | ||
else | ||
YAML.dump(self) # XXX only line modified so far for Huginn Agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What did this line say before you changed it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YAML.dump(arg)
The only change is from 'arg' to 'self'. 'arg' must have only been used as a testing function for the YAML output, merely YAML encoding the passed argument and hence has no affect on the rest of the code. 'self' returns the training data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth sending a pull request to the NBayes gem to make this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked and this function can just be avoided without much hassle. Instead of calling nbays.dump
the code can just say YAML.dump(nbayes)
. So I'll make this change and then there won't be a need to modify NBayes.
It might be worth it to still make the pull request just to add the useful functionality to NBayes.
if arg.instance_of? String | ||
File.open(arg, "w") {|f| YAML.dump(self, f) } | ||
else | ||
YAML.dump(self) # XXX only line modified so far for Huginn Agent NO LONGER NEEDED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this code from your Agent and add the gem to the Gemfile instead, if it's no longer needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on making this agent into a gem and just requiring NBayes. Just been a little busy and it is taking me time to figure this stuff out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've (finally) built a working agent gem. Currently full features require manual installation of the nbayes dependency from Github, but basic Bayesian classification works regardless. I raised an issue for them to update their rubygems files to their latest nbayes version.
The rspec still needs to get done, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Building the gem is the hard part. Do you have questions about how to get rspec running?
exists as gem: https://github.com/nogre/huginn_naive_bayes_agent |
Nice! |
Naive Bayes machine learning agent: Provides an agent front end to the NBayes gem: "NBayes is a full-featured, Ruby implementation of
Naive Bayes
". The agent takes events from some sources as training data and then classifies other events based on that data. A field in the event payload is analyzed and another is populated with the categories. Then the event is emitted. Training data is stored in the agent's memory.As I don't know Ruby, everything looks like it is working, but that is just from a few of my preliminary tests working, so any and all feedback is more than welcome.