Skip to content

neo4jrb/kvitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neo4j.rb example application

This application shows how to use neo4j together with rails 3.1 by parsing Twitter feeds and creating a social graph. It first shows some basic operations which you are probably used to from the ActiveRecord/ActiveModel API. In the last section we take real advantage of the neo4j engine by implementing a recommendation algorithm for finding new twitter users.

Have fun and feel free to clone it.

Installation

Make sure you have Java JDK 1.6+ installed

Install latest JRuby, example

rvm use jruby-1.6.7

Install Rails (>= 3.1.1)

gem install rails

1. Create a rails project

Use my rails template which will disable active record and enable neo4j instead

rails new kvitter -m http://andreasronge.github.com/neo4j/rails.rb

Edit the Gemfile and add the twitter gem

cd kvitter
emacs Gemfile # and add
gem 'twitter', '1.7.2'

Download all the dependencies

bundle

2. Scaffold

Run the following commands:

rails generate scaffold Tweet text:string link:string date:datetime tweet_id:string --indices tweet_id date text --has_n tags mentions links --has_one tweeted_by:tweeted
rails generate scaffold User twid:string link:string --indices twid --has_n tweeted follows knows used_tags mentioned_from:mentions
rails generate scaffold Link url:string --indices url --has_n tweets:links short_urls:redirected_link --has_one redirected_link
rails generate scaffold Tag name:string --indices name --has_n tweets:tags used_by_users:used_tags

There is nothing magical happening here like neo4j configuration or migrations. It only creates plain new Ruby classes (models, controllers and views) and routing. The relationships and properties are only specified in the model classes.

3. Start Rails

Test the basic crud operations

rails s

Open browser: localhost:3000/tags

4 Search using Twitter API

Add the following:

app/controllers/tag_controller.rb:

The following code does a twitter search and parses the result. It creates and connects the Tweet, Link, User and Tag model classes.

def search
  @tag = Tag.find(params[:id])

  search = Twitter::Search.new
  result = search.hashtag(@tag.name)

  curr_page = 0
  while curr_page < 2 do
    result.each do |item|
      parsed_tweet_hash = Tweet.parse(item)
      next if Tweet.find_by_tweet_id(parsed_tweet_hash[:tweet_id])
      tweet = Tweet.create!(parsed_tweet_hash)

      twid = item['from_user'].downcase
      user = User.find_or_create_by(:twid => twid)
      user.tweeted << tweet
      user.save

      parse_tweet(tweet, user)
    end
    result.fetch_next_page
    curr_page += 1
  end

  redirect_to @tag
end

def parse_tweet(tweet, user)
  tweet.text.gsub(/(@\w+|https?:\/\/[a-zA-Z0-9\-\.~\:\?#\[\]\!\@\$&,\*+=;,\/]+|#\w+)/).each do |t|
    case t
      when /^@.+/
        t = t[1..-1].downcase
        next if t.nil?
        other = User.find_or_create_by(:twid => t)
        user.knows << other unless t == user.twid || user.knows.include?(other)
        user.save
        tweet.mentions << other
      when /#.+/
        t = t[1..-1].downcase
        tag = Tag.find_or_create_by(:name => t)
        tweet.tags << tag unless tweet.tags.include?(tag)
        user.used_tags << tag unless user.used_tags.include?(tag)
        user.save
      when /https?:.+/
        link = Link.find_or_create_by(:url => t)
        tweet.links << (link.redirected_link || link)
    end
  end
  tweet.save!
end

app/models/tweet.rb:

Change index on text to fulltext lucene:

property :text, :type => String, :index => :fulltext

Add a to_s and parse method and change the index type of text to fulltext (we need that later on, see below).

def to_s
  text.gsub(/(@\w+|https?\S+|#\w+)/,"").strip
end

def self.parse(item)
  {:tweet_id => item['id_str'],
   :text => item['text'],
   :date => Time.parse(item['created_at']),
   :link => "http://twitter.com/#{item['from_user']}/statuses/#{item['id_str']}"
  }
end

Notice : in Neo4j it is not necessary to specify the types of properties. By setting :type => String we force that each Tweet object will store the property ‘type’ as strings.

config/routes.rb:

change

resource :tags

to

resources :tags do
  get :search, :on => :member
end

app/views/tags/show.html.erb

Add a button to the view:

<%= button_to "Search", [:search, @tag], :method => :get %>

Test the application now by opening a browser localhost:3000/tags Create a new tag and press the button ‘search’ You will now found tweets, users and links

Follow URL shortenings

When you looked at all the links, most of them are short urls like t.co Since we are more interested in the real link and who has tweeted about them we must follow the URL by doing a HTTP head request.

We use a before save callback to create a the real link which correspond to where the short url is directed to. The short link and the real link are connected in a redirected_link relationship.

app/models/link.rb

class Link < Neo4j::Rails::Model
  property :url, :type => String, :index => :exact
  has_n(:tweets).from(:links)
  has_n(:short_urls).from(:redirected_link)
  has_one :redirected_link

  # Add the following:
  before_save :create_redirect_link

  SHORT_URLS = %w[t.co bit.ly ow.ly goo.gl tiny.cc tinyurl.com doiop.com readthisurl.com memurl.com tr.im cli.gs short.ie kl.am idek.net short.ie is.gd hex.io asterl.in j.mp].to_set

  def to_s
    url
  end

  private

  def self.short_url?(url)
    domain = url.split('/')[2]
    domain && SHORT_URLS.include?(domain)
  end

  def create_redirect_link
    return if !self.class.short_url?(url)
    uri = URI.parse(url)
    http = Net::HTTP.new(uri.host, uri.port)
    http.read_timeout = 200
    req = Net::HTTP::Head.new(uri.request_uri)
    res = http.request(req)
    redirect = res['location']
    if redirect && url != redirect
      self.redirected_link = Link.find_or_create_by(:url => redirect.strip)
    end
  rescue Timeout::Error
    puts "Can't acccess #{url}"
  rescue Error
    puts "Can't call #{url}"
  rescue Net::HTTPBadResponse
    puts "Bad response for #{url}"
  end

In order to show both outgoing redirected_link and incoming redirected_link (by using short_urls method) add the following:

<p>
  <b>Short Urls:</b>
  <% @link.short_urls.each do |link| %>
    <%= link_to link, link %> <br/>
  <% end %>
</p>

<% if @link.redirected_link %>
<p>
  <b>Redirects to</b>
  <%= link_to @link.redirected_link, @link.redirected_link %>
</p>
<% end %>

Don’t display URL shortening

The index page shows all the links, including links like bit.ly and the the real one. To only return the real URLs we can use rules, which is a bit similar to scope in ActiveRecord.

app/models/links.rb

rule(:real) { redirected_link.nil?}

This means that it will group all links under the rule :real which does not have a redirected_link To return all those nodes, use the class method #real. Notice you can do some really interesting queries using rules and cypher together, see “Rules-Cypher”:github.com/andreasronge/neo4j/wiki/Neo4j%3A%3AWrapper-Rules-and-Functions

Btw, the Neo4j::Rails::Model#all method is also implemented as a rule. You can also chain rules, just like scopes in Active Record.

def index
   @links = Link.real

   respond_to do |format|
     format.html # index.html.erb
     format.json { render :json => @links }
   end
 end

Since the rules do work by creating relationships when new nodes are created/updated/deleted we must do another search or stop the rails server and delete the database ‘db/neo4j-developement’

Pagination

The list of all tweets (localhost:3000/tweets ) does not look good. It needs some pagination. Add the two gem in Gemfile:

gem 'neo4j-will_paginate', :git => 'git://github.com/andreasronge/neo4j-will_paginate.git'
gem 'will_paginate'

app/controllers/tweet_controller.rb

Neo4j.rb comes included with the will_paginate gem. Change the index method

@tweets = Tweet.all

to

@tweets = Tweet.all.paginate(:page => params[:page], :per_page => 10)

Pagination is support for all traversals and lucene queries.

app/views/tweets/index.html.erb

Add the following line before the table:

<%= will_paginate(@tweets) %>

Searching and Sorting

Lets say we want to sort the tweets by the text. Lucene has two types of indexes: exact and fulltext. The exact index is perfect for keywords while the fulltext is for longer texts. We have already changed the index of text to fulltext for the app/models/tweets.rb See lucene.apache.org/java/3_0_0/queryparsersyntax.html for the query syntax.

app/views/tweets/index.html.erb

Add the following form before the table

<%= form_for(:tweets, :method => :get) do |f| %>
<div class="field">
  <%= text_field_tag :query %>
    <%= f.submit "Search" %>
</div>
<% end %>

app/controllers/tweets_controller.rb

In the index method we now should handle the query parameter. Add the following before the respond_to.

def index
  query = params[:query]
    if query.present?
      @tweets = Tweet.all("text:#{query}", :type => :fulltext).paginate(:page => params[:page], :per_page => 10)
    else
      @tweets = Tweet.all.paginate(:page => params[:page], :per_page => 10)
    end

  # respond_to ...
end

Test it !

Add SVG Visualization !

Would it not be cool to visualize who knows who ?

Download the D3 Javascript library

Download the D3 version 2.4.6 javascript library from github.com/mbostock/d3/archives/master.

Unzip it and move the following files to the app/assets/javascript folder

  • d3.geom.js

  • d3.js

  • d3.layout.js

app/controllers/users_controller.rb

Create a JSON API needed by the javascript

def index
  @users = User.all

  respond_to do |format|
    format.html # index.html.erb
    format.json do
      nodes = @users.map{|u| {:name => u.twid, :value => u.tweeted.size}}
      links = []
      @users.each do |user|
        links += user.knows.map {|other| { :source => nodes.find_index{|n| n[:name] == user.twid}, :target => nodes.find_index{|n| n[:name] == other.twid}}}
      end
      render :json => {:nodes => nodes, :links => links}
    end
  end
end

Test the API, open a browser localhost:3000/users.json it should return something like this:

{"nodes":[{"name":"geoaxis","value":1},{"name":"peterneubauer","value":1},{"name":"dolugen","value":1},{"name":"neo4j","value":8},{"name":"emileifrem","value":1},{"name":"linnbar_consult","value":1},{"name":"skillsmatter","value":0},{"name":"swgoof","value":1},{"name":"bytor99999","value":0},{"name":"teropaananen","value":1},{"name":"tobyorourke","value":1},{"name":"sannegrinovero","value":1},{"name":"emmanuelbernard","value":0},{"name":"jimwebber","value":0},{"name":"ianrobinson","value":0},{"name":"mesirii","value":1},{"name":"kings13y","value":0},{"name":"chitrasen","value":1},{"name":"einarhreindal","value":1},{"name":"smunchang","value":1},{"name":"technige","value":1},{"name":"josh_adell","value":1},{"name":"mmuekk","value":1},{"name":"asbkar","value":1},{"name":"noppanit","value":1},{"name":"roddare","value":1},{"name":"tanyaespe","value":1},{"name":"erikusaj","value":1},{"name":"druidjaidan","value":1}],"links":[{"source":0,"target":1},{"source":2,"target":3},{"source":2,"target":4},{"source":3,"target":7},{"source":3,"target":4},{"source":3,"target":8},{"source":3,"target":9},{"source":3,"target":10},{"source":3,"target":11},{"source":3,"target":12},{"source":3,"target":13},{"source":3,"target":14},{"source":3,"target":1},{"source":3,"target":16},{"source":4,"target":9},{"source":5,"target":6},{"source":7,"target":4},{"source":9,"target":4},{"source":10,"target":4},{"source":11,"target":12},{"source":11,"target":13},{"source":11,"target":14},{"source":15,"target":9},{"source":15,"target":4},{"source":17,"target":9},{"source":17,"target":4},{"source":18,"target":6},{"source":19,"target":4},{"source":19,"target":3},{"source":20,"target":21},{"source":22,"target":6},{"source":23,"target":6},{"source":24,"target":1},{"source":25,"target":1},{"source":26,"target":4},{"source":26,"target":3},{"source":27,"target":6},{"source":28,"target":6}]}

app/assets/javascripts/users.js

Add the following coffeescript (be careful with the indentation):

# Place all the behaviors and hooks related to the matching controller here.
# All this logic will automatically be available in application.js.
# You can use CoffeeScript in this file: http://jashkenas.github.com/coffee-script/

$(document).ready ->
  w = 1260
  h = 1300
  fill = d3.scale.category20()

  vis = d3.select("#graph").append("svg:svg").attr("width", w).attr("height", h)

  d3.json("/users.json", (json) ->
    force = d3.layout.force()
      .charge(-320)
      .linkDistance(160)
      .nodes(json.nodes)
      .links(json.links)
      .size([w, h])
      .start()

    link = vis.selectAll("line.link")
      .data(json.links)
      .enter().append("svg:line")
      .attr("class", "link")
      .style("stroke-width", (d) -> Math.sqrt(d.value))
      .attr("x1", (d) -> d.source.x)
      .attr("y1", (d) -> d.source.y)
      .attr("x2", (d) -> d.target.x)
      .attr("y2", (d) -> d.target.y)

    node = vis.selectAll("g.node")
      .data(json.nodes)
      .enter().append("svg:g")
      .attr("transform", (d) -> "translate(" + d.x + "," + d.y + ")")
      .attr("class", "node")
      .call(force.drag)

    node.append("svg:circle")
      .attr("r", (d) -> if d.value > 25 then 50 else d.value*2 + 5)
      .style("fill", (d) -> '#fea')

    node.append("svg:title").text((d) -> d.name)
    node.append("svg:text")
      .attr("text-anchor", "middle")
      .attr("dy", ".3em")
      .text((d) -> d.name)

    vis.style("opacity", 1e-6)
      .transition()
      .duration(0)
      .style("opacity", 1)

    force.on("tick", ->
      link.attr("x1", (d) -> d.source.x).attr("y1", (d) -> d.source.y).attr("x2", (d) -> d.target.x).attr("y2", (d) -> d.target.y)
      node.attr("transform", (d) -> "translate(" + d.x + "," + d.y + ")")
    )
  )

app/views/users/index.html.erb

Add the following at the bottom of the file

<div id='graph'> </div>

app/assets/javascripts/application.js

Make sure things are loading in the correct order.

//= require jquery
//= require jquery_ujs
//= require d3
//= require d3.geom
//= require d3.layout
//= require users

app/assets/stylesheets/application.css

circle.node {
    stroke: #fff;
    stroke-width: 1.5px;
}

line.link {
    stroke: #999;
    stroke-opacity: .6;
}

Test it

Open a browser localhost:3000/users and scroll down

Better Navigation for Users

app/views/users/show.erb

Add the following before the link_to lines

<p>
  <b>Used tags</b>
  <% @user.used_tags.each do |tag| %>
    <%= link_to tag.name, tag %> <br/>
  <% end %>
</p>

<p>
  <b>Knows:</b><br/>
  <% @knows.each do |user| %>
    <%= link_to user, user %> <br/>
  <% end %>
  <%= will_paginate(@knows) %>
</p>

<p>
  <b>Mentioned from:</b><br/>
  <% @mentioned_from.each do |tweet| %>
    <%= link_to tweet, tweet %> <br/>
  <% end %>
</p>

<p>
  <b>Tweets</b><br/>
  <% @user.tweeted.each do |tweet| %>
    <%= link_to tweet, tweet %> <br/>
  <% end %>
</p>

app/controllers/users_controller.rb

Change the show function to:

def show
  @user = User.find(params[:id])
  @knows = @user.knows.paginate(:page => params[:page], :per_page => 10)

  @mentioned_from = @user.mentioned_from

  respond_to do |format|
    format.html # show.html.erb
    format.json { render :json => @user }
  end
end

Notice that we do pagination of a traversal (knows).

Better Navigation for Tweets

app/views/tweets/show.html.rb

Show the relationships from and to a tweet:

<p>
  <b>Text:</b>
  <%= @tweet.text %>
</p>

<p>
  <b>Link:</b>
  <%= link_to @tweet.link, @tweet.link  %>
</p>

<p>
  <b>Date:</b>
  <%= @tweet.date %>
</p>

<p>
  <b>Tweet:</b>
  <%= @tweet.tweet_id %>
</p>

<p>
  <b>Tweeted by</b>
  <%= link_to @tweet.tweeted_by, @tweet.tweeted_by %>
</p>

<p>
  <b>Tags:</b><br/>
  <% @tweet.tags.each do |tag| %>
    <%= link_to tag.name, tag %> <br/>
  <% end %>
</p>

<p>
  <b>Links:</b><br/>
  <% @tweet.links.each do |link| %>
    <%= link_to link, link %> <br/>
  <% end %>
</p>

Better Navigation for Tags

app/views/tags/show.html.erb

<p>

<b>Tweets:</b><br/>
<% @tweets.each do |tweet| %>
  <%= link_to tweet, tweet %> <br/>
<% end %>
<%= will_paginate(@tweets) %>

</p>

app/controller/

Add the following line in the show method after the Tag.find line

@tweets = @tag.tweets.paginate(:page => params[:page], :per_page => 10)

Recommendations

app/controllers/user_controllers.rb

The following algorithm works like this:

1. Get all the users who are also using my tags.
2. For each of those users get their used tags and compare with mine.
3. Select the user who has the most similar tags to mine.

Add the following at the bottom of the file user_controllers.rb

private

def recommend(user)
  my_tags = user.used_tags.to_a
  my_friends = user.knows.to_a

  # we are here using the raw java API - that's why using _java_node, raw and wrapper
  other_users =  user._java_node.outgoing(:used_tags).incoming(:used_tags).raw.depth(2).filter{|path| path.length == 2 && !my_friends.include?(path.end_node)}

  # for all those users, find the person who has the max number of same tags as I have
  found = other_users.max_by{|friend| (friend.outgoing(:used_tags).raw.map{|tag| tag[:name]} & my_tags).size }

  found && found.wrapper # load the ruby wrapper around the neo4j java node
end

app/controllers/user_controllers.rb

Add one line to the show method:

def show
  @user = User.find(params[:id])
  @recommend = recommend(@user)

  respond_to do |format|
    format.html # show.html.erb
    format.json { render :json => @user }
  end
end

app/views/users/show.html.erb

Display a recommendation if available

<% if @recommend %>
<p>
  <b>Recommend</b>
  <%= link_to @recommend.twid, @recommend %>
</p>
<% end %>

About

Example application using neo4j.rb

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published