Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse code

imports initial code

  • Loading branch information...
commit 7ef62857726cfeac3f21da92aa5e39f099b6e3bc 1 parent d6c4ee0
authored May 19, 2012
2  .gitignore
... ...
@@ -0,0 +1,2 @@
  1
+export/*
  2
+json/*
46  README.md
Source Rendered
... ...
@@ -1,4 +1,44 @@
1  
-edgeryders-mapper
2  
-=================
  1
+Dragon Trainer - Edgeryders network extraction scripts
  2
+======================================================
  3
+
  4
+In this repository you'll find the code used to extract a network from the raw data from the Edgeryders site. 
  5
+
  6
+
  7
+The experiment code is driven by the ```edgeryders_export.rb``` file and uses the classes defined in the ```edgeryders_dataset.rb``` and ```dataset/*.rb``` files. Running the edgeryders_export.rb file you'll obtain the network exported PAJEK format (.net files) whcih is usable also in other network analysis tools (e.g. Tulip or Gephi).
  8
+
  9
+Running the script
  10
+------------------
  11
+
  12
+#### Prerequisites
  13
+
  14
+To run the script you need to prepare your computer installing the Ruby interpreter, see here for details on how to do it for your OS: http://www.ruby-lang.org/
  15
+
  16
+#### Downloading the code
  17
+
  18
+Download the code from github and save it in a directory on your computer
  19
+
  20
+#### Preparing the data
  21
+
  22
+You need to obtain the data from the Edgeryders site and you need to put the json files obtained from the Edgeryders site in the ```json``` directory. 
  23
+
  24
+The files should be named as follows:
  25
+
  26
+* _nodes.js_ for the file containing the dump of the Drupal nodes objects
  27
+* _comments.js_ for the file containing the comment objects
  28
+* _users.js_ for the file containing the user objects
  29
+
  30
+#### Extracting the network
  31
+
  32
+From the command line:
  33
+
  34
+* cd into the edgeryders-mapper directory
  35
+* run the command ```ruby -rubygems edgeryders_export.rb```
  36
+
  37
+While running the script will log to the screen traces and evetually errors or warnings it finds.
  38
+
  39
+After the script has run you'll find in the ```export``` directory a new sub-directory named YYYYMMDD-HHmm after the current date and time which will contain the two .net files extracted:
  40
+
  41
+* _edgeryders-ANON.net_ contains the network without any direct user identifying information
  42
+* _edgeryders-NAMES.net_ contains the network whcih uses the user's name to identify the nodes
  43
+
3 44
 
4  
-Edgeryders network extraction scripts
33  dataset/artifact.rb
... ...
@@ -0,0 +1,33 @@
  1
+require 'dataset/element'
  2
+
  3
+# An artifact of the site produced by a member
  4
+# it is connected to many other artifacts 
  5
+# through the connection to other artifcats
  6
+# the author and other artifacts authors collaborate
  7
+class Artifact < Element
  8
+  attr_accessor :code, :author, :children, :timestamp
  9
+  
  10
+  def initialize(code, author, raw_data)
  11
+    @code = code
  12
+    @author = author
  13
+    @children = Array.new
  14
+    @data = raw_data
  15
+    ts = @data["timestamp"]||@data["created"]
  16
+    @timestamp = Time.at( ts.to_i ) if ts
  17
+  end
  18
+  
  19
+  def pretty(depth=0)
  20
+    puts %{#{" "*depth}- #{@code}}
  21
+    @children.each do |c|
  22
+      c.pretty(depth+1)
  23
+    end
  24
+  end
  25
+  
  26
+  def name
  27
+    code
  28
+  end
  29
+  
  30
+  def dump_data
  31
+    @data.to_json
  32
+  end
  33
+end
50  dataset/element.rb
... ...
@@ -0,0 +1,50 @@
  1
+# base element class
  2
+class Element
  3
+
  4
+  # THIS SHOULD BE OVERRIDDEN IF USING THE VALIDATIONS
  5
+  def legal_attributes
  6
+    [ ] 
  7
+  end
  8
+
  9
+  def method_missing name, *args
  10
+    @data[name.to_s]
  11
+  end  
  12
+
  13
+  def existed_at? date_as_string
  14
+    self.createdDate[0..date_as_string.size-1] < date_as_string
  15
+  end
  16
+
  17
+private
  18
+  def default_values! hash, attribs
  19
+    attribs.keys.each do |k|
  20
+      default_value! hash, k, attribs[k]
  21
+    end
  22
+  end
  23
+
  24
+  def default_value! hash, key, def_val
  25
+    hash[key] = def_val if not hash.include? key
  26
+  end
  27
+
  28
+  def verify_well_formedness! hash
  29
+    verify_all_keys_are_legal! hash
  30
+    verify_all_attributes_are_present! hash
  31
+  end
  32
+
  33
+  def verify_all_keys_are_legal! hash
  34
+    hash.keys.each do |k|
  35
+      if not legal_attributes.include? k
  36
+        raise "#{self.class.name} PARSE ERROR: Wrong key >>#{k}<< in data: #{hash.inspect}"
  37
+      end
  38
+    end    
  39
+  end
  40
+
  41
+  def verify_all_attributes_are_present! hash
  42
+    legal_attributes.each do |att|
  43
+      if not hash.keys.include? att
  44
+        raise "#{self.class.name} PARSE ERROR: Missing key >>#{att}<< in data: #{hash.inspect}"
  45
+      end
  46
+    end
  47
+  end
  48
+
  49
+
  50
+end
15  dataset/member.rb
... ...
@@ -0,0 +1,15 @@
  1
+require 'dataset/element'
  2
+
  3
+# the member is a person producing artifacts on a site
  4
+class Member < Element
  5
+  attr_accessor :code
  6
+
  7
+  def initialize(raw_data)
  8
+    @data = raw_data
  9
+    @code = "#{raw_data["uid"]}"
  10
+  end
  11
+  
  12
+  def to_s
  13
+    "#{@code}"
  14
+  end
  15
+end
16  dataset/relationship.rb
... ...
@@ -0,0 +1,16 @@
  1
+class Relationship
  2
+  attr_accessor :a, :b
  3
+  
  4
+  def initialize( from, to )
  5
+    @a = from
  6
+    @b = to
  7
+  end
  8
+  
  9
+  def signature
  10
+    %{#{a.to_s}_#{b.to_s}}
  11
+  end
  12
+  
  13
+  def to_s
  14
+    %{#{a.to_s} --> #{b.to_s}}
  15
+  end
  16
+end
10  dataset/site.rb
... ...
@@ -0,0 +1,10 @@
  1
+# a site where members interact on artifacts
  2
+# it has a list of artifacts and a list of members
  3
+class Site 
  4
+  attr_accessor :artifacts, :members
  5
+  
  6
+  def initialize
  7
+    @artifacts = Array.new
  8
+    @members = Hash.new
  9
+  end
  10
+end
10  dataset/timestamped_relationship.rb
... ...
@@ -0,0 +1,10 @@
  1
+require 'dataset/relationship'
  2
+
  3
+class TimestampedRelationship < Relationship
  4
+  attr_accessor :timestamp
  5
+  
  6
+  def initialize( from, to, timestamp )
  7
+    super(from, to)
  8
+    @timestamp = timestamp
  9
+  end
  10
+end
20  dataset/weighted_network.rb
... ...
@@ -0,0 +1,20 @@
  1
+require 'dataset/weighted_relationship'
  2
+
  3
+class WeightedNetwork
  4
+  
  5
+  def initialize
  6
+    @relations_map = Hash.new
  7
+  end
  8
+  
  9
+  def <<(relationship)
  10
+    if rel = @relations_map[relationship.signature]
  11
+      rel.weight += 1
  12
+    else 
  13
+       @relations_map[relationship.signature] = WeightedRelationship.new relationship.a, relationship.b
  14
+    end
  15
+  end
  16
+  
  17
+  def relationships 
  18
+    @relations_map.values
  19
+  end
  20
+end
11  dataset/weighted_relationship.rb
... ...
@@ -0,0 +1,11 @@
  1
+require 'dataset/relationship'
  2
+
  3
+class WeightedRelationship < Relationship
  4
+  attr_accessor :weight
  5
+  
  6
+  def initialize( from, to, weight=1 )
  7
+    super(from, to)
  8
+    @weight = weight||1
  9
+  end
  10
+  
  11
+end
220  edgeryders_dataset.rb
... ...
@@ -0,0 +1,220 @@
  1
+require 'rubygems'
  2
+require 'json'
  3
+
  4
+require 'dataset/element'
  5
+require 'dataset/site'
  6
+require 'dataset/artifact'
  7
+require 'dataset/member'
  8
+require 'dataset/relationship'
  9
+require 'dataset/timestamped_relationship'
  10
+require 'dataset/weighted_relationship'
  11
+require 'dataset/weighted_network'
  12
+
  13
+class EdgerydersDataset
  14
+  
  15
+  attr_accessor :site, :timed_relationships, :artifacts_map, :weighted_network
  16
+  
  17
+  def initialize args
  18
+    @site = Site.new
  19
+
  20
+    jusers = JSON.parse args[:json_users]
  21
+    jusers["users"].each do |m|
  22
+      member = Member.new(m["user"])
  23
+      @site.members[member.code] = member
  24
+    end
  25
+
  26
+    @artifacts_map = Hash.new
  27
+    @other_nodes_map = Hash.new #we use this for debugging purposes
  28
+    
  29
+    jnodes = JSON.parse args[:json_nodes]
  30
+    jnodes["nodes"].each do |n|
  31
+      if n["node"]["type"] == 'mission_case'
  32
+        artifact = Artifact.new( "mission_case.#{n["node"]["nid"]}", @site.members[n["node"]["uid"]], n["node"] )
  33
+        @site.artifacts << artifact
  34
+        @artifacts_map[artifact.code] = artifact
  35
+      else
  36
+        @other_nodes_map[n["node"]["nid"]] = n["node"]
  37
+      end
  38
+     end
  39
+     
  40
+    jcomments = JSON.parse args[:json_comments]
  41
+    jcomments["comments"].each do |c|
  42
+      comment = Artifact.new( "comment.#{c["comment"]["cid"]}", @site.members[c["comment"]["uid"]], c["comment"] )
  43
+      @artifacts_map[comment.code] = comment
  44
+    end
  45
+    
  46
+    # processing threaded comments and building the artifacts tree
  47
+    # we do this in a second step to prevent problems from the json file
  48
+    # possibly not being ordered
  49
+    @errors = {:without_pid=>[], :parent_not_a_mission_case=>[], :without_nid=>[]}
  50
+    @artifacts_map.each do |code, artifact|
  51
+      if artifact.pid
  52
+         # this is a threaded comment
  53
+         parent = @artifacts_map["comment.#{artifact.pid}"]
  54
+         @errors[:without_pid] << artifact unless parent
  55
+      elsif artifact.cid
  56
+         # this is an comment on a mission case
  57
+         parent = @artifacts_map["mission_case.#{artifact.nid}"]
  58
+         unless parent
  59
+           if other_node = @other_nodes_map[artifact.nid]
  60
+             @errors[:parent_not_a_mission_case] << [artifact, other_node]
  61
+           else
  62
+             @errors[:without_nid] << artifact
  63
+           end
  64
+         end
  65
+        
  66
+      end
  67
+      
  68
+      parent.children << artifact if parent     
  69
+    end
  70
+    
  71
+    puts "The following comments had a pid defined but no parent comment was found:\n"
  72
+    @errors[:without_pid].each do |artifact|
  73
+      puts "    #{artifact.dump_data}"
  74
+    end
  75
+    puts "======\n"
  76
+    
  77
+    puts "The following comments had a nid defined that was not a mission_case (the node found is shown in ()):\n"
  78
+    @errors[:parent_not_a_mission_case].each do |artifact, other_node|
  79
+      puts "    #{artifact.dump_data} (#{other_node.inspect})"
  80
+    end      
  81
+    puts "======\n"
  82
+    
  83
+    puts "The following comments had a nid defined but no node with that nid was found:\n"
  84
+    @errors[:without_nid].each do |artifact|
  85
+      puts "    #{artifact.dump_data}"
  86
+    end
  87
+    puts "======\n"
  88
+    
  89
+  end
  90
+  
  91
+  # builds the list of timed relationships formed
  92
+  # by connecting two members A and B if A commented
  93
+  # on a post or cmment from B
  94
+  def build_timed_relationships!
  95
+    @timed_relationships = Array.new
  96
+    @site.artifacts.each do |artifact|
  97
+      @timed_relationships += build_timed_relationships_for(artifact)
  98
+    end
  99
+  end
  100
+  
  101
+  # recursively builds the relationships to the 
  102
+  # artifact author from the artifact children authors
  103
+  # recurses on the children
  104
+  def build_timed_relationships_for(artifact)
  105
+    rels = Array.new
  106
+    
  107
+    if artifact.author.nil?
  108
+      puts "Error reading artifact #{artifact.code}: author not defined [[ #{ artifact.dump_data } ]]"
  109
+      return rels
  110
+    end
  111
+    
  112
+    artifact.children.each do |child|
  113
+      if child.author.nil?
  114
+        puts "Error reading artifact #{child.code} (child of: #{artifact.code}) author not defined [[ #{ child.dump_data } ]]"
  115
+      else
  116
+        puts "Warning reading artifact #{child.code}: same author of the parent #{artifact.code} [[ #{ artifact.author.code } ]]" if child.author.to_s == artifact.author.to_s
  117
+        
  118
+        rels << TimestampedRelationship.new(child.author, artifact.author, child.timestamp)
  119
+        rels += build_timed_relationships_for(child)
  120
+      end
  121
+    end
  122
+    rels 
  123
+  end
  124
+  
  125
+  def build_member_to_member_thread_network!(options={})
  126
+    puts "\nBuilding the member to member network based on threaded comments\n"
  127
+    build_timed_relationships!
  128
+    @weighted_network = WeightedNetwork.new
  129
+    @timed_relationships.each do |rel|
  130
+      @weighted_network << rel if allowed_relationship?( rel, options )
  131
+    end
  132
+  end
  133
+
  134
+  # builds the list of timed relationships formed
  135
+  # by connecting members A and post B if A commented
  136
+  # on B or on a comment on B
  137
+  def build_member_post_relationships!
  138
+    @member_post_relationships = Array.new
  139
+    @site.artifacts.each do |artifact|
  140
+      @member_post_relationships += build_timed_member_post_relationships_for(artifact, artifact)
  141
+    end
  142
+  end
  143
+  
  144
+  # recursively builds the relationships to the 
  145
+  # root_artifact from the artifact author and children authors
  146
+  # recurses on the children
  147
+  def build_timed_member_post_relationships_for(root_artifact, artifact)
  148
+    rels = Array.new
  149
+    
  150
+    if artifact.author.nil?
  151
+      puts "Error reading artifact #{artifact.code}: author not defined [[ #{ artifact.dump_data } ]]"
  152
+      return rels
  153
+    else
  154
+      rels << TimestampedRelationship.new(artifact.author, root_artifact, artifact.timestamp)      
  155
+    end
  156
+    
  157
+    artifact.children.each do |child|
  158
+      rels += build_timed_member_post_relationships_for(root_artifact, child)
  159
+    end
  160
+    rels 
  161
+  end
  162
+
  163
+  def build_member_to_post_network!(options={})
  164
+    puts "\nBuilding the member to post network\n"
  165
+    build_member_post_relationships!
  166
+    @weighted_network = WeightedNetwork.new
  167
+    @member_post_relationships.each do |rel|
  168
+      @weighted_network << rel if allowed_relationship?( rel, options )
  169
+    end
  170
+  end
  171
+
  172
+  def allowed_relationship?( rel, options={} )
  173
+    
  174
+    excluded_users = ['0'] 
  175
+    excluded_users += (options[:excluded_users]||[])
  176
+    
  177
+    ( !rel.a.is_a?(Member) || !excluded_users.include?(rel.a.code) ) && ( !rel.b.is_a?(Member) || !excluded_users.include?(rel.b.code) )
  178
+  end
  179
+  
  180
+  def export_pajek( filename, options )
  181
+    write_file filename, convert_to_pajek(@weighted_network.relationships, options)
  182
+
  183
+    puts
  184
+    puts "EXPORT PAJEK WITH OPTIONS #{options.inspect} DONE"
  185
+    puts
  186
+  end
  187
+  
  188
+
  189
+  def convert_to_pajek( relationships, options={} )
  190
+    member_node_field = options[:member_node_field]||:code
  191
+    exclude_isolated = options[:exclude_isolated]||false
  192
+    
  193
+    if exclude_isolated
  194
+      contributors = relationships.map{|r| [r.a.send(member_node_field), r.b.send(member_node_field)]}.flatten.uniq 
  195
+    else
  196
+      contributors = @site.members.values.map{|m| m.send(member_node_field) }
  197
+    end
  198
+
  199
+    pajek = "*Vertices #{contributors.size}" +"\r\n"
  200
+
  201
+    contributors.each_with_index do |c,i|
  202
+      pajek << %{#{i+1} "#{c}"}+"\r\n"
  203
+    end
  204
+
  205
+    pajek << "*Edges"+"\r\n"
  206
+
  207
+    relationships.each do |r| 
  208
+      a = contributors.index(r.a.send(member_node_field))+1
  209
+      b = contributors.index(r.b.send(member_node_field))+1
  210
+      pajek << %{#{a} #{b} #{r.weight}}+"\r\n"
  211
+    end
  212
+
  213
+    return pajek
  214
+  end
  215
+
  216
+  def write_file filename, content
  217
+    File.open(filename, 'w') {|f| f.write content }
  218
+  end
  219
+
  220
+end
42  edgeryders_export.rb
... ...
@@ -0,0 +1,42 @@
  1
+require 'edgeryders_dataset'
  2
+require 'fileutils'
  3
+
  4
+ts = Time.new.strftime("%Y%m%d-%H%M")
  5
+EXCLUDED_USERS = ['229', '624', '353', '595', '426', '462', '185', '592'] # these are spambots or other blocked users                               
  6
+
  7
+puts "------------------------"
  8
+puts "Loading and parsing"
  9
+
  10
+dataset = EdgerydersDataset.new :json_users => File.read('json/users.json'), 
  11
+                                :json_nodes => File.read('json/nodes.json'), 
  12
+                                :json_comments => File.read('json/comments.json')
  13
+                                 
  14
+puts "------------------------"
  15
+
  16
+dataset.build_member_to_member_thread_network!(:excluded_users=>EXCLUDED_USERS)
  17
+
  18
+puts ""
  19
+puts "Members count: #{dataset.site.members.size}"
  20
+puts "Connected members count: #{dataset.weighted_network.relationships.map{|r| [r.a, r.b]}.flatten.uniq.size}"
  21
+puts "Edges count: #{dataset.weighted_network.relationships.size}"
  22
+puts ""
  23
+puts "Exporting ..."
  24
+
  25
+FileUtils.mkdir_p "export/#{ts}"
  26
+dataset.export_pajek "export/#{ts}/edgeryders-ANON.net", :member_node_field=>:code, :exclude_isolated=>false
  27
+dataset.export_pajek "export/#{ts}/edgeryders-NAMES.net", :member_node_field=>:name, :exclude_isolated=>false
  28
+
  29
+puts "------------------------"
  30
+
  31
+dataset.build_member_to_post_network!(:excluded_users=>EXCLUDED_USERS)
  32
+
  33
+puts ""
  34
+puts "Members count: #{dataset.site.members.size}"
  35
+puts "Posts count: #{dataset.site.artifacts.size}"
  36
+puts "Edges count: #{dataset.weighted_network.relationships.size}"
  37
+puts ""
  38
+puts "Exporting ..."
  39
+
  40
+FileUtils.mkdir_p "export/#{ts}"
  41
+dataset.export_pajek "export/#{ts}/edgeryders-members-to-post-ANON.net", :member_node_field=>:code, :exclude_isolated=>true
  42
+dataset.export_pajek "export/#{ts}/edgeryders-members-to-post-NAMES.net", :member_node_field=>:name, :exclude_isolated=>true

0 notes on commit 7ef6285

Please sign in to comment.
Something went wrong with that request. Please try again.