Skip to content

Commit

Permalink
adds graph cluster detection
Browse files Browse the repository at this point in the history
Change-Id: I83ec95a69a8f01f682ddc8f75fc0e8f0e66ee5ab
Reviewed-on: https://gerrit.instructure.com/45455
Reviewed-by: Coraline Ehmke <coraline@instructure.com>
Product-Review: Michael Ziwisky <mziwisky@instructure.com>
QA-Review: Michael Ziwisky <mziwisky@instructure.com>
Tested-by: Michael Ziwisky <mziwisky@instructure.com>
  • Loading branch information
mziwisky committed Dec 15, 2014
1 parent 8da9310 commit 386bc8a
Show file tree
Hide file tree
Showing 13 changed files with 1,406 additions and 33 deletions.
674 changes: 674 additions & 0 deletions COPYING.txt

Large diffs are not rendered by default.

33 changes: 13 additions & 20 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,22 +1,15 @@
Copyright (c) 2014 Coraline Ada Ehmke / Instructure.inc

MIT License

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program in the file COPYING.txt. If not, see
<http://www.gnu.org/licenses/>
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,14 @@ and need to pull in updates from the
[society-assets](https://github.com/CoralineAda/society-assets) package, you
can do so on the command line with `$ bower update`.

## Recognition

The graph clustering algorithm used in this software is called MCL, described
in:

* Stijn van Dongen, _Graph Clustering by Flow Simulation_, PhD thesis,
University of Utrecht, May 2000. [micans.org/mcl](http://micans.org/mcl)

## Contributing

Please note that this project is released with a [Contributor Code of Conduct]
Expand Down
5 changes: 3 additions & 2 deletions lib/society.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
require_relative "society/association_processor"
require_relative "society/reference_processor"
require_relative "society/edge"
require_relative "society/clusterer"
require_relative "society/formatter/graph/json"
require_relative "society/formatter/report/html"
require_relative "society/formatter/report/json"
Expand All @@ -14,8 +15,8 @@

module Society

def self.new(path_to_files)
Society::Parser.for_files(path_to_files)
def self.new(*paths_to_files)
Society::Parser.for_files(*paths_to_files)
end

end
Expand Down
188 changes: 188 additions & 0 deletions lib/society/clusterer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# This file has been translated from the `minimcl` perl script, which was
# sourced from the MCL-edge library, release 14-137. The homepage for MCL-edge
# is http://micans.org/mcl
#
# The original copyright for `minimcl` is as follows:
#
# (C) Copyright 2006, 2007, 2008, 2009 Stijn van Dongen
#
# This file is part of MCL. You can redistribute and/or modify MCL under the
# terms of the GNU General Public License; either version 3 of the License or
# (at your option) any later version. You should have received a copy of the
# GPL along with MCL, in the file COPYING.

module Society
class Clusterer

def initialize(params={})
@params = DEFAULT_PARAMS.merge(params)
end

# returns an array of arrays of nodes
def cluster(graph)
m = matrix_from(graph)
clusters = mcl(m)
clusters.map { |index, members| members.keys }
.sort_by(&:size).reverse
end

private

attr_reader :params

DEFAULT_PARAMS = {
inflation: 2.0
}

# TODO: "weights" are ignored right now, but soon will be an attribute of
# the edge
def matrix_from(graph)
matrix = SparseMatrix.new
graph.edges.each do |edge|
a = edge.from
b = edge.to
matrix[a][b] = 1
matrix[b][a] = 1
end
matrix
end

def mcl(matrix)
matrix_add_loops(matrix)
matrix_make_stochastic(matrix)
chaos = 1.0
while (chaos > 0.001) do
sq = matrix_square(matrix)
chaos = matrix_inflate(sq)
matrix = sq
end
matrix_interpret(matrix)
end

def matrix_square(matrix)
squared = SparseMatrix.new
matrix.each do |node, vector|
squared[node] = matrix_multiply_vector(matrix, vector)
end
squared
end

def matrix_multiply_vector(matrix, vector)
result_vec = SparseVector.new
vector.each do |entry, val|
matrix[entry].each do |f, matrix_val|
result_vec[f] += val * matrix_val
end
end
result_vec
end

def matrix_make_stochastic(matrix)
matrix_inflate(matrix, 1)
end

def matrix_add_loops(matrix)
matrix.each do |key,_|
matrix[key][key] = 1.0
matrix[key][key] = vector_max(matrix[key])
end
end

def vector_max(vector)
vector.values.max || 0.0
end

def vector_sum(vector)
vector.values.reduce(&:+) || 0.0
end

# prunes small elements as well
def matrix_inflate(matrix, inflation = params[:inflation])
chaos = 0.0
matrix.each do |node, vector|
sum = 0.0
sumsq = 0.0
max = 0.0
vector.each do |node2, value|
vector.delete node2 and next if value < 0.00001

inflated = value ** inflation
vector[node2] = inflated
sum += inflated
end
if sum > 0.0
vector.each do |node2, value|
vector[node2] /= sum
sumsq += vector[node2] ** 2
max = [vector[node2], max].max
end
end
chaos = [max - sumsq, chaos].max
end
chaos # only meaningful if input is stochastic
end

# assumes but does not check doubly idempotent matrix.
# can handle attractor systems of size < 10.
# recognizes/preserves overlap.
def matrix_interpret(matrix)
clusters = SparseMatrix.new
attrid = {}
clid = 0

# crude removal of small elements
matrix.each do |n, vec|
vec.each do |nb, val|
matrix[n].delete nb if val < 0.1
end
end

attr = {}
matrix.each_key do |key|
attr[key] = 1 if matrix[key].key? key
end

attr.each_key do |a|
next if attrid.key?(a)
aa = [a]
while aa.size > 0 do
bb = []
aa.each do |aaa|
attrid[aaa] = clid
matrix[aaa].each_key { |akey| bb.push(akey) if attr.key? akey }
end
aa = bb.select { |b| !attrid.key?(b) }
end
clid += 1
end

matrix.each do |n, val|
if !attr.key?(n)
val.keys.select { |x| attr.key? x }.each do |a|
clusters[attrid[a]][n] += 1
end
else
clusters[attrid[n]][n] += 1
end
end

clusters
end


module SparseMatrix
def self.new
Hash.new do |hash, key|
hash[key] = SparseVector.new
end
end
end

module SparseVector
def self.new
Hash.new(0.0)
end
end
end
end

18 changes: 12 additions & 6 deletions lib/society/formatter/graph/json.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@ module Graph
class JSON

def initialize(graph)
@nodes = graph.nodes
@edges = graph.edges
@graph = graph
end

def to_json
Expand All @@ -22,20 +21,27 @@ def to_hash
from: node_names.index(edge.from),
to: node_names.index(edge.to)
}
end
end,
clusters: clusters_of_indices
}
end

private

attr_reader :nodes, :edges
attr_reader :graph

def node_names
@node_names ||= nodes.map(&:full_name).uniq
@node_names ||= graph.nodes.map(&:full_name).uniq
end

def named_edges
@named_edges ||= edges.map { |edge| Edge.new(from: edge.from.full_name, to: edge.to.full_name) }
@named_edges ||= graph.edges.map { |edge| Edge.new(from: edge.from.full_name, to: edge.to.full_name) }
end

def clusters_of_indices
Society::Clusterer.new.cluster(graph).map do |cluster|
cluster.map { |node| graph.nodes.index(node) }
end
end

end
Expand Down
2 changes: 1 addition & 1 deletion lib/society/object_graph.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ def initialize(nodes: nodes=[], edges: edges=[])

end

end
end
4 changes: 2 additions & 2 deletions lib/society/parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ module Society

class Parser

def self.for_files(file_path)
new(::Analyst.for_files(file_path))
def self.for_files(*file_paths)
new(::Analyst.for_files(*file_paths))
end

def self.for_source(source)
Expand Down
37 changes: 37 additions & 0 deletions spec/clusterer_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
require 'spec_helper'
require_relative 'fixtures/clustering/clusterer_fixtures.rb'

describe Society::Clusterer do

describe "#cluster" do
let(:clusterer) { Society::Clusterer.new }

it "detects clusters" do
clusters = clusterer.cluster(MCL::GRAPH_1).map(&:sort)
expected = MCL::CLUSTERS_1.map(&:sort)
expect(clusters).to match_array(expected)
end

context "with inflation parameter set to 1.7" do
let(:clusterer) { Society::Clusterer.new(inflation: 1.7) }

it "detects clusters at coarser granularity" do
clusters = clusterer.cluster(MCL::GRAPH_1).map(&:sort)
expected = MCL::CLUSTERS_1_I17.map(&:sort)
expect(clusters).to match_array(expected)
end
end

context "with inflation parameter set to 4.0" do
let(:clusterer) { Society::Clusterer.new(inflation: 4.0) }

it "detects clusters at finer granularity" do
clusters = clusterer.cluster(MCL::GRAPH_1).map(&:sort)
expected = MCL::CLUSTERS_1_I40.map(&:sort)
expect(clusters).to match_array(expected)
end
end
end

end

Loading

0 comments on commit 386bc8a

Please sign in to comment.