 Computes the PageRank of URLs from an input file. Input file should
 be in format of:
 URL         neighbor URL
 URL         neighbor URL
 URL         neighbor URL
 ...
 where URL and their neighbors are separated by space(s).

 This is an example implementation for learning how to use Spark. For more conventional use,
 please refer to org.apache.spark.graphx.lib.PageRank
 @class


In [1]:

var filename =  "../data/pagerank_data.txt";
var iters =  10;


In [2]:
    var conf = new SparkConf().setAppName("JavaScript Page Rank");
    var sc = new SparkContext(conf);


 Loads in input file. It should be in format of:
     URL         neighbor URL
     URL         neighbor URL
     URL         neighbor URL
     ...


In [3]:
    var lines = sc.textFile(filename, 1);


 Loads all URLs from input file and initialize their neighbors.


In [4]:
    var links = lines.mapToPair(function (s) {
        print(" s " + s)
        var parts = s.split(/\s+/);
        return new Tuple(parts[0], parts[1]);
    }).distinct().groupByKey().cache();


 Loads all URLs with other URL(s) link to from input file and initialize ranks of them to one.


In [5]:
    var ranks = links.mapValues(function () {
        return 1.0;
    });


 Calculates and updates URL ranks continuously using PageRank algorithm.


In [6]:
    for (var current = 0; current < iters; current++) {
        // Calculates URL contributions to the rank of other URLs.
        var contribs = links.join(ranks).values()
            .flatMapToPair(function (tuple) {
                var t = tuple[0];
                var urlCount = t.length;
                var results = new List();
                for (var n = 0; n < urlCount; n++) {
                    results.add(new Tuple(t[n], tuple[1] / urlCount));
                }
                return results;
            });

        // Re-calculates URL ranks based on neighbor contributions.
        ranks = contribs.reduceByKey(function (a, b) {
            return a + b;
        }).mapValues(function (sum) {
            return 0.15 + sum * 0.85;
        });
    }


 Collects all URL ranks and dump them to console.


In [7]:
    var output = ranks.collect();
    var result = "";
    for (var i = 0; i < output.length; i++) {
        result += output[i][0] + " has rank: " + output[i][1] + ".\n";
    }
    return result;
    print(result);

    sc.stop();
