Browse files

Create gh-pages branch via GitHub

  • Loading branch information...
0 parents commit 92e0ad9d5d62d124890e37cb55af5dc52c50a0f4 @blatyo committed Jun 30, 2012
BIN images/arrow-down.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
BIN images/octocat-small.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
334 index.html
@@ -0,0 +1,334 @@
+<!doctype html>
+<html>
+ <head>
+ <meta charset="utf-8">
+ <meta http-equiv="X-UA-Compatible" content="chrome=1">
+ <title>Page rankr by blatyo</title>
+
+ <link rel="stylesheet" href="stylesheets/styles.css">
+ <link rel="stylesheet" href="stylesheets/pygment_trac.css">
+ <script src="javascripts/scale.fix.js"></script>
+ <meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no">
+ <!--[if lt IE 9]>
+ <script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
+ <![endif]-->
+ </head>
+ <body>
+ <div class="wrapper">
+ <header>
+ <h1 class="header">Page rankr</h1>
+ <p class="header">Easy way to retrieve Google Page Rank, Alexa Rank, index counts, and backlink counts</p>
+
+ <ul>
+ <li class="download"><a class="buttons" href="https://github.com/blatyo/page_rankr/zipball/master">Download ZIP</a></li>
+ <li class="download"><a class="buttons" href="https://github.com/blatyo/page_rankr/tarball/master">Download TAR</a></li>
+ <li><a class="buttons github" href="https://github.com/blatyo/page_rankr">View On GitHub</a></li>
+ </ul>
+
+ <p class="header">This project is maintained by <a class="header name" href="https://github.com/blatyo">blatyo</a></p>
+
+
+ </header>
+ <section>
+ <p><a href="http://travis-ci.org/blatyo/page_rankr"><img src="http://travis-ci.org/blatyo/page_rankr.png" alt="Build Status"></a>
+Provides an easy way to retrieve Google Page Rank, Alexa Rank, backlink counts, and index counts.</p>
+
+<p>Check out a little <a href="http://isitpopular.heroku.com">web app</a> I wrote up that uses it or look at the <a href="https://github.com/blatyo/is_it_popular">source</a>.</p>
+
+<h2>Get it!</h2>
+
+<div class="highlight">
+<pre>gem install PageRankr
+</pre>
+</div>
+
+
+<h2>Use it!</h2>
+
+<div class="highlight">
+<pre><span class="nb">require</span> <span class="s1">'page_rankr'</span>
+</pre>
+</div>
+
+
+<h3>Backlinks</h3>
+
+<p>Backlinks are the result of doing a search with a query like "link:<a href="http://www.google.com">www.google.com</a>". The number of returned results indicates how many sites point to that url. If a site is not tracked then <code>nil</code> is returned.</p>
+
+<div class="highlight">
+<pre><span class="no">PageRankr</span><span class="o">.</span><span class="n">backlinks</span><span class="p">(</span><span class="s1">'www.google.com'</span><span class="p">,</span> <span class="ss">:google</span><span class="p">,</span> <span class="ss">:bing</span><span class="p">)</span> <span class="c1">#=&gt; {:google=&gt;161000, :bing=&gt;208000000}</span>
+<span class="no">PageRankr</span><span class="o">.</span><span class="n">backlinks</span><span class="p">(</span><span class="s1">'www.google.com'</span><span class="p">,</span> <span class="ss">:yahoo</span><span class="p">)</span> <span class="c1">#=&gt; {:yahoo=&gt;256300062}</span>
+</pre>
+</div>
+
+
+<p>If you don't specify a search engine, then all of them are used.</p>
+
+<div class="highlight">
+<pre><span class="c1"># this</span>
+<span class="no">PageRankr</span><span class="o">.</span><span class="n">backlinks</span><span class="p">(</span><span class="s1">'www.google.com'</span><span class="p">)</span>
+ <span class="c1">#=&gt; {:google=&gt;23000, :bing=&gt;215000000, :yahoo=&gt;250522337, :alexa=&gt;727036}</span>
+
+<span class="c1"># is equivalent to</span>
+<span class="no">PageRankr</span><span class="o">.</span><span class="n">backlinks</span><span class="p">(</span><span class="s1">'www.google.com'</span><span class="p">,</span> <span class="ss">:google</span><span class="p">,</span> <span class="ss">:bing</span><span class="p">,</span> <span class="ss">:yahoo</span><span class="p">,</span> <span class="ss">:alexa</span><span class="p">)</span>
+ <span class="c1">#=&gt; {:google=&gt;23000, :bing=&gt;215000000, :yahoo=&gt;250522337, :alexa=&gt;727036}</span>
+</pre>
+</div>
+
+
+<p>You can also use the alias <code>backlink</code> instead of <code>backlinks</code>.</p>
+
+<p>Valid search engines are: <code>:google, :bing, :yahoo, :alexa</code> (altavista and alltheweb now redirect to yahoo). To get this list you can do:</p>
+
+<div class="highlight">
+<pre><span class="no">PageRankr</span><span class="o">.</span><span class="n">backlink_trackers</span> <span class="c1">#=&gt; [:alexa, :bing, :google, :yahoo]</span>
+</pre>
+</div>
+
+
+<h3>Indexes</h3>
+
+<p>Indexes are the result of doing a search with a query like "site:<a href="http://www.google.com">www.google.com</a>". The number of returned results indicates how many pages of a domain are indexed by a particular search engine. If the site is not indexed <code>nil</code> is returned.</p>
+
+<div class="highlight">
+<pre><span class="no">PageRankr</span><span class="o">.</span><span class="n">indexes</span><span class="p">(</span><span class="s1">'www.google.com'</span><span class="p">,</span> <span class="ss">:google</span><span class="p">)</span> <span class="c1">#=&gt; {:google=&gt;4860000}</span>
+<span class="no">PageRankr</span><span class="o">.</span><span class="n">indexes</span><span class="p">(</span><span class="s1">'www.google.com'</span><span class="p">,</span> <span class="ss">:bing</span><span class="p">)</span> <span class="c1">#=&gt; {:bing=&gt;2120000}</span>
+</pre>
+</div>
+
+
+<p>If you don't specify a search engine, then all of them are used.</p>
+
+<div class="highlight">
+<pre><span class="c1"># this</span>
+<span class="no">PageRankr</span><span class="o">.</span><span class="n">indexes</span><span class="p">(</span><span class="s1">'www.google.com'</span><span class="p">)</span>
+ <span class="c1">#=&gt; {:bing=&gt;2120000, :google=&gt;4860000, :yahoo =&gt; 4863000}</span>
+
+<span class="c1"># is equivalent to</span>
+<span class="no">PageRankr</span><span class="o">.</span><span class="n">indexes</span><span class="p">(</span><span class="s1">'www.google.com'</span><span class="p">,</span> <span class="ss">:google</span><span class="p">,</span> <span class="ss">:bing</span><span class="p">,</span> <span class="ss">:yahoo</span><span class="p">)</span>
+ <span class="c1">#=&gt; {:bing=&gt;2120000, :google=&gt;4860000, :yahoo =&gt; 4863000}</span>
+</pre>
+</div>
+
+
+<p>You can also use the alias <code>index</code> instead of <code>indexes</code>.</p>
+
+<p>Valid search engines are: <code>:google, :bing, :yahoo</code>. To get this list you can do:</p>
+
+<div class="highlight">
+<pre><span class="no">PageRankr</span><span class="o">.</span><span class="n">index_trackers</span> <span class="c1">#=&gt; [:bing, :google, :yahoo]</span>
+</pre>
+</div>
+
+
+<h3>Ranks</h3>
+
+<p>Ranks are ratings assigned to specify how popular a site is. The most famous example of this is the google page rank.</p>
+
+<div class="highlight">
+<pre><span class="no">PageRankr</span><span class="o">.</span><span class="n">ranks</span><span class="p">(</span><span class="s1">'www.google.com'</span><span class="p">,</span> <span class="ss">:google</span><span class="p">)</span> <span class="c1">#=&gt; {:google=&gt;10}</span>
+</pre>
+</div>
+
+
+<p>If you don't specify a rank provider, then all of them are used.</p>
+
+<div class="highlight">
+<pre><span class="no">PageRankr</span><span class="o">.</span><span class="n">ranks</span><span class="p">(</span><span class="s1">'www.google.com'</span><span class="p">,</span> <span class="ss">:alexa_us</span><span class="p">,</span> <span class="ss">:alexa_global</span><span class="p">,</span> <span class="ss">:google</span><span class="p">)</span>
+ <span class="c1">#=&gt; {:alexa_us=&gt;1, :alexa_global=&gt;1, :google=&gt;10}</span>
+
+<span class="c1"># this also gives the same result</span>
+<span class="no">PageRankr</span><span class="o">.</span><span class="n">ranks</span><span class="p">(</span><span class="s1">'www.google.com'</span><span class="p">)</span>
+ <span class="c1">#=&gt; {:alexa_us=&gt;1, :alexa_global=&gt;1, :google=&gt;10}</span>
+</pre>
+</div>
+
+
+<p>You can also use the alias <code>rank</code> instead of <code>ranks</code>.</p>
+
+<p>Valid rank trackers are: <code>:alexa_us, :alexa_global, :google</code>. To get this you can do:</p>
+
+<div class="highlight">
+<pre><span class="no">PageRankr</span><span class="o">.</span><span class="n">rank_trackers</span> <span class="c1">#=&gt; [:alexa_global, :alexa_us, :google]</span>
+</pre>
+</div>
+
+
+<p>Alexa ranks are descending where 1 is the most popular. Google page ranks are in the range 0-10 where 10 is the most popular. If a site is unindexed then the rank will be nil.</p>
+
+<h2>Use it a la carte!</h2>
+
+<p>From versions &gt;= 3, everything should be usable in a much more a la carte manner. If all you care about is google page rank (which I speculate is common) you can get that all by itself:</p>
+
+<div class="highlight">
+<pre><span class="nb">require</span> <span class="s1">'page_rankr/ranks/google'</span>
+
+<span class="n">tracker</span> <span class="o">=</span> <span class="no">PageRankr</span><span class="o">::</span><span class="no">Ranks</span><span class="o">::</span><span class="no">Google</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="s2">"myawesomesite.com"</span><span class="p">)</span>
+<span class="n">tracker</span><span class="o">.</span><span class="n">run</span> <span class="c1">#=&gt; 2</span>
+</pre>
+</div>
+
+
+<p>Also, once a tracker has run three values will be accessible from it:</p>
+
+<div class="highlight">
+<pre><span class="c1"># The value extracted. Tracked is aliased to rank for PageRankr::Ranks, backlink for PageRankr::Backlinks, and index for PageRankr::Indexes.</span>
+<span class="n">tracker</span><span class="o">.</span><span class="n">tracked</span> <span class="c1">#=&gt; 2</span>
+
+<span class="c1"># The value extracted with the jsonpath, xpath, or regex before being cleaned.</span>
+<span class="n">tracker</span><span class="o">.</span><span class="n">raw</span> <span class="c1">#=&gt; "2"</span>
+
+<span class="c1"># The body of the response</span>
+<span class="n">tracker</span><span class="o">.</span><span class="n">body</span> <span class="c1">#=&gt; "&lt;html&gt;&lt;head&gt;..."</span>
+</pre>
+</div>
+
+
+<h2>Rate limiting and proxies</h2>
+
+<p>One of the annoying things about each of these services is that they really don't like you scraping data from them. In order to deal with this issue, they throttle traffic from a single machine. The simplest way to get around this is to use proxy machines to make the requests. </p>
+
+<p>In PageRankr &gt;= 3.2.0, this is much simpler. The first thing you'll need is a proxy service. Two are provided <a href="https://github.com/blatyo/page_rankr/tree/master/lib/page_rankr/proxy_services">here</a>. A proxy service must define a <code>proxy</code> method that takes two arguments. It should return a string like <code>user:password@192.168.1.1:50501</code>.</p>
+
+<p>Once you have a proxy service, you can tell PageRankr to use it. For example:</p>
+
+<div class="highlight">
+<pre><span class="no">PageRankr</span><span class="o">.</span><span class="n">proxy_service</span> <span class="o">=</span> <span class="no">PageRankr</span><span class="o">::</span><span class="no">ProxyServices</span><span class="o">::</span><span class="no">Random</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="o">[</span>
+ <span class="s1">'user:password@192.168.1.1:50501'</span><span class="p">,</span>
+ <span class="s1">'user:password@192.168.1.2:50501'</span>
+<span class="o">]</span><span class="p">)</span>
+</pre>
+</div>
+
+
+<p>Once PageRankr knows about your proxy service, any request that is made will ask for a proxy from the proxy service. It does this by calling the <code>proxy</code> method. When it calls the <code>proxy</code> method, it passed the name of the tracker (e.g. <code>:ranks_google</code>) and the site that is being looked up. Hopefully, this information is sufficient for you to build a much smarter proxy service than the ones provided (pull requests welcome!).</p>
+
+<h2>Fix it!</h2>
+
+<p>If you ever find something is broken it should now be much easier to fix it with version &gt;= 1.3.0. For example, if the xpath used to lookup a backlink is broken, just override the method for that class to provide the correct xpath.</p>
+
+<div class="highlight">
+<pre><span class="k">module</span> <span class="nn">PageRankr</span>
+ <span class="k">class</span> <span class="nc">Backlinks</span>
+ <span class="k">class</span> <span class="nc">Bing</span>
+ <span class="k">def</span> <span class="nf">xpath</span>
+ <span class="s2">"//my/new/awesome/@xpath"</span>
+ <span class="k">end</span>
+ <span class="k">end</span>
+ <span class="k">end</span>
+<span class="k">end</span>
+</pre>
+</div>
+
+
+<h2>Extend it!</h2>
+
+<p>If you ever come across a site that provides a rank or backlinks you can hook that class up to automatically be use with PageRankr. PageRankr does this by looking up all the classes namespaced under Backlinks, Indexes, and Ranks.</p>
+
+<div class="highlight">
+<pre><span class="nb">require</span> <span class="s1">'page_rankr/backlink'</span>
+
+<span class="k">module</span> <span class="nn">PageRankr</span>
+ <span class="k">class</span> <span class="nc">Backlinks</span>
+ <span class="k">class</span> <span class="nc">Foo</span>
+ <span class="kp">include</span> <span class="no">Backlink</span>
+
+ <span class="c1"># This method is required</span>
+ <span class="k">def</span> <span class="nf">url</span>
+ <span class="s2">"http://example.com/"</span>
+ <span class="k">end</span>
+
+ <span class="c1"># This method specifies the parameters for the url. It is optional, but likely required for the class to be useful.</span>
+ <span class="k">def</span> <span class="nf">params</span>
+ <span class="p">{</span><span class="ss">:q</span> <span class="o">=&gt;</span> <span class="n">tracked_url</span><span class="p">}</span>
+ <span class="k">end</span>
+
+ <span class="c1"># You can use a method named either xpath, jsonpath, or regex with the appropriate query type</span>
+ <span class="k">def</span> <span class="nf">xpath</span>
+ <span class="s2">"//backlinks/text()"</span>
+ <span class="k">end</span>
+
+ <span class="c1"># Optionally, you could override the clean method if the current implementation isn't sufficient</span>
+ <span class="c1"># def clean(backlink_count)</span>
+ <span class="c1"># #do some of my own cleaning</span>
+ <span class="c1"># super(backlink_count) # strips non-digits and converts it to an integer or nil</span>
+ <span class="c1"># end</span>
+ <span class="k">end</span>
+ <span class="k">end</span>
+<span class="k">end</span>
+
+<span class="no">PageRankr</span><span class="o">::</span><span class="no">Backlinks</span><span class="o">::</span><span class="no">Foo</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="s2">"myawesomesite.com"</span><span class="p">)</span><span class="o">.</span><span class="n">run</span> <span class="c1">#=&gt; 3</span>
+<span class="no">PageRankr</span><span class="o">.</span><span class="n">backlinks</span><span class="p">(</span><span class="s2">"myawesomesite.com"</span><span class="p">,</span> <span class="ss">:foo</span><span class="p">)</span><span class="o">[</span><span class="ss">:foo</span><span class="o">]</span> <span class="c1">#=&gt; 3</span>
+</pre>
+</div>
+
+
+<p>Then, just make sure you require the class and PageRankr and whenever you call PageRankr.backlinks it'll be able to use your class.</p>
+
+<h2>Note on Patches/Pull Requests</h2>
+
+<ul>
+<li>Fork the project.</li>
+<li>Make your feature addition or bug fix.</li>
+<li>Add tests for it. This is important so I don't break it in a
+future version unintentionally.</li>
+<li>Commit, do not mess with rakefile, version, or history.
+(if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)</li>
+<li>Send me a pull request. Bonus points for topic branches.</li>
+</ul><h2>TODO Version 4</h2>
+
+<ul>
+<li>Detect request throttling</li>
+</ul><h2>Contributors</h2>
+
+<ul>
+<li>
+<a href="https://github.com/Druwerd">Dru Ibarra</a> - Use Google Search API instead of scraping.</li>
+<li>
+<a href="https://github.com/iterationlabs">Iteration Labs, LLC</a> - Compete rank tracker and domain indexes.</li>
+<li>
+<a href="http://www.marc-seeger.de">Marc Seeger</a> (<a href="http://www.acquia.com">Acquia</a>) - Ignore invalid ranks that Alexa returns for incorrect sites.</li>
+<li>
+<a href="https://github.com/rymai">Rémy Coutable</a> - Update public_suffix_service gem.</li>
+<li>
+<a href="https://github.com/titanous">Jonathan Rudenberg</a> - Fix compete scraper.</li>
+<li>
+<a href="https://github.com/d11wtq">Chris Corbyn</a> - Fix google page rank url.</li>
+<li>
+<a href="https://github.com/i0rek">Hans Haselberg</a> - Update typhoeus gem.</li>
+<li>
+<a href="https://github.com/priithaamer">Priit Haamer</a> - Fix google backlinks lookup.</li>
+<li>
+<a href="https://github.com/martyMM">Marty McKenna</a> - Idea for proxy service</li>
+</ul><h2>Shout Out</h2>
+
+<p>Gotta give credit where credits due!</p>
+
+<p>Original inspiration from:</p>
+
+<ul>
+<li><a href="https://github.com/alexmipego/PageRankSharp">PageRankSharp</a></li>
+<li><a href="http://snipplr.com/view/18329/google-page-range-lookup/">Google Page Range Lookup/</a></li>
+<li><a href="http://www.sitetoolcenter.com/free-website-scripts/ajax-pr-checker.php">AJAX PR Checker</a></li>
+</ul><h2>Copyright</h2>
+
+<p>Copyright (c) 2010 Allen Madsen. See LICENSE for details.</p>
+ </section>
+ <footer>
+ <p><small>Hosted on <a href="https://pages.github.com">GitHub Pages</a> using the Dinky theme</small></p>
+ </footer>
+ </div>
+ <!--[if !IE]><script>fixScale(document);</script><!--<![endif]-->
+ <script type="text/javascript">
+ var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
+ document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
+ </script>
+ <script type="text/javascript">
+ try {
+ var pageTracker = _gat._getTracker("UA-33059328-1");
+ pageTracker._trackPageview();
+ } catch(err) {}
+ </script>
+
+ </body>
+</html>
20 javascripts/scale.fix.js
@@ -0,0 +1,20 @@
+fixScale = function(doc) {
+
+ var addEvent = 'addEventListener',
+ type = 'gesturestart',
+ qsa = 'querySelectorAll',
+ scales = [1, 1],
+ meta = qsa in doc ? doc[qsa]('meta[name=viewport]') : [];
+
+ function fix() {
+ meta.content = 'width=device-width,minimum-scale=' + scales[0] + ',maximum-scale=' + scales[1];
+ doc.removeEventListener(type, fix, true);
+ }
+
+ if ((meta = meta[meta.length - 1]) && addEvent in doc) {
+ fix();
+ scales = [.25, 1.6];
+ doc[addEvent](type, fix, true);
+ }
+
+};
1 params.json
@@ -0,0 +1 @@
+{"tagline":"Easy way to retrieve Google Page Rank, Alexa Rank, index counts, and backlink counts","google":"UA-33059328-1","note":"Don't delete this file! It's used internally to help with page regeneration.","name":"Page rankr","body":"[![Build Status](http://travis-ci.org/blatyo/page_rankr.png)](http://travis-ci.org/blatyo/page_rankr)\r\nProvides an easy way to retrieve Google Page Rank, Alexa Rank, backlink counts, and index counts.\r\n\r\nCheck out a little [web app][1] I wrote up that uses it or look at the [source][2].\r\n\r\n[1]: http://isitpopular.heroku.com\r\n[2]: https://github.com/blatyo/is_it_popular\r\n\r\n## Get it!\r\n\r\n``` bash\r\ngem install PageRankr\r\n```\r\n\r\n## Use it!\r\n\r\n``` ruby\r\nrequire 'page_rankr'\r\n```\r\n\r\n### Backlinks\r\n\r\nBacklinks are the result of doing a search with a query like \"link:www.google.com\". The number of returned results indicates how many sites point to that url. If a site is not tracked then `nil` is returned.\r\n\r\n``` ruby\r\nPageRankr.backlinks('www.google.com', :google, :bing) #=> {:google=>161000, :bing=>208000000}\r\nPageRankr.backlinks('www.google.com', :yahoo) #=> {:yahoo=>256300062}\r\n```\r\n\r\nIf you don't specify a search engine, then all of them are used.\r\n\r\n``` ruby\r\n# this\r\nPageRankr.backlinks('www.google.com')\r\n #=> {:google=>23000, :bing=>215000000, :yahoo=>250522337, :alexa=>727036}\r\n\r\n# is equivalent to\r\nPageRankr.backlinks('www.google.com', :google, :bing, :yahoo, :alexa)\r\n #=> {:google=>23000, :bing=>215000000, :yahoo=>250522337, :alexa=>727036}\r\n```\r\n\r\nYou can also use the alias `backlink` instead of `backlinks`.\r\n\r\nValid search engines are: `:google, :bing, :yahoo, :alexa` (altavista and alltheweb now redirect to yahoo). To get this list you can do:\r\n\r\n``` ruby\r\nPageRankr.backlink_trackers #=> [:alexa, :bing, :google, :yahoo]\r\n```\r\n\r\n### Indexes\r\n\r\nIndexes are the result of doing a search with a query like \"site:www.google.com\". The number of returned results indicates how many pages of a domain are indexed by a particular search engine. If the site is not indexed `nil` is returned.\r\n\r\n``` ruby\r\nPageRankr.indexes('www.google.com', :google) #=> {:google=>4860000}\r\nPageRankr.indexes('www.google.com', :bing) #=> {:bing=>2120000}\r\n```\r\n\r\nIf you don't specify a search engine, then all of them are used.\r\n\r\n``` ruby\r\n# this\r\nPageRankr.indexes('www.google.com')\r\n #=> {:bing=>2120000, :google=>4860000, :yahoo => 4863000}\r\n\r\n# is equivalent to\r\nPageRankr.indexes('www.google.com', :google, :bing, :yahoo)\r\n #=> {:bing=>2120000, :google=>4860000, :yahoo => 4863000}\r\n```\r\n\r\nYou can also use the alias `index` instead of `indexes`.\r\n\r\nValid search engines are: `:google, :bing, :yahoo`. To get this list you can do:\r\n\r\n``` ruby\r\nPageRankr.index_trackers #=> [:bing, :google, :yahoo]\r\n```\r\n\r\n### Ranks\r\n\r\nRanks are ratings assigned to specify how popular a site is. The most famous example of this is the google page rank.\r\n\r\n``` ruby\r\nPageRankr.ranks('www.google.com', :google) #=> {:google=>10}\r\n```\r\n\r\nIf you don't specify a rank provider, then all of them are used.\r\n\r\n``` ruby\r\nPageRankr.ranks('www.google.com', :alexa_us, :alexa_global, :google)\r\n #=> {:alexa_us=>1, :alexa_global=>1, :google=>10}\r\n\r\n# this also gives the same result\r\nPageRankr.ranks('www.google.com')\r\n #=> {:alexa_us=>1, :alexa_global=>1, :google=>10}\r\n```\r\n\r\nYou can also use the alias `rank` instead of `ranks`.\r\n\r\nValid rank trackers are: `:alexa_us, :alexa_global, :google`. To get this you can do:\r\n\r\n``` ruby\r\nPageRankr.rank_trackers #=> [:alexa_global, :alexa_us, :google]\r\n```\r\n\r\nAlexa ranks are descending where 1 is the most popular. Google page ranks are in the range 0-10 where 10 is the most popular. If a site is unindexed then the rank will be nil.\r\n\r\n## Use it a la carte!\r\n\r\nFrom versions >= 3, everything should be usable in a much more a la carte manner. If all you care about is google page rank (which I speculate is common) you can get that all by itself:\r\n\r\n``` ruby\r\nrequire 'page_rankr/ranks/google'\r\n\r\ntracker = PageRankr::Ranks::Google.new(\"myawesomesite.com\")\r\ntracker.run #=> 2\r\n```\r\n\r\nAlso, once a tracker has run three values will be accessible from it:\r\n\r\n``` ruby\r\n# The value extracted. Tracked is aliased to rank for PageRankr::Ranks, backlink for PageRankr::Backlinks, and index for PageRankr::Indexes.\r\ntracker.tracked #=> 2\r\n\r\n# The value extracted with the jsonpath, xpath, or regex before being cleaned.\r\ntracker.raw #=> \"2\"\r\n\r\n# The body of the response\r\ntracker.body #=> \"<html><head>...\"\r\n```\r\n\r\n## Rate limiting and proxies\r\n\r\nOne of the annoying things about each of these services is that they really don't like you scraping data from them. In order to deal with this issue, they throttle traffic from a single machine. The simplest way to get around this is to use proxy machines to make the requests. \r\n\r\nIn PageRankr >= 3.2.0, this is much simpler. The first thing you'll need is a proxy service. Two are provided [here](https://github.com/blatyo/page_rankr/tree/master/lib/page_rankr/proxy_services). A proxy service must define a `proxy` method that takes two arguments. It should return a string like `user:password@192.168.1.1:50501`.\r\n\r\nOnce you have a proxy service, you can tell PageRankr to use it. For example:\r\n\r\n``` ruby\r\nPageRankr.proxy_service = PageRankr::ProxyServices::Random.new([\r\n 'user:password@192.168.1.1:50501',\r\n 'user:password@192.168.1.2:50501'\r\n])\r\n```\r\n\r\nOnce PageRankr knows about your proxy service, any request that is made will ask for a proxy from the proxy service. It does this by calling the `proxy` method. When it calls the `proxy` method, it passed the name of the tracker (e.g. `:ranks_google`) and the site that is being looked up. Hopefully, this information is sufficient for you to build a much smarter proxy service than the ones provided (pull requests welcome!).\r\n\r\n## Fix it!\r\n\r\nIf you ever find something is broken it should now be much easier to fix it with version >= 1.3.0. For example, if the xpath used to lookup a backlink is broken, just override the method for that class to provide the correct xpath.\r\n\r\n``` ruby\r\nmodule PageRankr\r\n class Backlinks\r\n class Bing\r\n def xpath\r\n \"//my/new/awesome/@xpath\"\r\n end\r\n end\r\n end\r\nend\r\n```\r\n\r\n## Extend it!\r\n\r\nIf you ever come across a site that provides a rank or backlinks you can hook that class up to automatically be use with PageRankr. PageRankr does this by looking up all the classes namespaced under Backlinks, Indexes, and Ranks.\r\n\r\n``` ruby\r\nrequire 'page_rankr/backlink'\r\n\r\nmodule PageRankr\r\n class Backlinks\r\n class Foo\r\n include Backlink\r\n\r\n # This method is required\r\n def url\r\n \"http://example.com/\"\r\n end\r\n\r\n # This method specifies the parameters for the url. It is optional, but likely required for the class to be useful.\r\n def params\r\n {:q => tracked_url}\r\n end\r\n\r\n # You can use a method named either xpath, jsonpath, or regex with the appropriate query type\r\n def xpath\r\n \"//backlinks/text()\"\r\n end\r\n\r\n # Optionally, you could override the clean method if the current implementation isn't sufficient\r\n # def clean(backlink_count)\r\n # #do some of my own cleaning\r\n # super(backlink_count) # strips non-digits and converts it to an integer or nil\r\n # end\r\n end\r\n end\r\nend\r\n\r\nPageRankr::Backlinks::Foo.new(\"myawesomesite.com\").run #=> 3\r\nPageRankr.backlinks(\"myawesomesite.com\", :foo)[:foo] #=> 3\r\n```\r\n\r\nThen, just make sure you require the class and PageRankr and whenever you call PageRankr.backlinks it'll be able to use your class.\r\n\r\n## Note on Patches/Pull Requests\r\n\r\n* Fork the project.\r\n* Make your feature addition or bug fix.\r\n* Add tests for it. This is important so I don't break it in a\r\n future version unintentionally.\r\n* Commit, do not mess with rakefile, version, or history.\r\n (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)\r\n* Send me a pull request. Bonus points for topic branches.\r\n\r\n## TODO Version 4\r\n* Detect request throttling\r\n\r\n## Contributors\r\n* [Dru Ibarra](https://github.com/Druwerd) - Use Google Search API instead of scraping.\r\n* [Iteration Labs, LLC](https://github.com/iterationlabs) - Compete rank tracker and domain indexes.\r\n* [Marc Seeger](http://www.marc-seeger.de) ([Acquia](http://www.acquia.com)) - Ignore invalid ranks that Alexa returns for incorrect sites.\r\n* [Rémy Coutable](https://github.com/rymai) - Update public_suffix_service gem.\r\n* [Jonathan Rudenberg](https://github.com/titanous) - Fix compete scraper.\r\n* [Chris Corbyn](https://github.com/d11wtq) - Fix google page rank url.\r\n* [Hans Haselberg](https://github.com/i0rek) - Update typhoeus gem.\r\n* [Priit Haamer](https://github.com/priithaamer) - Fix google backlinks lookup.\r\n* [Marty McKenna](https://github.com/martyMM) - Idea for proxy service\r\n\r\n## Shout Out\r\nGotta give credit where credits due!\r\n\r\nOriginal inspiration from:\r\n\r\n* [PageRankSharp](https://github.com/alexmipego/PageRankSharp)\r\n* [Google Page Range Lookup/](http://snipplr.com/view/18329/google-page-range-lookup/)\r\n* [AJAX PR Checker](http://www.sitetoolcenter.com/free-website-scripts/ajax-pr-checker.php)\r\n\r\n## Copyright\r\n\r\nCopyright (c) 2010 Allen Madsen. See LICENSE for details."}
69 stylesheets/pygment_trac.css
@@ -0,0 +1,69 @@
+.highlight { background: #ffffff; }
+.highlight .c { color: #999988; font-style: italic } /* Comment */
+.highlight .err { color: #a61717; background-color: #e3d2d2 } /* Error */
+.highlight .k { font-weight: bold } /* Keyword */
+.highlight .o { font-weight: bold } /* Operator */
+.highlight .cm { color: #999988; font-style: italic } /* Comment.Multiline */
+.highlight .cp { color: #999999; font-weight: bold } /* Comment.Preproc */
+.highlight .c1 { color: #999988; font-style: italic } /* Comment.Single */
+.highlight .cs { color: #999999; font-weight: bold; font-style: italic } /* Comment.Special */
+.highlight .gd { color: #000000; background-color: #ffdddd } /* Generic.Deleted */
+.highlight .gd .x { color: #000000; background-color: #ffaaaa } /* Generic.Deleted.Specific */
+.highlight .ge { font-style: italic } /* Generic.Emph */
+.highlight .gr { color: #aa0000 } /* Generic.Error */
+.highlight .gh { color: #999999 } /* Generic.Heading */
+.highlight .gi { color: #000000; background-color: #ddffdd } /* Generic.Inserted */
+.highlight .gi .x { color: #000000; background-color: #aaffaa } /* Generic.Inserted.Specific */
+.highlight .go { color: #888888 } /* Generic.Output */
+.highlight .gp { color: #555555 } /* Generic.Prompt */
+.highlight .gs { font-weight: bold } /* Generic.Strong */
+.highlight .gu { color: #800080; font-weight: bold; } /* Generic.Subheading */
+.highlight .gt { color: #aa0000 } /* Generic.Traceback */
+.highlight .kc { font-weight: bold } /* Keyword.Constant */
+.highlight .kd { font-weight: bold } /* Keyword.Declaration */
+.highlight .kn { font-weight: bold } /* Keyword.Namespace */
+.highlight .kp { font-weight: bold } /* Keyword.Pseudo */
+.highlight .kr { font-weight: bold } /* Keyword.Reserved */
+.highlight .kt { color: #445588; font-weight: bold } /* Keyword.Type */
+.highlight .m { color: #009999 } /* Literal.Number */
+.highlight .s { color: #d14 } /* Literal.String */
+.highlight .na { color: #008080 } /* Name.Attribute */
+.highlight .nb { color: #0086B3 } /* Name.Builtin */
+.highlight .nc { color: #445588; font-weight: bold } /* Name.Class */
+.highlight .no { color: #008080 } /* Name.Constant */
+.highlight .ni { color: #800080 } /* Name.Entity */
+.highlight .ne { color: #990000; font-weight: bold } /* Name.Exception */
+.highlight .nf { color: #990000; font-weight: bold } /* Name.Function */
+.highlight .nn { color: #555555 } /* Name.Namespace */
+.highlight .nt { color: #000080 } /* Name.Tag */
+.highlight .nv { color: #008080 } /* Name.Variable */
+.highlight .ow { font-weight: bold } /* Operator.Word */
+.highlight .w { color: #bbbbbb } /* Text.Whitespace */
+.highlight .mf { color: #009999 } /* Literal.Number.Float */
+.highlight .mh { color: #009999 } /* Literal.Number.Hex */
+.highlight .mi { color: #009999 } /* Literal.Number.Integer */
+.highlight .mo { color: #009999 } /* Literal.Number.Oct */
+.highlight .sb { color: #d14 } /* Literal.String.Backtick */
+.highlight .sc { color: #d14 } /* Literal.String.Char */
+.highlight .sd { color: #d14 } /* Literal.String.Doc */
+.highlight .s2 { color: #d14 } /* Literal.String.Double */
+.highlight .se { color: #d14 } /* Literal.String.Escape */
+.highlight .sh { color: #d14 } /* Literal.String.Heredoc */
+.highlight .si { color: #d14 } /* Literal.String.Interpol */
+.highlight .sx { color: #d14 } /* Literal.String.Other */
+.highlight .sr { color: #009926 } /* Literal.String.Regex */
+.highlight .s1 { color: #d14 } /* Literal.String.Single */
+.highlight .ss { color: #990073 } /* Literal.String.Symbol */
+.highlight .bp { color: #999999 } /* Name.Builtin.Pseudo */
+.highlight .vc { color: #008080 } /* Name.Variable.Class */
+.highlight .vg { color: #008080 } /* Name.Variable.Global */
+.highlight .vi { color: #008080 } /* Name.Variable.Instance */
+.highlight .il { color: #009999 } /* Literal.Number.Integer.Long */
+
+.type-csharp .highlight .k { color: #0000FF }
+.type-csharp .highlight .kt { color: #0000FF }
+.type-csharp .highlight .nf { color: #000000; font-weight: normal }
+.type-csharp .highlight .nc { color: #2B91AF }
+.type-csharp .highlight .nn { color: #000000 }
+.type-csharp .highlight .s { color: #A31515 }
+.type-csharp .highlight .sc { color: #A31515 }
413 stylesheets/styles.css
@@ -0,0 +1,413 @@
+@import url(https://fonts.googleapis.com/css?family=Arvo:400,700,400italic);
+
+/* MeyerWeb Reset */
+
+html, body, div, span, applet, object, iframe,
+h1, h2, h3, h4, h5, h6, p, blockquote, pre,
+a, abbr, acronym, address, big, cite, code,
+del, dfn, em, img, ins, kbd, q, s, samp,
+small, strike, strong, sub, sup, tt, var,
+b, u, i, center,
+dl, dt, dd, ol, ul, li,
+fieldset, form, label, legend,
+table, caption, tbody, tfoot, thead, tr, th, td,
+article, aside, canvas, details, embed,
+figure, figcaption, footer, header, hgroup,
+menu, nav, output, ruby, section, summary,
+time, mark, audio, video {
+ margin: 0;
+ padding: 0;
+ border: 0;
+ font: inherit;
+ vertical-align: baseline;
+}
+
+
+/* Base text styles */
+
+body {
+ padding:10px 50px 0 0;
+ font-family:"Helvetica Neue", Helvetica, Arial, sans-serif;
+ font-size: 14px;
+ color: #232323;
+ background-color: #FBFAF7;
+ margin: 0;
+ line-height: 1.8em;
+ -webkit-font-smoothing: antialiased;
+
+}
+
+h1, h2, h3, h4, h5, h6 {
+ color:#232323;
+ margin:36px 0 10px;
+}
+
+p, ul, ol, table, dl {
+ margin:0 0 22px;
+}
+
+h1, h2, h3 {
+ font-family: Arvo, Monaco, serif;
+ line-height:1.3;
+ font-weight: normal;
+}
+
+h1,h2, h3 {
+ display: block;
+ border-bottom: 1px solid #ccc;
+ padding-bottom: 5px;
+}
+
+h1 {
+ font-size: 30px;
+}
+
+h2 {
+ font-size: 24px;
+}
+
+h3 {
+ font-size: 18px;
+}
+
+h4, h5, h6 {
+ font-family: Arvo, Monaco, serif;
+ font-weight: 700;
+}
+
+a {
+ color:#C30000;
+ font-weight:200;
+ text-decoration:none;
+}
+
+a:hover {
+ text-decoration: underline;
+}
+
+a small {
+ font-size: 12px;
+}
+
+em {
+ font-style: italic;
+}
+
+strong {
+ font-weight:700;
+}
+
+ul li {
+ list-style: inside;
+ padding-left: 25px;
+}
+
+ol li {
+ list-style: decimal inside;
+ padding-left: 20px;
+}
+
+blockquote {
+ margin: 0;
+ padding: 0 0 0 20px;
+ font-style: italic;
+}
+
+dl, dt, dd, dl p {
+ font-color: #444;
+}
+
+dl dt {
+ font-weight: bold;
+}
+
+dl dd {
+ padding-left: 20px;
+ font-style: italic;
+}
+
+dl p {
+ padding-left: 20px;
+ font-style: italic;
+}
+
+hr {
+ border:0;
+ background:#ccc;
+ height:1px;
+ margin:0 0 24px;
+}
+
+/* Images */
+
+img {
+ position: relative;
+ margin: 0 auto;
+ max-width: 650px;
+ padding: 5px;
+ margin: 10px 0 32px 0;
+ border: 1px solid #ccc;
+}
+
+
+/* Code blocks */
+
+code, pre {
+ font-family: Monaco, "Bitstream Vera Sans Mono", "Lucida Console", Terminal, monospace;
+ color:#000;
+ font-size:14px;
+}
+
+pre {
+ padding: 4px 12px;
+ background: #FDFEFB;
+ border-radius:4px;
+ border:1px solid #D7D8C8;
+ overflow: auto;
+ overflow-y: hidden;
+ margin-bottom: 32px;
+}
+
+
+/* Tables */
+
+table {
+ width:100%;
+}
+
+table {
+ border: 1px solid #ccc;
+ margin-bottom: 32px;
+ text-align: left;
+ }
+
+th {
+ font-family: 'Arvo', Helvetica, Arial, sans-serif;
+ font-size: 18px;
+ font-weight: normal;
+ padding: 10px;
+ background: #232323;
+ color: #FDFEFB;
+ }
+
+td {
+ padding: 10px;
+ background: #ccc;
+ }
+
+
+/* Wrapper */
+.wrapper {
+ width:960px;
+}
+
+
+/* Header */
+
+header {
+ background-color: #171717;
+ color: #FDFDFB;
+ width:170px;
+ float:left;
+ position:fixed;
+ border: 1px solid #000;
+ -webkit-border-top-right-radius: 4px;
+ -webkit-border-bottom-right-radius: 4px;
+ -moz-border-radius-topright: 4px;
+ -moz-border-radius-bottomright: 4px;
+ border-top-right-radius: 4px;
+ border-bottom-right-radius: 4px;
+ padding: 34px 25px 22px 50px;
+ margin: 30px 25px 0 0;
+ -webkit-font-smoothing: antialiased;
+}
+
+p.header {
+ font-size: 16px;
+}
+
+h1.header {
+ font-family: Arvo, sans-serif;
+ font-size: 30px;
+ font-weight: 300;
+ line-height: 1.3em;
+ border-bottom: none;
+ margin-top: 0;
+}
+
+
+h1.header, a.header, a.name, header a{
+ color: #fff;
+}
+
+a.header {
+ text-decoration: underline;
+}
+
+a.name {
+ white-space: nowrap;
+}
+
+header ul {
+ list-style:none;
+ padding:0;
+}
+
+header li {
+ list-style-type: none;
+ width:132px;
+ height:15px;
+ margin-bottom: 12px;
+ line-height: 1em;
+ padding: 6px 6px 6px 7px;
+
+ background: #AF0011;
+ background: -moz-linear-gradient(top, #AF0011 0%, #820011 100%);
+ background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#f8f8f8), color-stop(100%,#dddddd));
+ background: -webkit-linear-gradient(top, #AF0011 0%,#820011 100%);
+ background: -o-linear-gradient(top, #AF0011 0%,#820011 100%);
+ background: -ms-linear-gradient(top, #AF0011 0%,#820011 100%);
+ background: linear-gradient(top, #AF0011 0%,#820011 100%);
+
+ border-radius:4px;
+ border:1px solid #0D0D0D;
+
+ -webkit-box-shadow: inset 0px 1px 1px 0 rgba(233,2,38, 1);
+ box-shadow: inset 0px 1px 1px 0 rgba(233,2,38, 1);
+
+}
+
+header li:hover {
+ background: #C3001D;
+ background: -moz-linear-gradient(top, #C3001D 0%, #950119 100%);
+ background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#f8f8f8), color-stop(100%,#dddddd));
+ background: -webkit-linear-gradient(top, #C3001D 0%,#950119 100%);
+ background: -o-linear-gradient(top, #C3001D 0%,#950119 100%);
+ background: -ms-linear-gradient(top, #C3001D 0%,#950119 100%);
+ background: linear-gradient(top, #C3001D 0%,#950119 100%);
+}
+
+a.buttons {
+ -webkit-font-smoothing: antialiased;
+ background: url(../images/arrow-down.png) no-repeat;
+ font-weight: normal;
+ text-shadow: rgba(0, 0, 0, 0.4) 0 -1px 0;
+ padding: 2px 2px 2px 22px;
+ height: 30px;
+}
+
+a.github {
+ background: url(../images/octocat-small.png) no-repeat 1px;
+}
+
+a.buttons:hover {
+ color: #fff;
+ text-decoration: none;
+}
+
+
+/* Section - for main page content */
+
+section {
+ width:650px;
+ float:right;
+ padding-bottom:50px;
+}
+
+
+/* Footer */
+
+footer {
+ width:170px;
+ float:left;
+ position:fixed;
+ bottom:10px;
+ padding-left: 50px;
+}
+
+@media print, screen and (max-width: 960px) {
+
+ div.wrapper {
+ width:auto;
+ margin:0;
+ }
+
+ header, section, footer {
+ float:none;
+ position:static;
+ width:auto;
+ }
+
+ footer {
+ border-top: 1px solid #ccc;
+ margin:0 84px 0 50px;
+ padding:0;
+ }
+
+ header {
+ padding-right:320px;
+ }
+
+ section {
+ padding:20px 84px 20px 50px;
+ margin:0 0 20px;
+ }
+
+ header a small {
+ display:inline;
+ }
+
+ header ul {
+ position:absolute;
+ right:130px;
+ top:84px;
+ }
+}
+
+@media print, screen and (max-width: 720px) {
+ body {
+ word-wrap:break-word;
+ }
+
+ header {
+ padding:10px 20px 0;
+ margin-right: 0;
+ }
+
+ section {
+ padding:10px 0 10px 20px;
+ margin:0 0 30px;
+ }
+
+ footer {
+ margin: 0 0 0 30px;
+ }
+
+ header ul, header p.view {
+ position:static;
+ }
+}
+
+@media print, screen and (max-width: 480px) {
+
+ header ul li.download {
+ display:none;
+ }
+
+ footer {
+ margin: 0 0 0 20px;
+ }
+
+ footer a{
+ display:block;
+ }
+
+}
+
+@media print {
+ body {
+ padding:0.4in;
+ font-size:12pt;
+ color:#444;
+ }
+}

0 comments on commit 92e0ad9

Please sign in to comment.