Skip to content
This repository
Browse code

added uriPath

  • Loading branch information...
commit deb52393bd1c56c77db784a9214cc7dd0396a223 1 parent e624162
Greg Molnar authored April 10, 2013
1  README.markdown
Source Rendered
@@ -219,6 +219,7 @@ var conditionID = myCrawler.addFetchCondition(function(parsedURL) {
219 219
 	return !parsedURL.path.match(/\.pdf$/i);
220 220
 });
221 221
 ```
  222
+NOTE: simplecrawler uses slightly different terminology to URIjs. `parsedURL.path` includes the query string too. If you want the path without the query string, use `parsedURL.uriPath`.
222 223
 
223 224
 ##### Removing a fetch condition
224 225
 
7  lib/crawler.js
@@ -257,9 +257,10 @@ Crawler.prototype.processURL = function(URL,context) {
257 257
 	// simplecrawler uses slightly different terminology to URIjs. Sorry!
258 258
 	return {
259 259
 		"protocol": newURL.protocol() || "http",
260  
-		"host":		newURL.hostname(),
261  
-		"port":		newURL.port() || 80,
262  
-		"path":		newURL.resource()
  260
+		"host":	newURL.hostname(),
  261
+		"port":	newURL.port() || 80,
  262
+		"path":	newURL.resource(),
  263
+		"uriPath": newURL.path()
263 264
 	};
264 265
 };
265 266
 

0 notes on commit deb5239

Please sign in to comment.
Something went wrong with that request. Please try again.