Skip to content
Portable Class Library and Client to create sitemap.xml and search for links in a website
Branch: master
Clone or download
Latest commit 303a340 May 12, 2014
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
SiteMapperBash Bug converting to UTF-8 + LinkSpider-logo May 12, 2014
SiteMapperLib LinkSpiderLogo May 12, 2014
.gitattributes 1st Milestone Done. May 3, 2014
.gitignore Initial commit Apr 25, 2014
LICENSE Initial commit Apr 25, 2014
LingSpider-logo.png LinkSpiderLogo May 12, 2014
README.md Update README.md May 12, 2014

README.md

Link Spider

LinkSpider logo

Link Spider is a High performance Portable Class Library searching for links in a website or webpage allowing you [optionally] creating standard sitemap.xml file.

This proyect also includes a Console Client as an utility to generate sitemap.xml of any site.

  • Library works with parallel tasks to reach maximum perfomance
  • async - await operations support

##LinkSpider Portable Class Library Features

###LinkSpider Class

  • High performance
  • Explore single webpages using parallel features
  • Explore websites using parallel features
  • Ready for async / await operations
  • Broken links list in the website
  • List with all website links
  • List with all external links
  • Support exploration filters to avoid browse for links in pages including some url patterns

###SitemapTarantula Class

  • Builds standard sitemap.xml file
  • Support output filtering to exclude links with some url patterns
  • Support data generation in Unicode and UTF8 Encodings

##LinkSpider Console Features Full support to all LinkSpider Portable Class Library features

###Samples ####Fast create This creates

  • sitemap.xml : standard sitemap
  • plain.txt : plain text file listing all links in website
LinkSpiderConsole.exe --u http://yoursite.com

####Customizing output files --s --p

LinkSpiderConsole.exe --u http://yoursite.com --s YOURsitemap.xml --p YOURplain.txt

####Navigation Filtering --n This avoid to explore links containing some url patterns. This sample shows how to avoid url exploration when it contains this fragments

  • /tag/
  • /pages/
LinkSpiderConsole.exe --u http://yoursite.com --n /tag/,/pages/

This is very useful to improve performance not waisting time in unuseful urls.

####Sitemap Filtering --m This avoid to include links containing some url patterns in sitemap file. This sample shows how to avoid some links in sitemap when it contains this fragments

  • /tag/
  • /pages/
LinkSpiderConsole.exe --u http://yoursite.com --m /tag/,/pages/

####Single webpage mode Every execution mode listed before also support --o special parameter to analyze just the page passed as parameter.

LinkSpiderConsole.exe --u http://yoursite.com/myWebPage --m /tag/,/pages/ --o

###Additional files This tool creates optionally 2 differente files :

  • brokenLinks.txt: including broken links targeting current site
  • externalLinks.txt: including all links targeting to other domains
You can’t perform that action at this time.