Skip to content
Brendan Coles edited this page Dec 1, 2017 · 52 revisions

Suggestions, feature requests and discussion go here. See the TODO page for features which have not yet been implemented.


Direction

Things have been going well. WhatWeb 0.4.5 is a good, stable tool and has earned community recognition.

We've been tearing webpages apart and fingerprinting them piece by piece. We've built plugins for many web applications, client side libraries and HTML elements, but now we have a few important issues to consider regarding WhatWeb's direction.

Design Philosophy

  • Always use an intuitive interface. never force a user to choose an option when a default is better. the following command must always work: ./whatweb slashdot.org
  • Never take choices away from the user. Each automatic decision should be a default for a configurable option. examples: follow redirects.
  • Avoid premature over-engineering. do not implement core code to handle types of information that few plugins currently return. Allow plugins to return the information in generic formats such as :string instead. Wait until many plugins are returning the same type of information, such as operating system, filepaths, versions, or modules before considering how to solve this problem in the core. Premature over-engineering is the type of error that kills a project.
  • When a solution to a problem is inelegant then do not implement it in WhatWeb. Instead continue to meditate on the problem for as long as required. If you need a fast solution then hack up your own version of WhatWeb and do not introduce the patch into the core, I have done this many times.
  • WhatWeb must grow horizontally and vertically together. WhatWeb must be good at solving a type of problem before entering a new area. for example, WhatWeb must be competent at identifying a system before it starts becoming good at identifying versions of systems. If WhatWeb is known to be patchy in it's coverage this could kill the project. this is the rationale behind not implementing security checks yet. This also works with the unix philosophy of doing one thing but doing it really well.
  • Breaking backwards compatibility is OK.

Multi-App plugins

We're at a fork in the road on this one. On one side, we can fingerprint each application individually and write a plugin for each one. On the other we can incorporate many different applications of the same type into one plugin, for example all third party javascript libraries.

bcoles: I'm in favor of categorizing plugins rather than combining multiple applications into a single plugin. I'd rather see output Google-Analytics[713526426] than Third-Party-Library[Google-Analytics[713526426]]

Exceptions:

An exception would be fingerprinting generic wep apps: admin panels or web backdoors for example. Applications where you're only able to fingerprint generically using subtle clues, such as "/admin/", "/login/" or "?cmd=" in the URL. It doesn't necessarily mean that an admin panel or backdoor are present, but it's a good indication.

It also acceptable to write plugins which return different models for hardware. It is not feasible to write a different plugin for every model.

Output becomes a wall of text

We now have numerous plugins which return a file path from the source of an HTML element. For example,

  • Redirect-Location
  • Frame
  • RSS-Feed
  • Mailto
  • Title
  • Script
  • Shortcut Icon

These types of plugins are great for plugin development, data mining or noticing patterns across networks.

The problem is the WhatWeb output becomes a massive wall of text, even in --log-brief mode.

One way around this is by putting these types of plugins in a "plugin development" category and allowing the user to enable/disable certain categories.

For now most of these plugins are in the ./plugins-disabled directory.

One solution is a new output format combined with plugin categories (see below).


Categories

Should plugins be categorized? If so, should they be layered (ie, sub-categories)?:

  • Server
  • Language
  • Program
  • Third Party Library

or

  • HTML Elements
  • Program
  • Vendor
  • Server
  • Development
  • Config/Log files

or

  • HTTP Server. Apache, Nginx
  • Language. PHP, ASP, ASP.NET, ColdFusion
  • Framework. Cake, Zend, Ruby on Rails ( can u tell this from the language and CMS?)
  • CMS/Blog. WordPress, Joomla, Drupal
  • JS Library. Scriptaculus, Prototype, JQuery, Google Analytics
  • Hardware devices. Xerox Printers, Cisco routers, D-link cameras
  • Common. Title, Subdomains, Uncommon-headers, X-Powered-By, Mailto
  • Hashes. Header-hash, footer-hash

bcoles: Categories for plugins should be defined as an array of tags within the plugin file. Tagging is superior to categorization.

I (Andrew) like the above categories best but it is far from complete. The first categories break down into an OSI-like set of layers nicely. The 'hardware devices' category should be considered covering all layers from the server to the JS library. The common category defines plugins that are common to all types of websites, not necessarily commonly found plugins. The hashes are kept separate from the common plugins as hashes are primarily used to discover common content after a scan and a user may wish to disable these.

Here is a set of categories from builtwith.com:

  • Ads
  • Analytics
  • Blog
  • CDN
  • CMS
  • DocInfo
  • Ecommerce
  • Encoding (utf-8, big5)
  • Feeds (feed types and feed providers)
  • Framework (includes languages and frameworks)
  • JS (javascript libraries, not including analytics)
  • Media (Media provider such as youtube)
  • Server
  • Software (operating systems)
  • Widgets

Here is a set of categories from Wappalyzer:

  • CMS
  • Message Boards
  • Database managers
  • Documentation tools
  • Widgets
  • Web shops
  • Photo galleries
  • Wikis
  • Hosting panels
  • Analytics
  • Blogs
  • JavaScript frameworks
  • Issue trackers
  • Video Players
  • Comment Systems
  • CAPTCHAs
  • Font scripts
  • Web frameworks
  • Miscellaneous
  • Editors
  • LMS
  • Web servers
  • Cache tools

Some problems are:

  • Encoding should be a plugin value, not a plugin
  • Ecommerce has a lot of CMS's
  • Blogs and CMS's have cross over, such as WordPress
  • Client-Side fits into a lot of categories, but should probably be kept separate

Some notes are: The Analytics category could be included in JS but it's better to have it's own category.


How to scan websites that need authentication?

Types of authentication to potentially support:

  • HTTP Basic Authentication - currently supported by --header
  • HTTP Digest Authentication - currently supported by --header
  • URL parameter with session token
  • HTTP Cookies - currently supported by --header
  • SSL Certificate Support
  • HTTP Forms with passwords

Curl supports these and it might make sense for WhatWeb to copy curl's command line syntax.

A method, not necessarily a good one is to load WhatWeb with username and password combinations which it will try whenever it discovers a password prompt.

Using HTTP authorization would be nice for fingerprinting devices with default credentials. This belongs in aggression level 5 which has not yet been implemented.


POST data

Aung Khant: Some frameworks issue unique error response when we do invalid post request

:url_post=>'/', :post_data=>'null=null'

bcoles: post can be achieved with custom ruby but POST request support would be worth adding. Also support for OPTIONS requests may be useful, for example WebDav.


Should WhatWeb exploit vulnerabilities to test them?

Andrew: No. Not yet at least. I want good coverage of plugins to identify systems first including aggressive plugins to detect exact version numbers.

Plugins that test for vulnerabilities, if or when introduced, should be at a different aggression level, maybe 5. Exploiting full path disclosure, default credentials and weak access controls fit into this category.


Recursive Mode

The anemone library does not support redirects. It is also limited to extracting links from <a href="*"> tags. It may be worth while to rewrite the anemone library at some point in the distant future.


Returning Data

According to the WhatWeb design philosophy: avoid premature over-engineering. Do not implement core code to handle types of information that few plugins currently return.

The following are candidates as data-types for plugins to return (such as :version, :string, :firmware, etc) as it may be useful to separate them from results in :string=> :

  • :hostname=>
    • Internal host name - not widely used
  • :ip=>
    • Used for internal IP addresses and the IP plugin - not widely used
  • :mac=>
    • MAC address - not widely used
  • :year=>
    • The age of an installation can often be roughly determined by the year(s) in copyright messages. Several plugins report the year.

How should WhatWeb save/store webpages?

Add option to save HTTP response (HTML + HTTP headers).

  • option 1 (hostnames backwards by TLD, IPs forwards by octet)

    • login.yahoo.com becomes: com/yahoo/login/head and download/com/yahoo/login/body
    • 208.51.4.1 becomes: 208/51/4/1/head and 208/51/4/1/body
  • option 2 (md5 hash of url, this is kind of brutal)

    • 9e107d9d372bb6826bd81d3542a419d6.head
    • 9e107d9d372bb6826bd81d3542a419d6.body
  • option 3 (URL encode every special character after the hostname. should dots remain dots?)

    • login.yahoo.com%2findex.html.head
    • login.yahoo.com%2findex.html.body

Thoughts:

  • WhatWeb now supports reading HTTP headers + HTML content from a single local file so it's probably not necessary to separate the two.
  • large sets - splitting the hostnames across directories (option 1)
  • small sets - one directory for all hosts (keep the dots)
  • URL encode every special character for the path
  • Store files in optional folder? There should also be options for saving to DBs like gridfs, sqlite, etc

Custom Plugins

This feature should provide a gentle introduction into custom usage of WhatWeb and eventually lead into plugin writing.

Aims of the feature :

Reduce barrier to entry for custom searching with WhatWeb and remove the need for anyone to write this :

echo "\n\n" | netcat whatweb.net 80 | grep -Eo "<title>([^<]+)<\/title>"

For example:

$ ./whatweb --custom-plugin "{:string=>/<title>([^<]+)<\/title>/i}" whatweb.net

This option allows WhatWeb to act as a powerful, threaded, grep-powered platform for HTTP(S).

Unfortunately the --custom-plugin option needs to be escaped and in some cases, such as :regexp=>//, needs to be double-escaped as it parsed directly from the command-line. This results in a complicated and unintuitive command line argument.

Splitting each match method up into its own command line argument would help reduce the complexity :

option 1 --custom-plugin-text, --custom-plugin-regex

option 2 --find-text, --find-regex, --find-md5

option 3 --match-text, --match-regex, --match-md5

option 4 --grep-text, --grep-regex, --grep-md5


Graphical User Interface

A GUI would be nice. Options:

  • Add GUI to WhatWeb (Ruby) and launch with command line option --gui
  • Add GUI to WhatWeb (Ruby) and provide two branches: CLI and GUI
  • Write a separate application (wrapper). Using Ruby would make sense.

bcoles: I'm concerned that using a wrapper will be slow. That said, I've written a threaded GUI wrapper in C# for use on Windows systems as a working proof of concept. Contact me if you would like a copy. Keep in mind that is a proof of concept only and suffers from the following flaws:

  • you cannot select plugins (all enabled plugins are run by default)
  • logging is limited to brief-logging
  • scanning local files is buggy (Windows file paths are not escaped properly)

Addons

Addons in the ./addons directory allow users to extend WhatWeb. These tools have been kept separate for several reasons:

  • This helps us keep unsupported features out of the core until they have been thoroughly tested.
  • It follows the UNIX philosophy: do one thing and do it well.
  • It assists in preventing premature over-engineering.

The following are potential addons which might be worth writing.

build-report

A tool to build a report file. Use XML+XSL format?

  • Could include (fav)icons for different software.
  • CVE#/OSDVB#/bugtraq#/etc optional.
  • Allow report generation based on grouping:
    • this URL matches these plugins, or
    • this plugin matches these URLs

passive-vuln-detection

A tool to return CVE#/OSVDB#/bugtraq#/etc for know vulnerable software versions.


Max File Size

Set a maximum file size for remote files to stop WhatWeb getting "stuck" on huge files or streaming data.

--max-filesize=SIZE    Set the maximum allowed file size for remote files. Default: (1MB)

Follow frames

Many websites still use frames on intro pages. A --follow-frames option would allow WhatWeb to grab these URLs instead of being stuck trying to fingerprint a HTML frameset.

--follow-frames=WHEN    Control when to follow frames. WHEN may be `never',
                        `frame-only', `iframe-only', `same-site', `same-domain'
                        or `always'. Default: never

Should frames be followed by default? Should following off-site frames be ignored or be a configurable option? Would never or same-site be the best default?


Extract Injection Points

Andre Gironda: i would love to see WhatWeb identify candidate insertion points for testing - especially marking insertion points that are user controllable HTML element attributes

bcoles: any suggestions on how the results for candidates for insertion should be formatted?

Andre Gironda: ProxMon and Casaba Watcher tools do it right - they are open-source

bcoles: This could be achieved with a plugin. Something like :

  • GET params: split base_uri by ? then &
    • Extract params from /base_uri[^'"]+\?([^=]+)=([^&]+)/
  • POST params: The ./plugins-disabled/POST-Parameters.rb plugin exists for this purpose
  • Elements: grep for the GET param values and extract the relevant HTML element type
    • Will most likely result in false positives unless non-default GET parameter values are sent