Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the crawl.open_api plugin with additional parameters #17194

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 4 additions & 2 deletions w3af/core/data/parsers/doc/open_api/main.py
Expand Up @@ -59,9 +59,10 @@ class OpenAPI(BaseParser):
'swagger',
'paths')

def __init__(self, http_response):
def __init__(self, http_response, no_validation=False):
super(OpenAPI, self).__init__(http_response)
self.api_calls = []
self.no_validation = no_validation

@staticmethod
def content_type_match(http_resp):
Expand Down Expand Up @@ -136,7 +137,8 @@ def parse(self):
"""
Extract all the API endpoints using the bravado Open API parser
"""
specification_handler = SpecificationHandler(self.get_http_response())
specification_handler = SpecificationHandler(self.get_http_response(),
self.no_validation)

for data in specification_handler.get_api_information():
request_factory = RequestFactory(*data)
Expand Down
15 changes: 13 additions & 2 deletions w3af/core/data/parsers/doc/open_api/specification.py
Expand Up @@ -50,9 +50,10 @@


class SpecificationHandler(object):
def __init__(self, http_response):
def __init__(self, http_response, no_validation=False):
self.http_response = http_response
self.spec = None
self.no_validation = no_validation

def get_http_response(self):
return self.http_response
Expand Down Expand Up @@ -108,13 +109,23 @@ def _set_operation_params(self, operation):
def _parse_spec_from_dict(self, spec_dict, retry=True):
"""
load_spec_dict will load the open api document into a dict. We use this
function to parse the dict into a bravado Spec instance.
function to parse the dict into a bravado Spec instance. By default,
it validates the spec, but validation may be disabled
by passing `no_validation=True` to the construction

:param spec_dict: The output of load_spec_dict
:return: A Spec instance which holds all the dict information in an
accessible way.
"""
config = {'use_models': False}
if self.no_validation:
om.out.debug('Open API spec validation disabled')
config.update({
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't played with these settings. What will happen when the validation fails? Are errors written somewhere? Maybe we should write them to the debug log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the validation fails, then basically Spec.from_dict() returns None, and as a result, no API endpoints are added for further testing. If I understand correctly, any exceptions below are logged only if debug mode is enabled. Here is an example of a error I saw (it may depend on issues in a particular API spec):

The document at "file:///path/to/swagger.yaml" is not a valid Open API specification. The following exception was raised while parsing the dict into a specification object: "('expected string or buffer', TypeError('expected string or buffer',))"

Although I didn't see any additional info. I am not sure if printing out a full stacktrace would help, and I also didn't find a way how to enable additional logs in bravado. Maybe we can always print a warning if the validation fails, not only when debug mode is enabled. What do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that log line we're fine. Thanks for the detailed answer.

'validate_swagger_spec': False,
'validate_requests': False,
'validate_responses': False
})

url_string = self.http_response.get_url().url_string

try:
Expand Down
89 changes: 80 additions & 9 deletions w3af/plugins/crawl/open_api.py
Expand Up @@ -24,14 +24,19 @@
import w3af.core.data.kb.config as cf

from w3af.core.data.options.opt_factory import opt_factory
from w3af.core.data.parsers.doc.url import URL
from w3af.core.data.request.fuzzable_request import FuzzableRequest
from w3af.core.data.options.option_types import QUERY_STRING, HEADER
from w3af.core.data.options.option_types import QUERY_STRING, HEADER, BOOL, STRING
from w3af.core.data.options.option_list import OptionList
from w3af.core.data.parsers.doc.open_api import OpenAPI
from w3af.core.data.db.disk_set import DiskSet
from w3af.core.data.kb.info import Info
from w3af.core.controllers.plugins.crawl_plugin import CrawlPlugin
from w3af.core.controllers.core_helpers.fingerprint_404 import is_404
from w3af.core.data.dc.headers import Headers
from w3af.core.data.url.HTTPResponse import HTTPResponse

import os.path


class open_api(CrawlPlugin):
Expand Down Expand Up @@ -68,17 +73,23 @@ def __init__(self):
# User configured variables
self._query_string_auth = ''
self._header_auth = ''
self._no_spec_validation = False
self._custom_spec_location = ''

def crawl(self, fuzzable_request):
"""
Try to extract all the API endpoints from various locations.
Try to extract all the API endpoints from various locations
if no custom location specified.

:param fuzzable_request: A fuzzable_request instance that contains
(among other things) the URL to test.
"""
self._enable_file_name_fuzzing()
self._analyze_common_paths(fuzzable_request)
self._analyze_current_path(fuzzable_request)
if self._has_custom_spec_location():
self._analyze_custom_spec()
else:
self._enable_file_name_fuzzing()
self._analyze_common_paths(fuzzable_request)
self._analyze_current_path(fuzzable_request)

def _enable_file_name_fuzzing(self):
"""
Expand Down Expand Up @@ -142,7 +153,19 @@ def _extract_api_calls(self, spec_url):
if is_404(http_response):
return

parser = OpenAPI(http_response)
self._extract_api_calls_from_response(spec_url, http_response)

def _extract_api_calls_from_response(self, spec_url, http_response):
"""
Try to parse an API specification from an HTTP response.
Send all the newly found fuzzable requests to the core
after adding any authentication data that might have been configured.

:parm spec_url: A URL to API specification
:param http_response: An HTTP response
:return: None
"""
parser = OpenAPI(http_response, self._no_spec_validation)
parser.parse()

self._report_to_kb_if_needed(http_response, parser)
Expand Down Expand Up @@ -292,6 +315,35 @@ def _analyze_current_path(self, fuzzable_request):
self._spec_url_generator_current_path(fuzzable_request)
)

def _has_custom_spec_location(self):
"""
Checks if the plugin is configured to use a custom API specification
from a local file.

:return: True if the plugin is configured to read a custom API spec
"""
return self._custom_spec_location != ''

def _analyze_custom_spec(self):
"""
Loads a custom API specification from a local file, and try to parse it.

:return: None
"""
if not self._first_run:
return
self._first_run = False

url = URL('file://%s' % os.path.abspath(self._custom_spec_location))

with open(self._custom_spec_location, 'r') as f:
custom_spec_as_string = f.read()

headers = Headers([('content-type', 'application/json')])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exactly how headers were used (or if they were) but openapi supports both yaml and json.

Could you check in the rest of the code if these headers will be used to decide how the contents of the file are parsed? If so, then the header should be dynamically generated based on the file extension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, both yaml and json are supported. It looks like the header is just ignored, or maybe it doesn't affect much parsing the spec because I tested it with yaml, and it worked well. I just followed your advice here

#15087 (comment)

I'll check it it makes sense to set the header with correct type, or if it's fine to drop it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that OpenAPI.can_parse() checks Content-Type header but after some testing I don't think that Content-Type: application/json breaks processing an Open API spec in YAML format. Nevertheless, I think it would be better to specify a correct content type here at least to prevent further confusions.

http_response = HTTPResponse(200, custom_spec_as_string, headers, url, url, _id=1)

self._extract_api_calls_from_response(url, http_response)

def get_options(self):
"""
:return: A list of option objects for this plugin.
Expand All @@ -314,6 +366,18 @@ def get_options(self):
o = opt_factory('header_auth', self._header_auth, d, HEADER, help=h)
ol.add(o)

d = 'Disable Open API spec validation'
h = 'By default, the plugin validates Open API specification before extracting endpoints.'
o = opt_factory('no_spec_validation', self._no_spec_validation, d, BOOL, help=h)
ol.add(o)

d = 'Path to Open API specification'
h = ('By default, the plugin looks for the API specification on the target,',
' but sometimes applications do not provide an API specification. ',
' Set this parameter to specify a local path to the API specification.')
o = opt_factory('custom_spec_location', self._custom_spec_location, d, STRING, help=h)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the input types (STRING and BOOL are input types). There is one for input files which will make sure (during the configuration phase) that the file exists.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this one and we're A-OK for merging 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I'll update it, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the type to be INPUT_FILE. Now if the file doesn't exist, the plugin reports it during configuration. Then, crawling/testing go as usual, but most likely no API will be tested because no API spec provided. It looks to me as an expected behavior, but it can hide configuration issues if w3af is run with a script (-s option). Does w3af have an option to exit with a non-zero code if a configuration step failed? It would help to identify configuration issues (it's probably out of scope in this PR).

ol.add(o)

return ol

def set_options(self, options_list):
Expand All @@ -326,6 +390,8 @@ def set_options(self, options_list):
"""
self._query_string_auth = options_list['query_string_auth'].get_value()
self._header_auth = options_list['header_auth'].get_value()
self._no_spec_validation = options_list['no_spec_validation'].get_value()
self._custom_spec_location = options_list['custom_spec_location'].get_value()

def get_long_desc(self):
"""
Expand All @@ -340,13 +406,18 @@ def get_long_desc(self):
* swagger.json
* openapi.json
* openapi.yaml

The user can also set the Open API specification URL as the scan target
to provide the required information.

To provide the required information, the user can also set
the Open API specification URL as the scan target,
or set 'custom_spec_location' configuration parameter
to provide a path to a local file which contains the specification.

Most APIs require authentication, this plugin supports authentication
using query string parameters and HTTP headers. The user can configure
them using these configuration parameters:
* query_string_auth
* header_auth

By default, the plugin validates Open API specification.
The validation may be disabled by 'no_spec_validation' configuration parameter.
"""