UnicodeDecodeError @ html_file: 'utf8' codec can't decode byte #4219

Closed
1d3df9903ad opened this Issue Jul 25, 2014 · 18 comments

Projects

None yet

3 participants

@1d3df9903ad

Reproducing this issue

  • URL to reproduce sent by @ST2Labs via email with subject "Reproducir el error de w3af"
  • It's possible to reproduce the error using the scan configuration provided by @DavidKutik but in most cases I'm hitting Too many consecutive errors #8698 first

Analysis

Note that this issue is generated by a plugin, but not caught by the exception handler. This generates an ugly pop-up window to the user.

Version Information

  Python version: 2.7.3 (default, Mar 13 2014, 11:03:55) [GCC 4.7.2]
  GTK version: 2.24.10
  PyGTK version: 2.24.0
  w3af version:
    w3af - Web Application Attack and Audit Framework
    Version: 1.6.0.4
    Distribution: Kali Linux
    Author: Andres Riancho and the w3af team.

#8688 also confirmed this issue in:

  Python version: 2.7.8 (default, Oct 20 2014, 15:05:19) [GCC 4.9.1]
  GTK version: 2.24.25
  PyGTK version: 2.24.0
  w3af version:
    w3af - Web Application Attack and Audit Framework
    Version: 1.6.45
    Revision: b7cffaa62a - 26 Feb 2015 13:30
    Branch: master
    Local changes: No
    Author: Andres Riancho and the w3af team.

Traceback

Traceback (most recent call last):
  File "/usr/share/w3af/w3af/core/ui/gui/main.py", line 595, in start_scan_wrap
    real_scan_start()
  File "/usr/share/w3af/w3af/core/ui/gui/main.py", line 586, in real_scan_start
    self.w3af.start()
  File "/usr/share/w3af/w3af/core/controllers/w3afCore.py", line 222, in start
    self.scan_end_hook()
  File "/usr/share/w3af/w3af/core/controllers/w3afCore.py", line 396, in scan_end_hook
    om.out.end_output_plugins()
  File "/usr/share/w3af/w3af/core/controllers/output_manager.py", line 138, in end_output_plugins
    self.__end_output_plugins_impl()
  File "/usr/share/w3af/w3af/core/controllers/output_manager.py", line 149, in __end_output_plugins_impl
    o_plugin.end()
  File "/usr/share/w3af/w3af/plugins/output/html_file.py", line 309, in end
    severity))
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 1654: invalid start byte
@andresriancho andresriancho changed the title from [Auto-Generated] Bug Report - UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 1654: invalid start byte to html output file - UnicodeDecodeError: 'utf8' codec can't decode byte Jul 25, 2014
@andresriancho andresriancho changed the title from html output file - UnicodeDecodeError: 'utf8' codec can't decode byte to UnicodeDecodeError @ html_file: 'utf8' codec can't decode byte Mar 4, 2015
@DavidKutik

Well. it indeed has nothing to do with the output file name - sorry for that.

And I was able to reproduce the bug when scanning dvwa only using
audit - rfd
auth - detailed
crawl - web_spider
output - console, html_file

but then only like 5/6 times.
I noticed that in between it says: "warning: xdot version 1.7, but supported is 1.6"

@andresriancho
Owner

@DavidKutik thanks for confirming that the filename has nothing to do 👍

Regarding your configuration and the steps to reproduce:

  • Could you please save the scan configuration to a profile and then copy+paste the file here? The profile file should be located at ~/.w3af/profiles/profile-name.pw3af and it looks like a INI file.
  • What about DVWA? Which version of that are you running?
  • Are you running DVWA in Kali?
  • How did you install DVWA?

The more info I've got, the closer I get to your environment and can fix the issue. Thanks!

I noticed that in between it says: "warning: xdot version 1.7, but supported is 1.6"

This doesn't seem related, but thanks for mentioning

@DavidKutik

I am using dvwa Version 1.8 (Release date: 11/01/2011) from the owaspbwa vm 1.1.1 (2013.09.28)

@DavidKutik
[profile]
description = output_html_error
name = test_error

[grep.error_pages]

[crawl.web_spider]
only_forward = True
follow_regex = .*dvwa.*
ignore_regex = .*logout.*|.*setup.*|.*security\.php.*|^((?!dvwa).)*$

[auth.detailed]
username = user
password = user
username_field = username
password_field = password
auth_url = http://192.168.121.141/dvwa/login.php
check_url = http://192.168.121.141/dvwa/index.php
check_string = Welcome
data_format = %u=%U&%p=%P&Login=Login
method = POST

[output.html_file]
output_file = ~/w3af_output_html_utf8_error.html
verbose = True

[output.console]
verbose = True

[target]
target = http://192.168.121.141/dvwa/

[misc-settings]
fuzz_cookies = False
fuzz_form_files = True
fuzz_url_filenames = False
fuzz_url_parts = False
fuzzed_files_extension = gif
fuzzable_headers =
form_fuzzing_mode = tmb
stop_on_first_exception = False
max_discovery_time = 120
interface = ppp0
local_ip_address = None
non_targets =
msf_location = /opt/metasploit3/bin/

[http-settings]
timeout = 15
headers_file =
basic_auth_user =
basic_auth_passwd =
basic_auth_domain =
ntlm_auth_domain =
ntlm_auth_user =
ntlm_auth_passwd =
ntlm_auth_url =
cookie_jar_file =
ignore_session_cookies = False
proxy_port = 8080
proxy_address =
user_agent = w3af.org
rand_user_agent = False
max_file_size = 400000
max_http_retries = 2
max_requests_per_second = 0
always_404 =
never_404 =
string_match_404 =
url_parameter =

[audit.rfd]
@andresriancho
Owner

Amazing! Thanks for all your help, I'll increase the priority of this issue so I fix it soon(ish)

@DavidKutik

ignore_regex = . * logout . * | . * setup . * | . * security \ .php. * |^((?!dvwa).) * $
(without the spaces)

@andresriancho andresriancho self-assigned this Mar 4, 2015
@andresriancho
Owner

Setup environment

  • Change the IP address from 192.168.121.141 to whatever I get from the VM

Reproduce the issue

  • Scan 1: PASS
  • Scan 2: PASS
  • Scan 3: PASS
  • Scan 4, enabled all grep plugins to see if I can get to the line of code where html_file fails. The scans I'm running find any vulnerabilities, which makes it impossible to reach the buggy line:
            self._write_to_file(information_row % (color, i_class,
                                                   port,
                                                   desc,
                                                   escaped_url,
                                                   severity))

Scan didn't trigger the bug, but found >10 vulnerabilities.

  • Scan 5 (same as above): PASS
  • Scan 6 (same as above): PASS
  • Scan 7 (same as above): PASS
  • Scan 8, enabled all audit plugins: ...

Reproduce the issue (2)

  • Install Kali
  • Install w3af
  • Run a scan from the Kali installation to the DVWA, my workstation doesn't seem to be vulnerable to this issue.

Fix

  • Identify which vulnerability triggers the bug in html_file.py, line 309, in end()
  • repr() the data that triggers the error
  • Write a quick unittest which triggers the error
  • Properly encode string
  • Assert that the unittest PASSes
@andresriancho
Owner

@DavidKutik could you please send me a screenshot showing how your knowledge base looks after finishing the scan? Mine looks like this (I've enabled more plugins than the ones you had in the original profile, so don't worry about the difference):

screenshot from 2015-03-04 15 55 22

@andresriancho
Owner

@DavidKutik also please send me the output of running env in the same console you run w3af_gui. This will allow me to understand if you've got Kali installed in other language, or some strange encoding configured.

Thanks for all your help!

@DavidKutik

XDG_VTNR=7
LC_PAPER=de_DE.UTF-8
LC_ADDRESS=de_DE.UTF-8
XDG_SESSION_ID=c2
XDG_GREETER_DATA_DIR=/var/lib/lightdm-data/cw
LC_MONETARY=de_DE.UTF-8
SAL_USE_VCLPLUGIN=gtk
CLUTTER_IM_MODULE=xim
SESSION=Lubuntu
GPG_AGENT_INFO=/run/user/1000/keyring-EGOlRk/gpg:0:1
XDG_MENU_PREFIX=lxde-
SHELL=/bin/bash
TERM=xterm
LC_NUMERIC=de_DE.UTF-8
UPSTART_SESSION=unix:abstract=/com/ubuntu/upstart-session/1000/1972
USER=cw
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.Z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.jpg=01;35:.jpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.axv=01;35:.anx=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.axa=00;36:.oga=00;36:.spx=00;36:.xspf=00;36:
LC_TELEPHONE=de_DE.UTF-8
XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0
XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
SSH_AUTH_SOCK=/run/user/1000/keyring-EGOlRk/ssh
FTP_PROXY=$ftp_proxy
DEFAULTS_PATH=/usr/share/gconf/Lubuntu.default.path
XDG_CONFIG_DIRS=/etc/xdg/lubuntu:/etc/xdg/xdg-Lubuntu:/usr/share/upstart/xdg:/etc/xdg
DESKTOP_SESSION=Lubuntu
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
QT_IM_MODULE=xim
LC_IDENTIFICATION=de_DE.UTF-8
XDG_SESSION_TYPE=x11
PWD=/home/cw
JOB=dbus
XMODIFIERS=@im=ibus
LANG=en_US.UTF-8
GDM_LANG=en_US
MANDATORY_PATH=/usr/share/gconf/Lubuntu.mandatory.path
LC_MEASUREMENT=de_DE.UTF-8
IM_CONFIG_PHASE=1
GDMSESSION=Lubuntu
HTTPS_PROXY=$https_proxy
_LXSESSION_PID=2111
SESSIONTYPE=lxsession
SHLVL=1
HOME=/home/cw
XDG_SEAT=seat0
LANGUAGE=en_US
XDG_CONFIG_HOME=/home/cw/.config
HTTP_PROXY=$http_proxy
UPSTART_INSTANCE=
UPSTART_EVENTS=started xsession
XDG_SESSION_DESKTOP=Lubuntu
LOGNAME=cw
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-g3ALY5rV2f
XDG_DATA_DIRS=/etc/xdg/lubuntu:/usr/local/share:/usr/share:/usr/share/gdm:/var/lib/menu-xdg:/usr/share/Lubuntu:/usr/local/share/:/usr/share/
QT4_IM_MODULE=xim
LESSOPEN=| /usr/bin/lesspipe %s
TEXTDOMAIN=im-config
INSTANCE=
UPSTART_JOB=lxsession
XDG_RUNTIME_DIR=/run/user/1000
DISPLAY=:0
XDG_CURRENT_DESKTOP=LXDE
GTK_IM_MODULE=xim
LESSCLOSE=/usr/bin/lesspipe %s %s
LC_TIME=de_DE.UTF-8
TEXTDOMAINDIR=/usr/share/locale/
LC_NAME=de_DE.UTF-8
XAUTHORITY=/home/cw/.Xauthority
_=/usr/bin/env

@DavidKutik

It's a Lubuntu 14.10 Desktop 64-bit in English and German-Keyboard
Linux cw-2 3.16.0-31-generic #41-Ubuntu SMP Tue Feb 10 15:24:04 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

@DavidKutik

If you want to render HTML responses, you need to install at least one of rendering engines: python-webkit, python-gtkmozembed, python-gtkhtml2

@DavidKutik

I installed python-webkit
error still occures
kb

@andresriancho
Owner

Thanks for the quick answers.

It's a Lubuntu 14.10 Desktop 64-bit in English and German-Keyboard

Ok, that explains some of the variables in your env which refer to de, for example LC_NAME=de_DE.UTF-8. But LANG still points to LANG=en_US.UTF-8 which is the same I have. I think that python uses that as default encoding, will have to research a bit.

Image

Hmmm... that output can't be generated using the profile you sent me. For example the un_ssl key in the knowledge base is added by the audit.un_ssl plugin, which was not part of the profile you mentioned in the first place. So, the question is... does the profile in this comment reproduce the error for you?

@andresriancho
Owner

What's returned for you when running this command?

[pablo:/w3af] develop(+5/-5) ± python -c 'import sys; print sys.getfilesystemencoding()'
UTF-8
[pablo:/w3af] develop(+5/-5) ± python -c 'import locale; print locale.getpreferredencoding(False)'
ANSI_X3.4-1968
[pablo:/w3af] develop(+5/-5) ±
@DavidKutik

The kb-screenshot is from another profile with just more (nearly all) audits activated.
With the output_html_error profile it wasn't showing anything yesterday night, since it's not
vulnerable to RFD.
Btw I just reran the test to confirm it, and the error didn't occur.

cw@cw-2:$ python -c 'import sys; print sys.getfilesystemencoding()'
UTF-8
cw@cw-2:
$ python -c 'import locale; print locale.getpreferredencoding(False)'
ANSI_X3.4-1968

@andresriancho andresriancho added a commit that referenced this issue Mar 13, 2015
@andresriancho * html_file: Improve plugin to use jinja2 templates #8866
* UnicodeDecodeError @ html_file: 'utf8' codec can't decode byte #4219
941c97e
@andresriancho
Owner

@DavidKutik thanks for all your help with this issue, I've finally "fixed it" by refactoring the plugin. More information here:

http://w3af.org/not-a-web-designer

Please test the new plugin to make sure it works as expected and report new issues if it doesn't

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment