Better Error Reporting #259

rhunwicks · 2015-05-06T06:45:31Z

At the moment, when an error occurs with raise_errors = False we store the traceback and the error message and show it to the user.

However, we don't show the actual data in the row that caused the error, which makes it hard to troubleshoot what went wrong. This is particularly true if the dataset has been manipulated in before_import; for example, I have a lot of imports where I have to un-crosstab an Excel spreadsheet before importing it, so the rows in the dataset don't match the rows in the spreadsheet.

One simple solution is to allow the results.Error class to store the row:

class Error(object):

    def __init__(self, error, traceback=None, row=None):
        self.error = error
        self.traceback = traceback
        self.row = row

And then for import_data to save it:

            except Exception as e:
                tb_info = traceback.format_exc(2)
                row_result.errors.append(Error(e, tb_info, row))

And then finally for the template to display it:

      {% for line, errors in result.row_errors %}
        {% for error in errors %}
          <li>
            {% trans "Line number" %}: {{ line }} - {{ error.error }}
            <div>{{ error.row }}</div>
            <div class="traceback">{{ error.traceback|linebreaks }}</div>

This works for me without any special effort to coerce the row to a string. I get output like:

Line number: 91 - 'RelatedManager' object has no attribute 'geographicunitsetversion'
OrderedDict([(u'geographicunitsetversion', <GeographicUnitSetVersion: Standard Admin Units: null - null>), (u'geographicunitset', <GeographicUnitSet: Standard Admin Units>), (u'end', None), (u'start', None), (u'data_source_organization', None), (u'data_source_document', None), (u'parent_code', u'21106'), (u'unit_type', u'admin2'), (u'name', u'St Georges County'), (u'code', u'2110605'), (u'set', u'Standard Admin Units'), (u'admin0', u'USA'), (u'fnid0', u'US'), (u'full_name', None), (u'country', u'US'), (u'fnid1', u'21106'), (u'fnid2', u'2110605'), (u'fnid3', u'0'), (u'fnid4', u'0'), (u'admin1', u'MD'), (u'admin2', u'St Georges County'), (u'admin3', u''), (u'admin4', u'')])
Traceback (most recent call last):
File "/home/roger/.virtualenvs/fdw/local/lib/python2.7/site-packages/import_export/resources.py", line 361, in import_data
if self.skip_row(instance, original):
File "/home/roger/.virtualenvs/fdw/local/lib/python2.7/site-packages/import_export/resources.py", line 266, in skip_row
if field.get_value(instance) != field.get_value(original):
AttributeError: 'RelatedManager' object has no attribute 'geographicunitsetversion'

That output is much more useful to me than the default.

The text was updated successfully, but these errors were encountered:

bmihelac · 2015-05-06T09:29:28Z

Looks good to me, care to make PR?

rhunwicks · 2015-05-07T05:20:56Z

Sure, can I suggest another improvement too.

I currently have a problem where I get an exception in a before_import which is added to result.base_errors but not raised because raise_errors = False, and then all subsequent updates fail because of a TransactionManagementError - i.e. Postgresql won't allow any more updates because of an error in the transaction, which was the original error in the before_import. The second error is raised (it's in a different Resource with raise_errors = True) but doesn't mean anything, but I no longer have access to the result object.

I'd like to add a Null logger to the resources.py and then use logger.exception() everywhere we have an exception handler. That should have no impact on the existing code at all, but would allow users who have added a custom logger to see all the exceptions that occurred.

rhunwicks · 2015-05-07T05:51:31Z

Also, I noticed that in the import.html template we show the traceback for row errors but not for base errors. I'd like to correct that, unless there was a specific reason for not showing them.

rhunwicks · 2015-05-07T12:08:54Z

fwiw, with the exception logging place, the test output now looks like:

~/git/django-import-export$ ./runtests.sh 
Creating test database for alias 'default'...
......................................................ERROR:root:Column 'id': invalid literal for int() with base 10: 'foo'
Traceback (most recent call last):
  File "/home/roger/git/django-import-export/import_export/resources.py", line 355, in import_data
    instance, new = self.get_or_init_instance(instance_loader, row)
  File "/home/roger/git/django-import-export/import_export/resources.py", line 186, in get_or_init_instance
    instance = self.get_instance(instance_loader, row)
  File "/home/roger/git/django-import-export/import_export/resources.py", line 183, in get_instance
    return instance_loader.get_instance(row)
  File "/home/roger/git/django-import-export/import_export/instance_loaders.py", line 32, in get_instance
    params[field.attribute] = field.clean(row)
  File "/home/roger/git/django-import-export/import_export/fields.py", line 59, in clean
    raise ValueError("Column '%s': %s" % (self.column_name, e))
ValueError: Column 'id': invalid literal for int() with base 10: 'foo'
..ERROR:root:Column 'birthday': Enter a valid date/time.
Traceback (most recent call last):
  File "/home/roger/git/django-import-export/import_export/resources.py", line 373, in import_data
    self.import_obj(instance, row, real_dry_run)
  File "/home/roger/git/django-import-export/import_export/resources.py", line 238, in import_obj
    self.import_field(field, obj, data)
  File "/home/roger/git/django-import-export/import_export/resources.py", line 230, in import_field
    field.save(obj, data)
  File "/home/roger/git/django-import-export/import_export/fields.py", line 94, in save
    setattr(obj, self.attribute, self.clean(data))
  File "/home/roger/git/django-import-export/import_export/fields.py", line 59, in clean
    raise ValueError("Column '%s': %s" % (self.column_name, e))
ValueError: Column 'birthday': Enter a valid date/time.
....................
----------------------------------------------------------------------
Ran 76 tests in 0.767s

OK
Destroying test database for alias 'default'...

I.e. the tests all run successfully but the errors get logged to the console. Is that OK?

rhunwicks · 2015-05-07T12:18:04Z

I can suppress the exception logging when running tests by adding to settings.py

LOGGING = {
    'version': 1,
    'disable_existing_loggers': True,
    'handlers': {
        'console': {
            'class': 'logging.NullHandler'
        }
    },
    'root': {
        'handlers': ['console'],
    }}

int-ua · 2015-07-09T20:56:39Z

Please add some comment here when this code is released, it should help me a lot. Maybe add a "released" label?

bmihelac closed this as completed in 0b144b6 May 21, 2015

int-ua mentioned this issue Jul 13, 2015

How to import CSV file without ID column? #299

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better Error Reporting #259

Better Error Reporting #259

rhunwicks commented May 6, 2015

bmihelac commented May 6, 2015

rhunwicks commented May 7, 2015

rhunwicks commented May 7, 2015

rhunwicks commented May 7, 2015

rhunwicks commented May 7, 2015

int-ua commented Jul 9, 2015

Better Error Reporting #259

Better Error Reporting #259

Comments

rhunwicks commented May 6, 2015

bmihelac commented May 6, 2015

rhunwicks commented May 7, 2015

rhunwicks commented May 7, 2015

rhunwicks commented May 7, 2015

rhunwicks commented May 7, 2015

int-ua commented Jul 9, 2015