csv format #2

jimpriest · 2015-08-10T17:51:41Z

It looks like the original author had ideas for other output formats other than plain text. I see HTML as one format in the code.

I was curious how hard it would be to add CSV? It appears I could copy _write_plain_text_report in reporter.py and tweak?

I'm tinkering with the code now and if I come up with anything will send it back.

bartdag · 2015-08-10T18:37:46Z

Hi, yes, CSV would make a lot of sense, thanks!

A few guidelines if you wish to contribute to this one (otherwise I could write it, but not this week):

use the python csv module (it will handle automatic quoting gracefully)
if the output format is csv, the program should output csv to all output (email, console, file) to keep things simple.
it would be nice to create an attachment to the email, but that is something I could add later on (e.g., for now, the csv could be just in the body of the email)

If you have other thoughts or questions, do not hesitate to post them. Thanks!

jimpriest · 2015-08-25T15:05:24Z

I'm hacking on the CSV output but being new to Python not sure about a few things.

I basically copied _write_plain_text_report to _write_csv_report and have been modifying from there...

I'm not sure how best to integrate with 'output_files'. I can hack up something with oprint:

    oprint("STATUS,PAGE,PARENT",files=output_files)

    for page in pages.values():
        oprint("{0},{1},".format(
             page.get_status_message(), page.url_split.geturl()),
             files=output_files)
        for source in page.sources:
             oprint(",,{0}".format(source.origin.geturl()), files=output_files)

But as you mentioned we should probably use the CSV module:

f  = open('/home/jpriest/wwwroot/pylinkvalidator3/pylinkvalidator/bin/test.csv', "wb")
    writer = csv.writer(f)
    writer.writerow( ('Status', 'Page', 'Parent Page') )
    for page in pages.values():
        writer.writerow( (page.get_status_message(), page.url_split.geturl()) )
        for source in page.sources:
            writer.writerow(('','',source.origin.geturl()))
    f.close()

But it seems like I should integrate with output_file(s) somehow as that is used everywhere else.

Can you offer some guidance on how I might proceed? :)

bartdag · 2015-08-26T09:51:52Z

Hi Jim, here are a few stubs to get you started. Your last snippet looks promising:

# in report(...)
if config.options.format == FORMAT_PLAIN:
    _write_plain_text_report(site, config, output_files, total_time)
elif config.options.format == FORMAT_CSV:
    _write_csv_report(site, config, output_files, total_time)

def _write_csv_report(...):
    csv_writers = [csv.writer(output_file) for output_file in output_files]

    # maybe write a first row/header here

    for page in pages.values():
        # here we just output the result of each url
        # if we want to include the source of each url, i guess we would 
        # need to repeat the result and the original url on each row as if we had 
        # denormalized/joined multiple database tables
        writerow([page.get_status_message(), page.url_split.geturl()], writers=csv_writers)    

def writerow(row, writers):
    for writer in writers:
        writer.writerow(row)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csv format #2

csv format #2

jimpriest commented Aug 10, 2015

bartdag commented Aug 10, 2015

jimpriest commented Aug 25, 2015

bartdag commented Aug 26, 2015

csv format #2

csv format #2

Comments

jimpriest commented Aug 10, 2015

bartdag commented Aug 10, 2015

jimpriest commented Aug 25, 2015

bartdag commented Aug 26, 2015