Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv format #2

Open
jimpriest opened this issue Aug 10, 2015 · 3 comments
Open

csv format #2

jimpriest opened this issue Aug 10, 2015 · 3 comments

Comments

@jimpriest
Copy link

It looks like the original author had ideas for other output formats other than plain text. I see HTML as one format in the code.

I was curious how hard it would be to add CSV? It appears I could copy _write_plain_text_report in reporter.py and tweak?

I'm tinkering with the code now and if I come up with anything will send it back.

@bartdag
Copy link
Owner

bartdag commented Aug 10, 2015

Hi, yes, CSV would make a lot of sense, thanks!

A few guidelines if you wish to contribute to this one (otherwise I could write it, but not this week):

  1. use the python csv module (it will handle automatic quoting gracefully)
  2. if the output format is csv, the program should output csv to all output (email, console, file) to keep things simple.
  3. it would be nice to create an attachment to the email, but that is something I could add later on (e.g., for now, the csv could be just in the body of the email)

If you have other thoughts or questions, do not hesitate to post them. Thanks!

@jimpriest
Copy link
Author

I'm hacking on the CSV output but being new to Python not sure about a few things.

I basically copied _write_plain_text_report to _write_csv_report and have been modifying from there...

I'm not sure how best to integrate with 'output_files'. I can hack up something with oprint:

    oprint("STATUS,PAGE,PARENT",files=output_files)

    for page in pages.values():
        oprint("{0},{1},".format(
             page.get_status_message(), page.url_split.geturl()),
             files=output_files)
        for source in page.sources:
             oprint(",,{0}".format(source.origin.geturl()), files=output_files)

But as you mentioned we should probably use the CSV module:

f  = open('/home/jpriest/wwwroot/pylinkvalidator3/pylinkvalidator/bin/test.csv', "wb")
    writer = csv.writer(f)
    writer.writerow( ('Status', 'Page', 'Parent Page') )
    for page in pages.values():
        writer.writerow( (page.get_status_message(), page.url_split.geturl()) )
        for source in page.sources:
            writer.writerow(('','',source.origin.geturl()))
    f.close()

But it seems like I should integrate with output_file(s) somehow as that is used everywhere else.

Can you offer some guidance on how I might proceed? :)

@bartdag
Copy link
Owner

bartdag commented Aug 26, 2015

Hi Jim, here are a few stubs to get you started. Your last snippet looks promising:

# in report(...)
if config.options.format == FORMAT_PLAIN:
    _write_plain_text_report(site, config, output_files, total_time)
elif config.options.format == FORMAT_CSV:
    _write_csv_report(site, config, output_files, total_time)

def _write_csv_report(...):
    csv_writers = [csv.writer(output_file) for output_file in output_files]

    # maybe write a first row/header here

    for page in pages.values():
        # here we just output the result of each url
        # if we want to include the source of each url, i guess we would 
        # need to repeat the result and the original url on each row as if we had 
        # denormalized/joined multiple database tables
        writerow([page.get_status_message(), page.url_split.geturl()], writers=csv_writers)    

def writerow(row, writers):
    for writer in writers:
        writer.writerow(row)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants