Exporting to CSV after python parse #105

BlvdJoe · 2015-08-06T15:14:30Z

First of all - Love this tremendously. Second, I've been trying to use the usaddress.py + csv.py + pandas to import from a csv, parse the rows, then export the rows back out into a csv.

However, when I run the code listed below, it (1) doesn't parse the column headers correctly and (2) doesn't then parse the fields correctly. This approached was recommended to me over on StackOverflow, so please let me know if (1) This approach is silly and/or (2) I'm merely suffering from noob-itis as I am, in fact, terribly new to python.

import pandas as pd
import usaddress
from pandas import DataFrame

data = ['Robie House, 5757 South Woodlawn Avenue, Chicago, IL 60637',
    'State & Lake, Chicago']

tagged_addresses = [usaddress.parse(line) for line in data]

address_df = pd.DataFrame(tagged_addresses)

address_df.to_csv('moooutput.csv')

The text was updated successfully, but these errors were encountered:

cathydeng · 2015-08-06T17:36:47Z

hey @BlvdJoe!

for what you're trying to do, the tag method would make more sense than the parse method. also, great timing with this question because I just wrote some code to do this task yesterday:

import csvkit
import usaddress

# expected format in input.csv: first column 'id', second column 'address'
with open('input.csv', 'rU') as f:
    reader = csvkit.DictReader(f)

    all_rows = []
    for row in reader:
        try:
            parsed_addr = usaddress.tag(row['address'])
            row_dict = parsed_addr[0]
        except:
            row_dict = {'error':'True'}

        row_dict['id'] = row['id']
        all_rows.append(row_dict)

field_list = ['id','AddressNumber', 'AddressNumberPrefix', 'AddressNumberSuffix', 'BuildingName', 
              'CornerOf','IntersectionSeparator','LandmarkName','NotAddress','OccupancyType',
              'OccupancyIdentifier','PlaceName','Recipient','StateName','StreetName',
              'StreetNamePreDirectional','StreetNamePreModifier','StreetNamePreType',
              'StreetNamePostDirectional','StreetNamePostModifier','StreetNamePostType',
              'SubaddressIdentifier','SubaddressType','USPSBoxGroupID','USPSBoxGroupType',
              'USPSBoxID','USPSBoxType','ZipCode', 'error']

with open('output.csv', 'wb') as outfile:
    writer = csvkit.DictWriter(outfile, field_list)
    writer.writeheader()
    writer.writerows(all_rows)

I just used csvkit to read from & write to csv. for the output, I also added an error column to record when addresses encounter a RepeatedLabelError.

BlvdJoe · 2015-08-07T14:35:44Z

@cathydeng Thank you!! I deleted my previous comment as I figured out what was causing my previous error. You're great!

cathydeng closed this as completed Aug 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exporting to CSV after python parse #105

Exporting to CSV after python parse #105

BlvdJoe commented Aug 6, 2015

cathydeng commented Aug 6, 2015

BlvdJoe commented Aug 7, 2015

Exporting to CSV after python parse #105

Exporting to CSV after python parse #105

Comments

BlvdJoe commented Aug 6, 2015

cathydeng commented Aug 6, 2015

BlvdJoe commented Aug 7, 2015