# Fuction Assignment

## Summary of Geocoding Function (Planned Enhancements)

The current function reads a CSV file of vendor addresses and names, geocodes them with OSMnx, and saves the results to a CSV while avoiding duplicates. Planned improvements include:

- Adding a counter to calculate the percentage of addresses successfully geocoded.
- Incorporating a progress tracker to show geocoding progress.
- Using regex to clean addresses by removing apartment numbers, suite numbers, or other unusual address parts before geocoding.

These enhancements will make the function more robust and user-friendly for larger datasets.



In [9]:
import os
import pandas as pd
import osmnx as ox

def geocode_addresses(file_path, address_col='Address', name_col='Vendor', output_folder='data', output_file='geocoded_addresses.csv'):
    
    # Reads the CSV file
    df = pd.read_csv(file_path)
    
    # Make sure output folder exists
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)
    
    output_path = os.path.join(output_folder, output_file)
    
    # If the file already exists, read it to avoid duplicates
    if os.path.exists(output_path):
        geocoded_df = pd.read_csv(output_path)
    else:
        geocoded_df = pd.DataFrame(columns=[name_col, address_col, 'lat', 'lon'])
    
    # Loop through each row in the input file
    for index, row in df.iterrows():
        vendor = row[name_col]
        address = row[address_col]
        
        # Skip if this address already exists in the output
        if address in geocoded_df[address_col].values:
            continue
        
        try:
            # Geocode the address
            lat, lon = ox.geocoder.geocode(address)
            
            # Add to geocoded DataFrame
            geocoded_df = pd.concat([geocoded_df, pd.DataFrame({
                name_col: [vendor],
                address_col: [address],
                'lat': [lat],
                'lon': [lon]
            })], ignore_index=True)
            
            print(f"Geocoded: {address} -> ({lat}, {lon})") #used f-string to condense concatenation
        
        except:
            print(f"Could not geocode: {address}")
    
    # Saves results
    geocoded_df.to_csv(output_path, index=False)
    print(f"Geocoded data saved to: {output_path}")


### Example Usage

To run the geocoding function, you would call it like this:

```python
geocode_addresses('my_vendors.csv', address_col='Vendor_Address', name_col='Vendor_Name')

In [5]:
geocode_addresses('testData.csv', address_col='Address', name_col='Vendor Name')


  geocoded_df = pd.concat([geocoded_df, pd.DataFrame({


Geocoded: 9956  BALDWIN PL., EL MONTE, CA, 91731 -> (34.0710498, -118.0526633)
Geocoded: 9950  JEFFERSON BLVD , CULVER CITY, CA, 90232 -> (34.0111864, -118.3897509)
Geocoded: 9911  GIDLEY STREET, EL MONTE, CA, 91731 -> (34.0855759, -118.0514925)
Geocoded: 9900  BELL RANCH DR , SANTA FE SPRINGS, CA, 90670 -> (33.9499496, -118.0683831)
Geocoded: 9854  NATIONAL BLVD , LOS ANGELES, CA, 90034 -> (34.0313027, -118.4001463)
Geocoded: 9834  NORWALK BLVD, SANTA FE SPRINGS, CA, 90670 -> (33.9485997, -118.0725808)
Could not geocode: 9812  ALBURTIS AVE, SANTA FE SPGS, CA, 90670
Geocoded: 981  S WESTERN AVE, LOS ANGELES, CA, 90006 -> (34.0531823, -118.3091539)
Could not geocode: 9760  JERSEY AVE , SANTA FE SPGS, CA, 90670
Geocoded: 9734  VARIEL AVE, CHATSWORTH, CA, 91311 -> (34.2470554, -118.592825)
Geocoded: 9714  ARTESIA BLVD, BELLFLOWER, CA, 90706 -> (33.874741, -118.1269573)
Could not geocode: 9702  E. RUSH STREET, S. EL MONTE, CA, 91733
Geocoded: 970  S VILLAGE OAKS DRIVE , COVINA, CA, 91724 -

In [6]:
geocode_addresses('testData_2.csv', address_col='Address', name_col='Vendor Name')

Could not geocode: 9812  ALBURTIS AVE, SANTA FE SPGS, CA, 90670
Could not geocode: 9760  JERSEY AVE , SANTA FE SPGS, CA, 90670
Could not geocode: 9702  E. RUSH STREET, S. EL MONTE, CA, 91733
Could not geocode: 9624  HERMOSA AVE, RCH CUCAMONGA, CA, 91730
Could not geocode: 941  W. 190TH ST., GARDENA, CA, 90248
Could not geocode: 9362  PARKSTONE CIRCLE, ROSEVILLE, CA, 95747
Could not geocode: 9362  PARKSTONE CIR, ROSEVILLE, CA, 95747
Could not geocode: 926  N WILMINGTON BL , WILMINGTON, CA, 90744
Could not geocode: 9253  1/2 CEDROS AVE, PANORAMA CITY, CA, 91402
Could not geocode: 9253  1/2 CEDROS AVE, PANORAMA CITY, CA, 91402
Could not geocode: 903  CALLE AMANACER , SAN CLEMENTE, CA, 92673
Geocoded: 8908  BALBOA BLVD, NORTHRIDGE, CA, 91325 -> (34.2324363, -118.502465)
Geocoded: 889  N DOUGLAS STREET, EL SEGUNDO, CA, 90245 -> (33.9285381, -118.3835711)
Geocoded: 889  N DOUGLAS ST, EL SEGUNDO, CA, 90245 -> (33.9285381, -118.3835711)
Geocoded: 8885  RESEARCH DR, IRVINE, CA, 92618 -> (33.643