### Geocoder Application Sketch

This notebook tests the code for a geocoder web application deployed using Flask. The app takes a csv with an address column and adds two extra columns to the csv with the longitude and latitudes respectively. 

Two columns from an existing csv are combined into a single 'Address' column and saved as a separate csv file 'forGeocoding.csv' which is used to test the app. 

In [205]:
import pandas as pd
from geopy.geocoders import ArcGIS

In [206]:
# Load random csv file with address columns
df=pd.read_csv("scrapedPages.csv")

In [207]:
# create geocoder instance and test geocoding service
geoLocator=ArcGIS()
location=geoLocator.geocode("1003 Winchester Blvd. Rock Springs, WY 82901")
location.latitude

41.58501000000007

In [208]:
# create a new column which combines the address and locality of 'scrapedPages.csv' and remove irrelevant columns
def getAddress(add,local):
    return add+" "+local

df["Address"]=[getAddress(add,local) for add,local in zip(df["Address"],df["Locality"])]
df2=df.drop(columns=["Locality","Lot Size","Half Baths","Unnamed: 0", "Full Baths"],axis=1)
df2.loc[0:4].to_csv("forGeocoding.csv",index=False) # save only first 5 rows to be used to test the app

##### Start the app sketch

In [209]:
# read the newly generated csv
df2=pd.read_csv("forGeocoding.csv")

# function to geocode addresses supplied
def getLocation(address):
    geoLocator=ArcGIS()
    location=geoLocator.geocode(address)
    return location

# Test the function
example_address=df2.loc[1,'Address'] # loc method specifies row index and col
getLocation(example_address)

Location(82901, Rock Springs, Wyoming, (41.58501000000007, -109.21828999999997, 0.0))

In [210]:
# create two new columns in dataframe and use getLocation func to populate them via list comprehensions
df2["Latitude"]=[getLocation(address).latitude for address in df2["Address"]]
df2["Longitude"]=[getLocation(address).longitude for address in df2["Address"]]
print(df2)

      Price                                       Address  Beds   Area  \
0  $725,000              0 Gateway Rock Springs, WY 82901   NaN    NaN   
1  $452,900  1003 Winchester Blvd. Rock Springs, WY 82901   4.0    NaN   
2  $396,900          600 Talladega Rock Springs, WY 82901   5.0  3,154   
3  $389,900     3239 Spearhead Way Rock Springs, WY 82901   4.0  3,076   
4  $254,000     522 Emerald Street Rock Springs, WY 82901   3.0  1,172   

    Latitude   Longitude  
0  41.584308 -109.248052  
1  41.585010 -109.218290  
2  41.594230 -109.271630  
3  41.591470 -109.265400  
4  41.583430 -109.204740  


In [216]:
# for deployment - to check that one of the columns of the uploaded file is named as either 'address' or 'Address'
def checkFileColumns(df):
    if "address" in df.columns or "address".title() in df.columns:
        return True 
    else:
        return False

# Test if it works for column with 'addresss'
df3=df2.rename(columns={"Address":"address"})
checkFileColumns(df3)

True

In [218]:
# create file without address column for testing error handling when user uploads file without address column
df4=df3.drop(columns="address",axis=1)
df4.loc[0:4].to_csv("noAddress.csv",index=False) # send only first 5 rows of the df without an address column to csv

# Test
noaddress_df=pd.read_csv("noAddress.csv")
checkFileColumns(noaddress_df)

False