Load the data

In [2]:
import pandas as pd

df = pd.read_csv("../Data/haunted_places_v1.tsv", sep="\t")

check the number of na values for each columns. Here i noticed that every place has "city", "state". Some of them don't have "location" which represents location name, some of them don't have "county"

**i decided not to use extract the geo entities from description considering that GeoTopicParser doesn't know which part of input string (exact location name or geo entities extracted from description using spaCy) is more important. Generally, the exact location name should be prioritized if there is city name and state. **

In [4]:
na_counts = df.isna().sum()
na_counts

Unnamed: 0                                      0
city                                            0
country                                         0
description                                     0
location                                        3
state                                           0
state_abbrev                                    0
longitude                                       0
latitude                                        0
city_longitude                                 11
city_latitude                                  11
audio evidence                                  0
image/video/visual evidence                     0
haunted places date                             0
haunted places witness count                    0
time of day                                     0
apparition type                                 0
event type                                      0
binge drinking rate (%)                        16
median drinks per binge (overall)              16


Define a function to build structured string input for GeoTopicParser for each row

In [82]:
def build_geo_query(row):

    parts = []
    if pd.notna(row.get("location")): parts.append(row["location"])
    if pd.notna(row.get("city")): parts.append(row["city"])
    if pd.notna(row.get("county")): parts.append(row["county"] + " County")
    if pd.notna(row.get("state")): parts.append(row["state"])
    return ", ".join(parts)

define the request to the server and check the requested output

before the request, make sure that server has been configured and started

In [98]:
import requests

def geolocate(place):
    r = requests.get("http://localhost:8765/api/search", params={"s": place, "c": 1})
    print(r.url)
    return r.json()

print(geolocate("Los Angeles"))   #Test it with single string

http://localhost:8765/api/search?s=Los+Angeles&c=1
{'Los Angeles': [{'name': 'Los Angeles', 'countryCode': 'US', 'admin1Code': 'CA', 'admin2Code': '037', 'latitude': 34.05223, 'longitude': -118.24368}]}


It worked!!

run the request for each row, if it cannot return the expected output just use the original lat/long in the dataset

In [100]:
# Build geo queries for all rows
df["geo_query"] = df.apply(build_geo_query, axis=1)

lats = []
lons = []

# Loop through each row and apply geolocate
for i, row in df.iterrows():
    query = row["geo_query"]

    if pd.isna(query) or query.strip() == "":
        # Fallback to existing lat/lon
        lat = row.get("latitude")
        lon = row.get("longitude")
    else:
        try:
            result = geolocate(query)
            if result and query in result:
                loc = result[query][0]
                lat = loc["latitude"]
                lon = loc["longitude"]
            else:
                lat = row.get("latitude")
                lon = row.get("longitude")
        except Exception as e:
            print(f"Geolocate error at row {i} for query '{query}': {e}")
            lat = row.get("latitude")
            lon = row.get("longitude")

    lats.append(lat)
    lons.append(lon)

# Save to new columns
df["lat"] = lats
df["lon"] = lons


http://localhost:8765/api/search?s=Ada+Cemetery%2C+Ada%2C+Kent+County%2C+Michigan&c=1
http://localhost:8765/api/search?s=North+Adams+Rd.%2C+Addison%2C+Hillsdale+County%2C+Michigan&c=1
http://localhost:8765/api/search?s=Ghost+Trestle%2C+Adrian%2C+Lenawee+County%2C+Michigan&c=1
http://localhost:8765/api/search?s=Siena+Heights+University%2C+Adrian%2C+Lenawee+County%2C+Michigan&c=1
http://localhost:8765/api/search?s=Albion+College%2C+Albion%2C+Calhoun+County%2C+Michigan&c=1
http://localhost:8765/api/search?s=Riverside+Cemetery%2C+Albion%2C+Calhoun+County%2C+Michigan&c=1
http://localhost:8765/api/search?s=Hell%27s+Bridge%2C+Algoma+Township%2C+Kent+County%2C+Michigan&c=1
http://localhost:8765/api/search?s=Morrow+Road%2C+Algonac%2C+St.+Clair+County%2C+Michigan&c=1
http://localhost:8765/api/search?s=Elks+Lodge%2C+Allegan%2C+Allegan+County%2C+Michigan&c=1
http://localhost:8765/api/search?s=The+Grill+House+and+the+Rock+Bottom+Bar%2C+Allegan%2C+Allegan+County%2C+Michigan&c=1
http://localhost:8765

In [128]:
df.head()

Unnamed: 0.1,Unnamed: 0,city,country,description,location,state,state_abbrev,longitude,latitude,city_longitude,...,crime_rate_per_100000,MURDER,ROBBERY,BURGLRY,nearest_historical_place,num_historical_places_5mi,year_of_nearest_historical_place,geo_query,lat,lon
0,0,Ada,United States,Ada witch - Sometimes you can see a misty blue...,Ada Cemetery,Michigan,MI,-85.504893,42.962106,-85.49548,...,395.689239,22.0,639.0,3878.0,Ada Covered Bridge,2,1970.0,"Ada Cemetery, Ada, Kent County, Michigan",42.962106,-85.504893
1,1,Addison,United States,A little girl was killed suddenly while waitin...,North Adams Rd.,Michigan,MI,-84.381843,41.971425,-84.347168,...,190.88523,0.0,2.0,180.0,,0,,"North Adams Rd., Addison, Hillsdale County, Mi...",41.971425,-84.381843
2,2,Adrian,United States,If you take Gorman Rd. west towards Sand Creek...,Ghost Trestle,Michigan,MI,-84.035656,41.904538,-84.037166,...,205.670041,4.0,22.0,335.0,Downtown Adrian Commercial Historic District,3,1986.0,"Ghost Trestle, Adrian, Lenawee County, Michigan",41.904538,-84.035656
3,3,Adrian,United States,"In the 1970's, one room, room 211, in the old ...",Siena Heights University,Michigan,MI,-84.017565,41.905712,-84.037166,...,205.670041,4.0,22.0,335.0,Civil War Memorial,3,1972.0,"Siena Heights University, Adrian, Lenawee Coun...",41.905712,-84.017565
4,4,Albion,United States,Kappa Delta Sorority - The Kappa Delta Sororit...,Albion College,Michigan,MI,-84.745177,42.244006,-84.75303,...,602.168696,11.0,112.0,1405.0,Superior Street Commercial Historic District,1,1997.0,"Albion College, Albion, Calhoun County, Michigan",42.244006,-84.745177


In [132]:
df.to_csv("../Data/haunted_places_geoparsed.csv", index=False)