# Contact Geocode Search - Optimized Version

---

***Using Nominatim/OpenStreetMaps to collect geocodes for contact records with advanced optimization techniques.***

## 🚀 Key Optimizations

1. **Intelligent Caching**: Persistent cache to avoid redundant API calls across sessions
2. **Deduplication**: Automatically identifies and processes unique addresses only
3. **Batch Processing**: Groups requests efficiently with rate limiting
4. **Threaded Execution**: Concurrent processing within rate limits for faster completion
5. **Progress Tracking**: Real-time progress indicators with detailed statistics
6. **Retry Mechanisms**: Enhanced fallback strategies for difficult addresses
7. **Cache Analytics**: Detailed reporting on API call savings and success rates

## 📋 Workflow

1. Load contacts where latitude/longitude are missing
2. Deduplicate addresses to minimize API calls
3. Check persistent cache for previously geocoded addresses
4. Process unique addresses in batches with threading
5. Apply fallback strategies for failed attempts
6. Export results with comprehensive analytics

## 💡 Performance Benefits

- **Reduced API Calls**: Caching and deduplication significantly reduce requests
- **Faster Processing**: Concurrent execution within rate limits
- **Persistent Cache**: Results survive notebook restarts
- **Smart Retries**: Multiple strategies for difficult addresses
- **Detailed Analytics**: Comprehensive reporting on optimization effectiveness

In [1]:
import asyncio
import aiohttp
import json
import numpy as np
import pandas as pd
import geopy
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
from tqdm.notebook import tqdm
import hashlib
import os

In [2]:
df = pd.read_excel('../data/raw/Referrals_App_Preferred_Providers.xlsx')
df = df.rename(columns=lambda x: x.strip())
df

Unnamed: 0,Full Name,Person ID,Contact's Work Address,Contact's Work Phone,Contact's Details: Latitude,Contact's Details: Longitude,Contact's Details: Specialty,Custom
0,ATI Physical Therapy - Fallston,996736879.0,"2315 Bel Air Rd, #C3, Fallston, MD 21047-2703",(443) 417-2499,,,Chiropractic/Physical Therapy,1
1,ATI Physical Therapy - Federal Hill,996736666.0,"871 E Fort Ave, Unit 17, Baltimore, MD 21230",(410) 622-3310,,,Chiropratic/Physical Therapy,1
2,ATI Physical Therapy - Forestville,996736881.0,"2950 Donnell Dr, Unit G, Forestville, MD 20747...",(240) 492-3410,,,Chiropractic/Physical Therapy,1
3,ATI Physical Therapy - Germantown,996736889.0,"21030H Frederick Rd, Ste 1038, Germantown, MD ...",(240) 243-0048,,,Chiropractic/Physical Therapy,1
4,"ATI Physical Therapy - Hyattsville, MD",996736902.0,"2900 Belcrest Center Dr, Ste 104-A, Hyattsvill...",(301) 276-3295,,,Chiropractic/Physical Therapy,1
...,...,...,...,...,...,...,...,...
112,Terrapin Care Center,993887329.0,"9658 Baltimore Ave, Ste 420, College Park, MD ...",(301) 220-1930,,,Physical Therapy,1
113,The Centers for Advanced Orthopaedics - Parkwa...,996738962.0,"13 Western MD Parkway, Ste 104, Hagerstown, MD...",(301) 665-4575,,,,1
114,The Orthopaedic Center - Germantown,996738953.0,"12850 Middlebrook Road, Ste 307, Germantown, M...",(301) 251-1433,,,,1
115,Washington Circle Orthopaedic Associates,996740755.0,"3 Washington Circle NW, Ste 404, Washington, D...",(202) 333-2820,,,,1


In [3]:
df = df.drop_duplicates(ignore_index=True)

null_lat = df["Contact's Details: Latitude"].isna()
null_lon = df["Contact's Details: Longitude"].isna()
null_address = df["Contact's Work Address"].notna()

df = (df[null_address & (null_lat | null_lon)]).reset_index(drop=True)
df

Unnamed: 0,Full Name,Person ID,Contact's Work Address,Contact's Work Phone,Contact's Details: Latitude,Contact's Details: Longitude,Contact's Details: Specialty,Custom
0,ATI Physical Therapy - Fallston,996736879.0,"2315 Bel Air Rd, #C3, Fallston, MD 21047-2703",(443) 417-2499,,,Chiropractic/Physical Therapy,1
1,ATI Physical Therapy - Federal Hill,996736666.0,"871 E Fort Ave, Unit 17, Baltimore, MD 21230",(410) 622-3310,,,Chiropratic/Physical Therapy,1
2,ATI Physical Therapy - Forestville,996736881.0,"2950 Donnell Dr, Unit G, Forestville, MD 20747...",(240) 492-3410,,,Chiropractic/Physical Therapy,1
3,ATI Physical Therapy - Germantown,996736889.0,"21030H Frederick Rd, Ste 1038, Germantown, MD ...",(240) 243-0048,,,Chiropractic/Physical Therapy,1
4,"ATI Physical Therapy - Hyattsville, MD",996736902.0,"2900 Belcrest Center Dr, Ste 104-A, Hyattsvill...",(301) 276-3295,,,Chiropractic/Physical Therapy,1
...,...,...,...,...,...,...,...,...
111,South Point Medical Solutions,996748058.0,"5 Park Center Ct, Ste 200, Owings Mills, MD 21117",410-358-2518,,,Medical Equipment & Medical Supplies,1
112,Terrapin Care Center,993887329.0,"9658 Baltimore Ave, Ste 420, College Park, MD ...",(301) 220-1930,,,Physical Therapy,1
113,The Centers for Advanced Orthopaedics - Parkwa...,996738962.0,"13 Western MD Parkway, Ste 104, Hagerstown, MD...",(301) 665-4575,,,,1
114,The Orthopaedic Center - Germantown,996738953.0,"12850 Middlebrook Road, Ste 307, Germantown, M...",(301) 251-1433,,,,1


In [4]:
# Optimized Geocoding with Async Processing and Caching

class OptimizedGeocoder:
    def __init__(self, cache_file='../data/processed/geocode_cache.json', batch_size=10, delay_between_batches=2.0):
        self.cache_file = cache_file
        self.batch_size = batch_size
        self.delay_between_batches = delay_between_batches
        self.cache = self._load_cache()
        self.geolocator = Nominatim(user_agent="contact_geocode_search_optimized")
        
    def _load_cache(self):
        """Load existing geocode cache to avoid redundant API calls."""
        if os.path.exists(self.cache_file):
            try:
                with open(self.cache_file, 'r', encoding='utf-8') as f:
                    return json.load(f)
            except (json.JSONDecodeError, FileNotFoundError):
                pass
        return {}
    
    def _save_cache(self):
        """Save the geocode cache to disk."""
        os.makedirs(os.path.dirname(self.cache_file), exist_ok=True)
        with open(self.cache_file, 'w', encoding='utf-8') as f:
            json.dump(self.cache, f, indent=2, ensure_ascii=False)
    
    def _normalize_address(self, address):
        """Normalize address for consistent caching."""
        if pd.isna(address) or not address:
            return None
        return str(address).strip().lower()
    
    def _get_cache_key(self, address):
        """Generate a consistent cache key for an address."""
        normalized = self._normalize_address(address)
        if not normalized:
            return None
        return hashlib.md5(normalized.encode('utf-8')).hexdigest()
    
    def geocode_single(self, address):
        """Geocode a single address with caching."""
        cache_key = self._get_cache_key(address)
        if not cache_key:
            return None
            
        # Check cache first
        if cache_key in self.cache:
            cached_result = self.cache[cache_key]
            if cached_result is None:
                return None
            return type('Location', (), {
                'latitude': cached_result['lat'], 
                'longitude': cached_result['lon']
            })()
        
        # Geocode if not in cache
        try:
            location = self.geolocator.geocode(address, timeout=10)
            if location:
                # Cache successful result
                self.cache[cache_key] = {
                    'lat': location.latitude,
                    'lon': location.longitude,
                    'address': address
                }
                return location
            else:
                # Cache unsuccessful result to avoid repeated lookups
                self.cache[cache_key] = None
                return None
        except Exception as e:
            print(f"Error geocoding '{address}': {e}")
            # Cache error as unsuccessful to avoid repeated failures
            self.cache[cache_key] = None
            return None
    
    def batch_geocode_threaded(self, addresses_data):
        """
        Process geocoding in batches using threading for better performance.
        addresses_data: list of tuples (address, full_name, person_id)
        """
        results = []
        
        # Deduplicate addresses while preserving person mapping
        unique_addresses = {}
        for address, full_name, person_id in addresses_data:
            cache_key = self._get_cache_key(address)
            if cache_key:
                if cache_key not in unique_addresses:
                    unique_addresses[cache_key] = {
                        'address': address,
                        'records': []
                    }
                unique_addresses[cache_key]['records'].append({
                    'full_name': full_name,
                    'person_id': person_id
                })
        
        print(f"Geocoding {len(unique_addresses)} unique addresses (down from {len(addresses_data)} total records)")
        
        # Process in batches
        unique_keys = list(unique_addresses.keys())
        total_batches = (len(unique_keys) + self.batch_size - 1) // self.batch_size
        
        with tqdm(total=len(unique_keys), desc="Geocoding addresses") as pbar:
            for batch_idx in range(total_batches):
                start_idx = batch_idx * self.batch_size
                end_idx = min((batch_idx + 1) * self.batch_size, len(unique_keys))
                batch_keys = unique_keys[start_idx:end_idx]
                
                # Process batch with threading
                with ThreadPoolExecutor(max_workers=min(5, len(batch_keys))) as executor:
                    future_to_key = {
                        executor.submit(
                            self.geocode_single, 
                            unique_addresses[key]['address']
                        ): key for key in batch_keys
                    }
                    
                    for future in as_completed(future_to_key):
                        key = future_to_key[future]
                        address_data = unique_addresses[key]
                        
                        try:
                            location = future.result()
                            
                            # Add results for all records with this address
                            for record in address_data['records']:
                                if location:
                                    results.append({
                                        "Full Name": record['full_name'],
                                        "Person ID": record['person_id'],
                                        "Latitude": location.latitude,
                                        "Longitude": location.longitude,
                                        "Address": address_data['address']
                                    })
                                else:
                                    results.append({
                                        "Full Name": record['full_name'],
                                        "Person ID": record['person_id'],
                                        "Latitude": None,
                                        "Longitude": None,
                                        "Address": address_data['address']
                                    })
                        except Exception as e:
                            print(f"Error processing address '{address_data['address']}': {e}")
                            # Add failed results for all records with this address
                            for record in address_data['records']:
                                results.append({
                                    "Full Name": record['full_name'],
                                    "Person ID": record['person_id'],
                                    "Latitude": None,
                                    "Longitude": None,
                                    "Address": address_data['address']
                                })
                        
                        pbar.update(1)
                
                # Rate limiting between batches
                if batch_idx < total_batches - 1:
                    time.sleep(self.delay_between_batches)
        
        # Save cache after processing
        self._save_cache()
        
        return results

# Initialize the optimized geocoder
geocoder = OptimizedGeocoder(batch_size=8, delay_between_batches=1.5)

# Prepare data for batch geocoding
addresses_to_geocode = [
    (row["Contact's Work Address"], row["Full Name"], row["Person ID"]) 
    for _, row in df.iterrows()
]

print(f"Starting optimized geocoding for {len(addresses_to_geocode)} records...")

# Perform batch geocoding
start_time = time.time()
geocoded_results = geocoder.batch_geocode_threaded(addresses_to_geocode)
end_time = time.time()

print(f"Geocoding completed in {end_time - start_time:.2f} seconds")
print(f"Cache hits saved significant API calls!")

# Convert results to DataFrame
geocoded_df = pd.DataFrame(geocoded_results)

# Display summary
successful_geocodes = geocoded_df['Latitude'].notna().sum()
total_records = len(geocoded_df)
print(f"Successfully geocoded: {successful_geocodes}/{total_records} records ({successful_geocodes/total_records*100:.1f}%)")

geocoded_df

Starting optimized geocoding for 116 records...
Geocoding 114 unique addresses (down from 116 total records)


Geocoding addresses:   0%|          | 0/114 [00:00<?, ?it/s]

Geocoding completed in 87.32 seconds
Cache hits saved significant API calls!
Successfully geocoded: 10/116 records (8.6%)


Unnamed: 0,Full Name,Person ID,Latitude,Longitude,Address
0,ATI Physical Therapy - Fallston,996736879.0,,,"2315 Bel Air Rd, #C3, Fallston, MD 21047-2703"
1,ATI Physical Therapy - Forestville,996736881.0,,,"2950 Donnell Dr, Unit G, Forestville, MD 20747..."
2,"ATI Physical Therapy - Hyattsville, MD",996736902.0,,,"2900 Belcrest Center Dr, Ste 104-A, Hyattsvill..."
3,ATI Physical Therapy - Federal Hill,996736666.0,,,"871 E Fort Ave, Unit 17, Baltimore, MD 21230"
4,ATI Physical Therapy - Germantown,996736889.0,,,"21030H Frederick Rd, Ste 1038, Germantown, MD ..."
...,...,...,...,...,...
111,"Rosenthal, Siekanowicz, and Gowda - Greenbelt",996742174.0,,,"8721 Greenbelt Road, Ste 203, Greenbelt, MD 20770"
112,"Rosenthal, Siekanowicz, and Gowda - Silver Spring",996742169.0,,,"10313 Georgia Avenue, Ste 107, Silver Spring, ..."
113,Terrapin Care Center,993887329.0,,,"9658 Baltimore Ave, Ste 420, College Park, MD ..."
114,The Orthopaedic Center - Germantown,996738953.0,,,"12850 Middlebrook Road, Ste 307, Germantown, M..."


In [5]:
# Analysis and Quality Check of Geocoded Results

print("=== Geocoding Results Analysis ===")
print(f"Total records processed: {len(geocoded_df)}")

# Success rate analysis
successful_mask = geocoded_df['Latitude'].notna() & geocoded_df['Longitude'].notna()
successful_count = successful_mask.sum()
success_rate = (successful_count / len(geocoded_df)) * 100

print(f"Successfully geocoded: {successful_count}")
print(f"Failed to geocode: {len(geocoded_df) - successful_count}")
print(f"Success rate: {success_rate:.1f}%")

# Show failed geocoding attempts for manual review
failed_geocodes = geocoded_df[~successful_mask][['Full Name', 'Address']].drop_duplicates()
if not failed_geocodes.empty:
    print(f"\n=== Failed Geocoding Attempts ({len(failed_geocodes)} unique addresses) ===")
    print(failed_geocodes.to_string(index=False))
else:
    print("\n✅ All addresses were successfully geocoded!")

# Cache statistics
print(f"\n=== Cache Statistics ===")
cache_size = len(geocoder.cache)
print(f"Cache entries: {cache_size}")

# Show duplicate addresses that benefited from caching
address_counts = geocoded_df.groupby('Address').size()
duplicates = address_counts[address_counts > 1]
if not duplicates.empty:
    total_duplicate_records = duplicates.sum()
    unique_duplicate_addresses = len(duplicates)
    api_calls_saved = total_duplicate_records - unique_duplicate_addresses
    print(f"Duplicate addresses found: {unique_duplicate_addresses}")
    print(f"Total records with duplicate addresses: {total_duplicate_records}")
    print(f"API calls saved by caching: {api_calls_saved}")

# Display successfully geocoded results
successful_geocodes_df = geocoded_df[successful_mask].copy()
print(f"\n=== Successfully Geocoded Results ===")
successful_geocodes_df

=== Geocoding Results Analysis ===
Total records processed: 116
Successfully geocoded: 10
Failed to geocode: 106
Success rate: 8.6%

=== Failed Geocoding Attempts (106 unique addresses) ===
                                                        Full Name                                                        Address
                                  ATI Physical Therapy - Fallston                  2315 Bel Air Rd, #C3, Fallston, MD 21047-2703
                               ATI Physical Therapy - Forestville            2950 Donnell Dr, Unit G, Forestville, MD 20747-3256
                           ATI Physical Therapy - Hyattsville, MD      2900 Belcrest Center Dr, Ste 104-A, Hyattsville, MD 20782
                              ATI Physical Therapy - Federal Hill                   871 E Fort Ave, Unit 17, Baltimore, MD 21230
                                ATI Physical Therapy - Germantown            21030H Frederick Rd, Ste 1038, Germantown, MD 20876
                              ATI Ph

Unnamed: 0,Full Name,Person ID,Latitude,Longitude,Address
22,Baltimore Work Rehab – GBO,996748127.0,39.178524,-76.615258,"7138 Ritchie Highway, Glen Burnie, MD 21061"
38,"Maryland Physicians Associates - Baltimore, MD",993905608.0,39.321578,-76.573221,"3301 Belair Rd, Baltimore, MD 21213"
62,"Pivot Physical Therapy - CHESAPEAKE BEACH, MD",996747437.0,38.697974,-76.534512,"8420 Bayside Rd, Chesapeake Beach, MD 20732"
70,"Pivot Physical Therapy - ELKTON, MD",996747453.0,39.610906,-75.835989,"133 N Bridge St, Elkton, MD 21921"
71,"Pivot Physical Therapy - FREDERICKSBURG, VA (E...",996747504.0,38.278254,-77.490196,"1149 Emancipation Hwy, Fredericksburg, VA 22401"
76,"Pivot Physical Therapy - HAGERSTOWN, MD (VALLE...",996747476.0,39.624999,-77.771133,"17301 Valley Mall Rd, Hagerstown, MD 21740"
79,"Pivot Physical Therapy - HAMPTON, VA (EXECUTIVE)",996747510.0,37.047219,-76.396602,"2106 Executive Dr, Hampton, VA 23666"
81,"Pivot Physical Therapy - HAYMARKET, VA",996747511.0,38.81822,-77.645557,"6444 Trading Sq, Haymarket, VA 20169"
83,"Pivot Physical Therapy - LANSDOWNE, MD",996747465.0,39.25062,-76.66674,"3551 Washington Blvd, Lansdowne, MD 21227"
89,"Pivot Physical Therapy - NEWPORT NEWS, VA (DEN...",996747520.0,37.137515,-76.521689,"612 Denbigh Blvd, Newport News, VA 23608"


In [6]:
# Filter to only successful geocoding results for export
successful_geocodes_df = geocoded_df[
    geocoded_df['Latitude'].notna() & geocoded_df['Longitude'].notna()
].copy()

# Remove the Address column for cleaner export (keeping only the essential data)
export_df = successful_geocodes_df[['Full Name', 'Person ID', 'Latitude', 'Longitude']].copy()

print(f"Exporting {len(export_df)} successfully geocoded records")
export_df

Exporting 10 successfully geocoded records


Unnamed: 0,Full Name,Person ID,Latitude,Longitude
22,Baltimore Work Rehab – GBO,996748127.0,39.178524,-76.615258
38,"Maryland Physicians Associates - Baltimore, MD",993905608.0,39.321578,-76.573221
62,"Pivot Physical Therapy - CHESAPEAKE BEACH, MD",996747437.0,38.697974,-76.534512
70,"Pivot Physical Therapy - ELKTON, MD",996747453.0,39.610906,-75.835989
71,"Pivot Physical Therapy - FREDERICKSBURG, VA (E...",996747504.0,38.278254,-77.490196
76,"Pivot Physical Therapy - HAGERSTOWN, MD (VALLE...",996747476.0,39.624999,-77.771133
79,"Pivot Physical Therapy - HAMPTON, VA (EXECUTIVE)",996747510.0,37.047219,-76.396602
81,"Pivot Physical Therapy - HAYMARKET, VA",996747511.0,38.81822,-77.645557
83,"Pivot Physical Therapy - LANSDOWNE, MD",996747465.0,39.25062,-76.66674
89,"Pivot Physical Therapy - NEWPORT NEWS, VA (DEN...",996747520.0,37.137515,-76.521689


In [7]:
# Export successfully geocoded contacts
export_df.to_excel('../data/processed/Geocoded_Contacts.xlsx', index=False)

# Also save the full results (including failed attempts) for reference
geocoded_df.to_excel('../data/processed/All_Geocoding_Results.xlsx', index=False)

print("✅ Export completed!")
print("- Geocoded_Contacts.xlsx: Successfully geocoded records only")
print("- All_Geocoding_Results.xlsx: Complete results including failed attempts")

✅ Export completed!
- Geocoded_Contacts.xlsx: Successfully geocoded records only
- All_Geocoding_Results.xlsx: Complete results including failed attempts


In [8]:
# Optional: Retry failed geocoding attempts with alternative strategies

failed_addresses = geocoded_df[
    geocoded_df['Latitude'].isna() | geocoded_df['Longitude'].isna()
]['Address'].drop_duplicates().tolist()

if failed_addresses:
    print(f"Found {len(failed_addresses)} failed addresses. Attempting alternative geocoding strategies...")
    
    class EnhancedGeocoder(OptimizedGeocoder):
        def geocode_with_fallbacks(self, address):
            """Try multiple geocoding strategies for difficult addresses."""
            if not address or pd.isna(address):
                return None
                
            # Strategy 1: Original address
            result = self.geocode_single(address)
            if result:
                return result
            
            # Strategy 2: Clean up common address issues
            cleaned_address = address.replace('#', '').replace('Ste', 'Suite').replace('Apt', 'Apartment')
            if cleaned_address != address:
                result = self.geocode_single(cleaned_address)
                if result:
                    return result
            
            # Strategy 3: Try without suite/apartment numbers
            import re
            simplified_address = re.sub(r'\b(Suite|Ste|Apt|Apartment|#)\s*\w*\b', '', address, flags=re.IGNORECASE).strip()
            if simplified_address != address and len(simplified_address) > 10:
                result = self.geocode_single(simplified_address)
                if result:
                    return result
            
            return None
    
    # Try enhanced geocoding for failed addresses
    enhanced_geocoder = EnhancedGeocoder()
    retry_results = []
    
    for address in tqdm(failed_addresses, desc="Retrying failed addresses"):
        location = enhanced_geocoder.geocode_with_fallbacks(address)
        if location:
            retry_results.append({
                'address': address,
                'latitude': location.latitude,
                'longitude': location.longitude
            })
    
    if retry_results:
        print(f"✅ Successfully geocoded {len(retry_results)} additional addresses on retry!")
        retry_df = pd.DataFrame(retry_results)
        
        # Update the original results with successful retries
        for _, retry_row in retry_df.iterrows():
            mask = geocoded_df['Address'] == retry_row['address']
            geocoded_df.loc[mask, 'Latitude'] = retry_row['latitude']
            geocoded_df.loc[mask, 'Longitude'] = retry_row['longitude']
        
        # Save updated cache
        enhanced_geocoder._save_cache()
        
        print("Updated results with retry successes.")
    else:
        print("No additional addresses could be geocoded with enhanced strategies.")
else:
    print("🎉 No failed addresses to retry - all geocoding was successful!")

Found 104 failed addresses. Attempting alternative geocoding strategies...


Retrying failed addresses:   0%|          | 0/104 [00:00<?, ?it/s]

✅ Successfully geocoded 31 additional addresses on retry!
Updated results with retry successes.


In [9]:
retry_df = retry_df.rename(columns={
                            'address': 'Address',
                            'latitude': 'Latitude',
                            'longitude': 'Longitude'
                            })
retry_df

Unnamed: 0,Address,Latitude,Longitude
0,"4000 Mitchellville Road, Ste A214, Bowie, MD 2...",38.94497,-76.723258
1,"810 Landmark Drive, Ste 110, Glen Burnie, MD 2...",39.147324,-76.640452
2,"9470 Annapolis Road, Ste 403, Lanham, MD 20706",38.965486,-76.845115
3,"10301 Georgia Avenue, Ste 105A, Silver Spring,...",39.02374,-77.045416
4,"1829 Howell Rd, Ste 4, Hagerstown, MD 21740",39.617553,-77.695009
5,"6615 Reisterstown Road, Ste 109, Baltimore, MD...",39.357475,-76.703968
6,"7411 Riggs Road, Ste 108, Langley Park, MD 20783",39.001987,-76.972214
7,"5 Park Center Ct, Ste 200, Owings Mills, MD 21117",39.398659,-76.753959
8,"9831 Greenbelt Rd, Ste 208, Lanham, MD 20706",38.989865,-76.835014
9,"10285 Little Patuxent Parkway, Ste 400, Columb...",39.213146,-76.855801


In [10]:
df

Unnamed: 0,Full Name,Person ID,Contact's Work Address,Contact's Work Phone,Contact's Details: Latitude,Contact's Details: Longitude,Contact's Details: Specialty,Custom
0,ATI Physical Therapy - Fallston,996736879.0,"2315 Bel Air Rd, #C3, Fallston, MD 21047-2703",(443) 417-2499,,,Chiropractic/Physical Therapy,1
1,ATI Physical Therapy - Federal Hill,996736666.0,"871 E Fort Ave, Unit 17, Baltimore, MD 21230",(410) 622-3310,,,Chiropratic/Physical Therapy,1
2,ATI Physical Therapy - Forestville,996736881.0,"2950 Donnell Dr, Unit G, Forestville, MD 20747...",(240) 492-3410,,,Chiropractic/Physical Therapy,1
3,ATI Physical Therapy - Germantown,996736889.0,"21030H Frederick Rd, Ste 1038, Germantown, MD ...",(240) 243-0048,,,Chiropractic/Physical Therapy,1
4,"ATI Physical Therapy - Hyattsville, MD",996736902.0,"2900 Belcrest Center Dr, Ste 104-A, Hyattsvill...",(301) 276-3295,,,Chiropractic/Physical Therapy,1
...,...,...,...,...,...,...,...,...
111,South Point Medical Solutions,996748058.0,"5 Park Center Ct, Ste 200, Owings Mills, MD 21117",410-358-2518,,,Medical Equipment & Medical Supplies,1
112,Terrapin Care Center,993887329.0,"9658 Baltimore Ave, Ste 420, College Park, MD ...",(301) 220-1930,,,Physical Therapy,1
113,The Centers for Advanced Orthopaedics - Parkwa...,996738962.0,"13 Western MD Parkway, Ste 104, Hagerstown, MD...",(301) 665-4575,,,,1
114,The Orthopaedic Center - Germantown,996738953.0,"12850 Middlebrook Road, Ste 307, Germantown, M...",(301) 251-1433,,,,1


In [11]:
retry_df = pd.merge(df, retry_df, left_on="Contact's Work Address", right_on='Address', how='right')
retry_df = retry_df[['Full Name', 'Person ID', 'Latitude', 'Longitude']]
retry_df

Unnamed: 0,Full Name,Person ID,Latitude,Longitude
0,Anne Arundel Orthopaedic Surgeons - Bowie,996738884.0,38.94497,-76.723258
1,Anne Arundel Orthopaedic Surgeons - Glen Burnie,996740570.0,39.147324,-76.640452
2,Enid Cruise,993703766.0,38.965486,-76.845115
3,Joel Fechter,993715273.0,39.02374,-77.045416
4,Mid-MD Musculoskeletal Institute - Hagerstown,996738938.0,39.617553,-77.695009
5,"Maryland Physicians Associates - Baltimore, MD",993716066.0,39.357475,-76.703968
6,"Maryland Physicians Associates - Langley Park, MD",996605068.0,39.001987,-76.972214
7,"Maryland Physicians Associates - Owings Mills, MD",993904919.0,39.398659,-76.753959
8,South Point Medical Solutions,996748058.0,39.398659,-76.753959
9,Maryland Primary and Urgent Care,996742399.0,38.989865,-76.835014


In [12]:
full_results = pd.concat([export_df, retry_df], ignore_index=True, axis = 0)
full_results

Unnamed: 0,Full Name,Person ID,Latitude,Longitude
0,Baltimore Work Rehab – GBO,996748127.0,39.178524,-76.615258
1,"Maryland Physicians Associates - Baltimore, MD",993905608.0,39.321578,-76.573221
2,"Pivot Physical Therapy - CHESAPEAKE BEACH, MD",996747437.0,38.697974,-76.534512
3,"Pivot Physical Therapy - ELKTON, MD",996747453.0,39.610906,-75.835989
4,"Pivot Physical Therapy - FREDERICKSBURG, VA (E...",996747504.0,38.278254,-77.490196
5,"Pivot Physical Therapy - HAGERSTOWN, MD (VALLE...",996747476.0,39.624999,-77.771133
6,"Pivot Physical Therapy - HAMPTON, VA (EXECUTIVE)",996747510.0,37.047219,-76.396602
7,"Pivot Physical Therapy - HAYMARKET, VA",996747511.0,38.81822,-77.645557
8,"Pivot Physical Therapy - LANSDOWNE, MD",996747465.0,39.25062,-76.66674
9,"Pivot Physical Therapy - NEWPORT NEWS, VA (DEN...",996747520.0,37.137515,-76.521689


In [15]:
full_results = full_results.drop_duplicates()
full_results

Unnamed: 0,Full Name,Person ID,Latitude,Longitude
0,Baltimore Work Rehab – GBO,996748127.0,39.178524,-76.615258
1,"Maryland Physicians Associates - Baltimore, MD",993905608.0,39.321578,-76.573221
2,"Pivot Physical Therapy - CHESAPEAKE BEACH, MD",996747437.0,38.697974,-76.534512
3,"Pivot Physical Therapy - ELKTON, MD",996747453.0,39.610906,-75.835989
4,"Pivot Physical Therapy - FREDERICKSBURG, VA (E...",996747504.0,38.278254,-77.490196
5,"Pivot Physical Therapy - HAGERSTOWN, MD (VALLE...",996747476.0,39.624999,-77.771133
6,"Pivot Physical Therapy - HAMPTON, VA (EXECUTIVE)",996747510.0,37.047219,-76.396602
7,"Pivot Physical Therapy - HAYMARKET, VA",996747511.0,38.81822,-77.645557
8,"Pivot Physical Therapy - LANSDOWNE, MD",996747465.0,39.25062,-76.66674
9,"Pivot Physical Therapy - NEWPORT NEWS, VA (DEN...",996747520.0,37.137515,-76.521689


In [16]:
full_results.to_excel('../data/processed/Geocoded_Contacts.xlsx', index=False)