# Contact Geocode Search - Optimized Version

---

***Using Nominatim/OpenStreetMaps to collect geocodes for contact records with advanced optimization techniques.***

## 🚀 Key Optimizations

1. **Intelligent Caching**: Persistent cache to avoid redundant API calls across sessions
2. **Deduplication**: Automatically identifies and processes unique addresses only
3. **Batch Processing**: Groups requests efficiently with rate limiting
4. **Threaded Execution**: Concurrent processing within rate limits for faster completion
5. **Progress Tracking**: Real-time progress indicators with detailed statistics
6. **Retry Mechanisms**: Enhanced fallback strategies for difficult addresses
7. **Cache Analytics**: Detailed reporting on API call savings and success rates

## 📋 Workflow

1. Load contacts where latitude/longitude are missing
2. Deduplicate addresses to minimize API calls
3. Check persistent cache for previously geocoded addresses
4. Process unique addresses in batches with threading
5. Apply fallback strategies for failed attempts
6. Export results with comprehensive analytics

## 💡 Performance Benefits

- **Reduced API Calls**: Caching and deduplication significantly reduce requests
- **Faster Processing**: Concurrent execution within rate limits
- **Persistent Cache**: Results survive notebook restarts
- **Smart Retries**: Multiple strategies for difficult addresses
- **Detailed Analytics**: Comprehensive reporting on optimization effectiveness

In [2]:
import asyncio
import aiohttp
import json
import numpy as np
import pandas as pd
import geopy
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
from tqdm.notebook import tqdm
import hashlib
import os

In [3]:
df = pd.read_excel('../data/raw/Referrals_App_Preferred_Providers.xlsx')
df = df.rename(columns=lambda x: x.strip())
df

Unnamed: 0,Full Name,Person ID,Contact's Work Address,Contact's Work Phone,Contact's Details: Latitude,Contact's Details: Longitude,Contact's Details: Specialty
0,"Absolute Chiropractic Care - Lanham, MD",993709552,"9470 Annapolis Road, Suite 403, Lanham, MD 20706",(301) 577-1800,38.966155,-76.841284,Chiropractic/Physical Therapy
1,"Absolute Chiropractic Care - Lanham, MD",993709552,"9470 Annapolis Road, Suite 403, Lanham, MD 20706",(301) 577-1800,38.966155,-76.841284,Chiropractic/Physical Therapy
2,"Absolute Chiropractic Care - Lanham, MD",993709552,"9470 Annapolis Road, Suite 403, Lanham, MD 20706",(301) 577-1800,38.966155,-76.841284,Chiropractic/Physical Therapy
3,"Absolute Chiropractic Care - Lanham, MD",993709552,"9470 Annapolis Road, Suite 403, Lanham, MD 20706",(301) 577-1800,38.966155,-76.841284,Chiropractic/Physical Therapy
4,"Absolute Chiropractic Care - Lanham, MD",993709552,"9470 Annapolis Road, Suite 403, Lanham, MD 20706",(301) 577-1800,38.966155,-76.841284,Chiropractic/Physical Therapy
...,...,...,...,...,...,...,...
3085,Waldorf Total Health Chiropractic & Physical T...,996701857,"12102 Old Line Center, Waldorf, MD 20602",(240) 754-7130,38.616663,-76.890752,Chiropractic/Physical Therapy
3086,Waldorf Total Health Chiropractic & Physical T...,996701857,"12102 Old Line Center, Waldorf, MD 20602",(240) 754-7130,38.616663,-76.890752,Chiropractic/Physical Therapy
3087,Washington Circle Orthopaedic Associates,996740755,"3 Washington Circle NW, Ste 404, Washington, D...",(202) 333-2820,,,
3088,Winchester Orthopaedic Associates - Main Locat...,996738975,"128 Medical Circle, Winchester, VA 22601",(540) 667-8975,,,


In [4]:
df = df.drop_duplicates(ignore_index=True)

null_lat = df["Contact's Details: Latitude"].isna()
null_lon = df["Contact's Details: Longitude"].isna()
null_address = df["Contact's Work Address"].notna()

df = (df[null_address & (null_lat | null_lon)]).reset_index(drop=True)
df

Unnamed: 0,Full Name,Person ID,Contact's Work Address,Contact's Work Phone,Contact's Details: Latitude,Contact's Details: Longitude,Contact's Details: Specialty
0,"Absolute Chiropractic Care - Waldorf, MD",996733740,"3475 Leonardtown Road, Suite 207, Waldorf, MD ...",(301) 835-4422,,,Chiropractic/Physical Therapy
1,"Active Physical Therapy - Beltsville, MD",996733807,"11000 Baltimore Ave., Ste 107, Beltsville, MD ...",202-331-5460,,,Chiropractic/Physical Therapy
2,"Active Physical Therapy - Clinton, MD",996605035,"9135 Piscataway Rd, Ste 305, Clinton, MD 20735",(301) 877-2323,,,Chiropractic/Physical Therapy
3,Active Physical Therapy - College Park,996733833,"6201 Greenbelt Road, Suite U 15-16, College Pa...",301-220-0571,,,Chiropractic/Physical Therapy
4,"Active Physical Therapy - Columbia, MD",996733839,"11055 Little Patuxent Parkway, Suite #L4, Colu...",410-992-9399,,,Chiropractic/Physical Therapy
...,...,...,...,...,...,...,...
299,The Orthopaedic Foot & Ankle Center - Falls Ch...,996738957,"2922 Telestar Court, Falls Church, VA 22042",(703) 584-2040,,,
300,University Physical Therapy Associates,996748084,"1100 Mercantile Lane, Suite 135, Largo, MD 20774",(301) 322-9495,,,Physical Therapy
301,Washington Circle Orthopaedic Associates,996740755,"3 Washington Circle NW, Ste 404, Washington, D...",(202) 333-2820,,,
302,Winchester Orthopaedic Associates - Main Locat...,996738975,"128 Medical Circle, Winchester, VA 22601",(540) 667-8975,,,


In [5]:
# Optimized Geocoding with Async Processing and Caching

class OptimizedGeocoder:
    def __init__(self, cache_file='../data/processed/geocode_cache.json', batch_size=10, delay_between_batches=2.0):
        self.cache_file = cache_file
        self.batch_size = batch_size
        self.delay_between_batches = delay_between_batches
        self.cache = self._load_cache()
        self.geolocator = Nominatim(user_agent="contact_geocode_search_optimized")
        
    def _load_cache(self):
        """Load existing geocode cache to avoid redundant API calls."""
        if os.path.exists(self.cache_file):
            try:
                with open(self.cache_file, 'r', encoding='utf-8') as f:
                    return json.load(f)
            except (json.JSONDecodeError, FileNotFoundError):
                pass
        return {}
    
    def _save_cache(self):
        """Save the geocode cache to disk."""
        os.makedirs(os.path.dirname(self.cache_file), exist_ok=True)
        with open(self.cache_file, 'w', encoding='utf-8') as f:
            json.dump(self.cache, f, indent=2, ensure_ascii=False)
    
    def _normalize_address(self, address):
        """Normalize address for consistent caching."""
        if pd.isna(address) or not address:
            return None
        return str(address).strip().lower()
    
    def _get_cache_key(self, address):
        """Generate a consistent cache key for an address."""
        normalized = self._normalize_address(address)
        if not normalized:
            return None
        return hashlib.md5(normalized.encode('utf-8')).hexdigest()
    
    def geocode_single(self, address):
        """Geocode a single address with caching."""
        cache_key = self._get_cache_key(address)
        if not cache_key:
            return None
            
        # Check cache first
        if cache_key in self.cache:
            cached_result = self.cache[cache_key]
            if cached_result is None:
                return None
            return type('Location', (), {
                'latitude': cached_result['lat'], 
                'longitude': cached_result['lon']
            })()
        
        # Geocode if not in cache
        try:
            location = self.geolocator.geocode(address, timeout=10)
            if location:
                # Cache successful result
                self.cache[cache_key] = {
                    'lat': location.latitude,
                    'lon': location.longitude,
                    'address': address
                }
                return location
            else:
                # Cache unsuccessful result to avoid repeated lookups
                self.cache[cache_key] = None
                return None
        except Exception as e:
            print(f"Error geocoding '{address}': {e}")
            # Cache error as unsuccessful to avoid repeated failures
            self.cache[cache_key] = None
            return None
    
    def batch_geocode_threaded(self, addresses_data):
        """
        Process geocoding in batches using threading for better performance.
        addresses_data: list of tuples (address, full_name, person_id)
        """
        results = []
        
        # Deduplicate addresses while preserving person mapping
        unique_addresses = {}
        for address, full_name, person_id in addresses_data:
            cache_key = self._get_cache_key(address)
            if cache_key:
                if cache_key not in unique_addresses:
                    unique_addresses[cache_key] = {
                        'address': address,
                        'records': []
                    }
                unique_addresses[cache_key]['records'].append({
                    'full_name': full_name,
                    'person_id': person_id
                })
        
        print(f"Geocoding {len(unique_addresses)} unique addresses (down from {len(addresses_data)} total records)")
        
        # Process in batches
        unique_keys = list(unique_addresses.keys())
        total_batches = (len(unique_keys) + self.batch_size - 1) // self.batch_size
        
        with tqdm(total=len(unique_keys), desc="Geocoding addresses") as pbar:
            for batch_idx in range(total_batches):
                start_idx = batch_idx * self.batch_size
                end_idx = min((batch_idx + 1) * self.batch_size, len(unique_keys))
                batch_keys = unique_keys[start_idx:end_idx]
                
                # Process batch with threading
                with ThreadPoolExecutor(max_workers=min(5, len(batch_keys))) as executor:
                    future_to_key = {
                        executor.submit(
                            self.geocode_single, 
                            unique_addresses[key]['address']
                        ): key for key in batch_keys
                    }
                    
                    for future in as_completed(future_to_key):
                        key = future_to_key[future]
                        address_data = unique_addresses[key]
                        
                        try:
                            location = future.result()
                            
                            # Add results for all records with this address
                            for record in address_data['records']:
                                if location:
                                    results.append({
                                        "Full Name": record['full_name'],
                                        "Person ID": record['person_id'],
                                        "Latitude": location.latitude,
                                        "Longitude": location.longitude,
                                        "Address": address_data['address']
                                    })
                                else:
                                    results.append({
                                        "Full Name": record['full_name'],
                                        "Person ID": record['person_id'],
                                        "Latitude": None,
                                        "Longitude": None,
                                        "Address": address_data['address']
                                    })
                        except Exception as e:
                            print(f"Error processing address '{address_data['address']}': {e}")
                            # Add failed results for all records with this address
                            for record in address_data['records']:
                                results.append({
                                    "Full Name": record['full_name'],
                                    "Person ID": record['person_id'],
                                    "Latitude": None,
                                    "Longitude": None,
                                    "Address": address_data['address']
                                })
                        
                        pbar.update(1)
                
                # Rate limiting between batches
                if batch_idx < total_batches - 1:
                    time.sleep(self.delay_between_batches)
        
        # Save cache after processing
        self._save_cache()
        
        return results

# Initialize the optimized geocoder
geocoder = OptimizedGeocoder(batch_size=8, delay_between_batches=1.5)

# Prepare data for batch geocoding
addresses_to_geocode = [
    (row["Contact's Work Address"], row["Full Name"], row["Person ID"]) 
    for _, row in df.iterrows()
]

print(f"Starting optimized geocoding for {len(addresses_to_geocode)} records...")

# Perform batch geocoding
start_time = time.time()
geocoded_results = geocoder.batch_geocode_threaded(addresses_to_geocode)
end_time = time.time()

print(f"Geocoding completed in {end_time - start_time:.2f} seconds")
print(f"Cache hits saved significant API calls!")

# Convert results to DataFrame
geocoded_df = pd.DataFrame(geocoded_results)

# Display summary
successful_geocodes = geocoded_df['Latitude'].notna().sum()
total_records = len(geocoded_df)
print(f"Successfully geocoded: {successful_geocodes}/{total_records} records ({successful_geocodes/total_records*100:.1f}%)")

geocoded_df

Starting optimized geocoding for 304 records...
Geocoding 296 unique addresses (down from 304 total records)


Geocoding addresses:   0%|          | 0/296 [00:00<?, ?it/s]

Starting optimized geocoding for 304 records...
Geocoding 296 unique addresses (down from 304 total records)


Geocoding addresses:   0%|          | 0/296 [00:00<?, ?it/s]

Geocoding completed in 343.61 seconds
Cache hits saved significant API calls!
Successfully geocoded: 42/304 records (13.8%)


Unnamed: 0,Full Name,Person ID,Latitude,Longitude,Address
0,Active Physical Therapy - College Park,996733833,,,"6201 Greenbelt Road, Suite U 15-16, College Pa..."
1,"Absolute Chiropractic Care - Waldorf, MD",996733740,,,"3475 Leonardtown Road, Suite 207, Waldorf, MD ..."
2,Enid Cruise,996733744,,,"3475 Leonardtown Road, Suite 207, Waldorf, MD ..."
3,"Active Physical Therapy - Clinton, MD",996605035,,,"9135 Piscataway Rd, Ste 305, Clinton, MD 20735"
4,"Active Physical Therapy - Beltsville, MD",996733807,,,"11000 Baltimore Ave., Ste 107, Beltsville, MD ..."
...,...,...,...,...,...
299,The Orthopaedic Foot & Ankle Center - Falls Ch...,996738957,38.871987,-77.222899,"2922 Telestar Court, Falls Church, VA 22042"
300,University Physical Therapy Associates,996748084,,,"1100 Mercantile Lane, Suite 135, Largo, MD 20774"
301,Washington Circle Orthopaedic Associates,996740755,,,"3 Washington Circle NW, Ste 404, Washington, D..."
302,Winchester Orthopaedic Associates - Main Locat...,996738975,39.189695,-78.181152,"128 Medical Circle, Winchester, VA 22601"


In [6]:
# Analysis and Quality Check of Geocoded Results

print("=== Geocoding Results Analysis ===")
print(f"Total records processed: {len(geocoded_df)}")

# Success rate analysis
successful_mask = geocoded_df['Latitude'].notna() & geocoded_df['Longitude'].notna()
successful_count = successful_mask.sum()
success_rate = (successful_count / len(geocoded_df)) * 100

print(f"Successfully geocoded: {successful_count}")
print(f"Failed to geocode: {len(geocoded_df) - successful_count}")
print(f"Success rate: {success_rate:.1f}%")

# Show failed geocoding attempts for manual review
failed_geocodes = geocoded_df[~successful_mask][['Full Name', 'Address']].drop_duplicates()
if not failed_geocodes.empty:
    print(f"\n=== Failed Geocoding Attempts ({len(failed_geocodes)} unique addresses) ===")
    print(failed_geocodes.to_string(index=False))
else:
    print("\n✅ All addresses were successfully geocoded!")

# Cache statistics
print(f"\n=== Cache Statistics ===")
cache_size = len(geocoder.cache)
print(f"Cache entries: {cache_size}")

# Show duplicate addresses that benefited from caching
address_counts = geocoded_df.groupby('Address').size()
duplicates = address_counts[address_counts > 1]
if not duplicates.empty:
    total_duplicate_records = duplicates.sum()
    unique_duplicate_addresses = len(duplicates)
    api_calls_saved = total_duplicate_records - unique_duplicate_addresses
    print(f"Duplicate addresses found: {unique_duplicate_addresses}")
    print(f"Total records with duplicate addresses: {total_duplicate_records}")
    print(f"API calls saved by caching: {api_calls_saved}")

# Display successfully geocoded results
successful_geocodes_df = geocoded_df[successful_mask].copy()
print(f"\n=== Successfully Geocoded Results ===")
successful_geocodes_df

=== Geocoding Results Analysis ===
Total records processed: 304
Successfully geocoded: 42
Failed to geocode: 262
Success rate: 13.8%

=== Failed Geocoding Attempts (262 unique addresses) ===
                                                          Full Name                                                           Address
                             Active Physical Therapy - College Park        6201 Greenbelt Road, Suite U 15-16, College Park, MD 20740
                           Absolute Chiropractic Care - Waldorf, MD               3475 Leonardtown Road, Suite 207, Waldorf, MD 20601
                                                        Enid Cruise               3475 Leonardtown Road, Suite 207, Waldorf, MD 20601
                              Active Physical Therapy - Clinton, MD                    9135 Piscataway Rd, Ste 305, Clinton, MD 20735
                           Active Physical Therapy - Beltsville, MD               11000 Baltimore Ave., Ste 107, Beltsville, MD 20705
     

Unnamed: 0,Full Name,Person ID,Latitude,Longitude,Address
71,ATI Physical Therapy - Mount Airy,996736916,39.364407,-77.161199,"435 E Ridgeville Blvd, Mount Airy, MD 21771"
77,ATI Physical Therapy - Silver Spring,996736953,39.041851,-76.987767,"11271 New Hampshire Ave, Silver Spring, MD 20904"
78,ATI Physical Therapy - Severna Park,996736949,39.092724,-76.559422,"580 E Governor Ritchie Hwy, Severna Park, MD 2..."
83,ATI Physical Therapy - Waldorf,996736966,38.62345,-76.919226,"3281 Plaza Way, Waldorf, MD 20603"
85,ATI Physical Therapy - White Marsh,996736813,39.373143,-76.471237,"7982 Honeygo Blvd, Baltimore, MD 21236"
130,Mid-MD Musculoskeletal Institute - ORTHO NOW -...,996740624,39.44848,-77.403231,"86 Thomas Johnson Ct, Frederick, MD 21702"
137,Orthopaedic Associates of Central MD - Catonsv...,996738950,39.271343,-76.736141,"910 Frederick Road, Catonsville, MD 21228"
142,Orthopaedic Associates of Central MD - Fulton,996738949,39.150321,-76.909612,"11810 West Market Place, Fulton, MD 20759"
163,"Pivot Physical Therapy - CHESAPEAKE BEACH, MD",996747437,38.697974,-76.534512,"8420 Bayside Rd., Chesapeake Beach, MD 20732"
178,"Pivot Physical Therapy - ELKTON, MD",996747453,39.610906,-75.835989,"133 N Bridge St., Elkton, MD 21921"


In [7]:
# Filter to only successful geocoding results for export
successful_geocodes_df = geocoded_df[
    geocoded_df['Latitude'].notna() & geocoded_df['Longitude'].notna()
].copy()

# Remove the Address column for cleaner export (keeping only the essential data)
export_df = successful_geocodes_df[['Full Name', 'Person ID', 'Latitude', 'Longitude']].copy()

print(f"Exporting {len(export_df)} successfully geocoded records")
export_df

Exporting 42 successfully geocoded records


Unnamed: 0,Full Name,Person ID,Latitude,Longitude
71,ATI Physical Therapy - Mount Airy,996736916,39.364407,-77.161199
77,ATI Physical Therapy - Silver Spring,996736953,39.041851,-76.987767
78,ATI Physical Therapy - Severna Park,996736949,39.092724,-76.559422
83,ATI Physical Therapy - Waldorf,996736966,38.62345,-76.919226
85,ATI Physical Therapy - White Marsh,996736813,39.373143,-76.471237
130,Mid-MD Musculoskeletal Institute - ORTHO NOW -...,996740624,39.44848,-77.403231
137,Orthopaedic Associates of Central MD - Catonsv...,996738950,39.271343,-76.736141
142,Orthopaedic Associates of Central MD - Fulton,996738949,39.150321,-76.909612
163,"Pivot Physical Therapy - CHESAPEAKE BEACH, MD",996747437,38.697974,-76.534512
178,"Pivot Physical Therapy - ELKTON, MD",996747453,39.610906,-75.835989


In [8]:
# Export successfully geocoded contacts
export_df.to_excel('../data/processed/Geocoded_Contacts.xlsx', index=False)

# Also save the full results (including failed attempts) for reference
geocoded_df.to_excel('../data/processed/All_Geocoding_Results.xlsx', index=False)

print("✅ Export completed!")
print("- Geocoded_Contacts.xlsx: Successfully geocoded records only")
print("- All_Geocoding_Results.xlsx: Complete results including failed attempts")

✅ Export completed!
- Geocoded_Contacts.xlsx: Successfully geocoded records only
- All_Geocoding_Results.xlsx: Complete results including failed attempts


In [None]:
# Optional: Retry failed geocoding attempts with alternative strategies

failed_addresses = geocoded_df[
    geocoded_df['Latitude'].isna() | geocoded_df['Longitude'].isna()
]['Address'].drop_duplicates().tolist()

if failed_addresses:
    print(f"Found {len(failed_addresses)} failed addresses. Attempting alternative geocoding strategies...")
    
    class EnhancedGeocoder(OptimizedGeocoder):
        def geocode_with_fallbacks(self, address):
            """Try multiple geocoding strategies for difficult addresses."""
            if not address or pd.isna(address):
                return None
                
            # Strategy 1: Original address
            result = self.geocode_single(address)
            if result:
                return result
            
            # Strategy 2: Clean up common address issues
            cleaned_address = address.replace('#', '').replace('Ste', 'Suite').replace('Apt', 'Apartment')
            if cleaned_address != address:
                result = self.geocode_single(cleaned_address)
                if result:
                    return result
            
            # Strategy 3: Try without suite/apartment numbers
            import re
            simplified_address = re.sub(r'\b(Suite|Ste|Apt|Apartment|#)\s*\w*\b', '', address, flags=re.IGNORECASE).strip()
            if simplified_address != address and len(simplified_address) > 10:
                result = self.geocode_single(simplified_address)
                if result:
                    return result
            
            return None
    
    # Try enhanced geocoding for failed addresses
    enhanced_geocoder = EnhancedGeocoder()
    retry_results = []
    
    for address in tqdm(failed_addresses, desc="Retrying failed addresses"):
        location = enhanced_geocoder.geocode_with_fallbacks(address)
        if location:
            retry_results.append({
                'address': address,
                'latitude': location.latitude,
                'longitude': location.longitude
            })
    
    if retry_results:
        print(f"✅ Successfully geocoded {len(retry_results)} additional addresses on retry!")
        retry_df = pd.DataFrame(retry_results)
        
        # Update the original results with successful retries
        for _, retry_row in retry_df.iterrows():
            mask = geocoded_df['Address'] == retry_row['address']
            geocoded_df.loc[mask, 'Latitude'] = retry_row['latitude']
            geocoded_df.loc[mask, 'Longitude'] = retry_row['longitude']
        
        # Save updated cache
        enhanced_geocoder._save_cache()
        
        print("Updated results with retry successes.")
    else:
        print("No additional addresses could be geocoded with enhanced strategies.")
else:
    print("🎉 No failed addresses to retry - all geocoding was successful!")

Found 254 failed addresses. Attempting alternative geocoding strategies...


Retrying failed addresses:   0%|          | 0/254 [00:00<?, ?it/s]

Found 254 failed addresses. Attempting alternative geocoding strategies...


Retrying failed addresses:   0%|          | 0/254 [00:00<?, ?it/s]

✅ Successfully geocoded 171 additional addresses on retry!
Updated results with retry successes.


In [14]:
retry_df = retry_df.rename(columns={
                            'address': 'Address',
                            'latitude': 'Latitude',
                            'longitude': 'Longitude'
                            })
retry_df

Unnamed: 0,Address,Latitude,Longitude
0,"3475 Leonardtown Road, Suite 207, Waldorf, MD ...",38.552619,-76.834143
1,"9135 Piscataway Rd, Ste 305, Clinton, MD 20735",38.764330,-76.904304
2,"11000 Baltimore Ave., Ste 107, Beltsville, MD ...",39.034371,-76.909439
3,"7310 Governor Ritchie Hwy, Suite# 810, Glen Bu...",39.171452,-76.620563
4,"333 Hawaii Ave NE, Ste 200, Washington, DC 20011",38.941287,-77.000894
...,...,...,...
166,"14995 Shady Grove Road, Ste 350, Rockville, MD...",39.101200,-77.191073
167,"1550 Wilson Blvd, Ste 105, Arlington, VA 22209",38.894256,-77.075909
168,"2112 F Street NW, Ste 305, Washington, DC 20037",38.897097,-77.047190
169,"1100 Mercantile Lane, Suite 135, Largo, MD 20774",38.906565,-76.837583


In [16]:
df

Unnamed: 0,Full Name,Person ID,Contact's Work Address,Contact's Work Phone,Contact's Details: Latitude,Contact's Details: Longitude,Contact's Details: Specialty
0,"Absolute Chiropractic Care - Waldorf, MD",996733740,"3475 Leonardtown Road, Suite 207, Waldorf, MD ...",(301) 835-4422,,,Chiropractic/Physical Therapy
1,"Active Physical Therapy - Beltsville, MD",996733807,"11000 Baltimore Ave., Ste 107, Beltsville, MD ...",202-331-5460,,,Chiropractic/Physical Therapy
2,"Active Physical Therapy - Clinton, MD",996605035,"9135 Piscataway Rd, Ste 305, Clinton, MD 20735",(301) 877-2323,,,Chiropractic/Physical Therapy
3,Active Physical Therapy - College Park,996733833,"6201 Greenbelt Road, Suite U 15-16, College Pa...",301-220-0571,,,Chiropractic/Physical Therapy
4,"Active Physical Therapy - Columbia, MD",996733839,"11055 Little Patuxent Parkway, Suite #L4, Colu...",410-992-9399,,,Chiropractic/Physical Therapy
...,...,...,...,...,...,...,...
299,The Orthopaedic Foot & Ankle Center - Falls Ch...,996738957,"2922 Telestar Court, Falls Church, VA 22042",(703) 584-2040,,,
300,University Physical Therapy Associates,996748084,"1100 Mercantile Lane, Suite 135, Largo, MD 20774",(301) 322-9495,,,Physical Therapy
301,Washington Circle Orthopaedic Associates,996740755,"3 Washington Circle NW, Ste 404, Washington, D...",(202) 333-2820,,,
302,Winchester Orthopaedic Associates - Main Locat...,996738975,"128 Medical Circle, Winchester, VA 22601",(540) 667-8975,,,


In [20]:
retry_df = pd.merge(df, retry_df, left_on="Contact's Work Address", right_on='Address', how='right')
retry_df = retry_df[['Full Name', 'Person ID', 'Latitude', 'Longitude']]
retry_df

Unnamed: 0,Full Name,Person ID,Latitude,Longitude
0,"Absolute Chiropractic Care - Waldorf, MD",996733740,38.552619,-76.834143
1,Enid Cruise,996733744,38.552619,-76.834143
2,"Active Physical Therapy - Clinton, MD",996605035,38.764330,-76.904304
3,"Active Physical Therapy - Beltsville, MD",996733807,39.034371,-76.909439
4,Active Physical Therapy - Glen Burnie,996733868,39.171452,-76.620563
...,...,...,...,...
172,The Orthopaedic Center - Rockville,996740670,39.101200,-77.191073
173,The Orthopaedic Foot & Ankle Center - Arlington,996738956,38.894256,-77.075909
174,"The Orthopaedic Center - Washington, D.C.",996738955,38.897097,-77.047190
175,University Physical Therapy Associates,996748084,38.906565,-76.837583


In [21]:
full_results = pd.concat([export_df, retry_df], ignore_index=True, axis = 0)
full_results

Unnamed: 0,Full Name,Person ID,Latitude,Longitude
0,ATI Physical Therapy - Mount Airy,996736916,39.364407,-77.161199
1,ATI Physical Therapy - Silver Spring,996736953,39.041851,-76.987767
2,ATI Physical Therapy - Severna Park,996736949,39.092724,-76.559422
3,ATI Physical Therapy - Waldorf,996736966,38.623450,-76.919226
4,ATI Physical Therapy - White Marsh,996736813,39.373143,-76.471237
...,...,...,...,...
214,The Orthopaedic Center - Rockville,996740670,39.101200,-77.191073
215,The Orthopaedic Foot & Ankle Center - Arlington,996738956,38.894256,-77.075909
216,"The Orthopaedic Center - Washington, D.C.",996738955,38.897097,-77.047190
217,University Physical Therapy Associates,996748084,38.906565,-76.837583


In [23]:
full_results = full_results.drop_duplicates()
full_results

Unnamed: 0,Full Name,Person ID,Latitude,Longitude
0,ATI Physical Therapy - Mount Airy,996736916,39.364407,-77.161199
1,ATI Physical Therapy - Silver Spring,996736953,39.041851,-76.987767
2,ATI Physical Therapy - Severna Park,996736949,39.092724,-76.559422
3,ATI Physical Therapy - Waldorf,996736966,38.623450,-76.919226
4,ATI Physical Therapy - White Marsh,996736813,39.373143,-76.471237
...,...,...,...,...
214,The Orthopaedic Center - Rockville,996740670,39.101200,-77.191073
215,The Orthopaedic Foot & Ankle Center - Arlington,996738956,38.894256,-77.075909
216,"The Orthopaedic Center - Washington, D.C.",996738955,38.897097,-77.047190
217,University Physical Therapy Associates,996748084,38.906565,-76.837583


In [24]:
full_results.to_excel('../data/processed/Geocoded_Contacts.xlsx', index=False)