# Calculate a 9 mile radius of each zip code

For consideration: 

"Big-box retailers (e.g., Walmart, Target): These stores often draw customers from a larger area, so a radius of 10 to 15 miles is reasonable."

## 0. Load Libraries 

In [7]:
#Libraries and Settings
import pandas as pd

from pyzipcode import ZipCodeDatabase

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows',10)



## 1. Load Data

In [8]:
#Data File to Dataframe
file='/Users/c32/Documents/NYCDSA/Projects/DATA/Ready_Data/1_Load_Geographic_Data.csv'
df=pd.read_csv(file, converters={'zip':str})
df.shape

(33129, 10)

## 2. Calculate List of Zip Codes Around.

Calculate Distances: Using the Haversine formula, the method calculates the great-circle distance between the central ZIP code and all other ZIP codes in the database. The Haversine formula accounts for the spherical shape of the Earth and is used to compute the shortest distance over the earth's surface.

In [9]:
#Set the mile Radius here:
mile_radius=9

zcdb = ZipCodeDatabase()

def surounding_zips(center_zip):
    try: 
        in_radius=[z.zip for z in zcdb.get_zipcodes_around_radius(center_zip, mile_radius)] #zip codes within 9miles of the center of 'center_zip'
        #radius_utf = [x.encode('UTF-8') for x in in_radius] #I don't seem to need this. 
        return in_radius
    except: return[center_zip]

### --> THIS CALCULATION TAKES OVER 1minute !! <-- ###
df['zips_around']=df['zip'].apply(surounding_zips)

In [10]:
#Investigate the resulting lists of zipcodes.
def get_list_length(row):
    return len(row['zips_around'])

df['length_of_list'] = df.apply(lambda row: get_list_length(row), axis=1)

a=(df['length_of_list'].value_counts())

result=df[['zip', 'length_of_list']].sort_values(by='length_of_list', ascending=False)

## 3. Save The results

In [11]:
#After all the changes, let's save in a csv file.

import os
outname = '1_Load_Geographic_Data.csv'
outdir = '/Users/c32/Documents/NYCDSA/Projects/DATA/Ready_Data'
if not os.path.exists(outdir):
    os.mkdir(outdir)
fullname = os.path.join(outdir, outname)    

df.to_csv(fullname, header=True, index=False)
print("Saved!")

Saved!


In [12]:
df.head()

Unnamed: 0,zip,city,state,state_short,county,county_code,population,density,timezone,gps_coordinates,zips_around,length_of_list
0,58784,Stanley,North Dakota,ND,"Mountrail,Burke",3806138013,3528.0,4.2,America/Chicago,"48.36434, -102.42438",[58784],1
1,59029,Fromberg,Montana,MT,Carbon,30009,847.0,3.2,America/Denver,"45.40732, -108.80085","[59029, 59041]",2
2,59047,Livingston,Montana,MT,Park,30067,12728.0,5.5,America/Denver,"45.54805, -110.569",[59047],1
3,59072,Roundup,Montana,MT,"Musselshell,Petroleum",3006530069,4328.0,1.7,America/Denver,"46.4691, -108.53766",[59072],1
4,59106,Billings,Montana,MT,Yellowstone,30111,18281.0,72.7,America/Denver,"45.80792, -108.6834","[59002, 59044, 59102, 59106]",4
