In this notebook we will be getting the necessary latitudes and longitudes for our project. First we will load our libraries:

In [21]:
import pandas as pd
import numpy as np
!pip install geocoder
import geocoder
import json
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
!conda install -c conda-forge folium=0.5.0 --yes
import folium
print('Libraries imported.')

Collecting package metadata: ...working... done
Solving environment: ...working... 
  - anaconda::ca-certificates-2019.1.23-0, anaconda::openssl-1.1.1b-he774522_1
  - anaconda::openssl-1.1.1b-he774522_1, defaults::ca-certificates-2019.1.23-0
  - anaconda::ca-certificates-2019.1.23-0, defaults::openssl-1.1.1b-he774522_1
  - defaults::ca-certificates-2019.1.23-0, defaults::openssl-1.1.1b-he774522_1done

# All requested packages already installed.

Libraries imported.


Next we will load our cleaned and organized data frame with postal codes and area information. I have included the file on github:

In [22]:
df = pd.read_csv (r'C:\Users\John\Desktop\notebooks\STL.csv')
df = df.drop('Unnamed: 0', axis=1)

We need to add in the latitude and longitude of each postal code into the data frame. So we will go ahead and initiate the columns with 1s as place holders.

In [23]:
df['Latitude'] = 1.0
df['Longitude']=1.0
df.head()

Unnamed: 0,ZipCode,AreaName,Latitude,Longitude
0,63005,Chesterfield,1.0,1.0
1,63010,Arnold,1.0,1.0
2,63011,Ballwin,1.0,1.0
3,63012,Barnhart,1.0,1.0
4,63013,Beaufort,1.0,1.0


So it looks like everything is in order! Great! Now we need to pull the coordinates from geocoder. I have insted used .arcgis which seems to work very well. We will tie it all up with a little for loop to write the coordinates into the dataframe.

In [24]:
for index, row in df.iterrows():
    g = geocoder.arcgis('{}, Missouri'.format(row.ZipCode))
    lat_lng_coords = g.latlng
    df.at[index,'Latitude'] =lat_lng_coords[0]
    df.at[index,'Longitude'] =lat_lng_coords[1]

In [25]:
df

Unnamed: 0,ZipCode,AreaName,Latitude,Longitude
0,63005,Chesterfield,38.656650,-90.586180
1,63010,Arnold,38.436944,-90.366567
2,63011,Ballwin,38.600194,-90.542303
3,63012,Barnhart,38.336075,-90.402166
4,63013,Beaufort,38.413321,-91.169170
5,63014,Berger,38.674814,-91.337210
6,63015,Catawissa,38.419626,-90.781781
7,63016,Cedar Hill,38.357401,-90.644971
8,63017,Chesterfield,38.677780,-90.507360
9,63019,Crystal City,38.223530,-90.381590


Lets Visualize this

In [26]:
address = 'Saint Louis, MO'

geolocator = Nominatim(user_agent="STL_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Saint Louis are {}, {}'.format(latitude, longitude))

The geograpical coordinate of Saint Louis are 38.6268039, -90.1994097


In [27]:
map_STL = folium.Map(location=[latitude, longitude], zoom_start=9)

# add markers to map
for lat, lng, zipcode, areaname in zip(df['Latitude'], df['Longitude'], df['ZipCode'], df['AreaName']):
    label = '{}, {}'.format(zipcode, areaname)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_STL)  
    
map_STL

This is a very large area. I am going to limit the dimensions of the area we are examing.
The northern boundry will be set by West Alton, Latitude 38.867190. The Western Boundry will be St. Peters, Longitude -90.619140. The Southern boundry will be Arnold, or latitude 38.436944. 

In [28]:
df2=df

In [29]:
df2=df2.ix[df2.Latitude<= 38.867190]
df2=df2.ix[df2.Latitude>= 38.436944]
df2=df2.ix[df2.Longitude>=-90.619140]

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  This is separate from the ipykernel package so we can avoid doing imports until


In [30]:
df2

Unnamed: 0,ZipCode,AreaName,Latitude,Longitude
0,63005,Chesterfield,38.656650,-90.586180
2,63011,Ballwin,38.600194,-90.542303
8,63017,Chesterfield,38.677780,-90.507360
11,63021,Ballwin,38.564928,-90.523348
14,63026,Fenton,38.495220,-90.427090
17,63031,Florissant,38.807249,-90.361309
18,63033,Florissant,38.806502,-90.297042
19,63034,Old Jamestown,38.824705,-90.294735
21,63038,Wildwood,38.570080,-90.602080
25,63042,Hazelwood,38.776450,-90.367570


In [31]:
print('The final DataFrame shape is {}'.format(df2.shape),'\n')

The final DataFrame shape is (65, 4) 



In [32]:
df2.to_csv('STL_cords.csv')