
<h1 align=center><font size = 5>Using Location coordinates to link to corresponding Neighborshoods</font></h1>

In [4]:
# Import necessary Libraries
import pandas as pd
import math
import numpy as np
from numpy import ndarray

## Files Required
<a href="#item1">Augusta_Crime_Data.csv</a> aka df_Crime_Raw_Data
- This file contains longitude data under column 'X' and latitude data under 'Y'

<a href="#item1">Sample Augusta Neighbourhoods.csv</a> aka df_Neighborhood_Raw_Data
- This file contains longitude data under column 'Longitude' and latitude data under 'Latitude' and Neighborhood data under column 'Nearest Neighbourhood'

These files have been either forwarded to you on fiverr or can be downloaded 
1. https://www.dropbox.com/s/g5eipr4ij8te6ie/Augusta_Crime_Data.csv?dl=0
2. https://www.dropbox.com/s/47fl6rzwuhd83lw/Sample%20Augusta%20Neighbourhoods.csv?dl=0

## Solution Required

- I would like the Augusta_Crime_Data.csv file to have a new column that contains a list of the corresponding neighborhood.


#### How will this be done?
- We will use a Haversine formula.
    - A Haversine formula calculates the distance between 2 coordinates.
- Sample Augusta Neighbourhoods.csv contains the list of Neighborshoods that will need to need to be listed in the new neighborhood column we  will create in the Augusta_Crime_Data.csv

In this project the Haversine formula has been prepared as a function: distanceFromAugustaGolf(lon,lat)

#### def distanceFromAugustaGolf (lon,lat):
   ###### import math
   ###### lon1 = -82.066007
   ###### lat1 = 33.472074
   ###### R = 6371000; 
    
   ###### dLat = (33.4747346-lat) * math.pi / 180 # Convert degrees to radians
   ###### dLon = (-82.0497226-lon) * math.pi / 180 # Convert degrees to radians
   ###### a = math.sin(dLat/2) * math.sin(dLat/2) + math.cos(lat * math.pi / 180 ) * math.cos(33.4747346 * math.pi / 180 ) * math.sin(dLon/2) * math.sin(dLon/2)
   ###### c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
   ###### d = (R * c)/1000
   ###### return d
    

## Written out Example

##### Create dataframe 
1. Augusta_Crime_Data.csv = <em><strong>df_Crime_Raw_Data</em>
2. Sample Augusta Neighbourhoods.csv = <em><strong>df_Neighborhood_Raw_Data</em>


- Lets say <em><strong>df_Crime_Raw_Data</em> has 5 rows of data.
- Lets say <em><strong>df_Neighborhood_Raw_Data</em> has the following data below :
    
########Nearest Neighbourhood########City########State########Coordinates########Longitude########Latitude
0. Academy Baker Ave########Augusta ########Georgia########33.4836288, -82.0845725########33.483629########-82.084572
1. Albion Acres########Augusta########Georgia########33.4125043, -82.0220507########33.412504########-82.022051
2. Barton Chapel########Augusta########Georgia########33.4424915, -82.0883992########33.442491	-82.088399
3. Bath-Edie########Augusta########Georgia########33.3333868, -82.2163489########33.333387########-82.216349

- Lets take the longitude [0] value under column 'X' and latitude [0] value under 'Y' and use the Haversine formula <em><strong>distanceFromAugustaGolf(lon,lat)</em> to find out the distance between is and the 'Nearest Neighbourhood' list found in <em><strong>df_Neighborhood_Raw_Data</em>
- A temporary column 'Distance from Crime' created and the index found that correspondes to the 'Nearest Neighbourhood'    Academy Baker or Albion Acre or Barton Chapel or Bath-Edie
    
    <em><strong>index = df_Neighborhood_Raw_Data['Distance from Crime'].idxmin()</em>
        
        - If it was Albion Acre, then a new column[0] 'Neigborhood' in df_Crime_Raw_Data will list Albion Acre
        
### Goal: 
        take the longitude [1] value under column 'X' and latitude [1] value under 'Y' and use the Haversine formula...If it was Barton Chapel, then a new column[1] 'Neigborhood' in df_Crime_Raw_Data will list Barton Chapel
        take the longitude [2] value under column 'X' and latitude [2] value under 'Y' and use the Haversine formula...If it was Academy Baker, then a new column[2] 'Neigborhood' in df_Crime_Raw_Data will list Academy Baker
        
        ...etc
            

In [5]:
# df_Crime_Raw_Data
df_Crime_Raw_Data = pd.read_csv('Augusta_Crime_Data.csv', header=0)
df_Crime_Raw_Data.head()

Unnamed: 0,X,Y,OBJECTID,CASE_NUMBER,CROSS_STREET,INTERSECTION,STREET_ADDRESS,MONTH,ZIPCODE,MAJOR_CRIME_CATEGORY,CATEGORY,SUBCATEGORY1,SUBCATEGORY2,BEAT,REPORTED_DATE,YEAR
0,-82.066007,33.472074,1,2017-00412862,,,3220 HERITAGE CIR,11-Nov,30909,Violent Crimes,Homicide,,,13.0,2017-11-05T01:25:29.000Z,2017
1,-82.025184,33.422534,2,2017-00414244,,,2714 MARGARET CT,11-Nov,30906,Violent Crimes,Robbery,Individual,Armed,24.0,2017-11-06T11:35:44.000Z,2017
2,-82.080317,33.421334,3,2017-00415999,,,2664 BARTON CHAPEL RD,11-Nov,30906,Violent Crimes,Robbery,Individual,Armed,21.0,2017-11-07T17:46:49.000Z,2017
3,-82.068223,33.390878,4,2017-00420299,,,2509 DRUMCLIFF CT,11-Nov,30815,Violent Crimes,Robbery,Individual,Armed,30.0,2017-11-11T00:24:23.000Z,2017
4,-82.047967,33.399386,5,2017-00416380,,,2350 WINDSOR SPRING RD,11-Nov,30906,Violent Crimes,Robbery,Individual,Armed,29.0,2017-11-08T00:53:19.000Z,2017


In [6]:
df_Crime_Raw_Data['Y'][0]

33.472073699999996

In [7]:
df_Crime_Raw_Data['X'][0]

-82.06600665

In [8]:
# df_Neighborhood_Raw_Data
df_Neighborhood_Raw_Data = pd.read_csv('Sample Augusta Neighbourhoods.csv', header=0)
df_Neighborhood_Raw_Data.head()

Unnamed: 0,Nearest Neighbourhood,City,State,Coordinates,Longitude,Latitude
0,Academy Baker Ave,Augusta,Georgia,"33.4836288, -82.0845725",33.483629,-82.084572
1,Albion Acres,Augusta,Georgia,"33.4125043, -82.0220507",33.412504,-82.022051
2,Barton Chapel,Augusta,Georgia,"33.4424915, -82.0883992",33.442491,-82.088399
3,Bath-Edie,Augusta,Georgia,"33.3333868, -82.2163489",33.333387,-82.216349


In [9]:
#Uses Haversine formula to calculate distance of location another coordinate (  lon1 = -82.066007, lat1 = 33.472074)
def distanceFromAugustaGolf (lon,lat):
    import math
    lon1 = -82.066007
    lat1 = 33.472074
    R = 6371000; # radius of the earth in meters, https://en.wikipedia.org/wiki/Earth_radius
    dLat = (33.4747346-lat) * math.pi / 180 # Convert degrees to radians
    dLon = (-82.0497226-lon) * math.pi / 180 # Convert degrees to radians
    a = math.sin(dLat/2) * math.sin(dLat/2) + math.cos(lat * math.pi / 180 ) * math.cos(33.4747346 * math.pi / 180 ) * math.sin(dLon/2) * math.sin(dLon/2)
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
    d = (R * c)/1000
    return d

In [10]:
distanceFromAugustaGolf (df_Neighborhood_Raw_Data['Latitude'][0],df_Neighborhood_Raw_Data['Longitude'][0])

3.3801158227191515

In [11]:
# Creates Distance values from df_Crime_Raw_Data['Y'][0] lat1 = df_Crime_Raw_Data['X'][0]
from numpy import ndarray

Distance_Cal_Crime = ndarray((len(df_Neighborhood_Raw_Data),),float)
distance_CalArray = ndarray((len(df_Neighborhood_Raw_Data),),float)
for i in range (len(df_Neighborhood_Raw_Data)):
    distance_CalArray[i] = distanceFromAugustaGolf (df_Neighborhood_Raw_Data['Latitude'][i],df_Neighborhood_Raw_Data['Longitude'][i])
    Distance_Cal_Crime[i] = distance_CalArray[i]

In [12]:
Distance_Cal_Crime

array([ 3.38011582,  7.38067038,  5.07223768, 22.05146859])

In [13]:
df_Neighborhood_Raw_Data['Distance from Crime'] = Distance_Cal_Crime

In [14]:
df_Neighborhood_Raw_Data.head()

Unnamed: 0,Nearest Neighbourhood,City,State,Coordinates,Longitude,Latitude,Distance from Crime
0,Academy Baker Ave,Augusta,Georgia,"33.4836288, -82.0845725",33.483629,-82.084572,3.380116
1,Albion Acres,Augusta,Georgia,"33.4125043, -82.0220507",33.412504,-82.022051,7.38067
2,Barton Chapel,Augusta,Georgia,"33.4424915, -82.0883992",33.442491,-82.088399,5.072238
3,Bath-Edie,Augusta,Georgia,"33.3333868, -82.2163489",33.333387,-82.216349,22.051469


In [15]:
df_Neighborhood_Raw_Data['Distance from Crime'].min()

3.3801158227191515

In [16]:
index = df_Neighborhood_Raw_Data['Distance from Crime'].idxmin() 

In [17]:
df_Neighborhood_Raw_Data['Nearest Neighbourhood'][index]

'Academy Baker Ave'

In [18]:
df_Crime_Raw_Data['Neighborhood'] = np.nan

In [19]:
df_Crime_Raw_Data['Neighborhood'][0] = df_Neighborhood_Raw_Data['Nearest Neighbourhood'][index]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


In [79]:
df_Crime_Raw_Data.head()

Unnamed: 0,X,Y,OBJECTID,CASE_NUMBER,CROSS_STREET,INTERSECTION,STREET_ADDRESS,MONTH,ZIPCODE,MAJOR_CRIME_CATEGORY,CATEGORY,SUBCATEGORY1,SUBCATEGORY2,BEAT,REPORTED_DATE,YEAR,Neighborhood
0,-82.066007,33.472074,1,2017-00412862,,,3220 HERITAGE CIR,11-Nov,30909,Violent Crimes,Homicide,,,13.0,2017-11-05T01:25:29.000Z,2017,Academy Baker Ave
1,-82.025184,33.422534,2,2017-00414244,,,2714 MARGARET CT,11-Nov,30906,Violent Crimes,Robbery,Individual,Armed,24.0,2017-11-06T11:35:44.000Z,2017,
2,-82.080317,33.421334,3,2017-00415999,,,2664 BARTON CHAPEL RD,11-Nov,30906,Violent Crimes,Robbery,Individual,Armed,21.0,2017-11-07T17:46:49.000Z,2017,
3,-82.068223,33.390878,4,2017-00420299,,,2509 DRUMCLIFF CT,11-Nov,30815,Violent Crimes,Robbery,Individual,Armed,30.0,2017-11-11T00:24:23.000Z,2017,
4,-82.047967,33.399386,5,2017-00416380,,,2350 WINDSOR SPRING RD,11-Nov,30906,Violent Crimes,Robbery,Individual,Armed,29.0,2017-11-08T00:53:19.000Z,2017,


In [20]:
df_Crime_Raw_Data['Neighborhood']

0    Academy Baker Ave
1                  NaN
2                  NaN
3                  NaN
4                  NaN
5                  NaN
6                  NaN
7                  NaN
8                  NaN
Name: Neighborhood, dtype: object

## Goal

#### What I need is to loop through distanceFromAugustaGolf (df_Crime_Raw_Data['X'][i] , df_Crime_Raw_Data['X'][i])

#### Find  df_Neighborhood_Raw_Data['Distance from Crime'].idxmin() for each i

#### Add get shortest distance use the index to find Neighborhood and add it to df_Crime_Raw_Data['Neighborhood'][i]





# Below are coding ideas I tried and failed

In [90]:
def Nearest_Neighbourhood_Finder (lat,lon,Neighbor_df):

    #Libraries Required
    import math
    from numpy import ndarray
    
    #CONSTANTS
    R = 6371000;
    
    #Arrays Required
    distance_CalArray = ndarray((len(Neighbor_df),),float)
    dLat = ndarray((len(Neighbor_df),),float)
    dLon = ndarray((len(Neighbor_df),),float)
    a = ndarray((len(Neighbor_df),),float)
    c = ndarray((len(Neighbor_df),),float)
    d = ndarray((len(Neighbor_df),),float)
    d = ndarray((len(Neighbor_df),),float)
    Distance_Array = ndarray((len(Neighbor_df),),float)
    #Neighbor_df = ndarray((len(Neighbor_df),),float)
    
    #Data Frame Required
    #Principal_df = pd.DataFrame(columns = 'Loop Through Figures')
    
    #Initializing First Neigborhood
    Neighbor_df['Distance from Crime'] = np.nan
    #Nearest_Neighborhood = Neighbor_df.loc['Nearest Neighbourhood'][0] #Related to second part
    
    #Loop through Nearest Neighbourhood Data Frame
    for i in range (len(Neighbor_df)):
        dLat[i] = (Neighbor_df['Longitude'][i]-lat) * math.pi / 180 
        dLon[i] = (Neighbor_df['Latitude'][i]-lon) * math.pi / 180 
        a[i] = math.sin(dLat[i]/2) * math.sin(dLat[i]/2) + math.cos(lat * math.pi / 180 ) * math.cos(Neighbor_df['Longitude'][i] * math.pi / 180 ) * math.sin(dLon[i]/2) * math.sin(dLon[i]/2)
        c[i] = 2 * math.atan2(math.sqrt(a[i]), math.sqrt(1-a[i]))
        #Neighbor_df['Loop Through Figures'][i] = (R * c[i])/1000 #Go old School
        #Distance_Array[i] = (R * c[i])/1000
        
        
        Neighbor_df['Distance from Crime'][i] = (R * c[i])/1000
        #Nearest_Neighborhood_StepOne = Neighbor_df['Nearest Neighbourhood'][Neighbor_df['Loop Through Figures'].idxmin()]
        #return Nearest_Neighborhood_StepOne
        return Neighbor_df.head()
        
        
        
        #Neighbor_df[i] = Distance_Array[i]
        #return Distance_Array
        #Was second part
        #Nearest_Neighborhood_StepOne = Neighbor_df['Nearest Neighbourhood'][Neighbor_df['Loop Through Figures'].idxmin()]
        #return Nearest_Neighborhood_StepOne

In [91]:
TESTING = Nearest_Neighbourhood_Finder (33.472074, -82.066007,df_Neighborhood_Raw_Data)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [98]:
TESTING 
#Result not listing 'Distance from Crime'

Unnamed: 0,Nearest Neighbourhood,City,State,Coordinates,Longitude,Latitude,Distance from Crime
0,Academy Baker Ave,Augusta,Georgia,"33.4836288, -82.0845725",33.483629,-82.084572,2.148432
1,Albion Acres,Augusta,Georgia,"33.4125043, -82.0220507",33.412504,-82.022051,
2,Barton Chapel,Augusta,Georgia,"33.4424915, -82.0883992",33.442491,-82.088399,
3,Bath-Edie,Augusta,Georgia,"33.3333868, -82.2163489",33.333387,-82.216349,


In [89]:
def Nearest_Neighbourhood_Finder (lat,lon,Neighbor_df):

    #Libraries Required
    import math
    from numpy import ndarray
    
    #CONSTANTS
    R = 6371000;
    
    #Arrays Required
    distance_CalArray = ndarray((len(Neighbor_df),),float)
    dLat = ndarray((len(Neighbor_df),),float)
    dLon = ndarray((len(Neighbor_df),),float)
    a = ndarray((len(Neighbor_df),),float)
    c = ndarray((len(Neighbor_df),),float)
    d = ndarray((len(Neighbor_df),),float)
    d = ndarray((len(Neighbor_df),),float)
    Distance_Array = ndarray((len(Neighbor_df),),float)
    #Neighbor_df = ndarray((len(Neighbor_df),),float)
    
    #Data Frame Required
    #Principal_df = pd.DataFrame(columns = 'Loop Through Figures')
    
    #Initializing First Neigborhood
    Neighbor_df['Distance from Crime'] = np.nan
    #Nearest_Neighborhood = Neighbor_df.loc['Nearest Neighbourhood'][0] #Related to second part
    
    #Loop through Nearest Neighbourhood Data Frame
    for i in range (len(Neighbor_df)):
        dLat[i] = (Neighbor_df['Longitude'][i]-lat) * math.pi / 180 
        dLon[i] = (Neighbor_df['Latitude'][i]-lon) * math.pi / 180 
        a[i] = math.sin(dLat[i]/2) * math.sin(dLat[i]/2) + math.cos(lat * math.pi / 180 ) * math.cos(Neighbor_df['Longitude'][i] * math.pi / 180 ) * math.sin(dLon[i]/2) * math.sin(dLon[i]/2)
        c[i] = 2 * math.atan2(math.sqrt(a[i]), math.sqrt(1-a[i]))
        #Neighbor_df['Loop Through Figures'][i] = (R * c[i])/1000 #Go old School
        #Distance_Array[i] = (R * c[i])/1000
        
        
        #Neighbor_df['Distance from Crime'][i] = (R * c[i])/1000
        Nearest_Neighborhood_StepOne = Neighbor_df['Nearest Neighbourhood'][Neighbor_df['Distance from Crime'].idxmin()]
        return Nearest_Neighborhood_StepOne
        #return Neighbor_df.head()
        

        
        #Neighbor_df[i] = Distance_Array[i]
        #return Distance_Array
        #Was second part
        #Nearest_Neighborhood_StepOne = Neighbor_df['Nearest Neighbourhood'][Neighbor_df['Loop Through Figures'].idxmin()]
        #return Nearest_Neighborhood_StepOne

In [88]:
TESTING = Nearest_Neighbourhood_Finder (33.472074, -82.066007,df_Neighborhood_Raw_Data)

TypeError: cannot do label indexing on <class 'pandas.core.indexes.range.RangeIndex'> with these indexers [nan] of <class 'float'>

In [95]:
def Nearest_Neighbourhood_Finder (lat,lon,Neighbor_df):

    #Libraries Required
    import math
    from numpy import ndarray
    
    #CONSTANTS
    R = 6371000;
    
    #Arrays Required
    distance_CalArray = ndarray((len(Neighbor_df),),float)
    Distance_Array = ndarray((len(Neighbor_df),),float)
    #Neighbor_df = ndarray((len(Neighbor_df),),float)
    
    #Data Frame Required
    #Principal_df = pd.DataFrame(columns = 'Loop Through Figures')
    
    #Initializing First Neigborhood
    Neighbor_df['Distance from Crime'] = np.nan
    #Nearest_Neighborhood = Neighbor_df.loc['Nearest Neighbourhood'][0] #Related to second part
    
    #Loop through Nearest Neighbourhood Data Frame
    for i in range (len(Neighbor_df)):
        dLat = (Neighbor_df['Longitude'][i]-lat) * math.pi / 180 
        dLon = (Neighbor_df['Latitude'][i]-lon) * math.pi / 180 
        a = math.sin(dLat/2) * math.sin(dLat/2) + math.cos(lat * math.pi / 180 ) * math.cos(Neighbor_df['Longitude'][i] * math.pi / 180 ) * math.sin(dLon/2) * math.sin(dLon/2)
        c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
        #Neighbor_df['Loop Through Figures'][i] = (R * c[i])/1000 #Go old School
        #Distance_Array[i] = (R * c[i])/1000
        
        
        Neighbor_df['Distance from Crime'][i] = (R * c)/1000
        #Nearest_Neighborhood_StepOne = Neighbor_df['Nearest Neighbourhood'][Neighbor_df['Loop Through Figures'].idxmin()]
        #return Nearest_Neighborhood_StepOne
        return Neighbor_df.head()
        
        
        
        #Neighbor_df[i] = Distance_Array[i]
        #return Distance_Array
        #Was second part
        #Nearest_Neighborhood_StepOne = Neighbor_df['Nearest Neighbourhood'][Neighbor_df['Loop Through Figures'].idxmin()]
        #return Nearest_Neighborhood_StepOne

In [96]:
TESTING = Nearest_Neighbourhood_Finder (33.472074, -82.066007,df_Neighborhood_Raw_Data)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [97]:
TESTING 
#Result not listing 'Distance from Crime'

Unnamed: 0,Nearest Neighbourhood,City,State,Coordinates,Longitude,Latitude,Distance from Crime
0,Academy Baker Ave,Augusta,Georgia,"33.4836288, -82.0845725",33.483629,-82.084572,2.148432
1,Albion Acres,Augusta,Georgia,"33.4125043, -82.0220507",33.412504,-82.022051,
2,Barton Chapel,Augusta,Georgia,"33.4424915, -82.0883992",33.442491,-82.088399,
3,Bath-Edie,Augusta,Georgia,"33.3333868, -82.2163489",33.333387,-82.216349,


In [21]:
Distance_Cal_Crime = ndarray((len(df_Neighborhood_Raw_Data),),float)
distance_CalArray = ndarray((len(df_Neighborhood_Raw_Data),),float)
for i in range (len(df_Neighborhood_Raw_Data)):
    distance_CalArray[i] = distanceFromAugustaGolf (df_Neighborhood_Raw_Data['Latitude'][i],df_Neighborhood_Raw_Data['Longitude'][i])
    Distance_Cal_Crime[i] = distance_CalArray[i]

In [22]:
 Distance_Cal_Crime

array([ 3.38011582,  7.38067038,  5.07223768, 22.05146859])