# **Automated Location Analysis Tool**


In [1]:
import requests
import pandas as pd
import numpy as np

## Overview 
The goal of the project is to create a dashboard, which outputs an automated location analysis for a specified address. The tool should be designed such that it is useful for a variety of actors, for whom a structured assessment of a location presents beneficial (productivity improvement). For instance, it could be useful for: 
- the general public: e.g. holiday makers assessing hotels or someone who wants to assess a potential new flat
- real estate agents researching attractive features of some property to publish in adverts
- office managers assessing potential office locations or researching amenities, e.g. to provide some suggestion to employees
- architects or city planners assessing the surrounding area of a site to inform decisions regarding potential uses
- corporates assessing possible locations for

Overall, depending on the area of application, the requirements regarding outputs for such a tool/dashboard might differ, as are the relative importance of features considered. Such could, for instance, be a representative/central location, rental prices, public transport connectivity, and the location's surrounding area with all amenities that it has to offer. 

In the most basic version, the dashboard will showcase an objective assessment of the latter two features. Especially considering the surrounding area, the relative importance of different categories of amenities will differ depending on user groups. Thus, aside from inputting the location (in the format (street, no., city, (country))), the user should choose a selection of amenities in order to allow for some individualization. 

The output will be in the form of a sorted list/table including the identified feature and its distance as the crow flies and via road to the location. 

If time permits, measures will be created that give some insight into the performance of a location regarding the other features. Also, reference locations could be generated to provide some measure of relative performance.





## Project Management:

### TP 0 - Structure
- Goals: 
    - create necessary directory structure to facilitate streamlined execution of the program
    - split tasks into various task points and identify points that can be worked on in parallel such as to optimise the time until completion of the basic model

### TP 1 - Input Processing (geopy)
- Goal: take a raw input by the user and transform it into the necessary format needed for subsequent analyses

- UI: text input - street, number, city, country (transformed into lon/lat coordinates using geopy) 
    - at this stage no output shown, will be passed silently to next task 
    - store adress in nice formatted way 



#### Geopy and Nominatim


In [2]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent='automated_location_analysis')
location1 = geolocator.geocode('Rudi Dutschke str, 26, berlin')

type(location1)

geopy.location.Location

In [3]:
location1.longitude

13.3913836

In [4]:
location1.latitude

52.506883

### TP 2 - Overpass API (overpy)
- Goal: query the overpass API using overpy to get a selection of different amenities within a specified radius around the location

- Subtasks:
    - figure out how to use the long/lat output from TP 1 to specify an **area to be queried**
    - collect different **types of amenities** from the overpass language guide/documentation (node types)
    - construct query
        - output json
        - parse names of amenities into list with lat/lon coordinates

    
- UI: suggestion - tick boxes to select amenities
    - output in background: df with the name, category, address, lat/lon
    - should check computing time for different settings, likely only a handful feasible, especially in really central locations
    - especially for the routing, limited capacity; unless install on local machine
    


#### Overpass request

Structure:
```python 
[out:json][timeout:25];
(
  node["amenity"="post_box"]({{bbox}});
  way["amenity"="post_box"]({{bbox}});
  relation["amenity"="post_box"]({{bbox}});
);
out body;
>;
out skel qt;
```

To define the query: 
- need to collect the nodes, ways, relations we want to collect
- define the bbox 
- out statement (e.g. out center to get single coordinate per object, likely preferable)

In [5]:
%%time
overpass_url = 'https://z.overpass-api.de/api/interpreter'

overpass_query = """
[out:json];
area["ISO3166-1"="DE"][admin_level=2];
(node["amenity"="biergarten"](area);
 way["amenity"="biergarten"](area);
 rel["amenity"="biergarten"](area);
);
out center;
"""
response = requests.get(overpass_url, 
                        params={'data': overpass_query})

data = response.json()

CPU times: user 39.9 ms, sys: 9.52 ms, total: 49.4 ms
Wall time: 1min 2s


In [None]:
%%time
import overpy
api = overpy.Overpass()

query = """
[out:json];
area["ISO3166-1"="DE"][admin_level=2];
(node["amenity"="biergarten"](area);
 way["amenity"="biergarten"](area);
 rel["amenity"="biergarten"](area);
);
out center;"""

result = api.query(query)

In [None]:
result.get_nodes()

### TP 3 - Get distance
- Goal: use the lon/lat obtained in TP 1 and 
    1. compute the distance as the crow flies to the location of interest using geopy
    2. compute the distance via road using a to be selected API (e.g. openrouteservice)
    
- UI: output list/table/df; 
    - first location outputted in nice format
    - list/table/df with ammenities sorted by category (possibly split in multiple dfs)
        - columns: name, (category), address, distance as the crow flies, distance by road, approx time using modes of transport (simple interact with average speed factors for walking, bike, e-scooter)

In [None]:
%%time
import openrouteservice
from openrouteservice import convert

coords = ((8.34234,48.23424),(8.34423,48.26424))

client = openrouteservice.Client(key='5b3ce3597851110001cf62482818c293528942238de6f690d9ec3b11') # Specify your personal API key

# decode_polyline needs the geometry only
geometry = client.directions(coords, profile='cycling-regular')['routes'][0]['geometry']

decoded = convert.decode_polyline(geometry)



In [None]:
import requests
mode = 'foot-walking'
url = f'https://api.openrouteservice.org/v2/directions/{mode}'
#headers = {
#    'Accept': 'application/json, application/geo+json, application/gpx+xml, img/png; charset=utf-8',
#}
params = {
    'api_key': '5b3ce3597851110001cf62482818c293528942238de6f690d9ec3b11',
    'start': '13.3913836,52.506883',
    'end': '13.3889505, 52.5151154', 
}
call = requests.get(url, params=params)#, headers=headers)

In [None]:
call

#### Openrouteservice

In [None]:
%%time
import requests
mode = 'foot-walking'
base_url = f'https://api.openrouteservice.org/v2/directions/{mode}'
body = {"coordinates":[[8.681495,49.41461],[8.686507,49.41943],[8.687872,49.420318]]}

headers = {
    'Accept': 'application/json, application/geo+json, application/gpx+xml, img/png; charset=utf-8',
    'Authorization': '5b3ce3597851110001cf62482818c293528942238de6f690d9ec3b11',
    'Content-Type': 'application/json; charset=utf-8'
}

result = requests.post(base_url, json=body,headers=headers)

In [None]:
result.json()

In [None]:
%%time

'''
OLD (alternative with post request)
'''
import requests
mode = 'foot-walking'
base_url = f'https://api.openrouteservice.org/v2/directions/{mode}'
body = {"coordinates":[[8.681495,49.41461],[8.687872,49.420318]]}
headers = {
    'Accept': 'application/json, application/geo+json, application/gpx+xml, img/png; charset=utf-8',
    'Authorization': '5b3ce3597851110001cf62482818c293528942238de6f690d9ec3b11',
    'Content-Type': 'application/json; charset=utf-8'
}

result = requests.post(base_url, json=body,headers=headers)
result.json()

In [None]:
import requests
mode = 'foot-walking'
url = f'https://api.openrouteservice.org/v2/directions/{mode}'
#headers = {
#    'Accept': 'application/json, application/geo+json, application/gpx+xml, img/png; charset=utf-8',
#}
params = {
    'api_key': '5b3ce3597851110001cf62482818c293528942238de6f690d9ec3b11',
    'start': '13.3913836,52.506883',
    'end': '13.3889505, 52.5151154', 
}
call = requests.get(url, params=params)

#### Routing Functions

In [2]:
%%time 
import requests
api_key = '5b3ce3597851110001cf62482818c293528942238de6f690d9ec3b11'
mode = 'cycling-electric'

def routing(mode,lat_start,lon_start,lat_end,lon_end,api_key):
    ''' 
    The function takes mode of transport as well coordinates for
    the start and end point of a routing request as input.

    The intermediate output is a geoJSON file, which subsequently 
    '''
    # specify mode and base-url
    url = f'https://api.openrouteservice.org/v2/directions/{mode}'

    # specify the start and end coordinates
    ## start
    lat_start = lat_start
    lon_start = lon_start
    ## end
    lat_end = lat_end
    lon_end = lon_end
    

    # create params list for query
    params = {
        'api_key': api_key,
        'start': f'{lon_start},{lat_start}',
        'end': f'{lon_end},{lat_end}',
    }
    # make request
    result_geojson = requests.get(url,params=params)
    return result_geojson

CPU times: user 11 µs, sys: 1e+03 ns, total: 12 µs
Wall time: 16.9 µs


In [3]:
%%time
def routing_mode_transport(lat_start,lon_start,lat_end,lon_end,api_key):
    '''
    This function utilises the routing function to make the routing api request
    and outputs the geojsons to be used in the distance_duration_road
    function.
    '''
    modes_transport = ['foot-walking','cycling-regular','cycling-electric']
    modes_geojsons = []
    for mode in modes_transport:
        result = routing(mode,lat_start,lon_start,lat_end,lon_end,api_key)
        modes_geojsons.append(result)
    return modes_geojsons
    

CPU times: user 7 µs, sys: 1e+03 ns, total: 8 µs
Wall time: 16 µs


In [4]:
%%time
import requests
def dist_dur_road(result_geojson):
    ''' 
    The function uses the output of the routing api request 
    to extract the distance by road in km and the travel-time in 
    minutes between the location of interest and an amenity 
    utilising the mobility mode selected (walk, bike, e-bike) 
    '''
    # return json and transform distance to km
    # duration to minutes
    distance = round((result_geojson.json()['features'
                          ][0]['properties'
                              ]['summary'
                               ]['distance']) / 1000, 2)

    duration = round((result_geojson.json()['features'
                          ][0]['properties'
                      
                              ]['summary'
                               ]['duration']) / 60, 2)
    return distance, duration

CPU times: user 10 µs, sys: 1 µs, total: 11 µs
Wall time: 14.1 µs


In [5]:
%%time
def dist_dur_all_modes(lat_start,lon_start,lat_end,lon_end,api_key):
    '''
    Puts all the other information together to output a list containing
    the distance and duration for 3 modes of transport (walk, cycle, 
    e-cycle)
    '''
    geojsons = routing_mode_transport(lat_start,lon_start,lat_end,lon_end,api_key)
    dist_dur = []
    for geojson in geojsons:
        dist_dur.append(dist_dur_road(geojson))
    return dist_dur

CPU times: user 7 µs, sys: 2 µs, total: 9 µs
Wall time: 16.2 µs


In [28]:
dist_dur_all_modes('foot-walking',52.506883, 13.3913836, 52.5151154, 13.3889505,'5b3ce3597851110001cf62482818c293528942238de6f690d9ec3b11')

<Response [200]>

#### Next steps:
- create dummy dataframe to use as input for the above to use the above functions to create a final dataframe
- create a function that uses the dataframe as the input 

1. have the dataframe
2. want to add distance and duration via different modes to the location from the location inputted
    - possible way:
        - create array that contains the distance and duration information for each mobility mode
        - add as column to df

In [94]:
file_path = 'data/draft_data.csv'
df_test = pd.read_csv(file_path)
df_test = df_test.drop(columns=['Unnamed: 0','@id'])
df_test = df_test.replace(np.nan, 'Not Available')
df_test = df_test[0:12]

In [95]:
df_test

Unnamed: 0,amenity,name,@lat,@lon,contact:phone,contact:website,addr:city,addr:street
0,restaurant,Bocca di Bacco,52.515115,13.38895,+49 30 206 72828,http://www.boccadibacco.de/,Berlin,Friedrichstraße
1,restaurant,Restaurant Borchardt,52.515003,13.390388,+49 30 81886262,Not Available,Berlin,Französische Straße
2,restaurant,Charlotte & Fritz,52.515037,13.390772,Not Available,https://charlotteundfritz.com/,Berlin,Charlottenstraße
3,restaurant,Aigner,52.514689,13.390928,Not Available,Not Available,Berlin,Französische Straße
4,restaurant,Lutter & Wegner,52.513212,13.391178,Not Available,Not Available,Berlin,Charlottenstraße
5,restaurant,Malatesta,52.512405,13.391307,Not Available,Not Available,Berlin,Charlottenstraße
6,restaurant,Fellini,52.512054,13.392276,Not Available,Not Available,Not Available,Not Available
7,restaurant,Shan's Bistro,52.513082,13.393553,Not Available,Not Available,Not Available,Not Available
8,restaurant,Maximilians,52.511344,13.389181,+49 30 20450559,https://www.maximilians-berlin.de/,Berlin,Friedrichstraße
9,restaurant,Izumi Sushi Bar,52.511292,13.388712,Not Available,Not Available,Berlin,Kronenstraße


##### Step 1: define the center location 

In [17]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent='automated_location_analysis')
location_start = geolocator.geocode('rudi dutschke str 26 , berlin')

# define lat and lon start
lat_start, lon_start = (location_start.latitude, location_start.longitude)
location_address = location_start.address

##### Step 2: take information from the dataframe (lat_end, lon_end)

In [18]:
lat_end, lon_end = float(df_test.loc[0,'@lat']), float(df_test.loc[0,'@lon'])

##### Step 3: use the above information and functions to make get the distance and duration

In [27]:
%%time
api_key = '5b3ce3597851110001cf62482818c293528942238de6f690d9ec3b11'
dist_dur_all_modes(lat_start,lon_start,lat_end,lon_end,api_key)

KeyError: 'features'

##### Step 4: split the above information


In [23]:
dist_walk = []
dur_walk = []
dist_cycl_reg = []
dur_cycl_reg = []
dist_cycl_e = []
dur_cycl_e =[]
request = dist_dur_all_modes(lat_start,lon_start,lat_end,lon_end,api_key)

In [24]:
dist_walk.append(request[0][0])
dur_walk.append(request[0][1])
dist_cycl_reg.append(request[1][0])
dur_cycl_reg.append(request[1][1])
dist_cycl_e.append(request[2][0])
dur_cycl_e.append(request[2][1])

##### Step 5: combine the above and loop over the dataframe to get the arrays

In [99]:
%%time
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent='automated_location_analysis')
location_start = geolocator.geocode('rudi dutschke str 26 , berlin')

# define lat and lon start
lat_start, lon_start = (location_start.latitude, location_start.longitude)
location_address = location_start.address

api_key = '5b3ce3597851110001cf62482818c293528942238de6f690d9ec3b11'

# create empty lists to hold the results of the calls
dist_walk = []
dur_walk = []
dist_cycl_reg = []
dur_cycl_reg = []
dist_cycl_e = []
dur_cycl_e =[]
count = 0

for i in range((df_test.shape[0])):
    lat_end, lon_end = float(df_test.loc[i,'@lat']), float(df_test.loc[i,'@lon'])
    request = dist_dur_all_modes(lat_start,lon_start,lat_end,lon_end,api_key)
    dist_walk.append(request[0][0])
    dur_walk.append(request[0][1])
    dist_cycl_reg.append(request[1][0])
    dur_cycl_reg.append(request[1][1])
    dist_cycl_e.append(request[2][0])
    dur_cycl_e.append(request[2][1])

CPU times: user 959 ms, sys: 146 ms, total: 1.1 s
Wall time: 10.3 s


##### Step 6: Define functions to transform data

In [96]:
def transform_km(float_km):
    str_km = f"{str(float_km)} km"
    return str_km

def transform_min(float_min):
    float_min_s = str(int(float(str(float_min).split('.')[1])/100*60))
    float_min_min = str(int(str(float_min).split('.')[0]))
    str_min = f'{float_min_min} min {float_min_s} s'
    return str_min

##### Step 7: Add the arrays to the dataframe and transform the data

In [97]:
df_test['dist_walk'] = dist_walk
df_test['dist_walk'] = df_test['dist_walk'].apply(lambda x: transform_km(x))

df_test['dur_walk'] = dur_walk
df_test['dur_walk'] = df_test['dur_walk'].apply(lambda x: transform_min(x))

df_test['dist_cycl_reg'] = dist_cycl_reg
df_test['dist_cycl_reg'] = df_test['dist_cycl_reg'].apply(lambda x: transform_km(x))

df_test['dur_cycl_reg'] = dur_cycl_reg
df_test['dur_cycl_reg'] = df_test['dur_cycl_reg'].apply(lambda x: transform_min(x))

df_test['dist_cycl_e'] = dist_cycl_e
df_test['dist_cycl_e'] = df_test['dist_cycl_e'].apply(lambda x: transform_km(x))

df_test['dur_cycl_e'] = dur_cycl_e
df_test['dur_cycl_e'] = df_test['dur_cycl_e'].apply(lambda x: transform_min(x))

In [98]:
df_test

Unnamed: 0,amenity,name,@lat,@lon,contact:phone,contact:website,addr:city,addr:street,dist_walk,dur_walk,dist_cycl_reg,dur_cycl_reg,dist_cycl_e,dur_cycl_e
0,restaurant,Bocca di Bacco,52.515115,13.38895,+49 30 206 72828,http://www.boccadibacco.de/,Berlin,Friedrichstraße,1.14 km,13 min 42 s,1.15 km,4 min 5 s,1.15 km,3 min 27 s
1,restaurant,Restaurant Borchardt,52.515003,13.390388,+49 30 81886262,Not Available,Berlin,Französische Straße,1.02 km,12 min 12 s,1.02 km,3 min 23 s,1.02 km,2 min 46 s
2,restaurant,Charlotte & Fritz,52.515037,13.390772,Not Available,https://charlotteundfritz.com/,Berlin,Charlottenstraße,0.99 km,11 min 53 s,0.99 km,3 min 1 s,0.99 km,2 min 4 s
3,restaurant,Aigner,52.514689,13.390928,Not Available,Not Available,Berlin,Französische Straße,0.95 km,11 min 2 s,0.95 km,3 min 10 s,0.95 km,2 min 35 s
4,restaurant,Lutter & Wegner,52.513212,13.391178,Not Available,Not Available,Berlin,Charlottenstraße,0.79 km,9 min 27 s,0.79 km,2 min 37 s,0.79 km,2 min 9 s
5,restaurant,Malatesta,52.512405,13.391307,Not Available,Not Available,Berlin,Charlottenstraße,0.69 km,8 min 20 s,0.69 km,2 min 18 s,0.69 km,1 min 5 s
6,restaurant,Fellini,52.512054,13.392276,Not Available,Not Available,Not Available,Not Available,0.72 km,8 min 35 s,0.72 km,2 min 24 s,0.72 km,1 min 58 s
7,restaurant,Shan's Bistro,52.513082,13.393553,Not Available,Not Available,Not Available,Not Available,0.88 km,10 min 31 s,0.88 km,2 min 56 s,0.88 km,2 min 2 s
8,restaurant,Maximilians,52.511344,13.389181,+49 30 20450559,https://www.maximilians-berlin.de/,Berlin,Friedrichstraße,0.74 km,8 min 53 s,0.75 km,2 min 29 s,0.75 km,2 min 3 s
9,restaurant,Izumi Sushi Bar,52.511292,13.388712,Not Available,Not Available,Berlin,Kronenstraße,0.79 km,9 min 30 s,0.8 km,2 min 43 s,0.8 km,2 min 13 s


In [92]:
str_test = 1.03
str(int(str(str_test).split('.')[0]))

'1'

### TP 4 - (Optional) Multi-Criteria Decision Analysis
1. generate reference locations or loop over some list of scraped office locations in berlin and generate reference score or distribution (alternatively also create raster and compute for all boxes some score)
2. centrality of the location: generate random location within bounding box and get distance to inputted location

# References

- [Immobilienscout24](https://www.immobilienscout24.de/gewerbe/ratgeber/standortanalyse/standortkriterien-buero.html)