- Identify the Geospatial Attributes You Need: **shopping centre, schools, restaurants, public transportation, hospitals,**


- Use the Overpass API with overpy2 Wrapper:

This will help you fetch data from OSM. You might need to learn the query language for the Overpass API, but there are plenty of examples online.

- Consider Web Scraping for Additional Data:

If you find that some geospatial data is not available on OSM or is inconsistent, you can consider using Selenium to scrape relevant websites. Remember to always check the website's robots.txt file and terms of service to ensure you're allowed to scrape it.

- Check Local or National Data Portals:

These portals, like Open Data DK for Denmark, might have datasets that are relevant to your analysis. They might offer direct downloads or APIs for data access.

# Overview of the code
### 1.: This section fethcing and structuring data for malls (for practical purposes, I started with just the malls)
### 2.: Generalized approach, applying same methods on schools, restaurants, public transportaton and hospitals
### 3.: Combine malls, schools, restaurant, public transporation and hospitals data

In [44]:
#pip install overpy
# Remove "#" from the above if you have not installed overpy yet.

## 1. Data for malls

### First, we collect data for malls

In [17]:
import overpy

api = overpy.Overpass()

# Query for shopping centers
result = api.query("""
[out:json];
(
  node["shop"="mall"](around:10000,55.6761,12.5683);
  way["shop"="mall"](around:10000,55.6761,12.5683);
  relation["shop"="mall"](around:10000,55.6761,12.5683);
);
(._;>;);
out body;
""")

# Structure the data
nodes = [{'name': center.tags.get('name', 'n/a'), 'location': (center.lon, center.lat)} for center in result.nodes]
ways = [{'name': way.tags.get('name', 'n/a'), 'nodes': [(node.lon, node.lat) for node in way.nodes]} for way in result.ways]

# For relations, fetch associated ways from the result
relations = []
for relation in result.relations:
    relation_ways = [way for way in result.ways if way.id in [member.ref for member in relation.members if member.role == "outer"]]
    relations.append({'name': relation.tags.get('name', 'n/a'), 'ways': relation_ways})


In [18]:
def get_centroid(nodes):
    """Calculate the centroid of a list of nodes."""
    x_coords = [node[0] for node in nodes]
    y_coords = [node[1] for node in nodes]
    centroid_x = sum(x_coords) / len(nodes)
    centroid_y = sum(y_coords) / len(nodes)
    return (centroid_x, centroid_y)

malls_centroids = {}

# Handle nodes:
for node in nodes:
    mall_name = node['name']
    if mall_name not in malls_centroids:
        malls_centroids[mall_name] = node['location']

# Handle ways:
for way in ways:
    mall_name = way['name']
    if mall_name not in malls_centroids:
        malls_centroids[mall_name] = get_centroid(way['nodes'])

# Handle relations:
for relation in relations:
    mall_name = relation['name']
    if mall_name not in malls_centroids:
        # For each way in the relation, retrieve its nodes and calculate its centroid
        relation_centroids = [get_centroid([(node.lon, node.lat) for node in way.nodes]) for way in relation['ways']]
        malls_centroids[mall_name] = get_centroid(relation_centroids)

print(malls_centroids)


{'n/a': (Decimal('12.5310490'), Decimal('55.6818036')), 'Frederiksberg centeret': (Decimal('12.5333550'), Decimal('55.6822430')), 'Q-Park Illum': (Decimal('12.5795608'), Decimal('55.6797769')), 'K': (Decimal('12.4571370'), Decimal('55.6789057')), 'P': (Decimal('12.4579915'), Decimal('55.6799249')), 'Frihedens Butikscenter': (Decimal('12.48406003333333333333333333'), Decimal('55.626984875')), 'Frederiksberg Centret': (Decimal('12.53223812962962962962962963'), Decimal('55.68172073703703703703703704')), 'Rødovre Centrum': (Decimal('12.457133042'), Decimal('55.679388752')), 'Fisketorvet': (Decimal('12.56121518604651162790697674'), Decimal('55.66183853720930232558139535')), "Field's": (Decimal('12.57764880588235294117647059'), Decimal('55.63046102352941176470588235')), 'Nørrebro Bycenter': (Decimal('12.53812059622641509433962264'), Decimal('55.70324353207547169811320755')), 'Hvidovrevejs Butikstorv': (Decimal('12.4740983'), Decimal('55.65329371')), 'Hvidovre C': (Decimal('12.475971210714285

### 2. Now, we collect for the rest using same method

**Data Collection**

In [30]:
import overpy

api = overpy.Overpass()

def get_data_for_amenity(amenity, around_distance=10000, lat=55.6761, lon=12.5683):
    query = f"""
    [out:json];
    (
      node["amenity"="{amenity}"](around:{around_distance},{lat},{lon});
      way["amenity"="{amenity}"](around:{around_distance},{lat},{lon});
      relation["amenity"="{amenity}"](around:{around_distance},{lat},{lon});
    );
    (._;>;);
    out body;
    """
    return api.query(query)

school_data = get_data_for_amenity("school")
restaurant_data = get_data_for_amenity("restaurant")
bus_station_data = get_data_for_amenity("bus_station")
train_station_data = get_data_for_amenity("railway", "station")
hospital_data = get_data_for_amenity("hospital")


### Structure the data

In [33]:
def structure_data(raw_data):
    # Nodes
    nodes = [{'name': node.tags.get('name', 'n/a'), 'location': (node.lon, node.lat)} for node in raw_data.nodes]
    
    # Ways
    ways = [{'name': way.tags.get('name', 'n/a'), 'nodes': [(node.lon, node.lat) for node in way.nodes]} for way in raw_data.ways]
    
    # Relations
    relations = []
    for relation in raw_data.relations:
        relation_ways = [way for way in raw_data.ways if way.id in [member.ref for member in relation.members if member.role == "outer"]]
        relations.append({'name': relation.tags.get('name', 'n/a'), 'ways': relation_ways})
    
    return nodes, ways, relations

schools_nodes, schools_ways, schools_relations = structure_data(school_data)
restaurants_nodes, restaurants_ways, restaurants_relations = structure_data(restaurant_data)
bus_stations_nodes, bus_stations_ways, bus_stations_relations = structure_data(bus_station_data)
train_stations_nodes, train_stations_ways, train_stations_relations = structure_data(train_station_data)
hospitals_nodes, hospitals_ways, hospitals_relations = structure_data(hospital_data)


### Compute Centroids

In [39]:
def extract_centroids_from_parsed_data(nodes, ways, relations):
    centroids = {}

    # Handle nodes:
    for node in nodes:
        name = node['name']
        if name not in centroids:
            centroids[name] = node['location']

    # Handle ways:
    for way in ways:
        name = way['name']
        if name not in centroids:
            centroids[name] = get_centroid(way['nodes'])

    # Handle relations:
    for relation in relations:
        name = relation['name']
        if name not in centroids and relation['ways']:
            relation_centroids = [get_centroid([(node.lon, node.lat) for node in way.nodes]) for way in relation['ways']]
            centroids[name] = get_centroid(relation_centroids) or "No Centroid"

    return centroids

schools_centroids = extract_centroids_from_parsed_data(schools_nodes, schools_ways, schools_relations)
restaurants_centroids = extract_centroids_from_parsed_data(restaurants_nodes, restaurants_ways, restaurants_relations)
bus_stations_centroids = extract_centroids_from_parsed_data(bus_stations_nodes, bus_stations_ways, bus_stations_relations)
train_stations_centroids = extract_centroids_from_parsed_data(train_stations_nodes, train_stations_ways, train_stations_relations)
hospitals_centroids = extract_centroids_from_parsed_data(hospitals_nodes, hospitals_ways, hospitals_relations)

print("Schools:", schools_centroids)
print("Restaurants:", restaurants_centroids)
print("Bus Stations:", bus_stations_centroids)
print("Train Stations:", train_stations_centroids)
print("Hospitals:", hospitals_centroids)


Schools: {'n/a': (Decimal('12.6255513'), Decimal('55.6337732')), 'Instituttet for Blinde og Svagsynede - IBOS': (Decimal('12.5634854'), Decimal('55.7267963')), 'Fritidshjemmet Ærtebjerg': (Decimal('12.4628603'), Decimal('55.6350689')), 'Skolen på Nyelandsvej': (Decimal('12.5303254'), Decimal('55.6828139')), 'Trekronergade Freinetskole': (Decimal('12.5197857'), Decimal('55.6572129')), 'Skolen ved Milestedet': (Decimal('12.4506967'), Decimal('55.6700978')), 'Niels Brock': (Decimal('12.5745879'), Decimal('55.6825046')), 'Flyvevåbnets Officersskole': (Decimal('12.5730322'), Decimal('55.7174668')), "Krebs' Skole": (Decimal('12.5757128'), Decimal('55.6898694')), 'Christianshavn Døttreskole': (Decimal('12.5939114'), Decimal('55.6733005')), 'Frederiksberg HF-kursus': (Decimal('12.4985739'), Decimal('55.6799579')), 'Gefion Gymnasium': (Decimal('12.5813831'), Decimal('55.6882961')), 'Prins Henriks Skole': (Decimal('12.5496938'), Decimal('55.6739171')), 'Fabrikken': (Decimal('12.6141227'), Decima

### 3.: Now we combine the data and write to a CSV file

In [40]:
import csv

# Combine all data
all_centroids = {
    "Malls": malls_centroids,
    "Schools": schools_centroids,
    "Restaurants": restaurants_centroids,
    "Bus Stations": bus_stations_centroids,
    "Train Stations": train_stations_centroids,
    "Hospitals": hospitals_centroids
}

# Write data to CSV with UTF-8 encoding
with open("geospatial_data_wrong_order.csv", "w", newline='', encoding='utf-8') as csv_file:
    writer = csv.writer(csv_file)
    # Write header
    writer.writerow(["Type", "Name", "Longitude", "Latitude"])
    
    for category, centroids in all_centroids.items():
        for name, location in centroids.items():
            if location != "No Centroid":
                writer.writerow([category, name, location[0], location[1]])
            else:
                writer.writerow([category, name, "No Centroid", "No Centroid"])

print("Data written to 'geospatial_data_wrong_order.csv'")


Data written to 'geospatial_data_wrong_order.csv'


In [41]:
import pandas as pd

# Assuming you've loaded your data into a DataFrame named 'df'
df = pd.read_csv('geospatial_data_wrong_order.csv')

# Reordering the columns
df = df[['Type', 'Name', 'Latitude', 'Longitude']]

# Saving it back to the CSV (or to a new CSV if you prefer)
df.to_csv('geospatial_data_right_order.csv', index=False)