## Notebook for Open Data Extraction - Emotional Cities Project

In [6]:
import pandas as pd
import numpy as np
import os
import torch
from PIL import Image
from tqdm.notebook import tqdm  # Use tqdm.notebook for Jupyter integration
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation

## Mapillary 

Remember to get mappilary api token, and replace teh xx in teh mapillary script with yours :) 

In [None]:
# Downloading images 

# input your own paths here
# Adjust number of images to download, image size and radius

!python3 mapillary.py --dest_dir "/your/dest_dir/path" --input_file "/path/to/input/file/with/Coordinates.csv" --image_size 2048 --n_images 250 --radius 24

# This will give you directories for each coordinate, that have all images downloaded with different imageIds 

#### class descriptions
class_descriptions = {
    13: "Road Features", 24: "Road Features", 41: "Road Features",  # Road, Lane Marking, Manhole
    2: "Pedestrian Areas", 15: "Pedestrian Areas",  # Sidewalk, Curb
    17: "Building",  # Building
    6: "Wall",  # Wall
    3: "Fence",  # Fence
    45: "Pole", 47: "Pole",  # Pole, Utility Pole
    48: "Traffic Light",  # Traffic Light
    50: "Traffic Sign",  # Traffic Sign (Front)
    30: "Vegetation",  # Vegetation
    29: "Terrain",  # Terrain
    27: "Sky",  # Sky
    19: "Person",  # Person
    20: "Riders", 21: "Riders", 22: "Riders",  # Bicyclist, Motorcyclist, Other Rider
    55: "Car",  # Car
    61: "Truck",  # Truck
    54: "Bus",  # Bus
    58: "On Rails",  # On Rails
    57: "Motorcycle",  # Motorcycle
    52: "Bicycle",  # Bicycle
    1: "Car",  # Caravan
    53: "Truck"  # Trailer
}

In [None]:
# Segmentation to get percentage of features for each coordinate (averaged over all images) 

!python3 mapillarySegmentation.py --input_dir "/path/to/directory/with/Images" --dest_dir "/your/dest_dir/path"

# This will give you a csv file with all the features and their percentages for each coordinate

# The approach could be changed if there are other opininions on how thsi could be done, hence cumulative or normalised mean could be used
# also classes can be adjusted to get more or less features, in teh segmentation.py file

##### Determening amount of images needed to have stable std, based on one coordinate that is known to be complex

In [2]:
import os
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
import pandas as pd
from tqdm import tqdm
import numpy as np

# Initialize the processor and model globally
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-mapillary-vistas-semantic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-mapillary-vistas-semantic")

def process_image(image_path):
    """Process a single image and return semantic segmentation results as percentages."""
    image = Image.open(image_path)
    inputs = processor(images=image, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    predicted_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]

    unique_classes, counts = torch.unique(predicted_map, return_counts=True)
    total_pixels = counts.sum().item()
    features = {}

    class_descriptions = {
        13: "Road Features", 24: "Road Features", 41: "Road Features",
        2: "Pedestrian Areas", 15: "Pedestrian Areas",
        17: "Building",
        6: "Wall", 3: "Fence", 45: "Pole", 47: "Pole",
        48: "Traffic Light", 50: "Traffic Sign",
        30: "Vegetation", 29: "Terrain", 27: "Sky",
        19: "Person", 20: "Riders", 21: "Riders", 22: "Riders",
        55: "Car", 61: "Truck", 54: "Bus", 58: "On Rails", 57: "Motorcycle", 52: "Bicycle"
    }

    for cls, count in zip(unique_classes.numpy(), counts.numpy()):
        label = class_descriptions.get(cls, "Unknown")
        if label != "Unknown":
            if label not in features:
                features[label] = 0
            features[label] += count.item()

    features = {k: (v / total_pixels) * 100 for k, v in features.items()}
    return features

def aggregate_and_check_stability(input_dir):
    """Aggregate features over images and check stability of mean feature percentages."""
    feature_list = []
    images = [img for img in os.listdir(input_dir) if img.lower().endswith(('.jpg', '.jpeg', '.png'))]
    
    for i, img_name in enumerate(tqdm(images, desc="Processing images")):
        img_path = os.path.join(input_dir, img_name)
        features = process_image(img_path)
        feature_list.append(features)
        
        if len(feature_list) > 1:
            df = pd.DataFrame(feature_list)
            mean_features = df.mean()
            std_dev = mean_features.std()
            if std_dev < 0.05:  # Stability threshold
                print(f"Stability achieved with {i+1} images.")
                break

    print(f"Processed {i+1} images. Current Mean Std Deviation: {std_dev}")

# Define the directory containing images
input_dir = '/home/s184310/3.Project/data/testing_data/nørreportTest/55.68333_12.57167'
aggregate_and_check_stability(input_dir)

Processing images: 100%|██████████| 226/226 [05:07<00:00,  1.36s/it]

Processed 226 images. Current Mean Std Deviation: 8.72959947935563





## Open Street Maps

GET CARLSO TO GIVE YOU ACCES, it's already ready :) 

For this you need acess to the server where there is a map of teh city your are working with, thi is in order to work around teh limited number of times you can ping osm.
Data will thereby be extracted a local copy of OSM running on a private server. This setup allows for much faster data extraction processes which would take days/weeks otherwise for very short list of points (I believe Overpass begins to throttle requests after x amounts of requests are done). Once everything is set up the code can be run, linking to the private server endpoint. For the private server of OSM/Overpass I refer you to this repo: https://github.com/wiktorn/Overpass-API  . There is also this page that might help https://wiki.openstreetmap.org/wiki/Overpass_API/Installation


### Urban Metrics

In [None]:
# Retrieveing Urban metrics for csv file of coordinates

# Again input your file paths and adjust the number of nodes and radius (osm works with nodes that you can find withing a certain radius)
# Network can be adjusted to be walk, bike, drive or all 
!python3 OsmUrbanMetrics.py --dest_dir="/path" --input_file="/path/coordinates.csv" --radius=49 --num_nodes=55 --network="all"

# This will give you a csv file with all the urban metrics for each coordinate

### POIs

For now this just retieves poir and then cleanes tehm based on the tags you want (see tags.py). Later on (after information from portuga) there will me metrics calculated from teh POI's retreived liek vobrancy etc.

In [None]:
!python3 OsmPoiRetrieval.py --dest_dir="/path" --input_file="/path/coordinates.csv" --radius=72

In [None]:
!python3 OsmPoiTidy.py --dest_dir="/path" --input_file="/pathpoiRetrieved.csv"