## Jupyter notebook 04: This notebook use the **Mapillary API** to request images and metadata given a set of points, and storing them in locally Database

[Mapillary API Documentation](https://www.mapillary.com/developer/api-documentation?locale=pt_PT)

#### Import the necessary libraries

In [3]:
# Import library and some pre-installed modules
import os
import sys
from IPython.display import display, Markdown

In [4]:
# Sets the root directory of the project as the working directory
os.chdir('..')

In [3]:
# Get current working directory
os.getcwd()

'/Users/darlanmnunes/Dev/DSc_git/PhD_Thesis_Step3_OSM_Toponyms'

In [5]:
# Import and Reload the modules to ensure any changes are reflected
import importlib

import src.mapillary_api as mapillary_api
import src.mapillary_tile_downloader as mapillary_tile_downloader
import src.mapillary_metadata_enricher as mapillary_metadata_enricher

importlib.reload(mapillary_api)
importlib.reload(mapillary_tile_downloader)
importlib.reload(mapillary_metadata_enricher)

<module 'src.mapillary_metadata_enricher' from '/Users/darlanmnunes/Dev/DSc_git/PhD_Thesis_Step3_OSM_Toponyms/src/mapillary_metadata_enricher.py'>

### Mapillary Coverage Tiles

 - Retrieve image points
 
 - [Stack Overflow - GIS (Exporting vector tiles or saving them locally using QGIS)](https://gis.stackexchange.com/questions/458377/exporting-vector-tiles-or-saving-them-locally-using-qgis)

In [38]:
# This code cell is used to process a specific area using the Mapillary Coverage tiles do retrieve
# the points of image acquisition and save them in a GeoPackage file.
# To run this code, ensure you have the necessary Mapillary API token and the the 
# module `src.mapillary_tile_downloader` is correctly implemented.

from src.mapillary_tile_downloader import (
    ler_token_mapillary,
    processar_area_abrangente,
    salvar_resultados
)
import geopandas as gpd
from shapely.geometry import shape
import time
import json

# Configurações
TOKEN = ler_token_mapillary()
ZOOM = 14 # Nível de zoom para a área de interesse (> 14)
OUTPUT_GPKG = "results/3_mapillary_coverage/mapillary_coverage_bh.gpkg"

# Carregar área de interesse (exemplo: bbox de um GeoJSON)
with open("data/input_code1/limite_bh.geojson") as f:
    geojson = json.load(f)

# Criar geometria e obter bbox
geom = shape(geojson['features'][0]['geometry'])
bbox = list(geom.bounds)  # [minx, miny, maxx, maxy]

print(f"Bbox simplificada: {bbox}")
print(f"Zoom: {ZOOM}")

# Processar toda a área
start_time = time.time()
gdf_pontos = processar_area_abrangente(bbox, TOKEN, ZOOM)
tempo_processamento = time.time() - start_time

print(f"\nProcessamento concluído em {tempo_processamento:.2f} segundos")
print(f"Total de pontos encontrados: {len(gdf_pontos)}")

# Salvar resultados
if not gdf_pontos.empty:
    salvar_resultados(gdf_pontos, OUTPUT_GPKG)
    
    # Visualizar amostra
    print("\nAmostra dos dados:")
    print(gdf_pontos.head())
else:
    print("Nenhum ponto encontrado na área")

Bbox simplificada: [-44.06327447695697, -20.05950536981058, -43.85721816338328, -19.77674605928553]
Zoom: 14
Processando 165 tiles para a área...
  → Processados 10/165 tiles
  → Processados 20/165 tiles
  → Processados 30/165 tiles
  → Processados 40/165 tiles
  → Processados 50/165 tiles
  → Processados 60/165 tiles
  → Processados 70/165 tiles
  → Processados 80/165 tiles
  → Processados 90/165 tiles
  → Processados 100/165 tiles
  → Processados 110/165 tiles
  → Processados 120/165 tiles
  → Processados 130/165 tiles
  → Processados 140/165 tiles
  → Processados 150/165 tiles
  → Processados 160/165 tiles

Processamento concluído em 194.97 segundos
Total de pontos encontrados: 1811900
Pontos salvos em results/3_mapillary_coverage/mapillary_coverage_bh.gpkg

Amostra dos dados:
           image_id    captured_at  compass_angle       creator_id  \
0   192623852640864  1621771233793     263.109192  100125502238568   
1   528484455000560  1621771251296      59.773071  100125502238568   

In [7]:
gdf_pontos.head()

Unnamed: 0,image_id,captured_at,compass_angle,creator_id,sequence_id,is_pano,organization_id,tile_z,tile_x,tile_y,geometry
0,3039220326296461,1621519575300,251.195389,100125502238568,zyg89gl3zsj6xkckgpbhmg,False,1805884000000000.0,14,6190,9114,POINT (-43.97377 -19.86441)
1,214577133714463,1621519424295,341.703125,100125502238568,926rkf3jvfeq6zheung4mw,False,1805884000000000.0,14,6190,9114,POINT (-43.96903 -19.86694)
2,5519452621463181,1485428228435,231.7662,109069567993884,XVjYQP8XpXmXfwI4uTeVVA,False,,14,6190,9114,POINT (-43.96766 -19.86764)
3,163825782344898,1571996030669,0.0,102898865287539,kypzdup0a0yhmbf7rvduwa,False,,14,6190,9114,POINT (-43.97356 -19.86392)
4,788800808506055,1621513606289,342.171326,100125502238568,6aa5skxfbh3vq3gdd60azm,False,1805884000000000.0,14,6190,9114,POINT (-43.97382 -19.86371)


In [9]:
import geopandas as gpd

# Importar o GeoDataFrame resultante do processamento
INPUT_GPKG = "results/3_mapillary_coverage/mapillary_coverage_bh_tmp2.gpkg"
gdf = gpd.read_file(INPUT_GPKG, layer='mapillary_coverage_bh')
# Exibir informações do GeoDataFrame
print(f"Total de pontos carregados: {len(gdf)}")
# Exibir as primeiras linhas do GeoDataFrame
display(gdf.head())

Total de pontos carregados: 294625


Unnamed: 0,image_id,captured_at,compass_angle,creator_id,sequence_id,is_pano,organization_id,tile_z,tile_x,tile_y,geometry
0,3072526566357766,1572195344169,297.985027,102898865287539,3q8pmbfh7er75ht3k8lurf,False,,14,6191,9110,POINT (-43.95452 -19.78721)
1,348236080054466,1572195337669,349.341969,102898865287539,3q8pmbfh7er75ht3k8lurf,False,,14,6191,9110,POINT (-43.95446 -19.78731)
2,1273196033113396,1571806803000,182.602277,102898865287539,26xtebbhwxk2dyd9roxqvk,False,536091800000000.0,14,6191,9110,POINT (-43.94896 -19.78401)
3,788288095392725,1572195777169,0.0,102898865287539,tmu26pv17mgm9kknq38ucy,False,,14,6191,9110,POINT (-43.95459 -19.78733)
4,3013483902262419,1571806807000,182.57021,102898865287539,26xtebbhwxk2dyd9roxqvk,False,536091800000000.0,14,6191,9110,POINT (-43.94899 -19.78475)


In [17]:
import pandas as pd
import geopandas as gpd
from datetime import datetime, timezone

# Cria a nova coluna 'captured_date' convertendo o timestamp
gdf = gdf.copy()  # Faz uma cópia para evitar SettingWithCopyWarning
gdf.insert(
    loc=gdf.columns.get_loc('captured_at') + 1,
    column='captured_date',
    value=gdf['captured_at'].apply(
        lambda x: datetime.fromtimestamp(int(x)/1000, tz=timezone.utc).isoformat() if pd.notnull(x) else None
    )
)

# Visualize para conferir
display(gdf.head())

# Salva novamente em GPKG
gdf.to_file('results/3_mapillary_coverage/mapillary_coverage2_bh.gpkg', driver='GPKG')

Unnamed: 0,image_id,captured_at,captured_date,compass_angle,creator_id,sequence_id,is_pano,organization_id,tile_z,tile_x,tile_y,geometry
0,2854792694743984,1621701673296,2021-05-22T16:41:13.296000+00:00,69.188881,103853795191073,6yhbktqcnm4lcoee8qh26p,False,1805884000000000.0,14,6186,9120,POINT (-44.06166 -19.97378)
1,165794388771973,1621701666795,2021-05-22T16:41:06.795000+00:00,103.273102,103853795191073,6yhbktqcnm4lcoee8qh26p,False,1805884000000000.0,14,6186,9120,POINT (-44.06194 -19.97374)
2,210964464051400,1621701660294,2021-05-22T16:41:00.294000+00:00,186.616745,103853795191073,byscxdw4cjpdjmk7apwv6k,False,1805884000000000.0,14,6186,9120,POINT (-44.06255 -19.97369)
3,391405085453576,1621701661794,2021-05-22T16:41:01.794000+00:00,188.42276,103853795191073,byscxdw4cjpdjmk7apwv6k,False,1805884000000000.0,14,6186,9120,POINT (-44.06242 -19.97371)
4,4221865561213359,1621701667295,2021-05-22T16:41:07.295000+00:00,101.169991,103853795191073,6yhbktqcnm4lcoee8qh26p,False,1805884000000000.0,14,6186,9120,POINT (-44.06191 -19.97375)


### Mapillary image Metadata:

- Retrive image metadata
- This step enrich the entire set of points retrived from Mapillary Coverage tiles with image metadata 
 - It takes a lot of time to process and the imagens urls stay outdated in around 30 days

 - Foi observado que este não é o melhor momento para resgatar os metadados das imagensm, pois além do tempo de processamento ser enorme, as *thumbs_urls* se desatualizam em poucos dias

In [8]:
# This code cell enriches the Mapillary points retrived from Mapillary Coverage tiles with image metadata
# To apply this code, you need to have the Mapillary API token set up in your environment and 
# the module mapillary_metadata_enricher should be imported

from src.mapillary_metadata_enricher import (
    ler_token_mapillary,
    enriquecer_geodataframe,
    salvar_geodataframe_enriquecido
)
import geopandas as gpd
import time
from tqdm.notebook import tqdm
import logging

# Configurar logging detalhado
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# 1. Configuração inicial
TOKEN = ler_token_mapillary()
INPUT_GPKG = "results/3_mapillary_coverage/mapillary_coverage_bh_tmp3.gpkg"
OUTPUT_GPKG = "results/3_mapillary_coverage/mapillary_coverage_bh_tmp3_meta.gpkg"

# 2. Carregar dados existentes
logger.info(f"Carregando dados de {INPUT_GPKG}")
gdf = gpd.read_file(INPUT_GPKG, layer='mapillary_coverage_bh')
logger.info(f"Total de pontos carregados: {len(gdf)}")
logger.info(f"Colunas originais: {list(gdf.columns)}")

# 3. Enriquecer dados
start_time = time.time()
gdf_enriched = enriquecer_geodataframe(
    gdf,
    TOKEN,
    max_workers=15,
    batch_size=10000
)
enrichment_time = time.time() - start_time

logger.info(f"\nEnriquecimento concluído em {enrichment_time/60:.2f} minutos")
logger.info(f"Colunas originais: {len(gdf.columns)}")
logger.info(f"Colunas enriquecidas: {len(gdf_enriched.columns)}")
logger.info(f"Novas colunas adicionadas: {len(gdf_enriched.columns) - len(gdf.columns)}")

# 4. Salvar resultados
salvar_geodataframe_enriquecido(gdf_enriched, OUTPUT_GPKG)

2025-07-07 16:43:13,069 - INFO - Carregando dados de results/3_mapillary_coverage/mapillary_coverage_bh_tmp3.gpkg
2025-07-07 16:43:49,807 - INFO - Total de pontos carregados: 278279
2025-07-07 16:43:49,807 - INFO - Colunas originais: ['image_id', 'captured_at', 'compass_angle', 'creator_id', 'sequence_id', 'is_pano', 'organization_id', 'tile_z', 'tile_x', 'tile_y', 'geometry']
2025-07-07 16:43:49,819 - INFO - Iniciando enriquecimento para 278279 imagens únicas
Obtendo metadados: 100%|██████████| 278279/278279 [3:26:27<00:00, 22.46it/s]
2025-07-07 20:10:17,533 - INFO - Metadados obtidos para 276545 imagens
2025-07-07 20:10:17,534 - INFO - Criando DataFrame de metadados...
2025-07-07 20:13:12,986 - INFO - Campos de metadados obtidos (38): ['image_id', 'altitude', 'atomic_scale', 'camera_parameters', 'camera_type', 'captured_at', 'compass_angle', 'computed_altitude', 'computed_compass_angle', 'computed_geometry', 'computed_rotation', 'creator', 'exif_orientation', 'geometry', 'height', 'i

True

In [None]:
# 5. Verificação detalhada se dados foram enriquecidos
logger.info("\nVerificando dados enriquecidos...")

if not gdf_enriched.empty:
    # Verificar campos originais
    original_cols = set(gdf.columns)
    preserved_cols = original_cols.intersection(set(gdf_enriched.columns))
    logger.info(f"Campos originais preservados: {len(preserved_cols)}/{len(original_cols)}")
    
    # Identificar novos campos
    new_cols = list(set(gdf_enriched.columns) - original_cols)
    logger.info(f"\nTotal de novos campos: {len(new_cols)}")
    logger.info(f"Novos campos: {new_cols}")
    
    # Verificar campos essenciais
    essential_fields = [
        'altitude', 'atomic_scale', 'camera_parameters', 'camera_type',
        'captured_at', 'compass_angle', 'computed_altitude', 'computed_compass_angle',
        'computed_geometry', 'computed_rotation', 'creator_username', 'exif_orientation',
        'geometry', 'height', 'is_pano', 'make', 'model', 'thumb_256_url',
        'thumb_1024_url', 'thumb_2048_url', 'thumb_original_url', 'merge_cc',
        'mesh_url', 'sequence', 'sfm_cluster_url', 'width', 'detections'
    ]
    
    logger.info("\nCampos essenciais presentes:")
    for field in essential_fields:
        exists = field in new_cols
        logger.info(f"{field}: {'✔' if exists else '✘'}")
    
    # Amostra de dados
    sample_cols = ['image_id'] + new_cols[:10]
    logger.info("\nAmostra de dados enriquecidos:")
    logger.info(gdf_enriched[sample_cols].head(3).to_string(index=False))

#### Merge Mapillary geopackages files enriched with image Metadata

In [None]:
import geopandas as gpd
import pandas as pd
import glob
from tqdm.notebook import tqdm

# 1. Lista os arquivos
arquivos = sorted(glob.glob("results/3_mapillary_coverage/mapillary_coverage_bh_tmp*_meta.gpkg"))

# 2. Inicializa lista e define CRS com validação
gdfs = []
crs_ref = None

for i, f in enumerate(tqdm(arquivos, desc="Lendo arquivos GPKG")):
    gdf = gpd.read_file(f)
    
    if i == 0:
        crs_ref = gdf.crs  # salva o CRS do primeiro arquivo
    else:
        if gdf.crs != crs_ref:
            raise ValueError(f"CRS inconsistente no arquivo: {f}\nEsperado: {crs_ref}, encontrado: {gdf.crs}")
    
    gdfs.append(gdf)

# 3. Concatena
gdf_merged = gpd.GeoDataFrame(pd.concat(gdfs, ignore_index=True), crs=crs_ref)

# 4. Salva o resultado
gdf_merged.to_file("results/3_mapillary_coverage/mapillary_coverage_bh_meta.gpkg", driver="GPKG")

### Update URLs from the thumbnails of Mapillary images

- As URLs dos thumbnails do Mapillary expiram após algum tempo (em torno de 30 dias).
- Elas são "signed URLs" (links temporários gerados pelo Mapillary CDN com assinatura).

In [6]:
from src.mapillary_metadata_enricher import (
    ler_token_mapillary,
)
import requests

# 1. Configuração inicial
TOKEN = ler_token_mapillary()
IMAGE_ID = '344333660458660'  # Exemplo de um ponto

headers = {"Authorization": f"OAuth {TOKEN}"}

url = f"https://graph.mapillary.com/{IMAGE_ID}?fields=thumb_256_url,thumb_1024_url,thumb_2048_url,thumb_original_url"

response = requests.get(url, headers=headers)
if response.status_code == 200:
    data = response.json()
    print("Thumb 256:", data.get("thumb_256_url"))
    print("Thumb 1024:", data.get("thumb_1024_url"))
    print("Thumb 2048:", data.get("thumb_2048_url"))
    print("Thumb original:", data.get("thumb_original_url"))
else:
    print("Erro:", response.status_code, response.text)

Thumb 256: https://scontent.fplu40-1.fna.fbcdn.net/m1/v/t6/An_Bpa4-7jhAKVi5DrFFtEPkz4zrkqTt58cwbfo6f0m8VANZFNYyInvuMo5Oc_H0nvYoXPAMcrnaNZ6osvCVo8w87uLtW4oPvFzUzr-HqRjb5dwKekpOA9bFrOXlYU9GF7FtZYR5iI23IqL-2sLZLAY?stp=s256x144&edm=ALXxkZ8EAAAA&_nc_gid=XAaXgHqgG2rsz-VOYjuRcg&_nc_oc=AdmpCaxu6J_dFc3dZqwNnOmfWbo--co5xZIyZTXFcQdL3Fy8c-TE-UlwSlhnjkgemQQdQKyIJUX2laHHFZhOmB-5&ccb=10-5&oh=00_AfXBlJsUldgocAvhuzxGOEeNJOKSl0OQZYkGNdeDGlYlzQ&oe=68BAC148&_nc_sid=201bca
Thumb 1024: https://scontent.fplu40-1.fna.fbcdn.net/m1/v/t6/An_Bpa4-7jhAKVi5DrFFtEPkz4zrkqTt58cwbfo6f0m8VANZFNYyInvuMo5Oc_H0nvYoXPAMcrnaNZ6osvCVo8w87uLtW4oPvFzUzr-HqRjb5dwKekpOA9bFrOXlYU9GF7FtZYR5iI23IqL-2sLZLAY?stp=s1024x576&edm=ALXxkZ8EAAAA&_nc_gid=XAaXgHqgG2rsz-VOYjuRcg&_nc_oc=AdmpCaxu6J_dFc3dZqwNnOmfWbo--co5xZIyZTXFcQdL3Fy8c-TE-UlwSlhnjkgemQQdQKyIJUX2laHHFZhOmB-5&ccb=10-5&oh=00_AfVKTFmrbIdujpCN-dwkhwFjnXS-jrH9Jw_fOCiTh_u4IQ&oe=68BAC148&_nc_sid=201bca
Thumb 2048: https://scontent.fplu40-1.fna.fbcdn.net/m1/v/t6/An_Bpa4-7jhAKVi5DrFFtEPk