PMTiles Optymalization

In [None]:
How PMTiles Help with Optimization? 
PMTiles is a single-file storage format for vector and raster map tiles, designed for efficient storage and transmission of geospatial data. Here’s how it optimizes performance:

🔹 1. Reduces the Number of HTTP Requests
📌 Problem: Standard tile storage consists of hundreds of thousands of small files, leading to high latency due to multiple HTTP requests.

✅ PMTiles: Stores all tiles in one file, reducing the number of requests to just one instead of thousands.

🔹 2. Better Performance on S3/CDN
📌 Problem: Serving millions of small tile files from Amazon S3 or a CDN is slow and expensive, especially for large maps.

✅ PMTiles: Uses range requests to fetch only the needed parts of the file, reducing costs and increasing speed.

🔹 3. Efficient Compression
📌 Problem: Standard vector tile formats (e.g., .mbtiles) store data in a way that is not optimized for web delivery.

✅ PMTiles: Can be compressed (gzip, Brotli), reducing bandwidth usage and speeding up downloads.

🔹 4. No Database Required
📌 Problem: MBTiles relies on SQLite, requiring a database to read tiles.

✅ PMTiles: Works as a static file, eliminating the need for a database. It can be hosted on S3, GitHub Pages, Cloudflare R2, etc.

🔹 5. Streaming & Preloading
✅ Loads only the needed tiles, instead of downloading the entire dataset.
✅ Supports browser caching, reducing load times.

Vector tiles are a format used for storing and rendering geospatial data on maps. Instead of storing pre-rendered images (like raster tiles), vector tiles contain raw geometric data (points, lines, polygons) that can be dynamically styled.

🟢 Map is divided into tiles – The map is split into smaller squares at different zoom levels (e.g., Zoom 0 = entire Earth, Zoom 10 = city streets).
🟢 Stores geometry, not images – Each tile contains geographical features (e.g., roads, buildings) in vector format.
🟢 Flexible & dynamic rendering – The client (e.g., browser) can customize the map in real-time, changing colors, styles, or labels.

✅ Dynamic styling – Easily change colors, labels, or visibility of roads, buildings, and POIs.
✅ Less data transfer – Compressed vector data is downloaded only when needed, reducing bandwidth usage.
✅ High-resolution rendering – Maps look crisp at any zoom level, without pixelation or blurriness.

PMTiles is a format for storing and distributing "vector tiles" in a single file. It allows you to store and stream map tiles without the need for a tile server.
How is PMTiles different from standard vector tiles (e.g. MVT format)?
PMTiles is a single file, not thousands of separate tile files.
Works without a server - instead of a classic architecture with a tile server (e.g. TileServer GL), you can host the PMTiles file on AWS S3, Cloudflare R2 or another cloud.
Fast and optimized downloads - the client downloads only the necessary fragments of the file, instead of downloading the entire map.
Perfect for web maps - compatible with MapLibre, Leaflet, Deck.gl, Kepler.gl and other map libraries.

1. create PMTiles vector tiles for rendering maps

In [1]:
from sedona.spark import *

config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
                                                                                

Create a Spatial DataFrame with a geometry column and a layer column. The geometry column contains the features to render in the map. The layer column is a string that describes the grouping the feature should be in. Records within the same layer can be styled together, independently of other layers. In this case example features that represent buildings are in the buildings layer and those representing roads are in the roads layer.

In [2]:
from sedona.sql.st_constructors import ST_GeomFromText
from sedona.sql.st_predicates import ST_Intersects
import pyspark.sql.functions as f

In [3]:
# Set to False to generate tiles for the entire dataset, True to generate only for region_wkt area
filter = True
region_wkt = "POLYGON ((-122.097931 47.538528, -122.048836 47.566566, -121.981888 47.510012, -122.057076 47.506302, -122.097931 47.538528))"
filter_expression = ST_Intersects(f.col("geometry"), ST_GeomFromText(f.lit(region_wkt)))

In [4]:
import pyspark.sql.functions as f

buildings_df = (
    sedona.table("wherobots_open_data.overture_2024_02_15.buildings_building")
    .select(
        f.col("geometry"),
        f.lit("buildings").alias("layer"),
        f.element_at(f.col("sources"), 1).dataset.alias("source")
    )
)

buildings_df.show()

[Stage 3:>                                                          (0 + 1) / 1]

+--------------------+---------+--------------------+
|            geometry|    layer|              source|
+--------------------+---------+--------------------+
|POLYGON ((-49.438...|buildings|Microsoft ML Buil...|
|POLYGON ((-49.438...|buildings|Google Open Build...|
|POLYGON ((-49.438...|buildings|Microsoft ML Buil...|
|POLYGON ((-49.438...|buildings|Google Open Build...|
|POLYGON ((-49.441...|buildings|Google Open Build...|
|POLYGON ((-49.440...|buildings|Google Open Build...|
|POLYGON ((-49.441...|buildings|Microsoft ML Buil...|
|POLYGON ((-49.441...|buildings|Google Open Build...|
|POLYGON ((-49.440...|buildings|Microsoft ML Buil...|
|POLYGON ((-49.442...|buildings|Google Open Build...|
|POLYGON ((-49.442...|buildings|Google Open Build...|
|POLYGON ((-49.442...|buildings|Google Open Build...|
|POLYGON ((-49.442...|buildings|Google Open Build...|
|POLYGON ((-49.442...|buildings|Microsoft ML Buil...|
|POLYGON ((-49.442...|buildings|Google Open Build...|
|POLYGON ((-49.438...|buildi

                                                                                

In [7]:
num_rows = buildings_df.count()
print(f"Liczba wierszy w buildings_df: {num_rows}")

Liczba wierszy w buildings_df: 2350057850


In [6]:
roads_df = (
    sedona.table("wherobots_open_data.overture_2024_02_15.transportation_segment")
    .select(
        f.col("geometry"),
        f.lit("roads").alias("layer"),
        f.element_at(f.col("sources"), 1).dataset.alias("source")
    )
)

roads_df.show()

[Stage 5:>                                                          (0 + 1) / 1]

+--------------------+-----+-------------+
|            geometry|layer|       source|
+--------------------+-----+-------------+
|LINESTRING (7.034...|roads|OpenStreetMap|
|LINESTRING (7.037...|roads|OpenStreetMap|
|LINESTRING (7.032...|roads|OpenStreetMap|
|LINESTRING (7.033...|roads|OpenStreetMap|
|LINESTRING (7.031...|roads|OpenStreetMap|
|LINESTRING (7.031...|roads|OpenStreetMap|
|LINESTRING (7.031...|roads|OpenStreetMap|
|LINESTRING (7.033...|roads|OpenStreetMap|
|LINESTRING (7.034...|roads|OpenStreetMap|
|LINESTRING (7.030...|roads|OpenStreetMap|
|LINESTRING (7.037...|roads|OpenStreetMap|
|LINESTRING (7.037...|roads|OpenStreetMap|
|LINESTRING (7.041...|roads|OpenStreetMap|
|LINESTRING (7.051...|roads|OpenStreetMap|
|LINESTRING (7.037...|roads|OpenStreetMap|
|LINESTRING (7.050...|roads|OpenStreetMap|
|LINESTRING (7.054...|roads|OpenStreetMap|
|LINESTRING (7.051...|roads|OpenStreetMap|
|LINESTRING (7.052...|roads|OpenStreetMap|
|LINESTRING (7.052...|roads|OpenStreetMap|
+----------

                                                                                

In [8]:
num_rows = roads_df.count()
print(f"Liczba wierszy w roads_df: {num_rows}")

Liczba wierszy w roads_df: 303347527


prepare a single spatial DataFrame combining the roads and buildings features.

In [9]:
features_df = roads_df.union(buildings_df)

if filter: 
    features_df = features_df.filter(ST_Intersects(f.col("geometry"), ST_GeomFromText(f.lit(region_wkt))))

features_df.count()

                                                                                

8718

In [10]:
SedonaKepler.create_map(features_df, "buildings and roads")

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


                                                                                

KeplerGl(data={'buildings and roads': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…

Create Tiles as a PMTiles Archive
USe vtiles.generate_pmtiles method to create a PMTiles archive. PMTiles is a performant, simple, and optimized format for storing vector tiles.

For more control, a GenerationConfig object can optionally be provided as an argument to control which tiles are created and their contents. A PMTilesConfig object can optionally be provided to control the header information of the PMTiles Archive.

In [11]:
from wherobots import vtiles
import os

full_tiles_path = os.getenv("USER_S3_PATH") + "tiles.pmtiles"
vtiles.generate_pmtiles(features_df, full_tiles_path)


                                                                                

Visualizing Vector Tiles with leafmap. This function creates a signed URL, styles the tiles, and returns a Leafmap object. 

In [12]:
vtiles.show_pmtiles(full_tiles_path)

The Scala/Java API exposes the getQuickConfig method which provides the same GenerationConfig. This can be passed to the vtiles.generate or vtiles.generatePMTiles methods for the same tile generation functionality. This is accomplished by limiting the features processed to 100 million and generating fewer zoom levels at a higher resolution. At high zooms, the low precision from the low maximum zoom may be evident.

In [13]:
sample_tiles_path = os.getenv("USER_S3_PATH") + "sampleTiles.pmtiles"
vtiles.generate_quick_pmtiles(features_df, sample_tiles_path)

25/03/21 13:49:41 WARN CacheManager: Asked to cache already cached data.        
                                                                                