Traditional map-making relies heavily on Geographic Information Systems (GIS), requiring domain expertise and being time-consuming, especially for repetitive tasks. Recent advances in generative AI (GenAI), particularly image diffusion models, offer new opportunities for automating and democratizing the map-making process. However, these models struggle with accurate map creation due to limited control over spatial composition and semantic layout. To address this, we integrate vector data to guide map generation in different styles, specified by the textual prompts. Our model is the first to generate accurate maps in controlled styles, and we have integrated it into a web application to improve its usability and accessibility. We conducted a user study with professional cartographers to assess the fidelity of generated maps, the usability of the web application, and the implications of ever-emerging GenAI in map-making. The findings have suggested the potential of our developed application and, more generally, the GenAI models in helping both non-expert users and professionals in creating maps more efficiently. We have also outlined further technical improvements and emphasized the new role of cartographers to advance the paradigm of AI-assisted map-making.
Step-by-step instructions on how to use vector data to control Stable Diffusion with ControlNet for accurate map tile generation.
git clone https://github.com/lllyasviel/ControlNet.git && cd ControlNet
conda env create -f environment.yml
conda activate controlFurthermore, read the official ControlNet tutorial. No action is required in Steps 0, 1, and 2. Then, in Step 3, one has to decide which version of Stable Diffusion should be used. In our work we employed Stable Diffusion (SD) 1.5. As this version belongs to the group of legacy deprecated SD models, you need to download "v1-5-pruned.ckpt" from here instead. Alternatively, SD 2.1 "v2-1_512-ema-pruned.ckpt" or even newer versions could be used. But this was not tested by us.
Afterwards, run the correct script depending on your chosen version of SD. In our case with SD 1.5, it would be:
python tool_add_control.py ./models/v1-5-pruned.ckpt ./models/control_sd15_ini.ckptA ControlNet is now attached to the chosen SD model.
As already described in the tutorial, a ControlNet dataset is a triple structure consisting of target, source, and prompt:
In our case, the workflow to create such a dataset looked as follows:
First we collected map sheets (raster data) along with corresponding vector data. In QGIS we symbolized (i.e., assigned each vector layer a unique color, and adjusted point size and line thickness) and adapted (i.e., masked text labels and niche features) the vector data to achieve best possible alignement with the raster data. Afterwards, the raster data and the now adjusted and aligning vector data were saved separately as large .pngs that could then be tiled into much smaller raster map and vector map tiles. It is important to already define the desired scale in the QGIS print layout before exporting the large .pngs.
As a result:
- target is a folder containing raster map tiles (in .png format) of size 512 x 512 pixels.
- source is a folder containing the corresponding vector data in the form of vector map images (in .png format) of size 512 x 512 pixels. This folder thus contains the input conditioning images with which Stable Diffusion will be controlled.
- prompt is a .json file linking each image from target and source to a text prompt.
The contents of target, source, and prompt.json should look as follows:
Note:
- This workflow is straightforward in cases where there exists perfectly corresponding vector data. That is usually the case for modern map styles. In the case of historical map styles, where the maps have not yet been vectorized, this procedure becomes more challenging. In our work, we decided to use the modern Swisstopo vector data even for the historical Siegfried and Old National maps by adjusting the vector layers in a way (i.e., by removing certain layers completely) to achieve the best possible alignment.
- It is important to mask all map labels in the raster data AND vector data (see the neon blue areas, corresponding to the mask layer). If text is not masked in the training set, the generated map tiles will be subject to fake labels (i.e., illegible text that consists of made-up letters). For text label masking we employed keras-ocr. Furthermore, niche classes that rarely appear should be masked as well.
- In our work, all tiles were of scale 1:5000. While it is possible to use smaller scales such as 1:25000, the outputs will likely be blurry with inadequate rendering of smaller objects.
- It is also possible to train multiple map styles at once.
Open loadDataset.py and adjust all three paths so that they point to the correct training data folder containing target, source, and prompt.json.
Then run the script. Optionally check if the dataset was loaded correctly using dataset_test.py as a sanity check.
First, replace the logger.py located in the cldm folder with the logger.py from this repository. There, check the comments and adjust the variables marked with NEED.
Open trainCN.py, adjust the settings (see our paper and the ControlNet tutorial for reasonable values) and run the script to train ControlNet. To train using Low VRAM Mode, edit the config.py file accordingly.
Note:
- The better the alignment between target and source, the better the resulting model.
- Keep in mind that the evaluation loop might take some time to execute each epoch. Therefore, ideally the validation set should not consist of 1000s of tiles. In our work we chose 100 tiles.
- Validation is done by computing the MSE between target (ground-truth) and the generated map tile (model output). This metric only makes sense when training is done using perfectly corresponding vector data!
After training, adjust and run evaluateCN.py to qualitatively evaluate the model on a test set. The generated map tiles are saved as a NumPy array and can then, if needed, be stitched together.
Note:
- When training the model with historical raster data without perfectly corresponding vector data, the generated map tiles can be of poor quality. One way to increase the output quality would be to generate multiple versions of the same tile using different seeds (set
seed = -1andnum_samples = 6or any other value larger than 1). Then, using a method of your choice, automatically select the best generated version. In our work we did this automatic selection by employing a segmentation model and also computing the standard deviation of pixel values in the background regions (to check by how much the generated background textures differ from the original ground-truth background texture).
The figure below shows six versions of a generated map tile. Different seeds were used but the input vector map tile remained the same.
Our four ControlNet models can be downloaded here.
Swisstopo.ckpt: Specialized model for Swisstopo styleOldNational.ckpt: Specialized model for Old National styleSiegfried.ckpt: Specialized model for Siegfried styleCombined.ckpt: Combined model, capable of generating map tiles in all three styles and used in our web app.
| Class label | RGB color code | Swisstopo | Old National | Siegfried |
|---|---|---|---|---|
| Background | (255, 255, 255) | ✓ | ✓ | ✓ |
| Building | (82, 82, 82) | ✓ | ✓ | ✓ |
| Coordinate grid | (237, 240, 64) | ✓ | ✓ | ✓ |
| Railway (single track) | (219, 30, 42) | ✓ | ✓ | ✓ |
| Railway (multi track) | (144, 20, 28) | ✓ | ✓ | ✓ |
| Railway bridge | (226, 132, 115) | ✓ | ||
| Highway | (247, 128, 30) | ✓ | ✓ | |
| Highway gallery | (231, 119, 28) | ✓ | ✓ | ✓ |
| Road | (149, 74, 162) | ✓ | ✓ | ✓ |
| Through road | (255, 103, 227) | ✓ | ||
| Connecting road | (128, 135, 37) | ✓ | ||
| Path | (0, 0, 0) | ✓ | ✓ | ✓ |
| Depth contour | (63, 96, 132) | ✓ | ✓ | |
| River | (41, 163, 215) | ✓ | ✓ | ✓ |
| Lake | (55, 126, 184) | ✓ | ✓ | ✓ |
| Stream | (89, 180, 208) | ✓ | ✓ | ✓ |
| Tree | (63, 131, 55) | ✓ | ||
| Contour line | (164, 113, 88) | ✓ | ✓ | |
| Forest | (77, 175, 74) | ✓ | ✓ | ✓ |
Run webapp.py with Combined.ckpt as the underlying model.
