# Map Image Database and Geo-queries




# Preparing the map database

We have access to a number of map image datasets from the National Library of Scotland. Each of the datasets is accompanied by a metadata description that contains properties that we would like to search. To support  searches by geolocation and other features the map metadata was ingested into the **map image database** .

The following map datasets are available:

* 01 Town Plans London
* 02 Town Plans England 1840s 1890s
* 03 Lancs Dorset 1900s 1940s
* 04 Scotland 1840s 1880s
* 05 Lancs Dorset 1880s 1890s
* 06 Scotland OS 1 inch 1850s 1900s
* 07 Scotland 1 inch 1890s 1900s



## Map metadata
The map images are available as blobs in Azure. For each dataset there is a metadata file (shapefile) with text properties and geospatial properties of the maps.

For example, these are the metadata fields for one of the maps:

Key | Value 
---|---
COUNTY| Fife
DATES | Surveyed: 1854<br>Published: 1855
IMAGE | 74426819
IMAGETHUMB | https://deriv.nls.uk/dcn4/7442/74426819.4.jpg
IMAGEURL | https://maps.nls.uk/view/74426819
SHEET | Fife, Sheet 2
SHEET_MAP | 2
SHEET_NO	| 002

There is also a geospatial descriptor of the bounding box of the map - a polygon (in this case a rectangle) of latitude/longitude coordinates. For example:

```
        "location": {
            "type": "Polygon",
            "coordinates": [
                [
                    [ 56.40466580157687, -3.038775818686538 ],
                    [ 56.462475957491904, -3.038386446430906 ],
                    [ 56.46205321830267, -2.881755037222251 ],
                    [ 56.40424397748634, -2.882381897770028 ],
                    [ 56.40466580157687, -3.038775818686538 ]
                ]
            ]
        }
```
(If you enter a pair of coordinates into Googlemaps it will show a location in Scotland.)

## Creating the map image database
In order to search the metadata for all the maps and select certain areas or time properties, the map image metadata needs to be indexed. 

Elastic Search is a good match for this task as it supports geospatial data and provides a visual interface "Kibana" with a map view.

To ingest the metadata from the shapefiles it is extracted and converted into JSON format that can be inserted into Elastic Search.

For convenience, some of the metadata properties are preprocessed to create new properties:
The date range is extracted from the DATES property. For example, the original DATES value
```
Surveyed: 1852-3\<br>Published: 1854
```
yields the following properties: 
Key | Value 
---|---
Surveyed_start | 1852
Surveyed_end | 1853



# Accessing the map database
There are two options:


*   The map can be inspected visually with the Kibana user interface to Elastic Search.
*   We can run a query to search for maps that intersect certain areas.



## Visual interface Kibana

The map image database is available here:

http://51.140.84.209:5601/app/maps

Click on any of the maps to visually inspect the coverage of the map images. If you hover the mouse over any of the highlighted areas, a selection of map properties will be displayed.

## Searching the database from a notebook

First we need to install one dependency: pyproj is the Python interface to PROJ, the cartographic projections and coordinate transformations library. (See http://pyproj4.github.io/pyproj/stable/)

In [0]:
!pip install pyproj

Collecting pyproj
[?25l  Downloading https://files.pythonhosted.org/packages/16/59/43869adef45ce4f1cf7d5c3aef1ea5d65d449050abdda5de7a2465c5729d/pyproj-2.2.1-cp36-cp36m-manylinux1_x86_64.whl (11.2MB)
[K     |████████████████████████████████| 11.2MB 2.9MB/s 
[?25hInstalling collected packages: pyproj
Successfully installed pyproj-2.2.1


Import the libraries used in the code below.

In [0]:
from functools import partial
import json
import pandas as pd
import requests

import shapely
from shapely.geometry import Point
from shapely.ops import transform
import pyproj

We define the coordinates of specific places we may want to search for. Feel free to add your own, for example you can look up latitude and longitude from the wiki page of a town/place. (There might not always be maps available but we can work that out when we search the database.)

In [0]:
coordinates = {
    'Blackburn': Point(53.7449, -2.4769),
    'Burnley': Point(53.789, -2.248),
    'Barrow': Point(54.1108, -3.2261),
    'Salford': Point(53.483, -2.2931),
    'Dundee': Point(56.462, -2.9707),
    'Govan': Point(55.8615, -4.3083),
    'Weymouth': Point(50.613, -2.457),
    'Ashton-under-Lyne': Point(53.489708, -2.095241),
    'Dorchester': Point(50.7154, -2.4367),
    'Poole': Point(50.716667, -1.983333),
    'Manchester': Point(53.479444, -2.245278)
}

Now we can define the URL of the database we're going to search, and have a look at the indices that are available. In Elastic Search, an index is a collection of documents, in our case these are the metadata entries of our map images.

In [0]:
search_url = 'http://51.140.84.209:9200'

response = requests.get('{}/_cat/indices?format=json'.format(search_url))
indices = {}
for i in response.json():
    index_name = i['index']
    if not index_name.startswith('.'):
        indices[index_name[:2]] = index_name
indices

{'01': '01_os_town_plans_london_1890s',
 '02': '02_os_town_plans_england_1840s_1890s',
 '03': '03_os_25_inch_lancs_dorset_1900s_1940s',
 '04': '04_os_six_inch_scotland_1840s_1880s',
 '05': '05_os_six_inch_lancs_dorset_1880s_1890s',
 '06': '06_os_one_inch_1850s_1900s_scotland_lancs_dorset_london',
 '07': '07_os_one_inch_1890s_1900s_scotland_lancs_dorset_london'}

## Helper functions
We create a few functions that will be useful later.

This function creates a circular area (in fact: a polygon) around the given centre, with the given radius in metres. The centre coordinates are provided as latitude/longitude pairs.

In [0]:
def create_circle(centre, radius):
    proj_4326 = pyproj.Proj(init='EPSG:4326')
    proj_32633 = pyproj.Proj(init='EPSG:32633')
    transform_to_32633 = partial(pyproj.transform, proj_4326, proj_32633)
    transform_to_4326 = partial(pyproj.transform, proj_32633, proj_4326)
    proj_circle = transform(transform_to_32633, centre).buffer(radius, 3)
    circle = transform(transform_to_4326, proj_circle)
    return circle

The next function formats an Elastic Search geo-query: The query searches for all the maps that intersects with the given geometric shape. (The geometric shape will be the circular area that we created above.)

In [0]:
def create_geoquery(geojson):
    return {
       "query":{  
          "bool":{  
             "must": [  
                {  
                   "geo_shape": {  
                      "location": {  
                         "shape": geojson,
                         "relation":"INTERSECTS"
                      }
                   }
                }
             ],
             "filter":[  
                {  
                   "match_all": {}
                }
             ],
             "should": [],
             "must_not": []
          }
       }
    }

The last helper function runs a geo-query that searches for all the maps that intersect a given area and returns the results as a JSON object. Finally we put it all together to search an area around a centre point.


In [0]:
def find_maps(es_index, area):
    geojson = shapely.geometry.mapping(area)
    # we have to flip the lat-lon coordinates to comply with geojson
    flipped = [[coord[1], coord[0]] for coord in geojson['coordinates'][0]]
    geojson['coordinates'] = [flipped]
    geoquery = create_geoquery(geojson)
    response = requests.post('{}/{}/_search'.format(search_url, es_index), json=geoquery)
    hits = response.json()['hits']
    return hits

def search_area(es_index, centre, radius):
    area = create_circle(centre, radius)
    return find_maps(es_index, area)

The results of the search is a JSON object including the following:
 * total : the total number of documents that matched the search
 * hits : the list of matching documents and for each of them:
     * *_index* :  the index to search
     * *_source* : the actual document

Finally here's another helper function to create a dataframe for the results of a query.

In [0]:
def create_dataframe(hits):
    return pd.concat(map(lambda h: pd.DataFrame.from_dict(h)['_source'].to_frame().T, hits['hits']))

## Searching for maps

Using Dundee as an example we're going to search the dataset of 6 inch OS maps of Scotland.

The query searches for maps that cover places in a 10km radius around Dundee (or rather the geo-point that we defined above as the centre of Dundee).



In [0]:
hits = search_area('04_os_six_inch_scotland_1840s_1880s', centre=coordinates['Dundee'], radius=10000)
print('Hits: {}'.format(hits['total']['value']))

Hits: 9


In this case we found 9 maps. Let's create a dataframe from the metadata that was returned by the search so we can inspect it. The IMAGE property contains the image ID that matches the filename in the sample dataset.

In [0]:
df = create_dataframe(hits)
df

Unnamed: 0,COUNTY,DATES,IMAGE,IMAGETHUMB,IMAGEURL,Published,Published_end,Published_start,SHEET,SHEET_MAP,SHEET_NO,Surveyed,Surveyed_end,Surveyed_start,location
_source,Fife,Surveyed: 1854<br>Published: 1855,74426819,https://deriv.nls.uk/dcn4/7442/74426819.4.jpg,https://maps.nls.uk/view/74426819,"{'gte': 1855, 'lte': 1855}",1855,1855,"Fife, Sheet 2",2,2,"{'gte': 1854, 'lte': 1854}",1854,1854,"{'type': 'Polygon', 'coordinates': [[[-3.03877..."
_source,Fife,Surveyed: 1854<br>Published: 1855,74426823,https://deriv.nls.uk/dcn4/7442/74426823.4.jpg,https://maps.nls.uk/view/74426823,"{'gte': 1855, 'lte': 1855}",1855,1855,"Fife, Sheet 6",6,6,"{'gte': 1854, 'lte': 1854}",1854,1854,"{'type': 'Polygon', 'coordinates': [[[-3.03916..."
_source,Forfarshire,Surveyed: 1860<br>Published: 1865,74426926,https://deriv.nls.uk/dcn4/7442/74426926.4.jpg,https://maps.nls.uk/view/74426926,"{'gte': 1865, 'lte': 1865}",1865,1865,"Forfarshire, Sheet XLIX",XLIX,49,"{'gte': 1860, 'lte': 1860}",1860,1860,"{'type': 'Polygon', 'coordinates': [[[-3.15865..."
_source,Forfarshire,Surveyed: 1860-1862<br>Published: 1865,74426931,https://deriv.nls.uk/dcn4/7442/74426931.4.jpg,https://maps.nls.uk/view/74426931,"{'gte': 1865, 'lte': 1865}",1865,1865,"Forfarshire, Sheet LIV",LIV,54,"{'gte': 1862, 'lte': 1862}",1862,1862,"{'type': 'Polygon', 'coordinates': [[[-3.00192..."
_source,Perthshire,Surveyed: 1861<br>Published: 1867,74428155,https://deriv.nls.uk/dcn4/7442/74428155.4.jpg,https://maps.nls.uk/view/74428155,"{'gte': 1867, 'lte': 1867}",1867,1867,"Perthshire, Sheet LXXVI",LXXVI,76,"{'gte': 1861, 'lte': 1861}",1861,1861,"{'type': 'Polygon', 'coordinates': [[[-3.14189..."
_source,Perthshire,Surveyed: 1861<br>Published: 1867,74428167,https://deriv.nls.uk/dcn4/7442/74428167.4.jpg,https://maps.nls.uk/view/74428167,"{'gte': 1867, 'lte': 1867}",1867,1867,"Perthshire, Sheet LXXXVIII",LXXXVIII,88,"{'gte': 1861, 'lte': 1861}",1861,1861,"{'type': 'Polygon', 'coordinates': [[[-3.14284..."
_source,Perthshire,Surveyed: 1861<br>Published: 1866,74428179,https://deriv.nls.uk/dcn4/7442/74428179.4.jpg,https://maps.nls.uk/view/74428179,"{'gte': 1866, 'lte': 1866}",1866,1866,"Perthshire, Sheet C",C,100,"{'gte': 1861, 'lte': 1861}",1861,1861,"{'type': 'Polygon', 'coordinates': [[[-3.14378..."
_source,Forfarshire,Surveyed: 1858<br>Published: 1865,74426927,https://deriv.nls.uk/dcn4/7442/74426927.4.jpg,https://maps.nls.uk/view/74426927,"{'gte': 1865, 'lte': 1865}",1865,1865,"Forfarshire, Sheet L",L,50,"{'gte': 1858, 'lte': 1858}",1858,1858,"{'type': 'Polygon', 'coordinates': [[[-3.00190..."
_source,Forfarshire,Surveyed: 1859<br>Published: 1865,74426930,https://deriv.nls.uk/dcn4/7442/74426930.4.jpg,https://maps.nls.uk/view/74426930,"{'gte': 1865, 'lte': 1865}",1865,1865,"Forfarshire, Sheet LIII",LIII,53,"{'gte': 1859, 'lte': 1859}",1859,1859,"{'type': 'Polygon', 'coordinates': [[[-3.15843..."


The following example shows how to search all the indices in Elastic Search. For each document that matches the search we print out the image ID and the index it belongs to.

In [0]:
for key, index in indices.items():
    hits = search_area(index, centre=coordinates['Dundee'], radius=10000)
    num_hits = hits['total']['value']
    print('{}: {} hits'.format(index, num_hits))
    if 'hits' in hits:
        for hit in hits['hits']:
            print(hit['_source']['IMAGE'])
        print()

07_os_one_inch_1890s_1900s_scotland_lancs_dorset_london: 2 hits
74489062
74489065

04_os_six_inch_scotland_1840s_1880s: 9 hits
74426819
74426823
74426926
74426931
74428155
74428167
74428179
74426927
74426930

06_os_one_inch_1850s_1900s_scotland_lancs_dorset_london: 4 hits
74489061
74489062
74489065
74489064

02_os_town_plans_england_1840s_1890s: 0 hits

05_os_six_inch_lancs_dorset_1880s_1890s: 0 hits

03_os_25_inch_lancs_dorset_1900s_1940s: 0 hits

01_os_town_plans_london_1890s: 0 hits

