# Map Layers
**Pre-processing Dataset**

The point here is to prepare the map base-layer dataset.

First thing we're doing here is to scale the $x, y$ coordinates to nice integer values through linear transformation. 
Next, we annotate the labelled coordinates with `tiers`, as a proxy for "Wikipedia Vital Articles" list. This allows us to control which labels should be shown when.

We write out the prepared dataset as `json` serialized objects to disk.

File naming convention is `l{a, b, c}-{data}.json`. This reads as `layer-{a, b, c}-{data-type}`. Don't think too much into it, as long as we use a consistent convention, we'll be fine.

In [1]:
import pandas as pd
import json

from pydash import py_

In [2]:
# Fetch the base-layer map as a dataframe
mbase_url = 'https://noop-pub.s3.amazonaws.com/opt/atlas/atlas-optimal-02.json'

df_mbase = pd.read_json(mbase_url)

df_mbase.head()


Unnamed: 0,label,labelOpacity,markerSize,portal,x,y
0,,0.3,0.2,sci,-8.12,-4.301
1,,0.3,0.2,sci,-11.263,-3.278
2,,0.3,0.2,sci,-10.163,-6.365
3,,0.3,0.2,sci,-10.697,-2.326
4,,0.3,0.2,sci,-10.684,-3.34


### Linear Transformation

We'll scale the $x$ and $y$ coords linearly to integers, and shift them along both axes
so everything is in positive integer domain.

Transformation is implemented as follows:

1. Shift the position vector $ \vec s = <x, y> $

    $ \vec ∂ = < \min(x), \min(y) > $

2. Scale $ \vec s $ by a scaler $ z = 10^n $, where $ n $ is the desired number of precision.

3. Apply linear transformation to $ \vec s $

    $ \vec s_i = z(\vec s - \vec ∂) $

In [15]:
xmin, ymin = df_mbase.x.min(), df_mbase.y.min()

z = 1e3

df_mbase['x_t'] = (df_mbase
                   .x
                   .apply(lambda x: (x - xmin) * z)
                   .round()
                   .astype('int32'))
df_mbase['y_t'] = (df_mbase
                   .y
                   .apply(lambda y: (y - ymin) * z)
                   .round()
                   .astype('int32'))

df_mbase.head()

Unnamed: 0,label,labelOpacity,markerSize,portal,x,y,x_t,y_t
0,,0.3,0.2,sci,-8.12,-4.301,5214,4873
1,,0.3,0.2,sci,-11.263,-3.278,2071,5896
2,,0.3,0.2,sci,-10.163,-6.365,3171,2809
3,,0.3,0.2,sci,-10.697,-2.326,2637,6848
4,,0.3,0.2,sci,-10.684,-3.34,2650,5834


In [38]:
# We want to keep the "tier" information according to the "wikipedia vital articles"
# heirarchy. The `markerSize` property is a direct proxy for the 8 levels, which we 
# transform to integers and add to column `tier`.

df_mbase['tier'] = (df_mbase
                    .markerSize
                    .apply(lambda x: x * 10)
                    .astype('int32'))

df_mbase.tail()

Unnamed: 0,label,labelOpacity,markerSize,portal,x,y,x_t,y_t,tier
120389,,0.3,0.2,soc,7.591,-2.499,20925,6675,2
120390,,0.3,0.2,soc,9.026,-2.412,22360,6762,2
120391,,0.3,0.2,soc,11.275,-2.056,24609,7118,2
120392,,0.3,0.2,soc,12.214,-1.102,25548,8072,2
120393,SOCIÉTÉ,1.0,0.1,soc,10.0,-3.0,23334,6174,1


In [67]:
# We'll filter the rows with labels

df_labels = (df_mbase
             .iloc[df_mbase.label.dropna().index]
             .sort_values(by='tier'))

# ... and ensure that the labels are not `_` separated.
df_labels['label'] = df_labels.label.str.replace('_', ' ')

# Dump out the label, tier, portal, x_t, and y_t columns
# We'll rename `x_t` and `y_t` by `x` and `y`.
columns = ['label', 'portal', 'tier', 'x_t', 'y_t']

df_lb_labels = df_labels[columns].rename(columns={'x_t': 'x', 'y_t': 'y'})

# et voila:
df_lb_labels.head()


Unnamed: 0,label,portal,tier,x,y
120393,SOCIÉTÉ,soc,1,23334,6174
85392,HISTOIRE,hist,1,13834,4174
111949,SPORT ET LOISIRS,spo,1,22834,13174
104702,ARTS,art,1,18334,9174
65197,GÉOGRAPHIE,geo,1,13334,14674


In [85]:
# write it out to disk.
# [!] NOTE: Ensure `force_ascii` is False. We want to keep utf-8 as much as possible.
#           However... due to jupyter environment, it seems to be impossible. So
#           we'll just use a proper file pointer and explicit utf-8.

with open('./lb-labels.json', 'w', encoding='utf-8') as fp:
  df_lb_labels.to_json(fp, orient='records', force_ascii=False)