Usage of the package
========

First, some imports

In [1]:
import gzip
import pickle
import random
from collections import namedtuple

from territories import Territory, MissingTreeCache, Partition

## Creation of the tree

The first step is to create a tree of known entities. This can be a very compute intensive task, depending on the tree size. That is why, by default, once created, the tree is stored on disk.

Here we will create a very simple tree out of the **tree.txt** file.

In [2]:
Node = namedtuple('Node', ('id', 'parent_id', 'label', 'level'))
split = lambda x: (arg if arg != 'null' else None for arg in x[:-1].split('; '))

try:
    Territory.load_tree()
except MissingTreeCache:
    with open("tree.txt", "r") as file:
        lines = file.readlines()
        stream = ([Node(*split(x) )for x in lines])
        Territory.build_tree(data_stream=stream, save_tree=False)

Then, you can start to create territories from arbitrary territoiral units.

Entities associated territories are represented in an efficient way : if all leaves of a parent node are included in the territory, they are simply replaced by their parent node.

In [3]:
# some node of the tree
print('\n'.join([f"{e.name} | {e.level}" for e in random.sample(Territory.tree.nodes(), 12)]))

Marseille | COM
Grand Lyon | DEP
Lyon | COM
Rhône | DEP
Saint Étienne | COM
Paris | COM
Villeurbane | COM
Pantin | COM
Sud | REG
France | CNTRY
Nogent | COM
île-de-france | REG


In [5]:
with open("tree_large.gzip", "rb") as file:
    lines = pickle.loads(gzip.decompress(file.read()))

stream = ([Node(*split(x) )for x in lines])
Territory.build_tree(data_stream=stream, save_tree=False)

a = Territory.from_names("COM:69123", "COM:93055", "COM:94052")
b = Territory.from_names("COM:27429", "REG:84", "DEP:75")
c = Territory.from_names("COM:38185", "COM:31555", "REG:11")
d = Territory.from_names("COM:33063", "COM:13055", "REG:28")
e = Territory.from_names("COM:35238", "COM:35047", "DEP:27")
f = Territory.from_names("COM:59350", "COM:38442", "REG:53")

You can crate a territory with names of territorial units.

In [6]:
ter = Territory.from_names("DEP:69", "COM:59350", "ARR:75106")
ter

Paris 6e|Lille|Rhône

If names are invalid, an `NotOnTree` exception will be raised.

In [7]:
try:
    Territory.from_names("DEP:69", "do not exist", "garbage")
except Exception as e:
    print(e)

do not exist, garbage where not found in the territorial tree


Territories are jsons serializable, you can simply return them from an API endpoint

In [8]:
import json

print(json.dumps(ter, indent=4))

[
    {
        "name": "Lille",
        "tu_id": "COM:59350",
        "atomic": true,
        "level": "COM",
        "postal_code": null,
        "inhabitants": null
    },
    {
        "name": "Paris 6e",
        "tu_id": "ARR:75106",
        "atomic": true,
        "level": "ARR",
        "postal_code": null,
        "inhabitants": null
    },
    {
        "name": "Rh\u00f4ne",
        "tu_id": "DEP:69",
        "atomic": false,
        "level": "DEP",
        "postal_code": null,
        "inhabitants": null
    }
]


## Operations on territories


Usual operation on territories works as expected :

In [9]:
# addition

print(a, c)
print(a + c)

Lyon|Nogent-sur-Marne|Pantin Grenoble|Toulouse|Île-de-France
Grenoble|Lyon|Toulouse|Île-de-France


In [10]:
# substraction

print(a, d)
print(a - d)

Lyon|Nogent-sur-Marne|Pantin Bordeaux|Marseille|Normandie
Lyon|Nogent-sur-Marne|Pantin


More importantly, sets operations are also supported :

In [11]:
# intersection
print(f"Intersection of {a} and {d} is {a & d}")

# union
print(f"Union of {c} and {f} is {f | c}")

Intersection of Lyon|Nogent-sur-Marne|Pantin and Bordeaux|Marseille|Normandie is {}
Union of Grenoble|Toulouse|Île-de-France and Lille|Saint-Pierre-de-Chartreuse|Bretagne is Grenoble|Lille|Saint-Pierre-de-Chartreuse|Toulouse|Bretagne|Île-de-France


Territorial units may have parents or children, but Territory do not. As a territory may be formed of several territorial units, it has a LCA, a Lowest Common Ancestor.

In [12]:
lyon_and_grenoble = Territory.from_names("COM:38185", "COM:69123")

lyon_and_grenoble.lowest_common_ancestor()

ancestors : {26880, 26869}


Auvergne-Rhône-Alpes

You can easily retrieve all ancestors of a territory with the `.ancestors()` method, and respectively all of its descendants with the `.descendants()` method :

In [13]:
a.ancestors()

{France,
 Île-de-France,
 Auvergne-Rhône-Alpes,
 Val-de-Marne,
 Seine-Saint-Denis,
 Rhône}

In [14]:
print(a)
a.descendants(include_itself=True)

Lyon|Nogent-sur-Marne|Pantin


{Pantin,
 Nogent-sur-Marne,
 Lyon,
 Lyon 9e,
 Lyon 8e,
 Lyon 7e,
 Lyon 6e,
 Lyon 5e,
 Lyon 4e,
 Lyon 3e,
 Lyon 2e,
 Lyon 1er}

Territories are `True` if they are not empty, but you should probably use the `is_empty()` method for clarity.

In [15]:
if Territory.from_names("DEP:69"):
    print("not empty")

if Territory.from_names():
    print("empty")

if Territory.from_names().is_empty():
    print("empty")

not empty
empty


In [16]:
min(ter).level

<Partition.DEP: 3>

In [17]:
from territories import Partition


Partition.DEP > Partition.ARR

True

In [18]:
from territories import Partition

s = random.sample(Territory.tree.nodes(), 1000)
ter = Territory.from_names(*(ter.tu_id for ter in s))

len([tu.tu_id for tu in ter.descendants(include_itself=True) if tu.level == Partition.COM])

2571

In [7]:
import base64

enc = lambda s : "https://platform.datapolitics.fr/territoires/" + base64.b64encode(s.encode('utf-8')).decode('utf-8')
dec = lambda x: base64.b64decode(x.encode()).decode('utf-8').split(':')

hid = 'dmlkZW8tcGFyYWdyYXBoLWxvY2FsLXYxOnNjQWhMZ1BXeERjXzI='

dec(hid)

['video-paragraph-local-v1', 'scAhLgPWxDc_2']

In [4]:
# eid = "14a2cfa0-0f54-40d1-805b-db39d76a44fd/03._DCE_AFPA_SaintHerblain_Principe_de_raccordement.pdf#5"
eid = "s3://fr.datapolitics.content/2168/172b9_MGN_rapport_CV-2019.pdf#68"
# index = "tender-attachment-private-v1"
index = "speech-paragraph-local-v1"

base64.b64encode(f"{index}:{eid}".encode()).decode('utf-8')

'c3BlZWNoLXBhcmFncmFwaC1sb2NhbC12MTpzMzovL2ZyLmRhdGFwb2xpdGljcy5jb250ZW50LzIxNjgvMTcyYjlfTUdOX3JhcHBvcnRfQ1YtMjAxOS5wZGYjNjg='

In [8]:
import re

prepositions = {'de', 'du', 'des', 'le', 'la', 'les', 'd', 'l', 'el', 'von'}
capitalize_name = lambda name: ''.join(word.lower() if word.lower() in prepositions else word.capitalize() for word in re.split(r'([-\']|\s+)', name) if word)

capitalize_name("hugu de maz")


'Hugu de Maz'

In [None]:
raw_tree = Territory.save_tree(return_bytes=True)

with open("full_territorial_tree.gzip", "wb") as file:
    file.write(gzip.compress(raw_tree))

In [1]:
from territories import Territory

Territory.load_tree()


t = Territory.from_names("DEP:69")

In [4]:
[(i.name, i.inhabitants) for i in t.descendants(include_itself=True) if i.inhabitants is None]

[('Lyon 3e', None),
 ('Lyon 9e', None),
 ('Lyon 2e', None),
 ('Lyon 8e', None),
 ('Lyon 7e', None),
 ('Lyon 4e', None),
 ('Lyon 6e', None),
 ('Lyon 5e', None),
 ('Lyon 1er', None)]