Usage of the package
========

First, some imports

In [1]:
import gzip
import pickle
import random
from collections import namedtuple

from territories import Territory, MissingTreeCache, Partition

## Creation of the tree

The first step is to create a tree of known entities. This can be a very compute intensive task, depending on the tree size. That is why, by default, once created, the tree is stored on disk.

Here we will create a very simple tree out of the **tree.txt** file.

In [2]:
Node = namedtuple('Node', ('id', 'parent_id', 'label', 'level'))
split = lambda x: (arg if arg != 'null' else None for arg in x[:-1].split('; '))

try:
    Territory.load_tree()
except MissingTreeCache:
    with open("tree.txt", "r") as file:
        lines = file.readlines()
        stream = ([Node(*split(x) )for x in lines])
        Territory.build_tree(data_stream=stream, save_tree=False)

Then, you can start to create territories from arbitrary territoiral units.

Entities associated territories are represented in an efficient way : if all leaves of a parent node are included in the territory, they are simply replaced by their parent node.

In [3]:
# some node of the tree
print('\n'.join([f"{e.name} | {e.level}" for e in random.sample(Territory.tree.nodes(), 12)]))

Dampierre-Saint-Nicolas | COM
Bruys | COM
Champcenest | COM
Jouancy | COM
Mathieu | COM
Cuperly | COM
Le Fête | COM
Saint-Germain | COM
Orthevielle | COM
Mélicocq | COM
Machault | COM
Bayenghem-lès-Éperlecques | COM


In [None]:
with open("tree_large.gzip", "rb") as file:
    lines = pickle.loads(gzip.decompress(file.read()))

stream = ([Node(*split(x) )for x in lines])
Territory.build_tree(data_stream=stream, save_tree=False)

a = Territory.from_tu_ids("COM:69123", "COM:93055", "COM:94052")
b = Territory.from_tu_ids("COM:27429", "REG:84", "DEP:75")
c = Territory.from_tu_ids("COM:38185", "COM:31555", "REG:11")
d = Territory.from_tu_ids("COM:33063", "COM:13055", "REG:28")
e = Territory.from_tu_ids("COM:35238", "COM:35047", "DEP:27")
f = Territory.from_tu_ids("COM:59350", "COM:38442", "REG:53")

You can crate a territory with names of territorial units.

In [None]:
ter = Territory.from_tu_ids("DEP:69", "COM:59350", "ARR:75106")
ter

Paris 6e|Lille|Rhône

If names are invalid, an `NotOnTree` exception will be raised.

In [None]:
try:
    Territory.from_tu_ids("DEP:69", "do not exist", "garbage")
except Exception as e:
    print(e)

do not exist, garbage where not found in the territorial tree


Territories are jsons serializable, you can simply return them from an API endpoint

In [7]:
import json

print(json.dumps(ter, indent=4))

[
    {
        "name": "Paris 6e",
        "tu_id": "ARR:75106",
        "atomic": true,
        "level": "ARR",
        "postal_code": null,
        "inhabitants": null
    },
    {
        "name": "Rh\u00f4ne",
        "tu_id": "DEP:69",
        "atomic": false,
        "level": "DEP",
        "postal_code": null,
        "inhabitants": null
    },
    {
        "name": "Lille",
        "tu_id": "COM:59350",
        "atomic": true,
        "level": "COM",
        "postal_code": null,
        "inhabitants": null
    }
]


## Operations on territories


Usual operation on territories works as expected :

In [8]:
# addition

print(a, c)
print(a + c)

Lyon|Nogent-sur-Marne|Pantin Grenoble|Toulouse|Île-de-France
Grenoble|Lyon|Toulouse|Île-de-France


In [9]:
# substraction

print(a, d)
print(a - d)

Lyon|Nogent-sur-Marne|Pantin Bordeaux|Marseille|Normandie
Lyon|Nogent-sur-Marne|Pantin


More importantly, sets operations are also supported :

In [10]:
# intersection
print(f"Intersection of {a} and {d} is {a & d}")

# union
print(f"Union of {c} and {f} is {f | c}")

Intersection of Lyon|Nogent-sur-Marne|Pantin and Bordeaux|Marseille|Normandie is {}
Union of Grenoble|Toulouse|Île-de-France and Lille|Saint-Pierre-de-Chartreuse|Bretagne is Grenoble|Lille|Saint-Pierre-de-Chartreuse|Toulouse|Bretagne|Île-de-France


Territorial units may have parents or children, but Territory do not. As a territory may be formed of several territorial units, it has a LCA, a Lowest Common Ancestor.

In [None]:
lyon_and_grenoble = Territory.from_tu_ids("COM:38185", "COM:69123")

lyon_and_grenoble.lowest_common_ancestor()

Auvergne-Rhône-Alpes

You can easily retrieve all ancestors of a territory with the `.ancestors()` method, and respectively all of its descendants with the `.descendants()` method :

In [12]:
a.ancestors()

{France,
 Île-de-France,
 Auvergne-Rhône-Alpes,
 Val-de-Marne,
 Seine-Saint-Denis,
 Rhône}

In [13]:
print(a)
a.descendants(include_itself=True)

Lyon|Nogent-sur-Marne|Pantin


{Pantin,
 Nogent-sur-Marne,
 Lyon,
 Lyon 9e,
 Lyon 8e,
 Lyon 7e,
 Lyon 6e,
 Lyon 5e,
 Lyon 4e,
 Lyon 3e,
 Lyon 2e,
 Lyon 1er}

Territories are `True` if they are not empty, but you should probably use the `is_empty()` method for clarity.

In [None]:
if Territory.from_tu_ids("DEP:69"):
    print("not empty")

if Territory.from_tu_ids():
    print("empty")

if Territory.from_tu_ids().is_empty():
    print("empty")

not empty
empty


In [15]:
min(ter).level

<Partition.DEP: 3>

In [16]:
from territories import Partition


Partition.DEP > Partition.ARR

True

In [17]:
from territories import Partition

s = random.sample(Territory.tree.nodes(), 1000)
ter = Territory.from_names(*(ter.tu_id for ter in s))

len([tu.tu_id for tu in ter.descendants(include_itself=True) if tu.level == Partition.COM])

3352

In [6]:
66/400

0.165

In [2]:
import base64

enc = lambda s : "https://platform.datapolitics.fr/territoires/" + base64.b64encode(s.encode('utf-8')).decode('utf-8')
dec = lambda x: base64.b64decode(x.encode()).decode('utf-8').split(':')

hid = 'dGVuZGVyLWF0dGFjaG1lbnQtcHJpdmF0ZS12MTo2MWIyYmNjYy0yNDk2LTRjNzQtYjQxYy00MjFjNjYzZjg4NmUvQ0NBUy1DQ1RQLURDRS1MNC0xMDQtQV8gQ0NUUCBMb3QgMDQgQ1ZDIFBsb21iZXJpZS5wZGYjMjM='

# https://platform.datapolitics.fr/document/==?topicId=1981


dec(hid)

['tender-attachment-private-v1',
 '61b2bccc-2496-4c74-b41c-421c663f886e/CCAS-CCTP-DCE-L4-104-A_ CCTP Lot 04 CVC Plomberie.pdf#23']

In [8]:
import numpy as np

np.sin(np.pi/4)

np.float64(0.7071067811865475)

In [4]:
# eid = "14a2cfa0-0f54-40d1-805b-db39d76a44fd/03._DCE_AFPA_SaintHerblain_Principe_de_raccordement.pdf#5"
eid = "83/b6165bc5042ca018c8765bdf2d5d40dbbb7f858a_dm1-2025-editiqu#50"
# index = "tender-attachment-private-v1"
index = "speech-paragraph-local-v1"

base64.b64encode(f"{index}:{eid}".encode()).decode('utf-8')

'c3BlZWNoLXBhcmFncmFwaC1sb2NhbC12MTo4My9iNjE2NWJjNTA0MmNhMDE4Yzg3NjViZGYyZDVkNDBkYmJiN2Y4NThhX2RtMS0yMDI1LWVkaXRpcXUjNTA='

In [20]:
import re

prepositions = {'de', 'du', 'des', 'le', 'la', 'les', 'd', 'l', 'el', 'von'}
capitalize_name = lambda name: ''.join(word.lower() if word.lower() in prepositions else word.capitalize() for word in re.split(r'([-\']|\s+)', name) if word)

capitalize_name("hugu de maz")


'Hugu de Maz'

In [21]:
# raw_tree = Territory.save_tree(return_bytes=True)

# with open("full_territorial_tree.gzip", "wb") as file:
#     file.write(gzip.compress(raw_tree))

In [1]:
from territories import Territory

Territory.load_tree()


t = Territory.from_names("DEP:69")



In [23]:
[(i.name, i.inhabitants) for i in t.descendants(include_itself=True) if i.inhabitants is None]

[('Lyon 6e', None),
 ('Lyon 7e', None),
 ('Lyon 3e', None),
 ('Lyon 9e', None),
 ('Lyon 1er', None),
 ('Lyon 5e', None),
 ('Lyon 8e', None),
 ('Lyon 2e', None),
 ('Lyon 4e', None)]

In [24]:
def format_entity_label(self) -> str:
    if self.entity:
        return self.entity.name
    else:
        if self.territory.type < Partition.CNTRY:
            if len(self.territory) > 2:
                return ", ".join(tu.name for tu in sorted(self.territory)[:2]) + '...'
            return ", ".join(tu.name for tu in sorted(self.territory))
    return ""

In [40]:
c = ("Argenteuil",
"Athis-Mons",
"Juvisy-sur-Orge",
"Morangis",
"Paray-Vieille-Poste",
"Savigny-sur-Orge",
"Viry-Châtillon",
"Hauts-de-Seine",
"Paris",
"Seine-Saint-Denis",
"Val-de-Marne")

ter = Territory(*[n for n in Territory.tree.nodes() if n.name in c])
ter

Argenteuil|Athis-Mons|Juvisy-sur-Orge|Morangis|Morangis|Paray-Vieille-Poste|Savigny-sur-Orge|Viry-Châtillon|Hauts-de-Seine|Paris|Seine-Saint-Denis|Val-de-Marne

In [26]:
ter.parent()

Marne|Île-de-France

In [29]:
Territory.from_names("COM:93055", "COM:93053")

Noisy-le-Sec|Pantin

In [37]:
str(Territory.from_names("COM:93055"))

'Pantin'

In [30]:
Territory.tree.predecessors(Territory.from_name("COM:93055").tree_id)

[Seine-Saint-Denis]

In [33]:
ter = Territory.from_names("COM:93055", "COM:93053")
ter

Noisy-le-Sec|Pantin

In [35]:
def format_entity_label(ter) -> str:
    format_com = lambda t: f"{t.name} ({Territory.get_parent(t).name})" if t.level == Partition.COM else t.name

    if ter.type < Partition.CNTRY:
        if len(ter) > 2:
            return ", ".join(format_com(tu) for tu in sorted(ter)[:2]) + '...'
        return ", ".join(format_com(tu) for tu in sorted(ter))
    return ""


format_entity_label(ter)

'Pantin (Seine-Saint-Denis), Noisy-le-Sec (Seine-Saint-Denis)'

In [52]:
p = Territory.from_names("DEP:75", "COM:69132").parent()
p

Rhône|Île-de-France

In [54]:
[(d, d.tu_id) for d in p]

[(Île-de-France, 'REG:11'), (Rhône, 'DEP:69')]

In [12]:
import re

report = "@Document-8 je suis @Document-9 bb zsdkzpokdzld"

reverse_id_mapping = {"8" : "111111111", "9" : "222222222"}


re.sub(r'@Document-(\d+)', lambda m: f'{reverse_id_mapping[m.group(1)]}', report)


'111111111 je suis 222222222 bb zsdkzpokdzld'

In [6]:

report = """ArithmeticError

je suis une rppport md avec [des](lien)

"""


prefix = "https://platform.datapolitics.fr/territoires/"
replacer = lambda m: f'[{m.group(1)}]({prefix}{m.group(2)})' if not m.group(2).startswith(prefix) else m.group(0)
re.sub(r'\[([^\]]+)\]\((?:https?://)?([^)]+)\)', replacer, report).strip('`')

'ArithmeticError\n\nje suis une rppport md avec [des](https://platform.datapolitics.fr/territoires/lien)\n\n'

In [None]:
import re

message = "vrvfdv <|APPROVED|> TRUE\noekodk"


message.split("<|APPROVED|>")

['vrvfdv ', ' TRUE\noekodk']

In [28]:
import re

message = "vrvfdv \n<|APPROVED|>    TRUE"


m = re.findall(r"<\|APPROVED\|>([ a-zA-Z]+|$)", message, re.MULTILINE)
print(m)
any("TRUE" == matched.strip() for matched in m)

['    TRUE']


True

In [None]:
re.findall(r"<|APPROVED|>([ a-zA-Z]+)<|APPROVED|>", "<|APPROVED|> YES <|APPROVED|>")

['', '', ' YES ', '', '']