## Ce notebook sert à créer le modèle graphique suivant : 

**Format de l'entrée :** 
- 1ère ligne : $n, m$ deux entiers. $n$ le nombre points, $m$ nombre d'arrêtes.
- $n$ lignes suivantes : matrice $P \in [0, 1]^{n \times 6}$. $P_{ic} =$ probabilité que le points $i$ soit de label $c$ (d'après le classifier).
- $m$ lignes suivantes : arrêtes au format $i, j, d_{ij}$ (de type `int, int, float`)

On a $n \approx 10^6$ et $m \approx 5 n$

**Format de la sorties :**
- n lignes : $x \in \{1, ..., 6\}^n$ résulats de la minimisation de l'énergie $E(x)$ avec BP ou TRW. 

**Energie $E(x)$:**

$E(x) = \sum_{i=1}^n f(P_{i, x_i}) +  \sum_{(i, j) \in \mathcal E} g(d_{ij}) \mathbb 1_{x_i \neq x_j} $

Avec $f : [0, 1] \rightarrow \mathbb R$ décroissante et $g : \mathbb R_+ \rightarrow \mathbb R_+$ décroissante. Pour commencer on peut prendre : 

$E(x) = \sum_{i=1}^n - P_{i, x_i} +  \sum_{(i, j) \in \mathcal E} \alpha \frac{1}{d_{ij}} \mathbb 1_{x_i \neq x_j} $ avec $\alpha \in \mathbb R_+$ à régler 

Cette modélisation vient exprimer le fait suivant : "deux points qui sont proches ont une forte chance de partager le même label". Le but de cette modélisation/minimisation est de trouver des labels $x$ meilleurs que ceux du classifier (qui sont les $(\arg\max_c P_{ic})_i$).

**Score** :

Le score est calculer par rapports aux vrais labels $x^{true} \in \{0, 1, ..., 6\}^n$ (le label $0$ représente les "unclassifed". Les points "unclassifed" n'interviennent pas dans le score.) 

$\text{Score}(x^{true}, x) = \frac 1 6 \sum_{c = 1}^6 \text{IoU}(x^{true}, x, c)$ où $\text{IoU}$ signifie Intersection over Union : $$\text{IoU}(x^{true}, x, c) = \frac{ \{i, x^{true}_i = c\} \cap  \{i, x_i = c\} }{\{i, x^{true}_i = c\} \cup  \{i, x_i = c \text{ et } x^{true}_i \neq 0\}}$$

Pour l'instant, le classifier fait environ $0.45$ mais est très mauvais sur certaines classes (les piétons notamment).

In [1]:
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
import lightgbm as lgb
from multiprocessing import Pool

import pickle
from ply import write_ply, read_ply
from utils import neighborhood, min_radius_max_knns, local_PCA
import pandas as pd

In [2]:
label_names = {0: 'Unclassified', 1: 'Ground', 2: 'Building', 3: 'Poles',
               4: 'Pedestrians', 5: 'Cars', 6: 'Vegetation'}

class CloudDataset:
    
    def __init__(self, name, points, labels=None):
        self.name = name
        self.points = points
        self.labels = labels
    
    def compute_neighborhoods(self, radius=0.25, k=300):
        self.neighs, self.eigvals, self.eigvs = neighborhood(self.points, radius=0.25, k=300)
    
    def compute_features(self, feature_functions):
        features = []
        for func in feature_functions:
            f = func(self.neighs, self.eigvals, self.eigvs)
            if(f.ndim == 1):
                f = f.reshape(-1, 1)
            features.append(f)
        self.features = np.concatenate(features, axis=1)
       
    @property
    def X(self):
        if(self.labels is None):
            return self.features
        else:
            return self.features[self.labels > 0, :]
    
    @property
    def y(self):
        if(self.labels is None):
            return -np.ones(self.points.shape[0])
        else:
            return self.labels[self.labels > 0] - 1
    
    def save_neighborhoods(self, fname=None):
        if(fname is None):
            fname = f"data/{self.name}.pkl"
        data = [self.neighs, self.eigvals, self.eigvs]
        with open(fname, "wb") as f:
            pickle.dump(data, f)
            
    def load_neighborhoods(self, fname=None):
        if(fname is None):
            fname = f"data/{self.name}.pkl"
        with open(fname, "rb") as f:
            data = pickle.load(f)
        self.neighs, self.eigvals, self.eigvs = data
        
    def add_ground_distance_feature(self, ground_bidx=None):
        if(ground_bidx is None):
            ground_bidx = self.labels == 1
        ground = self.points[ground_bidx, :]
        eigvals, eigvects = local_PCA(ground)
        g0 = ground.mean(axis=0)
        points_grounded = (self.points - g0) @ eigvects
        self.features = np.concatenate((self.features, points_grounded[:, [2]]), axis=1)
        
    def unlabel(self):
        if(self.labels is not None):
            self.mem_labels = self.labels
            self.labels = None
        
    def relabel(self):
        if(self.labels is None):
            self.labels = self.mem_labels
        
    def __repr__(self):
        labeled = "unlabeled" if self.labels is None else "labeled"
        return f"CloudDataset<{self.name}, {self.points.shape[0]} points, {labeled}>"
    
class MultiDataset:
    
    def __init__(self, datasets, test_idx=1):
        self.datasets = datasets
        self.test_idx = test_idx
        
    def train_test_split(self):
        self.testset.unlabel()
        ys, Xs = [], []
        for dataset in self.trainsets:
            dataset.relabel()
            Xs.append(dataset.X)
            ys.append(dataset.y)
        
        X_train = np.concatenate(Xs, axis=0)
        y_train = np.concatenate(ys)
        return X_train, self.testset.X, y_train, self.testset.mem_labels
    
    @property
    def testset(self):
        return self.datasets[self.test_idx]
    
    @property
    def trainsets(self):
        datasets = self.datasets[:]
        datasets.pop(self.test_idx)
        return datasets

In [3]:
def load_points(fname):
    cloud_ply = read_ply(fname)
    points = np.vstack((cloud_ply['x'], cloud_ply['y'], cloud_ply['z'])).T
    if('class' in cloud_ply.dtype.fields):
        labels = cloud_ply['class']
        return points, labels
    else:
        return points

paris, paris_labels = load_points("data/MiniParis1.ply")
lille1, lille1_labels = load_points("data/MiniLille1.ply")
lille2, lille2_labels = load_points("data/MiniLille2.ply")
dijon = load_points("data/MiniDijon9.ply")
paris_wh = paris[:, 1] <= 20
paris1, paris1_label = paris[paris_wh], paris_labels[paris_wh]
paris2, paris2_label = paris[~paris_wh], paris_labels[~paris_wh]

datasets = [
    CloudDataset("paris1", paris1, paris1_label),
    CloudDataset("paris2", paris2, paris2_label),
    CloudDataset("lille1", lille1, lille1_labels),
    CloudDataset("lille2", lille2, lille2_labels),
    CloudDataset("dijon", dijon)
]
datasets

[CloudDataset<paris1, 1896865 points, labeled>,
 CloudDataset<paris2, 2262453 points, labeled>,
 CloudDataset<lille1, 1901853 points, labeled>,
 CloudDataset<lille2, 2500428 points, labeled>,
 CloudDataset<dijon, 3079187 points, unlabeled>]

In [4]:
recompute = False

if(recompute):
    for dataset in datasets:
        dataset.compute_neighborhoods(radius=0.15, k=100)
        dataset.save_neighborhoods()
        print(f"-- {dataset.name} neighborhoods computed --")
else:
    for dataset in datasets:
        dataset.load_neighborhoods()
        print(f"-- {dataset.name} neighborhoods loaded --")

-- paris1 neighborhoods loaded --
-- paris2 neighborhoods loaded --
-- lille1 neighborhoods loaded --
-- lille2 neighborhoods loaded --
-- dijon neighborhoods loaded --


In [16]:
def compute_4(an, eigvals, eigs):
    verticality = 2/np.pi * np.arcsin(np.abs(eigs[:, -1, -1]))
    linearity = 1 - eigvals[:, 1] / np.minimum(eigvals[:, 0], 1e-8)
    planarity = (eigvals[:, 1] - eigvals[:, 2]) / np.minimum(eigvals[:, 0], 1e-8)
    sphericity = eigvals[:, 2] / np.minimum(eigvals[:, 0], 1e-8)
    return np.vstack((verticality, linearity, planarity, sphericity)).T

def raw_eigenvalues(an, eigvals, eigs):
    return eigvals

def raw_eigenvector(an, eigvals, eigs):
    return eigs.reshape(-1, 9)

def density(ans, eigvals, eigs):
    return np.array([ns.size for ns in ans], dtype=float).reshape(-1, 1)

feature_functions = [compute_4, raw_eigenvalues, raw_eigenvector, density]

for dataset in datasets:
    dataset.compute_features(feature_functions)

  This is separate from the ipykernel package so we can avoid doing imports until
  after removing the cwd from sys.path.
  after removing the cwd from sys.path.
  """


In [6]:
# plt.figure(figsize=(6, 14))
# for i, data in enumerate(datas):
#     plt.subplot(5, 1, i+1)
#     plt.hist([ns.size for ns in data[0]], bins=100)

In [17]:
test_idx = 1
train_multidataset = MultiDataset(datasets[:-1], test_idx)
X_train, X_test, y_train, labels_test = train_multidataset.train_test_split()
np.bincount(labels_test)

array([199742, 790686, 501958,  14467,  10567,  37790, 707243])

In [44]:
num_round = 100
param = {'num_leaves': 31, 'max_depth': -1, 'objective': 'multiclass', 'num_class': 6, 'max_bin': 30}

train_data = lgb.Dataset(X_train, label=y_train)
bst = lgb.train(param, train_data, num_round)
y_pred = np.concatenate((np.zeros((X_test.shape[0], 1)), bst.predict(X_test)), axis=1) 
labels_pred = np.argmax(y_pred[:, 1:], axis=1) + 1

In [45]:
def print_results(labels_test, labels_pred):
    precisions = []
    recalls = []
    IoUs = []
    for c in range(1, 7):
        precisions.append(np.mean(labels_test[labels_pred == c] == c))
        recalls.append(np.mean(labels_pred[labels_test == c] == c))
        IoUs.append(np.sum((labels_pred == c) & (labels_test == c)) / \
                    np.sum(((labels_pred == c) & (labels_test != 0)) | (labels_test == c)))
        

    return pd.DataFrame({"precision": precisions, "recall": recalls, "IoU": IoUs}, 
                        index=[str(i) + " - " + label_names[i] for i in range(1, 7)])

res = print_results(labels_test, labels_pred)
print("Score:", res["IoU"].mean())
res

Score: 0.45656655547717073


Unnamed: 0,precision,recall,IoU
1 - Ground,0.944945,0.965382,0.924908
2 - Building,0.626729,0.861086,0.714446
3 - Poles,0.290129,0.207645,0.140861
4 - Pedestrians,0.082372,0.018927,0.01632
5 - Cars,0.510388,0.150172,0.139634
6 - Vegetation,0.851386,0.892152,0.80323


In [34]:
# c = 4
# idxs_c = np.argsort(y_pred[:, c])[::-1][:10]
# print(np.bincount(labels_test[idxs_c]) / idxs_c.size)
# print(np.bincount(labels_test[idxs_c])[c] / idxs_c.size)
# plt.plot(y_pred[idxs_c, c])