# LiDAR Point Cloud Preprocessing Pipeline

This notebook takes **lvx2 data** containing LiDAR point clouds as input, and performs the following pipeline to extract player point clouds and generate bird’s-eye view representations.

---

## Pipeline Overview

1. **Per-frame point cloud extraction**  
2. **Per-LiDAR extrinsic matrix application and integration**  
   - Calibration between multiple LiDARs  
3. **Alignment of the integrated point cloud with the world coordinate system**  
   - Court coordinates (file name) as the reference  
   - Calibration between LiDAR coordinates and world coordinates  
4. **Coordinate-based filtering to remove unnecessary point clouds**  
   - Elimination of gymnasium structures  
   - Apply file-specific hyperparameters (【file name】)  
5. **Bird’s-eye view projection**  
   - Projection using only x and y coordinates  
6. **Object detection on bird’s-eye view (batch processing with YOLO)**  
7. **Conversion between bounding box image coordinates and world coordinates**  

---

## Note on Pipeline Output

The final output of the pipeline is a CSV file containing:  

- **Detections in image coordinates** (used as input for the tracker)  
- **Detections in world coordinates** (used for LiDAR–camera detection matching)  
- **[frame, confidence] values** (used for detection filtering)  

---
## Additional Notes on Processing

- The pipeline steps (1–7) are repeated on a per-frame basis.  
- Due to the gap between the LVX2 DataFrame storage scheme and the LiDAR device’s FPS, even-numbered frames often contain very sparse point clouds.  
- To address this, odd and even frames are integrated, effectively making the processing cycle operate in 2-frame units.



In [None]:
from ultralytics import YOLO
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import cv2
from mpl_toolkits.mplot3d import Axes3D
from typing import Dict, Union, List, Tuple, Optional
import os
import sys
from natsort import natsorted
import json
import struct
from scripts.frame_calibration import extract_synchronized_images
import csv
from collections import defaultdict

### 1. Per-frame point cloud extraction

In [2]:
class LivoxFileReader_v2:
    def __init__(self, file_name):
        self.file_name = file_name
        self.data = self._read_file()  # インスタンスの初期化時に一度だけファイルを読み込む

    def _read_file(self):
        """ファイル全体を読み込み、バイナリデータとして保持する"""
        with open(self.file_name, 'rb') as rf:
            return rf.read()

    def get_lidar_ids(self):
        """読み込んだデータからLiDAR IDsを取得する関数"""
        lidar_ids = []
        Idx = 28
        device_count = self.data[Idx]
        Idx += 1
        for i in range(device_count):
            lidar_id = struct.unpack("=i", self.data[Idx+32:Idx+36])[0]
            lidar_ids.append(lidar_id)
            Idx += 63
        print("DeviceCount", device_count)
        return lidar_ids

    def get_frameInfo_list(self, frame_indices=None):
        """
        指定されたフレーム番号の情報を取得する
        frame_indices: 取得したいフレーム番号のリスト（Noneの場合は全フレームを取得）
        """
        frameInfo_list = []
        Idx = 28
        device_count = self.data[Idx]
        Idx += 1 + device_count * 63  # デバイス情報をスキップ
        file_size = len(self.data)

        frame_counter = 0
        selected_frames = set(frame_indices) if frame_indices is not None else None
        found_frames = set()   # ← 追加: 見つかった番号を記録

        while Idx < file_size:
            if selected_frames is None or frame_counter in selected_frames:
                current_frame_start = struct.unpack("=q", self.data[Idx:Idx+8])[0]
                next_frame_start = struct.unpack("=q", self.data[Idx+8:Idx+16])[0]
                frameInfo_list.append([current_frame_start, next_frame_start])
                found_frames.add(frame_counter)   # ← 追加: 発見

                if selected_frames and frame_counter >= max(selected_frames):
                    break

            # 次のフレームの位置に移動
            Idx = struct.unpack("=q", self.data[Idx+8:Idx+16])[0]
            frame_counter += 1

        # ← 追加: リクエストしたけど見つからなかったフレームをログ出力
        if selected_frames:
            missing = selected_frames - found_frames
            if missing:
                print(f"[WARNING] 以下のフレーム番号はファイル中に存在しません: {sorted(missing)}")

        return np.array(frameInfo_list)
    
    def livoxFileReadPointForEachLidar(self, frameInfo, lidar_id):
        d = self.data
        # print("current frame", frame[0])
        pointsPerFrame = []
        # フレームヘッダ分を処理
        Idx = frameInfo[0] + 24
        # 点群データブロック処理
        while Idx < frameInfo[1] :
            # print("lidar_id", struct.unpack("=i", d[Idx+1:Idx+5])[0])
            # if struct.unpack("=i", d[Idx+1:Idx+5])[0] == lidar_id:
            # print("current lidar_id", struct.unpack("=i", d[Idx+1:Idx+5])[0])
            current_lidar_id = struct.unpack("=i", d[Idx+1:Idx+5])[0]
            dtype = d[Idx + 17]
            Idx = Idx + 27
            if dtype==1:
                # print("dtype",1)
                for i in range(96):
                    B2D_X = struct.unpack("=l", d[Idx:Idx+4])[0]
                    B2D_Y = struct.unpack("=l", d[Idx+4:Idx+8])[0]
                    B2D_Z = struct.unpack("=l", d[Idx+8:Idx+12])[0]
                    # print(B2D_X)
                    point_tmp = [B2D_X, 
                                B2D_Y,
                                B2D_Z
                                    ]
                    # print(B2D_X, B2D_Y, B2D_X)
                    # print(len(point_tmp))
                    if current_lidar_id == lidar_id:
                        pointsPerFrame.append(point_tmp)
                    # print("point_tmp", point_tmp)
                    Idx = Idx + 14
            elif dtype==2:
                print("dtype",2)
        return pointsPerFrame

### 2. Per-LiDAR extrinsic matrix application and integration

In [3]:
class PointCloudTransformer:
    def __init__(self, transformation_matrix_file, lidar_ids):
        self.transformation_matrices = self._load_transformation_matrices(transformation_matrix_file, lidar_ids)

    def _load_transformation_matrices(self, file_name, lidar_ids):
        """変換行列を読み込む（lidar_idsを使用して辞書形式でマッピング）"""
        matrices = {}
        with np.load(file_name) as data:
            for i, lidar_id in enumerate(lidar_ids):
                key = f'transformation_pcd{i+1}'
                matrices[lidar_id] = data[key]
        return matrices

    def apply_transformation(self, points, lidar_id):
        """指定されたLiDAR IDに対応する変換行列を適用して点群を変換"""
        transformation_matrix = self.transformation_matrices[lidar_id]
        transformed_points = np.dot(points, transformation_matrix[:3, :3].T) + transformation_matrix[:3, 3]
        return transformed_points

### 3. Alignment of the integrated point cloud with 

In [None]:
def alignment_points_to_world(LiDAR_points):
        world_points  = np.column_stack((
            merged_points[:,1] - 600,
            -merged_points[:,0] + 7600,
            merged_points[:,2]
        )) / 1000

        return

### 4. Coordinate-based filtering to remove unnecessary point clouds

In [4]:
def crop_points_by_court(points: np.ndarray, landmark_csv: str, z_range: Tuple[float, float] = None) -> np.ndarray:
    """
    コートの4隅の座標を取得し、点群をその範囲内にフィルタリングする。
    :param points: 世界座標系の点群 (N,3)
    :param landmark_csv: コートのランドマークを含むCSVファイル
    :param z_range: (z_min, z_max) の範囲を手動で指定（Noneの場合は適用しない）
    :return: フィルタ後の点群
    """
    landmarks = pd.read_csv(landmark_csv)
    x_min, x_max = landmarks['x'].min(), landmarks['x'].max()
    y_min, y_max = landmarks['y'].min(), landmarks['y'].max()
    
    mask = (
        (points[:, 0] >= x_min) & (points[:, 0] <= x_max) &
        (points[:, 1] >= y_min) & (points[:, 1] <= y_max)
    )
    
    # z範囲のフィルタリングを追加
    if z_range is not None:
        z_min, z_max = z_range
        mask &= (points[:, 2] >= z_min) & (points[:, 2] <= z_max)
    
    return points[mask]

### 5. Bird’s-eye view projection

In [5]:
def generate_birdseye_view(points: np.ndarray, landmark_csv: str, target_size: int = 640) -> np.ndarray:
    """
    点群を2D鳥観図画像に変換（座標範囲に基づきアスペクト比を維持）
    :param points: 世界座標系の点群 (N,3)
    :param landmark_csv: コートのランドマークを含むCSVファイル
    :param target_size: 出力画像サイズ（デフォルト: 長辺が640）
    :return: 2D 鳥観図 (numpy 配列)
    """
    # コートの四隅の座標を取得
    landmarks = pd.read_csv(landmark_csv)
    x_min, x_max = landmarks['x'].min(), landmarks['x'].max()
    y_min, y_max = landmarks['y'].min(), landmarks['y'].max()
    
    # スケール計算（アスペクト比を維持するため最小のスケールを採用）
    scale_x = target_size / (x_max - x_min)
    scale_y = target_size / (y_max - y_min)
    scale = min(scale_x, scale_y)
    
    width = int((x_max - x_min) * scale)
    height = int((y_max - y_min) * scale)
    print(f"width:{width}")
    print(f"height:{height}")
    img = np.ones((height, width, 3), dtype=np.uint8) * 255  # 白背景
    
    # 座標を画像サイズに変換（整数化の際にroundを使用）
    x_img = np.round((points[:, 0] - x_min) * scale).astype(int)
    y_img = height - np.round((points[:, 1] - y_min) * scale).astype(int) - 1

    # 画像範囲外の座標をクリップ
    x_img = np.clip(x_img, 0, width - 1)
    y_img = np.clip(y_img, 0, height - 1)
    
    img[y_img, x_img] = (255, 0, 0)  # 青色の点を描画

    return img

### 6. Object detection on bird’s-eye view (batch processing with YOLO)

In [16]:
def batch_detect_objects_yolo_fast(
    images: List[np.ndarray],
    model: YOLO,
    conf_threshold: float = 0.25,
    batch_size: int = 4
) -> List[List[Tuple[int, Tuple[float, float, float, float], float]]]:
    """
    images: まとめて処理したい画像リスト
    model: 外で一度だけ YOLO(...) しておいたインスタンス
    conf_threshold: 信頼度閾値
    batch_size: 一度に何枚ずつ推論するか
    戻り値: 各画像ごとの検出結果リストのリスト
    """
    all_detections: List[List[Tuple[int, Tuple[float, float, float, float], float]]] = []

    # バッチごとにスライスして推論
    for start in range(0, len(images), batch_size):
        batch_imgs = images[start:start + batch_size]
        # ultralytics はリスト渡しで自動バッチ推論
        results = model(batch_imgs, conf=conf_threshold)

        # 各画像結果をまとめる
        for res in results:
            # GPU→CPU をまとめて一回だけ
            xywhs  = res.boxes.xywh.cpu().numpy()           # shape=(N,4)
            confs  = res.boxes.conf.cpu().numpy()           # shape=(N,)
            clss   = res.boxes.cls.cpu().numpy().astype(int) # shape=(N,)

            dets: List[Tuple[int, Tuple[float,float,float,float], float]] = []
            for (x, y, w, h), conf, cls_id in zip(xywhs, confs, clss):
                dets.append((cls_id, (x, y, w, h), float(conf)))

            all_detections.append(dets)

    return all_detections


### 7. Conversion between bounding box image coordinates and world coordinates

In [None]:
def extract_class0_detections(
    detections: List[Tuple[int, Tuple[float, float, float, float], float]]
) -> List[Tuple[float, Tuple[float, float, float, float]]]:
    """
    クラスID=0 の検出だけを取り出し、
    (confidence, (x_center, y_center, w, h)) のタプルリストで返す。
    :param detections: List[(class_id, (x_center, y_center, w, h), confidence)]
    :return: List[(confidence, (x_center, y_center, w, h))]
    """
    return [
        (conf, bbox)
        for (cls_id, bbox, conf) in detections
        if cls_id == 0
    ]

def generate_occlusion_bboxes(bboxes: List[Tuple[float, float, float, float]]) -> List[Tuple[float, float, float, float]]:
    """
    重なり（IoU > 0）のあるbboxグループを見つけて、それらを囲むocclusion_bboxを返す。
    出力はYOLO形式 (x_center, y_center, width, height)
    """
    n = len(bboxes)
    visited = [False] * n
    groups = []

    def dfs(i, group):
        visited[i] = True
        group.append(bboxes[i])
        for j in range(n):
            if not visited[j] and iou(bboxes[i], bboxes[j]) > 0:
                dfs(j, group)

    for i in range(n):
        if not visited[i]:
            group = []
            dfs(i, group)
            if len(group) > 1:  # 少なくとも1つは重なってる
                groups.append(group)

    # 各グループから囲むbboxを生成
    occlusion_bboxes = []
    for group in groups:
        x_mins = []
        y_mins = []
        x_maxs = []
        y_maxs = []

        for box in group:
            x, y, w, h = box
            x_mins.append(x - w/2)
            y_mins.append(y - h/2)
            x_maxs.append(x + w/2)
            y_maxs.append(y + h/2)

        x_min = min(x_mins)
        y_min = min(y_mins)
        x_max = max(x_maxs)
        y_max = max(y_maxs)

        x_center = (x_min + x_max) / 2
        y_center = (y_min + y_max) / 2
        width = x_max - x_min
        height = y_max - y_min

        occlusion_bboxes.append((x_center, y_center, width, height))

    return occlusion_bboxes

In [9]:
def project_bbox_to_world(bboxes: List[Tuple[float, float, float, float]], lidar_points: np.ndarray) -> np.ndarray:
    """
    BBoxを世界座標系に射影し、ボクセルの範囲 (x_min, y_min, z_min, x_max, y_max, z_max) を返す。
    :param bboxes: BBoxリスト (x, y, w, h)
    :param lidar_points: LiDAR点群
    :return: ボクセルの範囲リスト (M,6) (x_min, y_min, z_min, x_max, y_max, z_max)
    """
    voxel_bounds = []

    for bbox_x, bbox_y, bbox_w, bbox_h in bboxes:
        mask = (
            (lidar_points[:, 0] >= bbox_x - bbox_w / 2) &
            (lidar_points[:, 0] <= bbox_x + bbox_w / 2) &
            (lidar_points[:, 1] >= bbox_y - bbox_h / 2) &
            (lidar_points[:, 1] <= bbox_y + bbox_h / 2)
        )
        points_in_bbox = lidar_points[mask]
        
        if len(points_in_bbox) > 0:
            min_z = np.min(points_in_bbox[:, 2])  # BBox内の最小Z座標
            max_z = np.max(points_in_bbox[:, 2])  # BBox内の最大Z座標

            x_min, x_max = bbox_x - bbox_w / 2, bbox_x + bbox_w / 2
            y_min, y_max = bbox_y - bbox_h / 2, bbox_y + bbox_h / 2
            z_min, z_max = min_z, max_z  # 実際の高さ範囲を適用
            
            voxel_bounds.append((x_min, y_min, z_min, x_max, y_max, z_max))
    
    return np.array(voxel_bounds)


In [10]:
def visualize_points_landmarks_voxels(world_points: np.ndarray, landmark_csv: str, voxels: Optional[np.ndarray] = None, show_3D: bool = False,
    elev: int = 30,
    azim: int = 45,
    voxel_edge_color: str = 'green'
):
    """
    点群、ランドマーク、ボクセルのみを可視化する。

    Parameters
    ----------
    world_points : np.ndarray (N x 3)
        ワールド座標系の点群 (各行が [X, Y, Z])。show_3D=False の場合は Z を無視して XY 平面上にプロット。
    landmark_csv : str
        ランドマーク CSV ファイルへのパス。カラムに 'x','y','z' が含まれることを想定。
    voxels : np.ndarray (M x 6), optional
        各行が (x_min, y_min, z_min, x_max, y_max, z_max) を表す行列。3D 表示時にワイヤーフレームで描画。
    show_3D : bool, default=False
        False の場合は 2D (X–Y 平面) プロット、True の場合は 3D プロット。
    elev : int, default=30
        show_3D=True のときの仰角（elevation）。
    azim : int, default=45
        show_3D=True のときの方位角（azimuth）。
    voxel_edge_color : str, default='green'
        ボクセルのワイヤーフレームを描くときの線色。
    """

    # ──── ランドマーク読み込み ────
    landmarks = pd.read_csv(landmark_csv)
    x_landmarks = landmarks['x'].to_numpy()
    y_landmarks = landmarks['y'].to_numpy()
    # 3D なら z も使う
    z_landmarks = landmarks['z'].to_numpy()

    # ──── 可視化準備 ────
    fig = plt.figure(figsize=(9, 9))
    if show_3D:
        ax = fig.add_subplot(111, projection='3d')
        ax.view_init(elev=elev, azim=azim)
    else:
        ax = fig.add_subplot(111)

    # ──── ランドマークをプロット ────
    if show_3D:
        ax.scatter(x_landmarks, y_landmarks, z_landmarks,
                   c='orange', label='Landmarks', s=20)
    else:
        ax.scatter(x_landmarks, y_landmarks,
                   c='orange', label='Landmarks', s=20)

    # ──── 点群をプロット ────
    if show_3D:
        ax.scatter(world_points[:, 0], world_points[:, 1], world_points[:, 2],
                   c='blue', s=1, label='World Points')
    else:
        ax.scatter(world_points[:, 0], world_points[:, 1],
                   c='blue', s=1, label='World Points')

    # ──── ボクセル（ワイヤーフレーム）をプロット ────
    if voxels is not None and len(voxels) > 0 and show_3D:
        # 各ボクセルについて、8 頂点を組み立て → 12 辺を繋いでワイヤーフレーム表示
        for x_min, y_min, z_min, x_max, y_max, z_max in voxels:
            vertices = np.array([
                [x_min, y_min, z_min],
                [x_max, y_min, z_min],
                [x_max, y_max, z_min],
                [x_min, y_max, z_min],
                [x_min, y_min, z_max],
                [x_max, y_min, z_max],
                [x_max, y_max, z_max],
                [x_min, y_max, z_max]
            ])
            edges = [
                (0, 1), (1, 2), (2, 3), (3, 0),  # 下底面
                (4, 5), (5, 6), (6, 7), (7, 4),  # 上底面
                (0, 4), (1, 5), (2, 6), (3, 7)   # 側面
            ]
            for (i, j) in edges:
                p1 = vertices[i]
                p2 = vertices[j]
                ax.plot3D(
                    [p1[0], p2[0]],
                    [p1[1], p2[1]],
                    [p1[2], p2[2]],
                    color=voxel_edge_color
                )

    # ──── 軸スケール調整（3D のときのみ） ────
    if show_3D:
        # 各軸範囲をそろえて正方立方体のように表示する
        x_limits = ax.get_xlim()
        y_limits = ax.get_ylim()
        z_limits = ax.get_zlim()
        max_range = max(
            x_limits[1] - x_limits[0],
            y_limits[1] - y_limits[0],
            z_limits[1] - z_limits[0]
        ) / 2.0
        mid_x = (x_limits[1] + x_limits[0]) / 2.0
        mid_y = (y_limits[1] + y_limits[0]) / 2.0
        mid_z = (z_limits[1] + z_limits[0]) / 2.0
        ax.set_xlim(mid_x - max_range, mid_x + max_range)
        ax.set_ylim(mid_y - max_range, mid_y + max_range)
        ax.set_zlim(mid_z - max_range, mid_z + max_range)

    # ──── 凡例・タイトル ────
    ax.legend()
    plt.title("Points + Landmarks + Voxels Visualization")
    plt.show()


### Input_filepaths

In [None]:
landmark_csv = "aoki_project/data/read_only/ref_position_handy_laser.csv" 
lidar_model_path = "aoki_project/assets/models/0624_near_only.pt" 
matrices_file = "aoki_project/data/interim/lidar_extrinsics/1127.npz" 
lvx_file = "aoki_project/assets/pcap_data/1stday_3rdmeasure_2024-11-27_11-49-26.lvx2" 
near_end_path = "aoki_project/data/read_only/frames_img/near-end/lidar" 
far_end_path = "aoki_project/data/read_only/frames_img/far-end/lidar" 

## パイプライン実行部分

In [None]:
from scripts.file_dict_generator import generate_file_dict
from scripts.filename_list_generator import collect_sorted_image_filenames

near_1_1_dict = {'00523-01000': [1814, 1971], 
                '02262-02493': [2386, 2461], 
                '06815-07138': [3881, 3987], 
                '07831-08092': [4215, 4301], 
                '11757-12355': [5499, 5696], 
                '12703-13514': [5814, 6080], 
                '17981-18423': [7547, 7692], 
                '18851-19317': [7834, 7984]}

near_1_3_dict = {'00754-01276': [915, 1086], 
                '01943-02233': [1307, 1402], 
                '02987-03451': [1647, 1799], 
                '04060-04466': [2002, 2135], 
                '08323-08787': [3398, 3551], 
                '14094-14413': [5290, 5395], 
                '16298-16588': [6019, 6114], 
                '18473-18763': [6780, 6828], 
                '24708-25143': [8778, 8921], 
                '27405-27650': [9665, 9745]}

measurement_list = ["measurement1", "measurement2", "measurement3"]

file_dict_list = {
    "day1-1" : None,
    "day2-2" : None,
    "day1-3" : None
}

for key, measurement in zip(file_dict_list.keys(), measurement_list):
    file_dict_list[key] = generate_file_dict(near_end_path, measurement)

print(f"file_dict_list : {file_dict_list}")

def convert_to_lvx_frames(frames: List[int], day: int) -> range:
    # LVX フレームは元フレームの2倍を中心に,微妙な補正がかかる

    if day == 1:
        calibrate_number = 0
    elif day == 2:
        calibrate_number = 2

    start = frames[0] * 2 + calibrate_number
    stop  = frames[-1] * 2 + 2 + calibrate_number

    return range(start, stop)

frame_indices = []

#前処理対象のデータを変更したい場合、ここから↓を弄る

#デフォルト：mesurementごとにまとめて前処理したい場合
for folder_number in  file_dict_list["day1-3"]:
    file_range = file_dict_list["day1-3"][folder_number]
    frame_indice = convert_to_lvx_frames(file_range, 1)
    print(frame_indice)
    frame_indices += frame_indice

print(f"frame_indices:{frame_indices}")

#オプション：直接フレーム範囲を指定したいとき用
file_range = [1830, 1850]
frame_indices = convert_to_lvx_frames(file_range, 1)
print(frame_indices)

filename_indices = [x  // 2  for x in frame_indices[::2]]
print(filename_indices)

print(file_dict_list["day1-3"])
filename_list = collect_sorted_image_filenames(near_end_path, file_dict_list["day1-3"])
print(filename_list)

file_dict_list : {'day1-1': {'00523-01000': [1814, 1971], '02262-02493': [2386, 2461], '06815-07138': [3881, 3987], '07831-08092': [4215, 4301], '11757-12355': [5499, 5696], '12703-13514': [5814, 6080], '17981-18423': [7547, 7692], '18851-19317': [7834, 7984]}, 'day2-2': {'00290-00899': [1584, 1783], '04148-04408': [2850, 2935], '07628-07882': [3991, 4075], '10891-11108': [5064, 5135], '13631-14035': [5966, 6099]}, 'day1-3': {'00754-01276': [915, 1086], '01943-02233': [1307, 1402], '02987-03451': [1647, 1799], '04060-04466': [2002, 2135], '08323-08787': [3398, 3551], '14094-14413': [5290, 5395], '16298-16588': [6019, 6114], '18473-18763': [6780, 6828], '24708-25143': [8778, 8921], '27405-27650': [9665, 9745]}}
range(1830, 2174)
range(2614, 2806)
range(3294, 3600)
range(4004, 4272)
range(6796, 7104)
range(10580, 10792)
range(12038, 12230)
range(13560, 13658)
range(17556, 17844)
range(19330, 19492)
frame_indices:[1830, 1831, 1832, 1833, 1834, 1835, 1836, 1837, 1838, 1839, 1840, 1841, 184

In [None]:
import time
from ultralytics import YOLO
import numpy as np
import pandas as pd

# ──────────────────────────────────────────────────────────
# Variable about detection process
BATCH_SIZE        = 4
lidar_yolo_model = YOLO(lidar_model_path)

# ──────────────────────────────────────────────────────────
# Create instance for load point cloud and transformation
LvxReader = LivoxFileReader_v2(lvx_file)
frameInfo = LvxReader.get_frameInfo_list(frame_indices=frame_indices)
lidar_ids = LvxReader.get_lidar_ids()
lidar_ids[0],lidar_ids[1],lidar_ids[2] = lidar_ids[1],lidar_ids[2],lidar_ids[0]
print(lidar_ids)
pcdTransformer = PointCloudTransformer(matrices_file, lidar_ids)
points_one_frame = {lidar_id: [] for lidar_id in lidar_ids}

# ──────────────────────────────────────────────────────────
# Timer_variable
total_livox_time    = 0.0    # Livox 読み込みのみの累積時間
total_other_time    = 0.0    # その他処理の累積時間
total_append_count  = 0      # all_detections.append 呼び出し回数

batch_images     = []
batch_frame_idxs = []

# ──────────────────────────────────────────────────────────

# Per-frame loop
for i, frame in enumerate(frameInfo):
    if i % 2 == 0:
        transformed_points = []

    # 1. Per-frame point cloud extraction
    for index, lidar_id in enumerate(lidar_ids):
        t0 = time.perf_counter()
        pts = LvxReader.livoxFileReadPointForEachLidar(frame, lidar_id)
        t1 = time.perf_counter()
        total_livox_time += (t1 - t0)

    # 2. Per-LiDAR extrinsic matrix application and integration
    for index, lidar_id in enumerate(lidar_ids):
        points_one_frame[lidar_id] = np.array(pts)
        transformed_points.append(
            pcdTransformer.apply_transformation(points_one_frame[lidar_id], lidar_id)
        )

    # integration of even-frame and odd-frame point clouds
    if i % 2 != 0:
        t_other_start = time.perf_counter()
        merged_points = np.vstack(transformed_points)

        # 3. Alignment of the integrated point cloud with the world coordinate system
        world_points = alignment_points_to_world(merged_points)

        # 4. Coordinate-based filtering to remove unnecessary point clouds
        landmarks       = pd.read_csv(landmark_csv)
        court_x_min, court_x_max = landmarks['x'].min(), landmarks['x'].max()
        court_y_min, court_y_max = landmarks['y'].min(), landmarks['y'].max()
        cropped_points = crop_points_by_court(
            world_points, landmark_csv, z_range=(0.5, 2.5)
        )

        # 5. Bird’s-eye view projection
        bev = generate_birdseye_view(cropped_points, landmark_csv)

        # 6. Object detection on bird’s-eye view (batch processing with YOLO)
        batch_images.append(bev)
        batch_frame_idxs.append(filename_indices[int((i-1)/2)])
        if len(batch_images) == BATCH_SIZE:
            batch_results = batch_detect_objects_yolo_fast(
                batch_images,
                lidar_yolo_model,
                conf_threshold=0.25,
                batch_size=BATCH_SIZE
            )

            # 7. Conversion between bounding box image coordinates and world coordinates
            for frame_idx, dets in zip(batch_frame_idxs, batch_results):
                detections_frames.append(dets)
                class0_detections = extract_class0_detections(dets)

                original_birdview_width  = 640
                original_birdview_height = 343
                scale_x = original_birdview_width  / (court_x_max - court_x_min)
                scale_y = original_birdview_height / (court_y_max - court_y_min)
                scale   = min(scale_x, scale_y)

                for confidence, (x_img, y_img, w_img, h_img) in class0_detections:
                    flipped_y = original_birdview_height - y_img
                    bbox_x_world = (x_img   / scale) + court_x_min
                    bbox_y_world = (flipped_y / scale) + court_y_min
                    bbox_w_world = w_img   / scale
                    bbox_h_world = h_img   / scale

                    single_world_bbox = [
                        (bbox_x_world, bbox_y_world, bbox_w_world, bbox_h_world)
                    ]
                    voxel_bounds = project_bbox_to_world(
                        single_world_bbox, cropped_points
                    )
                    bbox_world_voxel = (
                        tuple(voxel_bounds[0]) if voxel_bounds.shape[0] == 1 else None
                    )

                    all_detections.append({
                        'frame':      frame_idx,
                        'confidence': confidence,
                        'bbox_img':   (x_img, y_img, w_img, h_img),
                        'bbox_world': bbox_world_voxel
                    })
                    total_append_count += 1

            batch_images.clear()
            batch_frame_idxs.clear()

        t_other_end = time.perf_counter()
        total_other_time += (t_other_end - t_other_start)

# ---- Remaining batches after the loop are processed in the same process ----
if batch_images:
    t_other_start = time.perf_counter()

    batch_results = batch_detect_objects_yolo_fast(
        batch_images,
        lidar_yolo_model,
        conf_threshold=0.25,
        batch_size=BATCH_SIZE
    )

    for frame_idx, dets in zip(batch_frame_idxs, batch_results):
        detections_frames.append(dets)
        class0_detections = extract_class0_detections(dets)

        original_birdview_width  = 640
        original_birdview_height = 343
        scale_x = original_birdview_width  / (court_x_max - court_x_min)
        scale_y = original_birdview_height / (court_y_max - court_y_min)
        scale   = min(scale_x, scale_y)

        for confidence, (x_img, y_img, w_img, h_img) in class0_detections:
            flipped_y = original_birdview_height - y_img
            bbox_x_world = (x_img   / scale) + court_x_min
            bbox_y_world = (flipped_y / scale) + court_y_min
            bbox_w_world = w_img   / scale
            bbox_h_world = h_img   / scale

            single_world_bbox = [
                (bbox_x_world, bbox_y_world, bbox_w_world, bbox_h_world)
            ]
            voxel_bounds = project_bbox_to_world(
                single_world_bbox, cropped_points
            )
            bbox_world_voxel = (
                tuple(voxel_bounds[0]) if voxel_bounds.shape[0] == 1 else None
            )

            all_detections.append({
                'frame':      frame_idx,
                'confidence': confidence,
                'bbox_img':   (x_img, y_img, w_img, h_img),
                'bbox_world': bbox_world_voxel
            })
            total_append_count += 1

    t_other_end = time.perf_counter()
    total_other_time += (t_other_end - t_other_start)

# ──────────────────────────────────────────────────────────
# Timer_result
print(f"LiDAR読み込み累積時間: {total_livox_time:.3f} 秒")
print(f"その他処理累積時間    : {total_other_time:.3f} 秒")
print(f"all_detections追加回数 : {total_append_count}")
print(
    f"1要素追加あたりの平均時間: "
    f"{(total_other_time/total_append_count)*1000:.3f} ms"
)

width:640
height:343
width:640
height:343
width:640
height:343
width:640
height:343

0: 352x640 12 Persons, 3.2ms
1: 352x640 13 Persons, 3.2ms
2: 352x640 11 Persons, 3.2ms
3: 352x640 13 Persons, 3.2ms
Speed: 5.6ms preprocess, 3.2ms inference, 0.8ms postprocess per image at shape (1, 3, 352, 640)
width:640
height:343
width:640
height:343
width:640
height:343
width:640
height:343

0: 352x640 16 Persons, 3.0ms
1: 352x640 13 Persons, 3.0ms
2: 352x640 14 Persons, 3.0ms
3: 352x640 14 Persons, 3.0ms
Speed: 1.1ms preprocess, 3.0ms inference, 0.8ms postprocess per image at shape (1, 3, 352, 640)
width:640
height:343
width:640
height:343
width:640
height:343
width:640
height:343

0: 352x640 13 Persons, 3.0ms
1: 352x640 15 Persons, 3.0ms
2: 352x640 13 Persons, 3.0ms
3: 352x640 18 Persons, 3.0ms
Speed: 1.1ms preprocess, 3.0ms inference, 0.8ms postprocess per image at shape (1, 3, 352, 640)
width:640
height:343
width:640
height:343
width:640
height:343
width:640
height:343

0: 352x640 13 Persons, 3

  scale_y = original_birdview_height / (court_y_max - court_y_max)


<pstats.Stats at 0x1460620a0c90>

### CSV File Export Process

In [None]:
def write_labels_for_class0_pix(
    detections_img: List[List[Tuple[int, Tuple[float, float, float, float], float]]],
    detections_world: List[np.ndarray],
    image_shape: Tuple[int, int, int],
    save_label_path: str,
    filename_indices: List[int]
) -> None:
    """
    CSV 形式で書き出し：
    cls_id, x1, y1, w, h, confidence, frame, x_min_world, y_min_world, z_min_world, x_max_world, y_max_world, z_max_world
    すべてピクセル座標（x1,y1,w,h は int）として出力します。

    save_label_path にはファイル名（例: "out/my_labels.csv"）を含めて指定してください。
    """
    # save_label_path のディレクトリ部分を取り出し、存在しなければ作成
    dir_path = os.path.dirname(save_label_path)
    if dir_path:
        os.makedirs(dir_path, exist_ok=True)

    csv_path = save_label_path

    header = [
        "cls_id",
        "x1", "y1", "w", "h",
        "confidence",
        "frame",
        "x_min_world", "y_min_world", "z_min_world",
        "x_max_world", "y_max_world", "z_max_world"
    ]
    with open(csv_path, "w", newline="", encoding="utf-8") as fw:
        writer = csv.writer(fw)
        writer.writerow(header)

        img_h, img_w = image_shape[:2]
        for dets, world_arr, frame_idx in zip(detections_img, detections_world, filename_indices):
            for (cls_id, (x_c, y_c, w_pix, h_pix), conf), world in zip(dets, world_arr):
                if cls_id != 0:
                    continue
                # 中心座標→左上座標に変換して丸め
                x1 = int(round(x_c - w_pix / 2))
                y1 = int(round(y_c - h_pix / 2))
                w  = int(round(w_pix))
                h  = int(round(h_pix))
                writer.writerow([
                    cls_id,
                    x1, y1, w, h,
                    f"{conf:.6f}",
                    frame_idx,
                    f"{world[0]:.6f}", f"{world[1]:.6f}", f"{world[2]:.6f}",
                    f"{world[3]:.6f}", f"{world[4]:.6f}", f"{world[5]:.6f}",
                ])

    print(f"[PIXEL CSV] saved {csv_path}")

[Timing] Cell [16] → 0.003 sec


In [None]:
filename_list = list(near_1_3_dict)

# 事前に定義しておく
image_shape = (343, 640, 3)

for i,rd in enumerate(runs_data):
    output_csv_path = f"official_time_test/detection_csv/{filename_list[i]}.csv"
    write_labels_for_class0_pix(
        detections_img   = rd['detections_img'],
        detections_world = rd['detections_world'],
        image_shape      = image_shape,
        save_label_path  = output_csv_path,
        filename_indices = rd['filename_indices'],
    )
    print(f"[PIXEL CSV] saved frames {rd['start']}-{rd['end']} → {path}")


[PIXEL CSV] saved official_time_test/detection_csv/00754-01276.csv
[PIXEL CSV] saved frames 915-1086 → official_time_test/detection_csv/00754-01276.csv
[PIXEL CSV] saved official_time_test/detection_csv/01943-02233.csv


[PIXEL CSV] saved frames 1307-1402 → official_time_test/detection_csv/01943-02233.csv
[PIXEL CSV] saved official_time_test/detection_csv/02987-03451.csv
[PIXEL CSV] saved frames 1647-1799 → official_time_test/detection_csv/02987-03451.csv
[PIXEL CSV] saved official_time_test/detection_csv/04060-04466.csv
[PIXEL CSV] saved frames 2002-2135 → official_time_test/detection_csv/04060-04466.csv
[PIXEL CSV] saved official_time_test/detection_csv/08323-08787.csv
[PIXEL CSV] saved frames 3398-3551 → official_time_test/detection_csv/08323-08787.csv
[PIXEL CSV] saved official_time_test/detection_csv/14094-14413.csv
[PIXEL CSV] saved frames 5290-5395 → official_time_test/detection_csv/14094-14413.csv
[PIXEL CSV] saved official_time_test/detection_csv/16298-16588.csv
[PIXEL CSV] saved frames 6019-6114 → official_time_test/detection_csv/16298-16588.csv
[PIXEL CSV] saved official_time_test/detection_csv/18473-18763.csv
[PIXEL CSV] saved frames 6780-6828 → official_time_test/detection_csv/18473-18763

### (※Option) BEV_img File Export Process

In [None]:
# from PIL import Image

# # NumPy→PIL オブジェクトに変換して、保存
# for i, filename in enumerate(filename_indices):
#     bev_pil = Image.fromarray(cropped_birdseyeview_frames[i])
#     bev_pil.save(f"cropped_near_1-3/cropped_birdsview_organized_near_1-3/27651-/lidar_{(105 + i):06}_{filename:06}.jpg")

[Timing] Cell [20] → 0.000 sec


### (※Option) Post-processing to remove detections with high IoU (duplicate error suppression)

In [None]:
class UnionFind:
    def __init__(self, n: int):
        self.parent = list(range(n))

    def find(self, x: int) -> int:
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, a: int, b: int):
        ra, rb = self.find(a), self.find(b)
        if ra != rb:
            self.parent[rb] = ra


def bbox_iou_pixels(box1: Tuple[int,int,int,int],
                    box2: Tuple[int,int,int,int]) -> float:
    """
    Calculate IoU between two pixel bboxes.
    box = (x1, y1, w, h) where (x1, y1) is top-left.
    """
    x1_a, y1_a, w_a, h_a = box1
    x2_a, y2_a = x1_a + w_a, y1_a + h_a
    x1_b, y1_b, w_b, h_b = box2
    x2_b, y2_b = x1_b + w_b, y1_b + h_b

    xi1, yi1 = max(x1_a, x1_b), max(y1_a, y1_b)
    xi2, yi2 = min(x2_a, x2_b), min(y2_a, y2_b)
    inter_w = max(0, xi2 - xi1)
    inter_h = max(0, yi2 - yi1)
    inter = inter_w * inter_h
    union = w_a * h_a + w_b * h_b - inter
    return inter / union if union > 0 else 0.0


def bbox_overlap_ratio(box1: Tuple[int,int,int,int],
                       box2: Tuple[int,int,int,int]) -> float:
    """
    Calculate overlap ratio between two pixel bboxes.
    overlap_ratio = intersection_area / min(area(box1), area(box2))
    box = (x1, y1, w, h)
    """
    x1_a, y1_a, w_a, h_a = box1
    x2_a, y2_a = x1_a + w_a, y1_a + h_a
    x1_b, y1_b, w_b, h_b = box2
    x2_b, y2_b = x1_b + w_b, y1_b + h_b

    xi1, yi1 = max(x1_a, x1_b), max(y1_a, y1_b)
    xi2, yi2 = min(x2_a, x2_b), min(y2_a, y2_b)
    inter_w = max(0, xi2 - xi1)
    inter_h = max(0, yi2 - yi1)
    inter = inter_w * inter_h

    area_a = w_a * h_a
    area_b = w_b * h_b
    min_area = min(area_a, area_b)
    return inter / min_area if min_area > 0 else 0.0


def merge_labels_from_csv_pixel(
    input_csv: str,
    image_shape: Tuple[int,int,int],
    save_label_path: str,
    thresh: float,
    output_format: str = "csv",
    mode: str = "iou"
) -> None:
    """
    Pixel-based post-processing with two merge modes:
    1) Read CSV with columns:
       cls_id, x1, y1, w, h, confidence, frame,
       x_min_world, y_min_world, z_min_world,
       x_max_world, y_max_world, z_max_world
    2) For each frame, cluster boxes whose overlap score > thresh:
       - mode="iou": IoU score
       - mode="min_ratio": overlap / min(box_area)
    3) Merge each cluster:
       - Pixel bbox: take min x1,y1 and max x2,y2, average confidence
       - World bbox: min of mins and max of maxes
    4) Write out either CSV (same columns) or YOLO .txt per frame
    """
    img_h, img_w = image_shape[:2]

    # --- 1) Read ---
    df = pd.read_csv(input_csv)
    print("Loaded columns:", df.columns.tolist())

    # --- CSV 出力準備 ---
    if output_format.lower() == "csv":
        dir_path = os.path.dirname(save_label_path)
        if dir_path:
            os.makedirs(dir_path, exist_ok=True)
        fout = open(save_label_path, "w", newline="", encoding="utf-8")
        writer = csv.writer(fout)
        writer.writerow(df.columns.tolist())

    # --- 2)&3) フレームごとにクラスタリング & マージ ---
    for frame_id, group in df.groupby("frame"):
        records: List[Dict] = group.to_dict("records")
        n = len(records)
        if n == 0:
            continue

        uf = UnionFind(n)
        for i in range(n):
            for j in range(i+1, n):
                box_i = (records[i]["x1"], records[i]["y1"], records[i]["w"], records[i]["h"])
                box_j = (records[j]["x1"], records[j]["y1"], records[j]["w"], records[j]["h"])
                if mode == "iou":
                    score = bbox_iou_pixels(box_i, box_j)
                else:
                    score = bbox_overlap_ratio(box_i, box_j)
                if score > thresh:
                    uf.union(i, j)

        clusters: Dict[int, List[Dict]] = {}
        for idx in range(n):
            root = uf.find(idx)
            clusters.setdefault(root, []).append(records[idx])

        if output_format.lower() == "yolo":
            os.makedirs(save_label_path, exist_ok=True)

        for members in clusters.values():
            x1_vals = [m["x1"] for m in members]
            y1_vals = [m["y1"] for m in members]
            x2_vals = [m["x1"] + m["w"] for m in members]
            y2_vals = [m["y1"] + m["h"] for m in members]
            confs   = [m["confidence"] for m in members]

            x1_m = min(x1_vals)
            y1_m = min(y1_vals)
            x2_m = max(x2_vals)
            y2_m = max(y2_vals)
            w_m  = x2_m - x1_m
            h_m  = y2_m - y1_m
            conf_m = float(np.mean(confs))

            world_mins = np.min(
                [[m["x_min_world"], m["y_min_world"], m["z_min_world"]] for m in members],
                axis=0
            )
            world_maxs = np.max(
                [[m["x_max_world"], m["y_max_world"], m["z_max_world"]] for m in members],
                axis=0
            )

            if output_format.lower() == "yolo":
                cx = x1_m + w_m / 2
                cy = y1_m + h_m / 2
                out_txt = os.path.join(save_label_path, f"frame_{int(frame_id):06d}.txt")
                with open(out_txt, "w", encoding="utf-8") as fw:
                    fw.write(
                        f"{members[0]['cls_id']} {int(round(cx))} {int(round(cy))} "
                        f"{int(round(w_m))} {int(round(h_m))} {conf_m:.6f}\n"
                    )
            else:
                row = [
                    members[0]["cls_id"],
                    x1_m, y1_m, w_m, h_m,
                    f"{conf_m:.6f}",
                    int(frame_id),
                    f"{world_mins[0]:.6f}", f"{world_mins[1]:.6f}", f"{world_mins[2]:.6f}",
                    f"{world_maxs[0]:.6f}", f"{world_maxs[1]:.6f}", f"{world_maxs[2]:.6f}"
                ]
                writer.writerow(row)

    if output_format.lower() == "csv":
        fout.close()
        print(f"[CSV] merged labels saved to: {save_label_path}")
    else:
        print(f"[YOLO] merged labels saved under directory: {save_label_path}")

[Timing] Cell [21] → 0.004 sec


In [22]:
for filename in near_1_3_dict:
    thresh = 0.3
    mode = "iou"

    merge_labels_from_csv_pixel(
        input_csv       = f"official_time_test/detection_csv/{filename}.csv",
        image_shape     = (343, 640, 3), #上のcsv生成関数呼び出しの方で定義されている
        save_label_path = f"official_time_test/detection_csv/{filename}_merged.csv",
        thresh      = thresh,
        output_format   = "csv",   # または "yolo"
        mode = mode
    )

Loaded columns: ['cls_id', 'x1', 'y1', 'w', 'h', 'confidence', 'frame', 'x_min_world', 'y_min_world', 'z_min_world', 'x_max_world', 'y_max_world', 'z_max_world']
[CSV] merged labels saved to: official_time_test/detection_csv/00754-01276_merged.csv
Loaded columns: ['cls_id', 'x1', 'y1', 'w', 'h', 'confidence', 'frame', 'x_min_world', 'y_min_world', 'z_min_world', 'x_max_world', 'y_max_world', 'z_max_world']
[CSV] merged labels saved to: official_time_test/detection_csv/01943-02233_merged.csv
Loaded columns: ['cls_id', 'x1', 'y1', 'w', 'h', 'confidence', 'frame', 'x_min_world', 'y_min_world', 'z_min_world', 'x_max_world', 'y_max_world', 'z_max_world']
[CSV] merged labels saved to: official_time_test/detection_csv/02987-03451_merged.csv
Loaded columns: ['cls_id', 'x1', 'y1', 'w', 'h', 'confidence', 'frame', 'x_min_world', 'y_min_world', 'z_min_world', 'x_max_world', 'y_max_world', 'z_max_world']
[CSV] merged labels saved to: official_time_test/detection_csv/04060-04466_merged.csv
Loaded c

### (※Option) Post-processing to remove detections with low 

In [None]:
def filter_csv_by_confidence(
    input_csv: str,
    output_csv: str,
    threshold: float,
    inclusive: bool = True
) -> None:
    """
    input_csv を読み込み、confidence が閾値以上（inclusive=True）または
    閾値より大きい（inclusive=False）だけを残して output_csv に書き出します。

    Args:
        input_csv:  フィルタ対象の元 CSV パス
        output_csv: フィルタ後の CSV パス
        threshold:  confidence の閾値 (0.0–1.0)
        inclusive:  閾値を含むかどうか (>= なら True, > なら False)
    """
    # 読み込み
    df = pd.read_csv(input_csv)

    # フィルタ
    if inclusive:
        df_filt = df[df["confidence"] >= threshold]
    else:
        df_filt = df[df["confidence"] >  threshold]

    # 出力先フォルダ作成
    os.makedirs(os.path.dirname(output_csv), exist_ok=True)

    # 書き出し
    df_filt.to_csv(output_csv, index=False)
    print(f"✅ {len(df_filt)} 件を残して {output_csv} に書き出しました。")


[Timing] Cell [24] → 0.002 sec


In [None]:
# confidence >= 0.6 の行だけを残す
# filter_csv_by_confidence(
#     input_csv   = "./labels/merged_labels_0.3.csv",
#     output_csv  = "./labels/merged_labels_0.3_conf0.4.csv",
#     threshold   = 0.4,
#     inclusive   = True
# )

[Timing] Cell [25] → 0.000 sec
