### Point Cloud Dataset Preparation Task

With this script we intend to create base segmentation dataset for RandLA-Net's
training to segment out the apple class objects.
For this task, we first prepare two base separate point cloud datasets with `Fuji-SfM` and `PFuji-Size` datasets. And, then we merge these two datasets to train our models.

Also, by just using the `Fuji-SfM` dataset we have observed that the model is unable to train and generalize the point cloud segmentation learning.
Second, we have also observed that at further upsampling is required even at patch level data PCs for making the learning process more easier for the segmentation model.

Here, in our experimentation we will prepare two merged dataset zip files for training, validation and evaluation. First, will contain the base merged dataset with x, y, z-coordinates and dropped out r,g,b-color information of 30 %. Second, the other dataset will additionally the normal information as well with 30 % of the data being dropped out for making the learning more robust.

### 1. PFuji-Size Data Extraction and Base Point Cloud File Preparation

In [None]:
# installing laspy & opend3d package, post installation restart this python environment
!pip install "laspy[lazrs,laszip]"
!pip install open3d

In [1]:
# for loading the dataset zips into the runtime
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# import statements
import laspy
import random
import numpy as np
import open3d as o3d
from tqdm import tqdm
# for loading all the directory files
from os import listdir
from os.path import isfile, join

In [None]:
# Fuji-SfM and PFuji-Size dataset files for generating datasets
# TODO: update the below paths based on your project setup
!ls drive/MyDrive/point-cloud-prototyping/datasets/*.zip

#### 1.a Data Extration and Preparation of `PFuji-Size` dataset

In [None]:
# unzipping the PFuji-Size dataset point cloud datasets for apple and tree class
# TODO: update the below paths based on your project setup
!unzip drive/MyDrive/point-cloud-prototyping/datasets/2-trees_point_clouds.zip -d .
!unzip drive/MyDrive/point-cloud-prototyping/datasets/3-apple_point_clouds.zip -d .

In [None]:
# for more color robustness we are going to use west side for 2018 orchard
# and east side for 2020 point cloud dataset building, and build two seperate
# '.pkl' point cloud files for the dataset
!ls 2-trees_point_clouds/

Orch2018_east.laz  Orch2018_west.laz  Orch2020-east.laz  Orch2020-west.laz


In [None]:
# extract '.laz' data from the 'Orch2018_west.laz' and 'Orch18_east.laz' files
# only normalization of rgb values but not the xyz coordinates
# TODO: update the below paths based on your project setup
with laspy.open('2-trees_point_clouds/Orch2018_west.laz') as orch_eghtn:
    las = orch_eghtn.read()
    x_coords = (las.X * las.header.scales[0]) + las.header.offsets[0]
    y_coords = (las.Y * las.header.scales[1]) + las.header.offsets[1]
    z_coords = (las.Z * las.header.scales[2]) + las.header.offsets[2]

    r_vals = (las.red - np.min(las.red)) / (np.max(las.red) - np.min(las.red))
    g_vals = (las.green - np.min(las.green)) / (np.max(las.green) - np.min(las.green))
    b_vals = (las.blue - np.min(las.blue)) / (np.max(las.blue) - np.min(las.blue))
    orch_eghtn_pcl_arr_west = np.column_stack((x_coords, y_coords, z_coords,
                                          r_vals, g_vals, b_vals))

In [None]:
# TODO: update the below paths based on your project setup
with laspy.open('2-trees_point_clouds/Orch2018_east.laz') as orch_eghtn:
    las = orch_eghtn.read()
    x_coords = (las.X * las.header.scales[0]) + las.header.offsets[0]
    y_coords = (las.Y * las.header.scales[1]) + las.header.offsets[1]
    z_coords = (las.Z * las.header.scales[2]) + las.header.offsets[2]

    r_vals = (las.red - np.min(las.red)) / (np.max(las.red) - np.min(las.red))
    g_vals = (las.green - np.min(las.green)) / (np.max(las.green) - np.min(las.green))
    b_vals = (las.blue - np.min(las.blue)) / (np.max(las.blue) - np.min(las.blue))
    orch_eghtn_pcl_arr_east = np.column_stack((x_coords, y_coords, z_coords,
                                          r_vals, g_vals, b_vals))

In [None]:
# exploratory analysis for the pcl array for
# the 'Orch2018_west.laz' & 'Orch2018_east.laz' files
print(orch_eghtn_pcl_arr_west, orch_eghtn_pcl_arr_west.shape)
print(orch_eghtn_pcl_arr_east, orch_eghtn_pcl_arr_east.shape)

[[ 0.80662251  3.47955847 -0.18946247  0.61568627  0.52362205  0.40392157]
 [ 0.78669149  3.49546671 -0.1783798   0.55294118  0.45275591  0.3372549 ]
 [ 0.80395353  3.47906494 -0.18989898  0.59215686  0.49606299  0.37647059]
 ...
 [-0.78412449  1.18851221  3.78472638  0.65882353  0.73622047  0.89019608]
 [-0.78315645  1.18451405  3.78412104  0.65490196  0.73228346  0.87058824]
 [-0.77879816  1.15837657  3.78567338  0.63529412  0.70866142  0.8627451 ]] (8884179, 6)
[[-1.53670645 -0.09859817 -0.25578997  0.62598425  0.52380952  0.38823529]
 [-1.53300416 -0.09175679 -0.26506999  0.52362205  0.48015873  0.36470588]
 [-1.5359515  -0.09223113 -0.26814881  0.57874016  0.55555556  0.4627451 ]
 ...
 [ 1.23130023  3.5960362  -0.20349368  0.56692913  0.51190476  0.45882353]
 [ 1.22598445  3.59545898 -0.20638813  0.48425197  0.44444444  0.41568627]
 [ 1.23562944  3.59649801 -0.2042381   0.57480315  0.52777778  0.47843137]] (8763029, 6)


In [None]:
# combine two numpy array pcl data for adding more color data info
orch_eghtn_pcl_arr = np.concatenate((orch_eghtn_pcl_arr_west,
                                     orch_eghtn_pcl_arr_east), axis=0)

In [None]:
# exploratory analysis for the pcl array for the combined pcl dataset
print(orch_eghtn_pcl_arr, orch_eghtn_pcl_arr.shape)

[[ 0.80662251  3.47955847 -0.18946247  0.61568627  0.52362205  0.40392157]
 [ 0.78669149  3.49546671 -0.1783798   0.55294118  0.45275591  0.3372549 ]
 [ 0.80395353  3.47906494 -0.18989898  0.59215686  0.49606299  0.37647059]
 ...
 [ 1.23130023  3.5960362  -0.20349368  0.56692913  0.51190476  0.45882353]
 [ 1.22598445  3.59545898 -0.20638813  0.48425197  0.44444444  0.41568627]
 [ 1.23562944  3.59649801 -0.2042381   0.57480315  0.52777778  0.47843137]] (17647208, 6)


In [None]:
# TODO: update the below paths based on your project setup
with laspy.open('2-trees_point_clouds/Orch2020-east.laz') as orch_twnty:
    las = orch_twnty.read()
    x_coords = (las.X * las.header.scales[0]) + las.header.offsets[0]
    y_coords = (las.Y * las.header.scales[1]) + las.header.offsets[1]
    z_coords = (las.Z * las.header.scales[2]) + las.header.offsets[2]
    r_vals = (las.red - np.min(las.red)) / (np.max(las.red) - np.min(las.red))
    g_vals = (las.green - np.min(las.green)) / (np.max(las.green) - np.min(las.green))
    b_vals = (las.blue - np.min(las.blue)) / (np.max(las.blue) - np.min(las.blue))
    orch_twnty_pcl_arr_east = np.column_stack((x_coords, y_coords, z_coords,
                                          r_vals, g_vals, b_vals))

In [None]:
# TODO: update the below paths based on your project setup
with laspy.open('2-trees_point_clouds/Orch2020-west.laz') as orch_twnty:
    las = orch_twnty.read()
    x_coords = (las.X * las.header.scales[0]) + las.header.offsets[0]
    y_coords = (las.Y * las.header.scales[1]) + las.header.offsets[1]
    z_coords = (las.Z * las.header.scales[2]) + las.header.offsets[2]
    r_vals = (las.red - np.min(las.red)) / (np.max(las.red) - np.min(las.red))
    g_vals = (las.green - np.min(las.green)) / (np.max(las.green) - np.min(las.green))
    b_vals = (las.blue - np.min(las.blue)) / (np.max(las.blue) - np.min(las.blue))
    orch_twnty_pcl_arr_west = np.column_stack((x_coords, y_coords, z_coords,
                                          r_vals, g_vals, b_vals))

In [None]:
# exploratory analysis for the pcl array for
# the 'Orch2020-west.laz' & 'Orch2020-east.laz' files
print(orch_twnty_pcl_arr_west, orch_twnty_pcl_arr_west.shape)
print(orch_twnty_pcl_arr_east, orch_twnty_pcl_arr_east.shape)

[[-0.25839853  2.95192218 -0.11821502  0.16078431  0.14117647  0.10980392]
 [-0.40850973  3.02875566 -0.1349324   0.6627451   0.62352941  0.46666667]
 [-0.40811172  3.03178954 -0.13438649  0.75294118  0.70196078  0.56862745]
 ...
 [ 0.21464436 -0.56504387  3.99386597  0.81960784  0.86666667  0.95686275]
 [ 0.21260631 -0.57097203  3.99224329  0.63529412  0.68627451  0.82745098]
 [ 0.211767   -0.56696969  3.99352241  0.78039216  0.82745098  0.89803922]] (9043033, 6)
[[-1.5013206  -0.4447605  -0.25654498  0.21960784  0.2         0.14117647]
 [-1.52162778 -0.44923183 -0.26363975  0.2         0.18823529  0.13333333]
 [-1.52707398 -0.45156905 -0.26595756  0.20784314  0.2         0.13333333]
 ...
 [ 0.06417678  3.3079443   3.59787273  0.72941176  0.8         0.94509804]
 [ 0.06322569  3.29847312  3.59692883  0.78823529  0.83921569  0.93333333]
 [ 0.06022703  3.31223679  3.596627    0.7372549   0.8         0.89411765]] (9609280, 6)


In [None]:
# combine two numpy array pcl data for adding more color data info
orch_twnty_pcl_arr = np.concatenate((orch_twnty_pcl_arr_west,
                                     orch_twnty_pcl_arr_east), axis=0)

In [None]:
# exploratory analysis for the pcl array for the combined pcl dataset
print(orch_twnty_pcl_arr, orch_twnty_pcl_arr.shape)

[[-0.25839853  2.95192218 -0.11821502  0.16078431  0.14117647  0.10980392]
 [-0.40850973  3.02875566 -0.1349324   0.6627451   0.62352941  0.46666667]
 [-0.40811172  3.03178954 -0.13438649  0.75294118  0.70196078  0.56862745]
 ...
 [ 0.06417678  3.3079443   3.59787273  0.72941176  0.8         0.94509804]
 [ 0.06322569  3.29847312  3.59692883  0.78823529  0.83921569  0.93333333]
 [ 0.06022703  3.31223679  3.596627    0.7372549   0.8         0.89411765]] (18652313, 6)


In [None]:
# loading the apple point clouds from '.txt' files of 2018 and 2020 orchards data
!ls 3-apple_point_clouds/2018_* | head -3

3-apple_point_clouds/2018_01_001.txt
3-apple_point_clouds/2018_01_002.txt
3-apple_point_clouds/2018_01_003.txt


In [None]:
!ls 3-apple_point_clouds/2018_* | tail -3

3-apple_point_clouds/2018_01_338.txt
3-apple_point_clouds/2018_01_339.txt
3-apple_point_clouds/2018_01_340.txt


In [None]:
!ls 3-apple_point_clouds/2020_01* | head -3

3-apple_point_clouds/2020_01_001.txt
3-apple_point_clouds/2020_01_002.txt
3-apple_point_clouds/2020_01_003.txt


In [None]:
!ls 3-apple_point_clouds/2020_01* | tail -3

3-apple_point_clouds/2020_01_335.txt
3-apple_point_clouds/2020_01_336.txt
3-apple_point_clouds/2020_01_337.txt


In [None]:
# peek into the file structure of 2018 and 2020 orchards data
!head -5 3-apple_point_clouds/2018_01_001.txt

-0.22910452 -0.09944689 0.65560591 153 122 79 0.000000 1.000000 0.000000 -0.223879 -0.709210 -0.668505
-0.22910452 -0.09944689 0.65560591 153 122 79 0.000000 1.000000 0.000000 -0.223879 -0.709210 -0.668505
-0.22910452 -0.09944689 0.65560591 153 122 79 0.000000 1.000000 0.000000 -0.223879 -0.709210 -0.668505
-0.22910452 -0.09944689 0.65560591 153 122 79 0.000000 1.000000 0.000000 -0.223879 -0.709210 -0.668505
-0.22910452 -0.09944689 0.65560591 153 121 79 0.000000 1.000000 0.000000 -0.223879 -0.709210 -0.668505


In [None]:
!head -5 3-apple_point_clouds/2018_01_338.txt

0.60063934 2.21131873 0.23708095 144 122 52 0.000000 4.000000 1.000000 0.597226 0.246101 -0.763384
0.61113667 2.20790339 0.24079944 150 125 62 0.000000 1.000000 1.000000 0.473407 0.225360 -0.851527
0.58905882 2.21994638 0.22884357 148 125 49 0.000000 1.000000 1.000000 0.543262 -0.115061 -0.831642
0.60526359 2.20557427 0.23708454 142 120 56 0.000000 1.000000 1.000000 0.509396 0.107619 -0.853776
0.60024565 2.20175362 0.23327842 143 118 52 0.000000 1.000000 1.000000 0.536338 0.026062 -0.843601


In [None]:
# verifying whether the '.laz' extraction was done correctly
print(np.max(orch_eghtn_pcl_arr[:, 0]), np.min(orch_eghtn_pcl_arr[:, 0]))
print(np.max(orch_eghtn_pcl_arr[:, 1]), np.min(orch_eghtn_pcl_arr[:, 1]))
print(np.max(orch_eghtn_pcl_arr[:, 2]), np.min(orch_eghtn_pcl_arr[:, 2]))

2.47710729 -1.98462462
3.74508071 -0.34017122
3.98604536 -0.28742504


In [None]:
!head -5 3-apple_point_clouds/2020_01_001.txt

-0.32185173 -0.33695343 0.36797228 137 129 57 0.000000 1.000000 0.000000 -0.517100 0.349566 -0.781288
-0.32471815 -0.33817196 0.36393949 131 119 59 0.000000 1.000000 0.000000 -0.054135 -0.134761 0.989398
-0.32538888 -0.34947816 0.36468124 122 111 46 0.000000 1.000000 0.000000 -0.019902 0.998259 0.055517
-0.31823555 -0.33507586 0.36933643 124 118 56 0.000000 1.000000 0.000000 -0.797107 0.570673 -0.197364
-0.32462609 -0.34697980 0.36600029 121 113 48 0.000000 1.000000 0.000000 -0.448421 0.822370 -0.350181


In [None]:
!head -5 3-apple_point_clouds/2020_01_337.txt

-0.15741143 1.37385952 2.87322736 41 47 21 0.000000 1.000000 0.000000 0.050819 -0.877652 0.476597
-0.15816760 1.36673987 2.86788082 44 50 22 0.000000 1.000000 0.000000 -0.330476 -0.796654 0.506091
-0.15923078 1.36683381 2.87112236 44 50 22 0.000000 1.000000 0.000000 -0.432213 -0.652491 0.622453
-0.15587974 1.37375402 2.87360597 41 46 21 0.000000 1.000000 0.000000 -0.662164 0.391354 -0.639046
-0.15897395 1.36760497 2.87156463 43 49 21 0.000000 1.000000 0.000000 -0.574453 -0.503699 0.645206


In [None]:
# verifying whether the '.laz' extraction was done correctly
print(np.max(orch_twnty_pcl_arr[:, 0]), np.min(orch_twnty_pcl_arr[:, 0]))
print(np.max(orch_twnty_pcl_arr[:, 1]), np.min(orch_twnty_pcl_arr[:, 1]))
print(np.max(orch_twnty_pcl_arr[:, 2]), np.min(orch_twnty_pcl_arr[:, 2]))

1.3716284 -1.59532678
3.56140113 -0.80763555
3.73156381 -0.41045624


In [None]:
# TODO: update the below paths based on your project setup
PFUJI_ANN_PATH = '3-apple_point_clouds/'
ann_data_files = [f for f in listdir(PFUJI_ANN_PATH)
                    if isfile(join(PFUJI_ANN_PATH, f))]
ann_2018_data_files, ann_2020_data_files = [], []
ann_2018_data_files.append([f for f in ann_data_files if '2018_01_' in f])
ann_2020_data_files.append([f for f in ann_data_files if '2020_01_' in f])
ann_2018_data_files, ann_2020_data_files = ann_2018_data_files[0], ann_2020_data_files[0]

In [None]:
# verifying the length of the created lists in the project
print(len(ann_data_files), len(ann_2018_data_files), len(ann_2020_data_files))

640 303 312


In [None]:
# loading all the files into a numpy array and appending those array values to list
pcl_apple_data_list_2018, pcl_apple_data_list_2020 = [], []

for files in tqdm(ann_2018_data_files):
    pcl_temp_arr = np.loadtxt(PFUJI_ANN_PATH + files)
    pcl_temp_arr = pcl_temp_arr[:, 0:6]
    pcl_temp_arr[:, 3:6] = pcl_temp_arr[:,3:6] / 255
    pcl_apple_data_list_2018 = pcl_apple_data_list_2018 + pcl_temp_arr.tolist()

100%|██████████| 303/303 [01:23<00:00,  3.64it/s]


In [None]:
print(len(pcl_apple_data_list_2018))

4779434


In [None]:
for files in tqdm(ann_2020_data_files):
    pcl_temp_arr = np.loadtxt(PFUJI_ANN_PATH + files)
    pcl_temp_arr = pcl_temp_arr[:, 0:6]
    pcl_temp_arr[:, 3:6] = pcl_temp_arr[:,3:6] / 255
    pcl_apple_data_list_2020 = pcl_apple_data_list_2020 + pcl_temp_arr.tolist()

100%|██████████| 312/312 [00:54<00:00,  5.75it/s]


In [None]:
print(len(pcl_apple_data_list_2020))

3215507


In [None]:
# convert these lists into corresponding numpy arrays and append
appl_eghtn_pcl_arr = np.asarray(pcl_apple_data_list_2018)
appl_twnty_pcl_arr = np.asarray(pcl_apple_data_list_2020)
print(appl_eghtn_pcl_arr.shape, appl_twnty_pcl_arr.shape)
print(orch_eghtn_pcl_arr.shape, orch_twnty_pcl_arr.shape)

(4779434, 6) (3215507, 6)
(17647208, 6) (18652313, 6)


In [None]:
appl_eghtn_arr_one = np.ones(4779434)
appl_twnty_arr_one = np.ones(3215507)
orch_eghtn_arr_zer = np.zeros(17647208)
orch_twnty_arr_zer = np.zeros(18652313)

In [None]:
appl_eghtn_pcl_arr = np.column_stack((appl_eghtn_pcl_arr, appl_eghtn_arr_one))
appl_twnty_pcl_arr = np.column_stack((appl_twnty_pcl_arr, appl_twnty_arr_one))
orch_eghtn_pcl_arr = np.column_stack((orch_eghtn_pcl_arr, orch_eghtn_arr_zer))
orch_twnty_pcl_arr = np.column_stack((orch_twnty_pcl_arr, orch_twnty_arr_zer))

In [None]:
# exploratory data analysis for further improving the segmentation model training
# observing the point cloud limit of apple class arrays in the orchard 2018 PCLs
print(np.min(appl_eghtn_pcl_arr[:, 0]), np.max(appl_eghtn_pcl_arr[:, 0]),
      (np.max(appl_eghtn_pcl_arr[:, 0]) - np.min(appl_eghtn_pcl_arr[:, 0])))

print(np.min(appl_eghtn_pcl_arr[:, 1]), np.max(appl_eghtn_pcl_arr[:, 1]),
      (np.max(appl_eghtn_pcl_arr[:, 1]) - np.min(appl_eghtn_pcl_arr[:, 1])))

print(np.min(appl_eghtn_pcl_arr[:, 2]), np.max(appl_eghtn_pcl_arr[:, 2]),
      (np.max(appl_eghtn_pcl_arr[:, 2]) - np.min(appl_eghtn_pcl_arr[:, 2])))

-0.60293728 0.70358139 1.30651867
-0.10622169 2.92781162 3.03403331
0.18001336 3.22922134 3.0492079800000003


In [None]:
print(np.min(orch_eghtn_pcl_arr[:, 0]), np.max(orch_eghtn_pcl_arr[:, 0]),
      (np.max(orch_eghtn_pcl_arr[:, 0]) - np.min(orch_eghtn_pcl_arr[:, 0])))

print(np.min(orch_eghtn_pcl_arr[:, 1]), np.max(orch_eghtn_pcl_arr[:, 1]),
      (np.max(orch_eghtn_pcl_arr[:, 1]) - np.min(orch_eghtn_pcl_arr[:, 1])))

print(np.min(orch_eghtn_pcl_arr[:, 2]), np.max(orch_eghtn_pcl_arr[:, 2]),
      (np.max(orch_eghtn_pcl_arr[:, 2]) - np.min(orch_eghtn_pcl_arr[:, 2])))

-1.98462462 2.47710729 4.46173191
-0.36339155 3.74508071 4.10847226
-0.41931444 3.98604536 4.4053598


In [None]:
# reduction of array size of application of deep learning models on this dataset
orch_eghtn_pcl_arr = orch_eghtn_pcl_arr[orch_eghtn_pcl_arr[:,0] >= -0.75]
orch_eghtn_pcl_arr = orch_eghtn_pcl_arr[orch_eghtn_pcl_arr[:,0] <= 0.75]

orch_eghtn_pcl_arr = orch_eghtn_pcl_arr[orch_eghtn_pcl_arr[:,1] >= -0.25]
orch_eghtn_pcl_arr = orch_eghtn_pcl_arr[orch_eghtn_pcl_arr[:,1] <= 3.0]

orch_eghtn_pcl_arr = orch_eghtn_pcl_arr[orch_eghtn_pcl_arr[:,2] >= 0.0]
orch_eghtn_pcl_arr = orch_eghtn_pcl_arr[orch_eghtn_pcl_arr[:,2] <= 3.25]
print(orch_eghtn_pcl_arr, orch_eghtn_pcl_arr.shape)

[[0.15063816 2.86242628 0.07597023 ... 0.12598425 0.07058824 0.        ]
 [0.15288696 2.86359286 0.07708284 ... 0.12992126 0.0627451  0.        ]
 [0.15185295 2.86449361 0.07853915 ... 0.13385827 0.0627451  0.        ]
 ...
 [0.7420758  2.66130161 1.85778499 ... 0.67063492 0.66666667 0.        ]
 [0.72970802 2.65539908 1.86040342 ... 0.43253968 0.2        0.        ]
 [0.73756379 2.67185354 1.85862207 ... 0.32539683 0.23529412 0.        ]] (11694528, 7)


In [None]:
# exploratory data analysis for further improving the segmentation model training
# observing the point cloud limit of apple class arrays in the orchard 2020 PCLs
print(np.min(appl_twnty_pcl_arr[:, 0]), np.max(appl_twnty_pcl_arr[:, 0]),
      (np.max(appl_twnty_pcl_arr[:, 0]) - np.min(appl_twnty_pcl_arr[:, 0])))

print(np.min(appl_twnty_pcl_arr[:, 1]), np.max(appl_twnty_pcl_arr[:, 1]),
      (np.max(appl_twnty_pcl_arr[:, 1]) - np.min(appl_twnty_pcl_arr[:, 1])))

print(np.min(appl_twnty_pcl_arr[:, 2]), np.max(appl_twnty_pcl_arr[:, 2]),
      (np.max(appl_twnty_pcl_arr[:, 2]) - np.min(appl_twnty_pcl_arr[:, 2])))

-0.83455122 0.82778805 1.66233927
-0.38381717 2.58056998 2.96438715
0.04616591 2.94004345 2.89387754


In [None]:
print(np.min(orch_twnty_pcl_arr[:, 0]), np.max(orch_twnty_pcl_arr[:, 0]),
      (np.max(orch_twnty_pcl_arr[:, 0]) - np.min(orch_twnty_pcl_arr[:, 0])))

print(np.min(orch_twnty_pcl_arr[:, 1]), np.max(orch_twnty_pcl_arr[:, 1]),
      (np.max(orch_twnty_pcl_arr[:, 1]) - np.min(orch_twnty_pcl_arr[:, 1])))

print(np.min(orch_twnty_pcl_arr[:, 2]), np.max(orch_twnty_pcl_arr[:, 2]),
      (np.max(orch_twnty_pcl_arr[:, 2]) - np.min(orch_twnty_pcl_arr[:, 2])))

-1.59532678 1.4571104000000001 3.05243718
-0.8208921 3.56140113 4.38229323
-0.41938722 3.99765754 4.4170447600000005


In [None]:
# reduction of array size of application of deep learning models on this dataset
orch_twnty_pcl_arr = orch_twnty_pcl_arr[orch_twnty_pcl_arr[:,0] >= -1.00]
orch_twnty_pcl_arr = orch_twnty_pcl_arr[orch_twnty_pcl_arr[:,0] <= 1.00]

orch_twnty_pcl_arr = orch_twnty_pcl_arr[orch_twnty_pcl_arr[:,1] >= -0.50]
orch_twnty_pcl_arr = orch_twnty_pcl_arr[orch_twnty_pcl_arr[:,1] <= 2.75]

orch_twnty_pcl_arr = orch_twnty_pcl_arr[orch_twnty_pcl_arr[:,2] >= 0.0]
orch_twnty_pcl_arr = orch_twnty_pcl_arr[orch_twnty_pcl_arr[:,2] <= 3.0]
print(orch_twnty_pcl_arr, orch_twnty_pcl_arr.shape)

[[-0.40458879  2.00968909  0.01972301 ...  0.16078431  0.14117647
   0.        ]
 [-0.41081741  2.00792456  0.02378946 ...  0.17647059  0.14509804
   0.        ]
 [-0.41779482  2.00558281  0.03036218 ...  0.16078431  0.14901961
   0.        ]
 ...
 [ 0.58761132  2.698668    2.95487523 ...  0.91372549  0.96862745
   0.        ]
 [ 0.59226578  2.69258618  2.94578671 ...  0.91372549  0.97254902
   0.        ]
 [ 0.59276831  2.69483328  2.95398736 ...  0.91764706  0.96862745
   0.        ]] (10997997, 7)


In [None]:
# fraction of apple class present in comparison to the background tree class
print(appl_eghtn_pcl_arr.shape[0] / orch_eghtn_pcl_arr.shape[0]) # double upsampling for apple class
print(appl_twnty_pcl_arr.shape[0] / orch_twnty_pcl_arr.shape[0]) # triple upsampling for apple class
# upsampling crashes the session for such large array, best to create patches and then upsample the pcl dataset

0.4086897735419506
0.29237205647537456


In [None]:
# merging apple and tree class for 2018 orchard pcl data
comp_eghtn_pcl_arr = np.concatenate((appl_eghtn_pcl_arr,
                                     orch_eghtn_pcl_arr), axis=0)

print(comp_eghtn_pcl_arr, comp_eghtn_pcl_arr.shape)

[[-0.28477782  0.83724743  1.37119353 ...  0.38823529  0.2
   1.        ]
 [-0.28477782  0.83724743  1.37119353 ...  0.38823529  0.2
   1.        ]
 [-0.28534016  0.83995342  1.35774648 ...  0.45882353  0.23137255
   1.        ]
 ...
 [ 0.7420758   2.66130161  1.85778499 ...  0.67063492  0.66666667
   0.        ]
 [ 0.72970802  2.65539908  1.86040342 ...  0.43253968  0.2
   0.        ]
 [ 0.73756379  2.67185354  1.85862207 ...  0.32539683  0.23529412
   0.        ]] (16473962, 7)


In [None]:
# cleaning out the memory with deletion of extra arrays
del appl_eghtn_pcl_arr
del orch_eghtn_pcl_arr
del orch_eghtn_pcl_arr_west
del orch_eghtn_pcl_arr_east

In [None]:
# save the numpy array in current working directory
# TODO: update the below paths based on your project setup
np.save('pfuji-size-pcl-processed-2018.npy', comp_eghtn_pcl_arr)

In [None]:
# merging apple and tree class for 2020 orchard pcl data
comp_twnty_pcl_arr = np.concatenate((appl_twnty_pcl_arr,
                                     orch_twnty_pcl_arr), axis=0)

print(comp_twnty_pcl_arr, comp_twnty_pcl_arr.shape)

In [None]:
# cleaning out the memory with deletion of extra arrays
del appl_twnty_pcl_arr
del orch_twnty_pcl_arr
del orch_twnty_pcl_arr_west
del orch_twnty_pcl_arr_east

In [None]:
# save the numpy array in current working directory
# TODO: update the below paths based on your project setup
np.save('pfuji-size-pcl-processed-2020.npy', comp_twnty_pcl_arr)

In [None]:
!ls

2-trees_point_clouds  pfuji-size-pcl-processed-2018.npy
3-apple_point_clouds  pfuji-size-pcl-processed-2020.npy
drive		      sample_data


In [None]:
# storing a zipped version of the dataset for further processing into a merged dataset creation
# TODO: update the below paths based on your project setup
!zip -r drive/MyDrive/point-cloud-prototyping/datasets/pfuji-size-pcl-processed-2018.zip pfuji-size-pcl-processed-2018.npy
!zip -r drive/MyDrive/point-cloud-prototyping/datasets/pfuji-size-pcl-processed-2020.zip pfuji-size-pcl-processed-2020.npy

  adding: pfuji-size-pcl-processed-2018.npy (deflated 52%)
  adding: pfuji-size-pcl-processed-2020.npy (deflated 51%)


### 2. Fuji-SfM Data Extraction and Base Point Cloud File Preparation

In [None]:
# installing open3d library for data upsampling task
!pip install open3d

In [9]:
# listing all directory and data processing import statements
import pickle
import numpy as np
import open3d as o3d
from os import listdir
from tqdm import tqdm
from os.path import isfile, join

In [None]:
# extracting 3D point cloud data for creating base point cloud npy file
# TODO: update the below paths based on your project setup
!unzip drive/MyDrive/point-cloud-prototyping/datasets/3-3D_data.zip -d .

In [6]:
# structure of the extracted txt dataset file
!ls 3-3D_data/

Fuji_apple_trees_point_cloud.txt  GroundTruth-3D_apples_locations  readme.txt


In [None]:
# loading txt data file onto the pcl numpy array
# TODO: update the below paths based on your project setup
pc_arr = np.loadtxt('3-3D_data/Fuji_apple_trees_point_cloud.txt')

In [None]:
# exploratory data analysis for the general shape of the loaded point cloud
print(pc_arr)
print(pc_arr.shape)

[[ 11.50647926  75.69534302 312.91494751 206.         206.
  202.        ]
 [ 11.52078819  75.62413788 312.97903442 205.         207.
  203.        ]
 [ 11.52173042  75.6230011  312.97857666 144.         151.
  137.        ]
 ...
 [ 10.69560528  79.81440735 308.80877686 114.          98.
   60.        ]
 [ 10.70045471  79.81336975 308.80587769 107.         115.
   64.        ]
 [ 10.69812202  79.81700134 308.80447388 109.         123.
   79.        ]]
(10853801, 6)


In [None]:
# list containing array coordinate values of the bounding boxes in apple class objects
bb_box_coord_lst = [] 

In [None]:
# listing training and testing directories for the 3D apple pcl annotations
!ls 3-3D_data/GroundTruth-3D_apples_locations

test_set  training_set


In [None]:
# test set apple bounding box coordinate list
# TODO: update the below paths based on your project setup
bb_coord_path_test = '3-3D_data/GroundTruth-3D_apples_locations/test_set'
bb_coord_files_test = [f for f in listdir(bb_coord_path_test) if isfile(join(bb_coord_path_test, f))]
bb_coord_files_test[:10]

['Tree01-annotations_000097.txt',
 'Tree06-annotations_000017.txt',
 'Tree01-annotations_000011.txt',
 'Tree05-annotations_000091.txt',
 'Tree07-annotations_000075.txt',
 'Tree04-annotations_000011.txt',
 'Tree07-annotations_000073.txt',
 'Tree03-annotations_000014.txt',
 'Tree01-annotations_000082.txt',
 'Tree04-annotations_000100.txt']

In [None]:
# train set apple bounding box coordinate list
# TODO: update the below paths based on your project setup
bb_coord_path_train = '3-3D_data/GroundTruth-3D_apples_locations/training_set'
bb_coord_files_train = [f for f in listdir(bb_coord_path_train) if isfile(join(bb_coord_path_train, f))]
bb_coord_files_train[:10]

['Tree11-annotations_000092.txt',
 'Tree09-annotations_000116.txt',
 'Tree11-annotations_000031.txt',
 'Tree09-annotations_000111.txt',
 'Tree11-annotations_000051.txt',
 'Tree10-annotations_000055.txt',
 'Tree11-annotations_000117.txt',
 'Tree10-annotations_000113.txt',
 'Tree09-annotations_000046.txt',
 'Tree11-annotations_000155.txt']

In [None]:
# reader loop for test set bounding box annotation preparation
for bb_test_file in bb_coord_files_test:
    bb_test_arr = np.loadtxt(bb_coord_path_test+'/'+bb_test_file)
    bb_test_arr = bb_test_arr[1:]
    bb_coords = [
            np.min(bb_test_arr[:,0]), np.max(bb_test_arr[:,0]),
            np.min(bb_test_arr[:,1]), np.max(bb_test_arr[:,1]),
            np.min(bb_test_arr[:,2]), np.max(bb_test_arr[:,2])
        ]
    
    bb_box_coord_lst.append(bb_coords)

In [None]:
# reader loop for train set bounding box annotation  preparation
for bb_train_file in bb_coord_files_train:
    bb_train_arr = np.loadtxt(bb_coord_path_train+'/'+bb_train_file)
    bb_train_arr = bb_train_arr[1:]
    bb_coords = [
            np.min(bb_test_arr[:,0]), np.max(bb_test_arr[:,0]),
            np.min(bb_test_arr[:,1]), np.max(bb_test_arr[:,1]),
            np.min(bb_test_arr[:,2]), np.max(bb_test_arr[:,2])
        ]
    bb_box_coord_lst.append(bb_coords)

In [None]:
# converting array to list for effective pcl data editing operation
pc_list = pc_arr.tolist()

In [None]:
# preparation of apple pcl list data
apple_pc_list = []

In [None]:
# iterating over the bounding box coordinate limits of different apple class objects
# and also iterating over the complete point cloud to separate out the apple point cloud
pc_idx = 0
for pc_coord in tqdm(pc_list):
    for bb_coord_vals in bb_box_coord_lst:
        if ( pc_coord[0] >= bb_coord_vals[0] and pc_coord[0] <= bb_coord_vals[1] ) \
        and ( pc_coord[1] >= bb_coord_vals[2] and pc_coord[1] <= bb_coord_vals[3] ) \
        and ( pc_coord[2] >= bb_coord_vals[4] and pc_coord[2] <= bb_coord_vals[5] ):
            apple_pc_list.append(pc_list.pop(pc_idx))
            # print(len(apple_pc_list))
            break
    pc_idx = pc_idx + 1
# note: most time-consuming bottleneck step with approximately two hour processing time
# limitation note: the apple class objects are annotated with a square bounding box
# and might include some noisy tree point cloud also leading to noisy labels

 95%|█████████▌| 10352754/10853801 [2:04:52<06:02, 1381.69it/s]


In [None]:
# listing the resultant point cloud size for the apple and the tree objects
len(apple_pc_list), len(pc_list), pc_arr.shape

(501047, 10352754, (10853801, 6))

In [None]:
# appending label to the resultant point cloud rows of apple class
apple_pc_list_lbl = []
for apple_pc in apple_pc_list:
    apple_pc = apple_pc[:6]
    apple_pc.append(1.0)
    apple_pc_list_lbl.append(apple_pc)

In [None]:
# appending label to the resultant point cloud rows of tree class
bg_pc_list_lbl = []
for bg_pc in pc_list:
    bg_pc.append(0.0)
    bg_pc_list_lbl.append(bg_pc)

In [None]:
# merging two point cloud lists for generating the resultant pcl list
pc_total_list = apple_pc_list_lbl + bg_pc_list_lbl
print( len(pc_total_list), len(apple_pc_list_lbl), len(bg_pc_list_lbl))

10853801 501047 10352754


In [None]:
# generating the final resultant array from the pcl list
pc_arr_new = np.array(pc_total_list)
pc_arr_new.shape

(10853801, 7)

In [11]:
# basic exploratory analysis to check the value
pc_arr_new

array([[ 11.73235703,  74.63858032, 311.94485474, ...,  68.        ,
         15.        ,   1.        ],
       [ 11.73261738,  74.63868713, 311.94223022, ...,  59.        ,
         19.        ,   1.        ],
       [ 11.73064613,  74.6374588 , 311.94668579, ...,  82.        ,
         16.        ,   1.        ],
       ...,
       [ 10.69560528,  79.81440735, 308.80877686, ...,  98.        ,
         60.        ,   0.        ],
       [ 10.70045471,  79.81336975, 308.80587769, ..., 115.        ,
         64.        ,   0.        ],
       [ 10.69812202,  79.81700134, 308.80447388, ..., 123.        ,
         79.        ,   0.        ]])

In [None]:
import pickle
with open('drive/MyDrive/faro-prototyping/datasets/fuji_apple_data_preprocessed.pkl','wb') as f:
    pickle.dump(pc_arr_new, f)

In [10]:
with open('drive/MyDrive/point-cloud-prototyping/datasets/fuji_apple_data_preprocessed.pkl','rb') as f:
     pc_arr_new = pickle.load(f)
     print(pc_arr_new.shape)

(10853801, 7)


In [12]:
# y-axis coordinate limits
print(np.min(pc_arr_new[:,1]), np.max(pc_arr_new[:,1]), (np.max(pc_arr_new[:,1]) - np.min(pc_arr_new[:,1])) )
# x-axis coordinate limits
print(np.min(pc_arr_new[:,0]), np.max(pc_arr_new[:,0]), (np.max(pc_arr_new[:,0]) - np.min(pc_arr_new[:,0])) )
# z-axis coordinate limits
print(np.min(pc_arr_new[:,2]), np.max(pc_arr_new[:,2]), (np.max(pc_arr_new[:,2]) - np.min(pc_arr_new[:,2])) )

64.82430267 79.90513611 15.080833440000006
10.40850639 13.90547562 3.4969692300000013
307.55532837 313.66433716 6.1090087900000185


In [None]:
# checking for the subset PC data with apple class label
# y-axis coordinate limits
print(np.min(pc_arr_new[pc_arr_new[:,6] == 1][:,1]), np.max(pc_arr_new[pc_arr_new[:,6] == 1][:,1]), (np.max(pc_arr_new[pc_arr_new[:,6] == 1][:,1]) - np.min(pc_arr_new[pc_arr_new[:,6] == 1][:,1])) )
# x-axis coordinate limits
print(np.min(pc_arr_new[pc_arr_new[:,6] == 1][:,0]), np.max(pc_arr_new[pc_arr_new[:,6] == 1][:,0]), (np.max(pc_arr_new[pc_arr_new[:,6] == 1][:,0]) - np.min(pc_arr_new[pc_arr_new[:,6] == 1][:,0])) )
# z-axis coordinate limits
print(np.min(pc_arr_new[pc_arr_new[:,6] == 1][:,2]), np.max(pc_arr_new[pc_arr_new[:,6] == 1][:,2]), (np.max(pc_arr_new[pc_arr_new[:,6] == 1][:,2]) - np.min(pc_arr_new[pc_arr_new[:,6] == 1][:,2])) )

67.28730011 74.84210968 7.554809569999989
11.45102787 12.98692036 1.5358924899999984
308.83605957 312.26205444 3.425994870000011


In [13]:
# reduction of array size of application of deep learning models on this dataset
pc_temp = pc_arr_new[pc_arr_new[:,0] >= 11.40]
pc_temp = pc_temp[pc_temp[:,0] <= 13.15]

pc_temp = pc_temp[pc_temp[:,1] >= 67.25]
pc_temp = pc_temp[pc_temp[:,1] <= 75.00]

pc_temp = pc_temp[pc_temp[:,2] >= 308.75]
pc_temp = pc_temp[pc_temp[:,2] <= 312.50]

In [14]:
# y-axis coordinate limits
print(np.min(pc_temp[:,1]), np.max(pc_temp[:,1]), (np.max(pc_temp[:,1]) - np.min(pc_temp[:,1])) )
# x-axis coordinate limits
print(np.min(pc_temp[:,0]), np.max(pc_temp[:,0]), (np.max(pc_temp[:,0]) - np.min(pc_temp[:,0])) )
# z-axis coordinate limits
print(np.min(pc_temp[:,2]), np.max(pc_temp[:,2]), (np.max(pc_temp[:,2]) - np.min(pc_temp[:,2])) )

67.25000763 75.0 7.749992370000001
11.4000082 13.14999771 1.749989509999999
308.75 312.49996948 3.7499694800000043


In [15]:
# checking the current shape of the reduced pcl dataset
print(pc_temp.shape)

(5149657, 7)


In [18]:
# procedural logic for the upsampling function

# separate out apple class sub array with rgb values
# build a KDTree of the separated out sub array
# loop over upsampling_factor number of points
# then random point selection based on index
# post that, query the nearest neighbor to that random point
# preserve rgb information, separate array for actual upsampling operation

# also, update the whole PC to create new upsampling temporary PC
# after certain number of upsampled new PC size chunk is created
# for better upsampling consistency.

def upsample_minor_class(pcl_arr, upsampling_factor):
    assert upsampling_factor >= 1.0 and upsampling_factor <= 5.0
    pcl_arr_cls = pcl_arr[pcl_arr[:,6] == 1]
    pcl_arr_bg = pcl_arr[pcl_arr[:,6] == 0]
    # open3d pcl processing
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(pcl_arr_cls[:,0:3])
    # building KD-Tree for NN query
    pcd_tree = o3d.geometry.KDTreeFlann(pcd)
    upsample_count = int((upsampling_factor - 1.0)*pcl_arr_cls.shape[0])
    # upsampled points nested list
    upsampled_points = []
    umsample_limit_flg = 0
    for i in tqdm(range(upsample_count)):
        rand_ind = random.randint(0,pcl_arr_cls.shape[0]-1) 
        [k, idx_vals, _] = pcd_tree.search_knn_vector_3d(pcd.points[rand_ind], 4)
        idx_vals = idx_vals[1:]
        rand_neighbor = random.choice(idx_vals)
        # creating new upsampled points
        arr_vals = (pcl_arr_cls[rand_ind] + pcl_arr_cls[rand_neighbor]) / 2
        upsampled_points.append(list(arr_vals))
        if len(upsampled_points) > 0.8 * pcl_arr_cls.shape[0] and umsample_limit_flg == 0:
            pcl_arr_cls =  np.concatenate((pcl_arr_cls, np.array(upsampled_points)), axis=0)
            pcd = o3d.geometry.PointCloud()
            pcd.points = o3d.utility.Vector3dVector(pcl_arr_cls[:,0:3])
            # building KD-Tree for NN query
            pcd_tree = o3d.geometry.KDTreeFlann(pcd)
            umsample_limit_flg = 1
            upsampled_points = []

    pcl_arr_cls =  np.concatenate((pcl_arr_cls, np.array(upsampled_points)), axis=0)
    pcl_arr_final = np.concatenate((pcl_arr_cls, pcl_arr_bg), axis=0)
    return pcl_arr_final

In [19]:
pcl_arr_upsampled = upsample_minor_class(pc_temp, 5)
print(pcl_arr_upsampled.shape)

100%|██████████| 2004188/2004188 [01:28<00:00, 22550.17it/s]


(7153845, 7)


In [23]:
# post upsampling operation point cloud size qualitative analysis for fuji-sfm effective dataset
print(pc_temp[pc_temp[:, 6] == 1].shape, pc_temp[pc_temp[:, 6] == 0].shape)
print(pcl_arr_upsampled[pcl_arr_upsampled[:, 6] == 1].shape, pcl_arr_upsampled[pcl_arr_upsampled[:, 6] == 0].shape)
print(pc_temp[pc_temp[:, 6] == 1].shape[0] / pc_temp[pc_temp[:, 6] == 0].shape[0])
print(pcl_arr_upsampled[pcl_arr_upsampled[:, 6] == 1].shape[0] / pcl_arr_upsampled[pcl_arr_upsampled[:, 6] == 0].shape[0])

(501047, 7) (4648610, 7)
(2505235, 7) (4648610, 7)
0.10778426239241408
0.5389213119620704


In [24]:
# saving the numpy array of the processed point cloud
# TODO: update the below paths based on your project setup
np.save('fuji-sfm-pcl-processed.npy', pcl_arr_upsampled)

In [26]:
# moving the processed point cloud into 
# TODO: update the below paths based on your project setup
!zip -r drive/MyDrive/point-cloud-prototyping/datasets/fuji-sfm-pcl-processed.zip fuji-sfm-pcl-processed.npy

  adding: fuji-sfm-pcl-processed.npy (deflated 60%)
