# im23D_pipeline Pipeline Notebooks

These notebooks are here to supply a user with the base functionality of the pipeline. 

The overall goal of this project is to take data from (almost) any dataset (the verifiable ones are going to be Pix3D and ShapeNet v2 (available via login request at: [ShapeNet Website](https://shapenet.org/)

Overall this repo leverages the power of a few packages, [pydantic](https://pydantic-docs.helpmanual.io/), [pydantic_cli](https://github.com/mpkocher/pydantic-cli) (for command line interface) and [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) (remote file systems and a (relatively) uniform API for working with files)


## Where did the code come from?

This repo is a composition of multiple different sources including:

In [1]:
# Get an example file and put it into the ../data folder

!wget --no-check-certificate -O "../data/ShapeNetCore.v2_example.zip" "https://onedrive.live.com/download?cid=B654A7CBA0D23C19&resid=B654A7CBA0D23C19%21745801&authkey=ABOpWCcHdArmSxQ"

!unzip "../data/ShapeNetCore.v2_example.zip"

--2022-12-04 18:50:38--  https://onedrive.live.com/download?cid=B654A7CBA0D23C19&resid=B654A7CBA0D23C19%21745801&authkey=ABOpWCcHdArmSxQ
Resolving onedrive.live.com (onedrive.live.com)... 13.107.42.13
Connecting to onedrive.live.com (onedrive.live.com)|13.107.42.13|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ignpwq.bn.files.1drv.com/y4m9pwsDUxzqsftbLEgG5ZPcRgFwfK0IAS6wdu70TkneY-_nHSyUcdeZNFTVxw3ezNTs5qaqZcQ4kgizlkKfawieq8cokNhfUuPeZr5Qzg2PohBcVKpDEF3jWo9pqUQmKz_SwT72Cb3sRmgoMc2YPOpFXiDqcRSZkuPuVcSqWluBFxp2IObYRuMfCZIYVVf8j1FqorOtZAlANnEmVU-sIh2og/ShapeNetCore.v2_example-skaro-ubuntu.zip?download&psid=1 [following]
--2022-12-04 18:50:42--  https://ignpwq.bn.files.1drv.com/y4m9pwsDUxzqsftbLEgG5ZPcRgFwfK0IAS6wdu70TkneY-_nHSyUcdeZNFTVxw3ezNTs5qaqZcQ4kgizlkKfawieq8cokNhfUuPeZr5Qzg2PohBcVKpDEF3jWo9pqUQmKz_SwT72Cb3sRmgoMc2YPOpFXiDqcRSZkuPuVcSqWluBFxp2IObYRuMfCZIYVVf8j1FqorOtZAlANnEmVU-sIh2og/ShapeNetCore.v2_example-skaro-ubuntu.zip?download&psid=1
Re

In [1]:
# Auto reload magic,
# ONLY run if you're developing
# and changing a bunch of stuff
%load_ext autoreload
%autoreload 2

In [2]:
from pathlib import Path
from im23D_pipeline.datasets import ShapeNetCoreDataset
from im23D_pipeline.pydantic_models import ShapeNetModel

In [3]:
dataset_path = Path("../data/ShapeNetCore.v2/")

In [4]:
test_config = {"dataset_folder": dataset_path, "verbose": True}

shape_net_validated_inputs = ShapeNetModel(**test_config)

test = shape_net_validated_inputs.dict()["dataset_folder"]

len(shape_net_validated_inputs.dict()["dataset_list"])

finding all files in: ../data/ShapeNetCore.v2/**/*.obj
found 52472 number of obj files in the dataset


52472

In [5]:
test.as_posix()

'../data/ShapeNetCore.v2'

In [6]:
shape_net_validated_inputs.dataset_list[100]

'/home/bartelsaa/dev/misc/gatech/img_to_3d_pipeline/notebooks/../data/ShapeNetCore.v2/02691156/157a81baeb10914566cf1b4a8fc3914e/models/model_normalized.obj'

## Now let's instantiate it in the dataset and start generating some data useful to the models!

In [7]:
shape_net_data_set = ShapeNetCoreDataset(shape_net_validated_inputs)

shape_net_data_set.data_catalog_path

-----------------meta data file-----------------

 <bound method NDFrame.head of      synsetId                                               name  \
0    02691156                           airplane,aeroplane,plane   
1    02690373                                           airliner   
2    03809312  narrowbody aircraft,narrow-body aircraft,narro...   
3    04583620  widebody aircraft,wide-body aircraft,wide-body...   
4    02842573                                            biplane   
..        ...                                                ...   
349  04363082                                       surface ship   
350  04567746                                       weather ship   
351  04610013                                 yacht,racing yacht   
352  04554684            washer,automatic washer,washing machine   
353  04591713                                        wine bottle   

                                              children  numInstances  \
0    [02690373, 02842573, 0286

ValueError: Metadata inference failed in `apply`.

You have supplied a custom function and Dask is unable to 
determine the type of output that that function returns. 

To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.

Original error is below:
------------------------
ValueError('not enough values to unpack (expected 4, got 1)')

Traceback:
---------
  File "/home/bartelsaa/anaconda3/envs/im23D_pipeline/lib/python3.10/site-packages/dask/dataframe/utils.py", line 195, in raise_on_meta_error
    yield
  File "/home/bartelsaa/anaconda3/envs/im23D_pipeline/lib/python3.10/site-packages/dask/dataframe/core.py", line 6560, in _emulate
    return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
  File "/home/bartelsaa/anaconda3/envs/im23D_pipeline/lib/python3.10/site-packages/dask/utils.py", line 1103, in __call__
    return getattr(__obj, self.method)(*args, **kwargs)
  File "/home/bartelsaa/anaconda3/envs/im23D_pipeline/lib/python3.10/site-packages/pandas/core/frame.py", line 9565, in apply
    return op.apply().__finalize__(self, method="apply")
  File "/home/bartelsaa/anaconda3/envs/im23D_pipeline/lib/python3.10/site-packages/pandas/core/apply.py", line 746, in apply
    return self.apply_standard()
  File "/home/bartelsaa/anaconda3/envs/im23D_pipeline/lib/python3.10/site-packages/pandas/core/apply.py", line 873, in apply_standard
    results, res_index = self.apply_series_generator()
  File "/home/bartelsaa/anaconda3/envs/im23D_pipeline/lib/python3.10/site-packages/pandas/core/apply.py", line 889, in apply_series_generator
    results[i] = self.f(v)
  File "/home/bartelsaa/dev/misc/gatech/img_to_3d_pipeline/src/im23D_pipeline/datasets/shapenet_dataset.py", line 85, in _assign_metadata_and_labels_for_meshes
    _, sysNetID, modelId, _ = str(mesh_parts).split(pathlib.os.sep)


In [17]:
import pandas as pd

test = pd.read_csv(
    "/home/bartelsaa/dev/misc/gatech/img_to_3d_pipeline/data/ShapeNetCore.v2/datacatalog_parts/datacatalog-0.csv",
    index_col=0,
)
print(len(test))

# read in as dask
import dask as dd

data_catalog = dd.dataframe.read_csv("/home/bartelsaa/dev/misc/gatech/img_to_3d_pipeline/data/ShapeNetCore.v2/datacatalog_parts/datacatalog-*.csv")

5248


In [18]:
len(data_catalog.index)

52472