## ENV SETUP

1. Install uv (or do it you're own way)
2. Run `uv sync`
3. Run `source .venv/bin/activate`

You're good to go.

In [2]:
!git clone https://github.com/MecAgent/mecagent-technical-test.git

fatal: destination path 'mecagent-technical-test' already exists and is not an empty directory.


In [3]:
!cd mecagent-technical-test

In [4]:
cd mecagent-technical-test/

/content/mecagent-technical-test


In [5]:
!uv python install

In [6]:
!uv sync

[2mResolved [1m76 packages[0m [2min 0.71ms[0m[0m
[2mAudited [1m70 packages[0m [2min 0.02ms[0m[0m


In [7]:
!source .venv/bin/activate

In [8]:
!pip install fsspec==2023.9.2




# Instructions

The Task : Create the best CadQuery code generator model.

1. Load the dataset (147K pairs of Images/CadQuery code).
2. Create a baseline model and evaluate it with the given metrics.
3. Enhance by any manner the baseline model and evaluate it again.
4. Explain you choices and possible bottlenecks.
5. Show what enhancements you would have done if you had more time.

You can do *WHATEVER* you want, be creative, result is not what matters the most.
Creating new model architectures, reusing ones you used in the past, fine-tuning, etc...

If you are GPU poor, there are solutions. Absolute value is not what matters, relative value between baseline and enhanced model is what matters.

In [9]:
from datasets import load_dataset
ds = load_dataset("CADCODER/GenCAD-Code", num_proc=16, split=["train", "test"], cache_dir="/Volumes/BIG-DATA/HUGGINGFACE_CACHE")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  table = cls._concat_blocks(blocks, axis=0)


In [31]:
!pip install cadquery trimesh



## Evaluation Metrics

1. Valid Syntax Rate metric assess the validity of the code by executing and checking if error are returned.
2. Best IOU assess the similarity between the meshes generated by the code.

In [11]:
from metrics.valid_syntax_rate import evaluate_syntax_rate_simple
from metrics.best_iou import get_iou_best

In [12]:
## Example usage of the metrics
sample_code = """
height = 60.0
width = 80.0
thickness = 10.0
diameter = 22.0

# make the base
result = (
    cq.Workplane("XY")
    .box(height, width, thickness)
)
"""

sample_code_2 = """
 height = 60.0
 width = 80.0
 thickness = 10.0
 diameter = 22.0
 padding = 12.0

 # make the base
 result = (
     cq.Workplane("XY")
     .box(height, width, thickness)
     .faces(">Z")
     .workplane()
     .hole(diameter)
     .faces(">Z")
     .workplane()
     .rect(height - padding, width - padding, forConstruction=True)
     .vertices()
     .cboreHole(2.4, 4.4, 2.1)
 )
"""
codes = {
    "sample_code": sample_code,
    "sample_code_2": sample_code_2,
}
vsr = evaluate_syntax_rate_simple(codes)
print("Valid Syntax Rate:", vsr)
iou = get_iou_best(sample_code, sample_code_2)
print("IOU:", iou)

Valid Syntax Rate: 1.0
IOU: 0.5834943417057687


## Have Fun

## CODE DEVELOPMENT

### Data analysis

In [13]:
print(ds)

[Dataset({
    features: ['image', 'deepcad_id', 'cadquery', 'token_count', 'prompt', 'hundred_subset'],
    num_rows: 147289
}), Dataset({
    features: ['image', 'deepcad_id', 'cadquery', 'token_count', 'prompt', 'hundred_subset'],
    num_rows: 7355
})]


In [14]:
train_set = ds[0]
test_set = ds[1]

In [None]:
#Analyses of dataset
import pprint

def analyse_data(dataset, name = None):
  print(name, '\n')
  pprint.pprint(dataset.features)

  print("\n", "*"*50)
  print("\nCheck values in features")
  print("\nFeature Hundred_subset:")
  print(set(dataset['hundred_subset']))

  print("\n", "*"*50)
  print("\nFeature Deepcad_id:")
  print((dataset['deepcad_id'][0:10]))

  print("\n", "*"*50)
  print("Image Dimensions:")
  for i in range(0, 10):
      img = dataset[i]['image']
      print(f"Image {i}: Width={img.width}, Height={img.height}")

  print("\n", "*"*50)
  print("\nPrint example cadquery and prompt")
  print('Cadquery: ', dataset[0]['cadquery'])
  print('Prompt: ', dataset[0]['prompt'])

print(analyse_data(train_set, "TRAIN_SET"))
print(analyse_data(test_set, "TEST_SET"))

### Extract Features of importance

In [16]:
#Delete unnecessary columns
train_set = train_set.remove_columns(["deepcad_id", "hundred_subset"])
test_set = test_set.remove_columns(["deepcad_id", "hundred_subset"])



In [17]:
print(train_set, test_set)


Dataset({
    features: ['image', 'cadquery', 'token_count', 'prompt'],
    num_rows: 147289
}) Dataset({
    features: ['image', 'cadquery', 'token_count', 'prompt'],
    num_rows: 7355
})


In [18]:
from tqdm import tqdm

In [19]:
import os
import json
from tqdm import tqdm

os.makedirs("train_images", exist_ok=True)

with open("train.json", "w") as f:
    for id, sample in tqdm(enumerate(train_set), total=len(train_set)):
        img_path = f"train_images/img_{id}.png"
        sample['image'].save(img_path)
        entry = {
            'id': id,
            "image": img_path,
            "prompt": sample['prompt'],
            "cadquery": sample['cadquery']
        }
        f.write(json.dumps(entry) + "\n")


100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 147289/147289 [27:24<00:00, 89.59it/s]


In [20]:
os.makedirs("test_images", exist_ok=True)

with open("test.json", "w") as f:
    for id, sample in tqdm(enumerate(test_set), total=len(test_set)):
        img_path = f"test_images/img_{id}.png"
        sample['image'].save(img_path)
        entry = {
            'id': id,
            "image": img_path,
            "prompt": sample['prompt'],
            "cadquery": sample['cadquery']
        }
        f.write(json.dumps(entry) + "\n")

100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 7355/7355 [01:22<00:00, 89.23it/s]


In [21]:
print(len(os.listdir("train_images")), len(os.listdir('test_images')))

147289 7355


In [None]:
!pip install torch transformers datasets

In [None]:
!pip install --upgrade transformers huggingface_hub


In [None]:
!pip install -U bitsandbytes

In [33]:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121


  Attempting uninstall: nvidia-cusolver-cu12
    Found existing installation: nvidia-cusolver-cu12 11.6.1.9
    Uninstalling nvidia-cusolver-cu12-11.6.1.9:
      Successfully uninstalled nvidia-cusolver-cu12-11.6.1.9
  Attempting uninstall: torchvision
    Found existing installation: torchvision 0.21.0+cu124
    Uninstalling torchvision-0.21.0+cu124:
      Successfully uninstalled torchvision-0.21.0+cu124
  Attempting uninstall: torchaudio
    Found existing installation: torchaudio 2.6.0+cu124
    Uninstalling torchaudio-2.6.0+cu124:
      Successfully uninstalled torchaudio-2.6.0+cu124
Successfully installed nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nvtx-cu12-12.1.105 torch-2.5.1+cu121 torchaudio-2.5.1+cu121 torchvision-0.20.1+cu121 triton-3.1.0


In [13]:
from transformers import AutoModel, AutoTokenizer, AutoProcessor

# Load model directly
model_name = "openbmb/MiniCPM-V"
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

MiniCPMForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly defined. However, it doesn't directly inherit from `GenerationMixin`. From ðŸ‘‰v4.50ðŸ‘ˆ onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The repository openbmb/MiniCPM-V contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/openbmb/MiniCPM-V .
 You can inspect the repository content at https://hf.co/openbmb/MiniCPM-V.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
The repository openbmb/MiniCPM-V contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/openbmb/MiniCPM-V .
 You can inspect the repository content at https://hf.co/openbmb/MiniCPM-V.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
The repository openbmb/MiniCPM-V contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/openbmb/MiniCPM-V .
 You can inspect the repository content 

In [34]:
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"
tokenizer.add_special_tokens({'pad_token': '[tokenizer.eos_token]'})

1

In [35]:

from datasets import load_dataset

dataset = load_dataset("json", data_files={"train": "/content/mecagent-technical-test/train.json", "test": "/content/mecagent-technical-test/test.json"})


In [39]:
from PIL import Image
import os

from PIL import Image
import os

def preprocess(example):
    # Load image (adjust the path as needed)
    image = Image.open(os.path.join("/content/mecagent-technical-test/", example["image"])).convert("RGB")
    prompt = example["prompt"]
    target = example["cadquery"]

    # Tokenize input: both text and image
    model_inputs = processor(
        text=prompt,
        images=image,
        padding="max_length",
        truncation=True,
        max_length=256,
        return_tensors=None,
    )
    # Tokenize target as labels (text only)
    labels = processor.tokenizer(
        target,
        padding="max_length",
        truncation=True,
        max_length=256,
        return_tensors=None,
    )["input_ids"]
    model_inputs["labels"] = labels
    return model_inputs


# Apply preprocessing
tokenized_train = dataset["train"].map(preprocess)
tokenized_test = dataset["test"].map(preprocess)


Map:   0%|          | 0/147289 [00:00<?, ? examples/s]

ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.

In [None]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer)


In [None]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=2,  # adjust as needed
    per_device_eval_batch_size=2,
    num_train_epochs=3,
    fp16=True,  # set to True if using GPU with fp16 support
    evaluation_strategy="epoch",
    save_strategy="epoch",
    logging_dir="./logs",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
    tokenizer=processor,
)


In [None]:
trainer.train()

In [None]:
results = trainer.evaluate()
print(results)