Skip to content

RChoukri03/CadQuery_Code_Generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Everything is explained in the good_luck.ipynb file.

CadQuery Code Generator

This project implements a simple, fully custom image-to-code architecture to generate CadQuery scripts from mechanical CAD images using a CNN + LSTM model. It was developed for a technical challenge, with the objective of being simple, efficient, and explainable.

Objective

Generate CadQuery Python code that recreates the geometry shown in a CAD image.


Model Architecture

The solution follows an encoder-decoder design:

  • Encoder (CnnEncoder):

    • A lightweight convolutional neural network that reduces a 128x128 image to a 512-dimensional feature vector.
    • Acts as a visual extractor for the geometry.
  • Decoder (CodeLSTMDecoder):

    • A 2-layer LSTM conditioned on the image features.
    • Trained to generate valid Python code token-by-token using GPT2 tokenization.
        Image                     Tokens
       (RGB)                     (Code)
         |                          |
    [ CnnEncoder ]             [ LSTM Decoder ]
         |                          |
      Features   -->     "import cadquery..."

Tokenizer: GPT2 tokenizer is used for code, as it handles indentation and Python-specific tokens well. pad_token is explicitly set to eos_token for compatibility.


Training Strategy

  • Training done from scratch on the full CADCODER/GenCAD-Code dataset (~147k samples).
  • Model trained for only 10 epochs on a high-end NVIDIA L4 GPU-equipped AWS EC2 instance.
Device: NVIDIA L4 (16 GB VRAM)
CUDA Version: 12.8
Training time: ~7h
Memory usage: ~16.7 GB

Loss curve

Training Loss

Initial loss: 5.77 Final loss after 10 epochs: ~0.15 The loss decreased consistently, even with a simple model.

Link to Pretrained Model

You can download the pretrained model weights here:
Download from Google Drive


Evaluation

We evaluate two metrics:

  1. Valid Syntax Rate (VSR): using evaluate_syntax_rate_simple.

    • Checks if the generated code runs without error.
  2. IOU (Shape Similarity): using get_iou_best.

    • Compares the generated shape with the ground truth.

Due to limited compute resources, I were only able to evaluate a subset of the test set. On a representative example (sampleIdx = 840), the results were:

  • Valid Syntax Rate: 1.00
  • IOU (shape similarity): 0.64

While the syntax validity is good, the overall IOU remains relatively low. This indicates that the model often generates code that compiles, but the resulting shape is still far from accurate. These results are not yet satisfying, and further improvements are needed, especially in structure consistency and geometry alignment. Additionally, the model frequently struggles to correctly generate the initial import cadquery as cq statement. To ensure the generated code is syntactically valid and executable, this import line is automatically corrected in the evaluation script when missing or malformed.


How to Run

1. Train the model

python train.py

This will:

  • Load the dataset
  • Initialize model from scratch
  • Train for 30 epochs (or fewer)
  • Save weights in weights/ folder

No virtualenv configuration provided, but using one is recommended for reproducibility.

2. Run evaluation

python eval.py

This loads one sample, runs inference, and prints:

  • Ground truth code
  • Generated code
  • Syntax validity rate
  • IOU (3D shape comparison)

Design Choices

Component Reason
GPT2 Tokenizer Handles Python-like syntax well; avoids whitespace errors
CNN Encoder Simple and lightweight, avoids heavy ViTs for faster prototyping
LSTM Decoder Easy to train on limited GPU; interpretable code output
No pretrained Full training from scratch to control all layers and demonstrate skill

Possible Improvements

  • Use a pretrained image encoder (e.g., ResNet18 or ViT) to improve feature extraction
  • Replace LSTM with Transformer decoder (e.g., GPT2)
  • Improve training workflow with callbacks, logging ...
  • Add support for exporting the model and tokenizer to ONNX for deployment and inference optimization
  • Test different image sizes, max token lengths, and tokenization strategies

Conclusion

This project shows a functional baseline for CAD-to-code generation, using simple components and no external training tricks. The model is small, explainable, and performs reasonably well with only 10 epochs.

With more time and compute, we could easily scale this approach to achieve high performances.


Author: R.Choukri Date: June 2025

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published